Thursday, August 04, 2005

Deciphering Monoalphabetic substitution cipher

Monoalphabetic substitution cipher is a method of encryption where every letter of a plaintext is substituted with a corresponding ciphertext.

Deciphering a monoalphabetic substitution cipher is an interesting process. Let us try deciphering the following ciphertext.

QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
QZX BEU WQ CDJKXLG
ZP KFG HEJSBKLGGKB HEJS EALDIFK
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

To differentiate between the plaintext and ciphertext, I am representing the former as all low case and the latter as all caps.

The first thing I would do in deciphering a monoalphabetic substitution is to use the frequency analysis to guess high frequency letters.

The alphabets’ frequency goes in this order in standard English,
E T A O I N S R H L D C U M F P G W Y B V K X J Q Z

with E having the highest and Z, the least frequency.

Step 1: So the first step is to count the occurrence of each letter in the enciphered text.
E-18 K-16 G-14 Z-9 L-8 Q-7 X-7 F-7 B-7 W-6 J-5 H-5 D-4 P-4 S-4 A-4 O-3 U-3 Y-3 C-2 I-1 R-1 M-1

E has the highest frequency, so it is likely to be ‘e’ in the plaintext too.

Step 2: Next step is to look for single letter word in the ciphertext. Here it is E. Hence E should be either 'a' or 'i' and not 'e' as per the frequency analysis.

Step 3: So the second highest frequency letter K is likely to be 'e'. But while substituting it in the word EKKEJS - EeeEJS, with a quick glance there is no six letter English word of this format with same first and fourth letters and 'ee' in between them. So K cannot be e.

Step 4: Now let us consider the third highest frequency letter G is likely to e. So let us try subsituting it in the ciphertext.

                       e            e
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
e
QZX BEU WQ CDJKXLG
e e e
ZP KFG HEJSBKLGGKB HEJS EALDIFK
e e e e e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
e e e
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

All the substitutions look fine.

Step 5: Now let us look at the two letter words with e. 'We' occurs twice.

as, am, an, at, be, do, hi, if, my, is, in, it, me, us, on and we are the meaningful two letter words in English. For the list of all two letter English words follow this link: http://en.wikipedia.org/wiki/List_of_two-letter_English_words

Pick out the words ending with 'e'. They are be, me and we. So W could be either of 'b', 'm' or 'w'. Also not the word 'EW'. It means 'W' is the first letter of 'WG' and second letter of 'EW'. From the list of two letter words only 'i' and 'm' has that characteristic, ie., 'm' is the last character of am and first character of me. Also as per 'WG', the word ends with 'e'. So 'WG' should be 'me'. So let us subsitute 'm' for 'W'.

                   me             e
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
m e
QZX BEU WQ CDJKXLG
e e e
ZP KFG HEJSBKLGGKB HEJS EALDIFK
e e m e e e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
e me m m e
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 6: Now let us look at the other two letter words made of 'm'. One is 'WQ' and we are sure that it starts with 'm'. The only other two letter word that starts with 'm' is my. So 'Q' should be 'y'. Other is 'EW' and we are sure that it ends with 'm'. The only two letter word that ends with 'm' is am. So 'E' should be 'a'. Also remember as per step 2, E can be either 'a' or 'i' also makes sense in this case.

Let us substitute Q and E with 'y' and 'a'.

y                 me     a      e         a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
y a my e
QZX BEU WQ CDJKXLG
e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a y e e m e a a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
y e me am ama e a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 7: Now let us consider the word 'QZX', which after substitution is 'yZX'. Pick out the three letter words starting with 'y' in standard English. I could think of yes, yet and you. It cannot be yes and yet, because 'e' is already substituted. So it should be you. For the list of all three letter English words follow this link: http://en.wikipedia.org/wiki/List_of_three-letter_English_words.

Let us substitute Z and X with 'o' and 'u' respectively.

 you            me    a      e          a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
o e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a you e e mo e a u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo e me am a ma e a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 8: Look at the word EWEMGY, after substitution 'amaMeY'. Look for a six letter word starting with 'ama' in the dictionary. Only 'amazed' fits in. So let us substitute M and Y with 'z' and 'd' respectively.

 you            me    a      e          a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
o e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a d you e e mo e a u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo ed me am a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 9: Look at the word EPY, after subsitution 'aPd'. Let us get some clue for the letter P. Look at the word ZP, after subsitution 'oP'. There are only two two-letter words starting with 'o' ie., on and or. So P should be either 'n' or 'r'. Substituting these with our earlier observation 'aPd', only 'and' makes sense. So P should be 'n'. Let us substitute P with 'n'.

 you            me    a      e          an  a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
on e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e e mo e an u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo ed me am a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 10: Look at the words KFEP and KFG, after substitution 'KFan' and 'KFe'. I could only think of than and the to match them. Let us substitute K and F with 't' and 'h' respectively.

 you  h   t   me     a   t e     than   a    ha      a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my tu e
QZX BEU WQ CDJKXLG
on the a t e e t a a ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e e mo e than u t a e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut ho you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 11: Look at the words OEBKGL and WZLG, after subsitution 'OaBteL' and 'moLe' respectively. They both are followed by than. So let us make them as comparative degree. They become 'OaBter' and 'more'. L should be 'r' and that makes CLGKKQ - 'Cretty' as pretty. Let us substitute L and C with 'r' and 'p' respectively.

 you  h   t   me     a   t e r  than   a    ha r    a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my p t ur e
QZX BEU WQ CDJKXLG
on the a t r e e t a a r ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e r e mor e than u t a pr e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut ho you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 12: The word UGLG - 'Uere' is preceded by you and followed by more than. It shoudl be were. Substitute U with 'w'.

 you  h   t   me     a   t e r  than   a    ha r    a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you aw my p t ur e
QZX BEU WQ CDJKXLG
on the a t r e e t a a r ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than u t a pr e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut how you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEH

Step 13: Now things will move pretty faster.. let us get into guess works!.. Few obvious guesses.. FDK - 'hDt' becomes hit; OEBKGL - 'OaBter' becomes faster; EKKEJS - 'attaJS' becomes attack and BEU - 'Baw' becomes saw. Let us substitute D, O, B, J, S and B with 'i','f','s','c','k' and 's' respectively.

 you  h i t   me  f a s t e r  than   a  sh ark  a t t ack
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you s aw my pi c t ur e
QZX BEU WQ CDJKXLG
on the ack st r e e t s a ck a r i ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than us t a pr e t t y ac e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut how you oo ed me am s t i a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 14: Few more obvious guesses.. HEJSBKLGGKB - 'Hackstreets' becomes backstreets; RXBK - 'Rust' becomes just; OEJG - 'Oace' becomes face; OZZAGY - 'fooAed' becomes fooled; and EALDIFK - 'alriIht' becomes alright. Substitute H, R, O, A and I with 'b', 'j', 'f', 'l' and 'g' respectively.

 you  h i t   me  f a s t e r  than   a  sh ark  a t t ack
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you s aw my pi c t ur e
QZX BEU WQ CDJKXLG
on the b ack st r e e t s ba ck a l r i ght
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than j u s t a pr e t t y f a c e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
bu t how you f o o l e d me am s t i l l a ma z e d ba b y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ


Finally it deciphers to:

you hit me faster than a shark attack
you saw my picture
on the backstreets back alright
and you were more than just a pretty face
but how you fooled me am still amazed baby.

Tips and tricks:

  1. Use frequency analysis - the order goes as.. e t a o i n s r h l d c u m f p g w y b v k x j q z
  2. First substitution is critical, so cross check the high frequency letter with all possible word formations in the ciphertext, before substituting.
  3. Look for single letter word - it should be either A or I.
  4. Use frequency analysis for digraphs - the order goes as.. TH HE IN ER ED AN ND AR RE EN ES TO NT EA OU NG ST AS RO AT
  5. After every substitution, look for the clue from two letter and three letter words.
  6. Use the list of two letter and three letter words list in wikipedia.
  7. When the starting letters of a word are determined, look for the words with the same starting letters and word length in the dictionary.
  8. Once 50% complete, make quick guesses.

Monday, July 04, 2005

TUNES project

These days I had been surfing a lot and read any technical content that comes to my notice. This morning I was browsing for some interesting sites on operating systems and just happened to end up with this website. http://cliki.tunes.org/index

TUNES is a recursive acronym for TUNES is a Useful, Nevertheless Expedient, System.Based on my understanding of this project, their feeling is that there is a growing gap between the ever-growing hardware potential, and the slowly enhancing software realizations. Their ultimate goal is to rethink how computer systems should be to fill this gap. You can read more about this project here.

What attracted me more about this site are their reviews on the existing systems and methodologies. These include operating systems, programming languages, virtual machines, user interface, methods of reflection and migration.

I was simply amazed by the list of existing operating systems and programming languages they have mentioned. I wonder that such a huge numbers of them exist. It was really an eye-opener to me. To my knowledge, I thought and could number some 6 operating systems and 15 programming languages.

I hope that these pages would be a good reading for anyone.
Reviews on existing operating systems :
http://cliki.tunes.org/Operating%20Systems
Reviews on existing programming languages :
http://cliki.tunes.org/Programming%20Languages

The above links are must see for those interested in operating systems and programming languages.

Sunday, April 10, 2005

How function calls work?

This is the first time am peeking into low-level of programming. This topic is not part of my curriculum but had to explore this in order to understand few advanced features of functional programming language. I'll detail about those features in my future blog.

Function, is just a piece of code which can be referred using a name. The computer stores the data of the function (such as arguments, local variables, return address etc.,) when a function is invoked and destroys them when the function returns. Similarly when a function calls other functions, the caller function returns only after all the callee functions are returned. We can infer a LIFO (last-in-first-out) scenario in this, ie., the function that is invoked last completes first and the one invoked first completes last. So the stack becomes an obvious choice for function calls implementation.

In the memory, the stack grows from the higher memory address to the lower memory addresses. So we have to visualize an inverted stack. Have look at the below diagram.


Stack frame


Let me brief through all the terminologies before we get into the working.

Stack frame - The section of memory where the local variables, arguments, return address and other information of a function are stored, is called stack frame or activation record.

Stack pointer - For function calls, a bulk of data is pushed and popped from the memory, whenever a function is invoked and returned respectively. But we may need to access the data which is deep into the stack and it will be inefficient to pop off all the data to do that. Hence, unlike conventional stack implementation, we have a register called stack pointer that points to top of the stack such that all the addresses smaller than the address to which the stack pointer points are considered as garbage and all the address larger are considered as valid.

Stack bottom - It is the highest valid address of the stack. When the stack is initialized the stack pointer points to stack bottom.

Stack limit - It is the smallest valid address of the stack. When the stack pointer goes below this address, then there's stack overflow.

Frame pointer - When a new function is invoked the frame pointer points to where the stack pointer was and when the function returns the stack pointer points back to where the frame pointer is.

Return address - It points to address (within the caller function) to which the control should pass when the callee (current) function returns.

Static link - This comes into picture for the programming languages that support nested functions. When a function
(nested function) is defined within another function (enclosing function), the nested function may need to access the variables of the enclosing function. Therefore, the nested function's stack frame should be able to access the enclosing function's frame. This made possible by static link which points to the address of the enclosing function's frame.

How it works?

The working of stack differs with different programming languages. Programming languages like C, doesn't support nested functions and they dont need static link. Programming lanuages like Pascal support nested functions and they use static link. There are programming languages like ML and Scheme which support both nested functions and function-valued variables (ie., a function returned as a result of another function). I am not aware of how these function calls work. I have discussed the working only for programming languages that support nested functions.

Consider a pseudo-C code: (This doesn't work in C. Just using a comfortable syntax.)

int foo (int arg1)
{
   int f (int arg2)
   {
      int i=10;
      return arg2 * i;
   }
   int g(int arg3)
   {
      return f(arg3);
   }
   return g(arg1);
}

f and g are nested functions of foo. foo invokes g which inturn invokes f.


Stack frame - working

When the function foo invokes function g, the argument arg3 is pushed into the stack. A new section memory is allocated for the function g and the stack pointer points to the highest unused address. The frame pointer points to where stack pointer was. The old frame pointer is pushed ie., 1. The static link
which points to the address of the enclosing function foo is pushed, ie., 1. The return address is pushed ie., to foo.

Then when the function f is invoked the argument arg2 is pushed. Now the frame pointer points to where the stack pointer was. The old FP is pushed ie., 3. The static link pointing to foo is pushed ie., 1. The return address is pushed ie., to g. Then the local variable i is pushed.

When the function f returns. The stack pointer points to where stack pointer was and the frame pointer points to the address stored as old FP in f's stack frame.

This simply doesn't work with languages like ML and scheme because in those, the local variables should not be destroyed while the function returns! as they may be required for the nested callee function.

How recursive functions work?

The recursive functions work in same way as the other functions. For every recursive call a new stack frame of the function is created. But this avoided in tail recursive calls in functional languages. In tail recursive call, each recursive call is made to loop to the same stack frame.


Stack overflow

When the stack pointer goes below the stack limit, then there's stack overflow. The most common cause for stack overflow is the infinite loop in the recursive function calls. Imagine if a function is invoked recursively for infinite times, a stack frame gets appended for every recursive call and at one point the stack pointer goes below the stack limit and thus a stack overflow is caused.



References:
1. Understanding the stack.
2. Modern compiler implementation in ML by Andrew W. Appel

Sunday, March 06, 2005

Abstract data types vs Data structures

C++ data structures is another course am learning this semester. This course is a pre-requisite for all non-Computer Science students. I have already learnt and implemented all the C++ data structures in a private course during my undergraduate studies, but not with the same understanding am learning them these days. Frankly I dont know much about data structures during my work experience. So I consider this course as an opportunity to get my hands on C++ again and implement various datastructures.

First thing I got to know is.. the difference between the abstract data types (ADT) and datastructures. Earlier I thought they are same.

Before starting that, we need to be clear about the logical and implementation level of data. Let us take an example of a simple built-in data type integer. At the logic level, we application programmers know only about what are the operations an integer data type can perform, ie., addition, subtraction etc., But we are no way aware of how the data type is actually implemented. At the implementation level, it is about how the data type integer is implemented in the machine level, ie., it could either of binary-coded decimal, unsigned binary, sign-and-magnitude binary, One's complement and Two's complement notation.

Now.. for the understanding the ADT and data structure, we need to assume a higher level abstraction where we have the built-in types at the implementation level.

To put it simple, ADT is a logical description and data structure is concrete. ADT is the logical picture of the data and the operations to manipulate the component elements of the data. Data structure is the actual representation of the data during the implementation and the algorithms to manipulate the data elements. ADT is in the logical level and data structure is in the implementation level.

As you can see, ADT is implementation independent. For example, it only describes what a data type List consists (data) and what are the operations it can perform, but it has no information about how the List is actually implemented.

Whereas data structure is implementation dependent, as in the same example, it is about how the List implemented ie., using array or linked list. Ultimately, data structure is how we implement the data in an abstract data type.

This is something new of what I have learnt.

Tuesday, March 01, 2005

got to know.. 'Leda'

I just came across a new programming language called Leda when I was reading the book 'Thinking in C++' by Bruce Eckel. What is so unique about Leda is, it is a multiparadigm programming language. Multiparadigm, I mean, has all the paradigms of programming languages implemented in it ie., imperative, object-oriented, functional and logic!

As we all know that these abtraction of programming languages were basically to address the problem needs to be solved by the computer. As in imperative languages the problem space and solution space were different, so it became tedious for a programmer to map the spaces with each other. And then came the better abstraction, object-oriented programming. Here the problem space and solution space looked similar! like.. all were treated as objects ie., the abstraction was catered in a way that it fits the real world. Later we had functional and logic paradigms which had their own conceptualization of the real world ie., as 'lists' in LISP and 'algorithms' in Prolog. So there came a point that every paradigm best suits only for the particular kind of problem that it was catered for.

So Leda was designed to have all the paradigms in one language and the programmer can chose the paradigm based on which will best suit his problem to be solved. It was founded by Timothy Budd. There is a book named 'Multiparadigm Programming in Leda' authored by the founder himself. I tried finding a free soft copy of this book to learn more but in vain. But I got a research paper on this topic from my univ's 'Adaptive Systems and Languages group'. Will post more when I get time to read that paper.

Thursday, January 20, 2005

Paradigms of languages

In the first lecture, I learnt about the different paradigms of prg. languages. This is not new to us but generally we classify them as low-level, structural and object oriented languages, and what I learnt is a way forward. Have a look at it.

The taxonomy is provided based on the nature of program and data in these paradigms. The taxonomy is based on my instructor, Dr. Mattox A. Beckman's views.


1. Low level languages
Program : series of control signals for a CPU.
Data : integers, memory address, IEEE floating point.
eg: Assembly languages and microcode

2. Imperative languages
Program : list of commands to be executed.
Data : low level types (integers) and composite types (records and arrays)
eg: C, BASIC, Fortran, Forth and Pascal.
Fortran is still the best language to do intensive mathematic calculations. It has advantages over Matlab.

3. Data encapsulation languages
Program : list of commands to be executed.
Data : low level types (integers), composite types and we can hide details from the user.
eg: Clu, Modula 2, Ada

4. Object oriented languages
Program : message between objects
Data : object (functions with state)
eg: Smalltalk, Java, C++
C++ and Java were actually inspired by Smalltalk which should have been the web language, - what Java has become. But they didn't see the opportunity at the right time.

5. Functional languages
Program : an expression to evaluate.
Data : low level types and higher order types (the functions)
eg: LISP, Scheme, ML

6. Logic programming languages
Program : a logical predicate to satisfy.
Data : set of assertions about what we know to be true.
eg: Prolog

So from this taxonomy am already introduced to four of the paradigms. I learnt BASIC during my schooling days and have done few coding in assembly level language during my undergraduate studies. I have also learnt C and C++ in a private course. Then I learnt and worked in Java during my work experience.

Still there are two more paradigms unexplored and I'll learn them in this course (functional and logic programming languages). Hope that should provide me with the well-rounded understanding of the different paradigms, I strive for.

Wednesday, January 19, 2005

Programming languages and translators

One of my major reasons to opt for graduate studies is to learn the science behind programming languages. This thought blossomed 4 the first time when I was attending a training class in Java in my prev. organization. My tutor was Mr. Sanjay Kumar Mitra, a very nice gentle man with sound knowledge in Java. When he was talking about objects and primitive datatypes like 'int', the similarity in their implementation in Java and the fact that you can't do everything with an 'int' that you do with an object, added more mystery.. I wondered how the designer of the language would have designed it and whatz the science behind it?

With my experience as a software engineer, I felt that every software developer should have a good understanding of the different paradigms of progamming languages. Because this is an ever-growing field and it becomes compelling for us to be updated with the technical advancements thoughout our career. The understanding of the paradigms and the root of their conceptualization, is the key to adapt to any new programming language down the line.

Now.. I hope this course 'Programming languages and translators' will address that. In this blog I'll give an introduction abt what I'll learn through this course.

Yep.. so this is what I'll learn from this course.
  1. The major classes of programming languages.
  2. How to specify formally the meaning of a language - to people and to the computer. (type derivation etc.,)
  3. How to design and implement a language?
  4. Three powerful concepts - Abstraction, Recursion and Transformation.
Yeah! I am going to learn how to design and implement a language.. now doesn't that sound cool?

and basically there are three major themes in this course.
  1. How do we describe a language to ourselves?
  2. How do we describe a language to the computer? - how the computer understands what we typed in? (parsing)
  3. Once the computer has understood, how it does what we said? (interpreting and compiling)
and I am gonna learn all these through a new language called 'OCaml' - Objective Caml. Itz a functional language .

so.. now comes the question, why a functional language? and why not an object oriented programming language which is predominant in the market?

Because functional languages are step ahead and have all the features of procedural and OOP languages. So learning through this language, we can conceptualize all other paradigms.

Ground zero..

This is my first blog and the purpose is to share whatever I learn in the field of computer science.. or might add even more in the career perspective.

Why blogging? It will help me to express myself better and also logging my step-by-step learning curve will help me in the long run to evaluate myself. Apart from that my intention is to share the knowledge I gain and keep this blog as an informative resource to the reader.

and with this short note am starting off..