Thursday, August 04, 2005

Deciphering Monoalphabetic substitution cipher

Monoalphabetic substitution cipher is a method of encryption where every letter of a plaintext is substituted with a corresponding ciphertext.

Deciphering a monoalphabetic substitution cipher is an interesting process. Let us try deciphering the following ciphertext.

QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
QZX BEU WQ CDJKXLG
ZP KFG HEJSBKLGGKB HEJS EALDIFK
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

To differentiate between the plaintext and ciphertext, I am representing the former as all low case and the latter as all caps.

The first thing I would do in deciphering a monoalphabetic substitution is to use the frequency analysis to guess high frequency letters.

The alphabets’ frequency goes in this order in standard English,
E T A O I N S R H L D C U M F P G W Y B V K X J Q Z

with E having the highest and Z, the least frequency.

Step 1: So the first step is to count the occurrence of each letter in the enciphered text.
E-18 K-16 G-14 Z-9 L-8 Q-7 X-7 F-7 B-7 W-6 J-5 H-5 D-4 P-4 S-4 A-4 O-3 U-3 Y-3 C-2 I-1 R-1 M-1

E has the highest frequency, so it is likely to be ‘e’ in the plaintext too.

Step 2: Next step is to look for single letter word in the ciphertext. Here it is E. Hence E should be either 'a' or 'i' and not 'e' as per the frequency analysis.

Step 3: So the second highest frequency letter K is likely to be 'e'. But while substituting it in the word EKKEJS - EeeEJS, with a quick glance there is no six letter English word of this format with same first and fourth letters and 'ee' in between them. So K cannot be e.

Step 4: Now let us consider the third highest frequency letter G is likely to e. So let us try subsituting it in the ciphertext.

                       e            e
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
e
QZX BEU WQ CDJKXLG
e e e
ZP KFG HEJSBKLGGKB HEJS EALDIFK
e e e e e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
e e e
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

All the substitutions look fine.

Step 5: Now let us look at the two letter words with e. 'We' occurs twice.

as, am, an, at, be, do, hi, if, my, is, in, it, me, us, on and we are the meaningful two letter words in English. For the list of all two letter English words follow this link: http://en.wikipedia.org/wiki/List_of_two-letter_English_words

Pick out the words ending with 'e'. They are be, me and we. So W could be either of 'b', 'm' or 'w'. Also not the word 'EW'. It means 'W' is the first letter of 'WG' and second letter of 'EW'. From the list of two letter words only 'i' and 'm' has that characteristic, ie., 'm' is the last character of am and first character of me. Also as per 'WG', the word ends with 'e'. So 'WG' should be 'me'. So let us subsitute 'm' for 'W'.

                   me             e
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
m e
QZX BEU WQ CDJKXLG
e e e
ZP KFG HEJSBKLGGKB HEJS EALDIFK
e e m e e e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
e me m m e
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 6: Now let us look at the other two letter words made of 'm'. One is 'WQ' and we are sure that it starts with 'm'. The only other two letter word that starts with 'm' is my. So 'Q' should be 'y'. Other is 'EW' and we are sure that it ends with 'm'. The only two letter word that ends with 'm' is am. So 'E' should be 'a'. Also remember as per step 2, E can be either 'a' or 'i' also makes sense in this case.

Let us substitute Q and E with 'y' and 'a'.

y                 me     a      e         a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
y a my e
QZX BEU WQ CDJKXLG
e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a y e e m e a a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
y e me am ama e a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 7: Now let us consider the word 'QZX', which after substitution is 'yZX'. Pick out the three letter words starting with 'y' in standard English. I could think of yes, yet and you. It cannot be yes and yet, because 'e' is already substituted. So it should be you. For the list of all three letter English words follow this link: http://en.wikipedia.org/wiki/List_of_three-letter_English_words.

Let us substitute Z and X with 'o' and 'u' respectively.

 you            me    a      e          a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
o e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a you e e mo e a u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo e me am a ma e a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 8: Look at the word EWEMGY, after substitution 'amaMeY'. Look for a six letter word starting with 'ama' in the dictionary. Only 'amazed' fits in. So let us substitute M and Y with 'z' and 'd' respectively.

 you            me    a      e          a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
o e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a d you e e mo e a u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo ed me am a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 9: Look at the word EPY, after subsitution 'aPd'. Let us get some clue for the letter P. Look at the word ZP, after subsitution 'oP'. There are only two two-letter words starting with 'o' ie., on and or. So P should be either 'n' or 'r'. Substituting these with our earlier observation 'aPd', only 'and' makes sense. So P should be 'n'. Let us substitute P with 'n'.

 you            me    a      e          an  a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
on e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e e mo e an u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo ed me am a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 10: Look at the words KFEP and KFG, after substitution 'KFan' and 'KFe'. I could only think of than and the to match them. Let us substitute K and F with 't' and 'h' respectively.

 you  h   t   me     a   t e     than   a    ha      a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my tu e
QZX BEU WQ CDJKXLG
on the a t e e t a a ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e e mo e than u t a e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut ho you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 11: Look at the words OEBKGL and WZLG, after subsitution 'OaBteL' and 'moLe' respectively. They both are followed by than. So let us make them as comparative degree. They become 'OaBter' and 'more'. L should be 'r' and that makes CLGKKQ - 'Cretty' as pretty. Let us substitute L and C with 'r' and 'p' respectively.

 you  h   t   me     a   t e r  than   a    ha r    a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my p t ur e
QZX BEU WQ CDJKXLG
on the a t r e e t a a r ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e r e mor e than u t a pr e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut ho you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 12: The word UGLG - 'Uere' is preceded by you and followed by more than. It shoudl be were. Substitute U with 'w'.

 you  h   t   me     a   t e r  than   a    ha r    a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you aw my p t ur e
QZX BEU WQ CDJKXLG
on the a t r e e t a a r ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than u t a pr e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut how you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEH

Step 13: Now things will move pretty faster.. let us get into guess works!.. Few obvious guesses.. FDK - 'hDt' becomes hit; OEBKGL - 'OaBter' becomes faster; EKKEJS - 'attaJS' becomes attack and BEU - 'Baw' becomes saw. Let us substitute D, O, B, J, S and B with 'i','f','s','c','k' and 's' respectively.

 you  h i t   me  f a s t e r  than   a  sh ark  a t t ack
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you s aw my pi c t ur e
QZX BEU WQ CDJKXLG
on the ack st r e e t s a ck a r i ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than us t a pr e t t y ac e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut how you oo ed me am s t i a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 14: Few more obvious guesses.. HEJSBKLGGKB - 'Hackstreets' becomes backstreets; RXBK - 'Rust' becomes just; OEJG - 'Oace' becomes face; OZZAGY - 'fooAed' becomes fooled; and EALDIFK - 'alriIht' becomes alright. Substitute H, R, O, A and I with 'b', 'j', 'f', 'l' and 'g' respectively.

 you  h i t   me  f a s t e r  than   a  sh ark  a t t ack
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you s aw my pi c t ur e
QZX BEU WQ CDJKXLG
on the b ack st r e e t s ba ck a l r i ght
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than j u s t a pr e t t y f a c e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
bu t how you f o o l e d me am s t i l l a ma z e d ba b y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ


Finally it deciphers to:

you hit me faster than a shark attack
you saw my picture
on the backstreets back alright
and you were more than just a pretty face
but how you fooled me am still amazed baby.

Tips and tricks:

  1. Use frequency analysis - the order goes as.. e t a o i n s r h l d c u m f p g w y b v k x j q z
  2. First substitution is critical, so cross check the high frequency letter with all possible word formations in the ciphertext, before substituting.
  3. Look for single letter word - it should be either A or I.
  4. Use frequency analysis for digraphs - the order goes as.. TH HE IN ER ED AN ND AR RE EN ES TO NT EA OU NG ST AS RO AT
  5. After every substitution, look for the clue from two letter and three letter words.
  6. Use the list of two letter and three letter words list in wikipedia.
  7. When the starting letters of a word are determined, look for the words with the same starting letters and word length in the dictionary.
  8. Once 50% complete, make quick guesses.

No comments: