Friday, May 12, 2006

XML Encryption

XML Encryption allows hiding all or part of the XML document (confidentiality) from anyone other than the private key holder.

XML encryption is implemented using shared key (symmetric) cryptography. In shared key cryptography, the data is encrypted and decrypted by the same key. The challenge lies in transporting the shared key to the recipient. This one challenge in shared key cryptography, can simply makes its purpose meaningless. Because any intruder who receives the shared key and simply encrypt the message which is supposed to be confidential to the intended recipient. So the critical aspect is to secure the shared key itself.

This challenge is handled in XML encryption using public key (asymmetric) cryptography. The shared key is encrypted using the recipient's public key and sent to the recipient along with the encrypted data. Notice that both the data and key are encrypted, but both are encrypted using different keys. Data by shared key and key by recipient's public key. Hence the recipient will first decrypt the encrypted key using its private key and retrieve the shared key. Then it uses the shared key to decrypt the encrypted data.

Thus the shared key is transferred securely and the data's confidentiality is also maintained.

It is worth to discuss reg. the symmetric and asymmetric cryptographies. In asymmetric cryptography there is no need to distribute the encryption key. The public key of the recipient is published and the sender encrypts the data using the public key; and the private key to decrypt the message is possessed only by the recipient. But in asymmetric cryptography, the encryption using public key and the decryption using the private key are much slower when compared to symmetric cryptography. This performance issue becomes more critical when the data is big. Hence it is a common practice to combine the advantages of both the cryptographies. The encryption and decryption is done using the shared key as in symmetric cryptography and the distribution of the shared key is done using asymmetric cryptography. As only the shared key is processed using asymmetric cryptography, the performance issue is neglible when compared to processing the whole data. This same concept is implemented in XML encryption.

Thursday, August 04, 2005

Deciphering Monoalphabetic substitution cipher

Monoalphabetic substitution cipher is a method of encryption where every letter of a plaintext is substituted with a corresponding ciphertext.

Deciphering a monoalphabetic substitution cipher is an interesting process. Let us try deciphering the following ciphertext.

QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
QZX BEU WQ CDJKXLG
ZP KFG HEJSBKLGGKB HEJS EALDIFK
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

To differentiate between the plaintext and ciphertext, I am representing the former as all low case and the latter as all caps.

The first thing I would do in deciphering a monoalphabetic substitution is to use the frequency analysis to guess high frequency letters.

The alphabets’ frequency goes in this order in standard English,
E T A O I N S R H L D C U M F P G W Y B V K X J Q Z

with E having the highest and Z, the least frequency.

Step 1: So the first step is to count the occurrence of each letter in the enciphered text.
E-18 K-16 G-14 Z-9 L-8 Q-7 X-7 F-7 B-7 W-6 J-5 H-5 D-4 P-4 S-4 A-4 O-3 U-3 Y-3 C-2 I-1 R-1 M-1

E has the highest frequency, so it is likely to be ‘e’ in the plaintext too.

Step 2: Next step is to look for single letter word in the ciphertext. Here it is E. Hence E should be either 'a' or 'i' and not 'e' as per the frequency analysis.

Step 3: So the second highest frequency letter K is likely to be 'e'. But while substituting it in the word EKKEJS - EeeEJS, with a quick glance there is no six letter English word of this format with same first and fourth letters and 'ee' in between them. So K cannot be e.

Step 4: Now let us consider the third highest frequency letter G is likely to e. So let us try subsituting it in the ciphertext.

                       e            e
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
e
QZX BEU WQ CDJKXLG
e e e
ZP KFG HEJSBKLGGKB HEJS EALDIFK
e e e e e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
e e e
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

All the substitutions look fine.

Step 5: Now let us look at the two letter words with e. 'We' occurs twice.

as, am, an, at, be, do, hi, if, my, is, in, it, me, us, on and we are the meaningful two letter words in English. For the list of all two letter English words follow this link: http://en.wikipedia.org/wiki/List_of_two-letter_English_words

Pick out the words ending with 'e'. They are be, me and we. So W could be either of 'b', 'm' or 'w'. Also not the word 'EW'. It means 'W' is the first letter of 'WG' and second letter of 'EW'. From the list of two letter words only 'i' and 'm' has that characteristic, ie., 'm' is the last character of am and first character of me. Also as per 'WG', the word ends with 'e'. So 'WG' should be 'me'. So let us subsitute 'm' for 'W'.

                   me             e
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
m e
QZX BEU WQ CDJKXLG
e e e
ZP KFG HEJSBKLGGKB HEJS EALDIFK
e e m e e e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
e me m m e
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 6: Now let us look at the other two letter words made of 'm'. One is 'WQ' and we are sure that it starts with 'm'. The only other two letter word that starts with 'm' is my. So 'Q' should be 'y'. Other is 'EW' and we are sure that it ends with 'm'. The only two letter word that ends with 'm' is am. So 'E' should be 'a'. Also remember as per step 2, E can be either 'a' or 'i' also makes sense in this case.

Let us substitute Q and E with 'y' and 'a'.

y                 me     a      e         a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
y a my e
QZX BEU WQ CDJKXLG
e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a y e e m e a a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
y e me am ama e a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 7: Now let us consider the word 'QZX', which after substitution is 'yZX'. Pick out the three letter words starting with 'y' in standard English. I could think of yes, yet and you. It cannot be yes and yet, because 'e' is already substituted. So it should be you. For the list of all three letter English words follow this link: http://en.wikipedia.org/wiki/List_of_three-letter_English_words.

Let us substitute Z and X with 'o' and 'u' respectively.

 you            me    a      e          a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
o e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a you e e mo e a u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo e me am a ma e a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 8: Look at the word EWEMGY, after substitution 'amaMeY'. Look for a six letter word starting with 'ama' in the dictionary. Only 'amazed' fits in. So let us substitute M and Y with 'z' and 'd' respectively.

 you            me    a      e          a    a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
o e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
a d you e e mo e a u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo ed me am a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 9: Look at the word EPY, after subsitution 'aPd'. Let us get some clue for the letter P. Look at the word ZP, after subsitution 'oP'. There are only two two-letter words starting with 'o' ie., on and or. So P should be either 'n' or 'r'. Substituting these with our earlier observation 'aPd', only 'and' makes sense. So P should be 'n'. Let us substitute P with 'n'.

 you            me    a      e          an  a       a      a      a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my u e
QZX BEU WQ CDJKXLG
on e a e e a a
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e e mo e an u a e y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
u o you oo ed me am a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 10: Look at the words KFEP and KFG, after substitution 'KFan' and 'KFe'. I could only think of than and the to match them. Let us substitute K and F with 't' and 'h' respectively.

 you  h   t   me     a   t e     than   a    ha      a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my tu e
QZX BEU WQ CDJKXLG
on the a t e e t a a ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e e mo e than u t a e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut ho you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 11: Look at the words OEBKGL and WZLG, after subsitution 'OaBteL' and 'moLe' respectively. They both are followed by than. So let us make them as comparative degree. They become 'OaBter' and 'more'. L should be 'r' and that makes CLGKKQ - 'Cretty' as pretty. Let us substitute L and C with 'r' and 'p' respectively.

 you  h   t   me     a   t e r  than   a    ha r    a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you a my p t ur e
QZX BEU WQ CDJKXLG
on the a t r e e t a a r ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you e r e mor e than u t a pr e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut ho you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 12: The word UGLG - 'Uere' is preceded by you and followed by more than. It shoudl be were. Substitute U with 'w'.

 you  h   t   me     a   t e r  than   a    ha r    a t t a
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you aw my p t ur e
QZX BEU WQ CDJKXLG
on the a t r e e t a a r ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than u t a pr e t t y a e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut how you oo ed me am t a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEH

Step 13: Now things will move pretty faster.. let us get into guess works!.. Few obvious guesses.. FDK - 'hDt' becomes hit; OEBKGL - 'OaBter' becomes faster; EKKEJS - 'attaJS' becomes attack and BEU - 'Baw' becomes saw. Let us substitute D, O, B, J, S and B with 'i','f','s','c','k' and 's' respectively.

 you  h i t   me  f a s t e r  than   a  sh ark  a t t ack
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you s aw my pi c t ur e
QZX BEU WQ CDJKXLG
on the ack st r e e t s a ck a r i ht
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than us t a pr e t t y ac e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
ut how you oo ed me am s t i a ma z e d a y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ

Step 14: Few more obvious guesses.. HEJSBKLGGKB - 'Hackstreets' becomes backstreets; RXBK - 'Rust' becomes just; OEJG - 'Oace' becomes face; OZZAGY - 'fooAed' becomes fooled; and EALDIFK - 'alriIht' becomes alright. Substitute H, R, O, A and I with 'b', 'j', 'f', 'l' and 'g' respectively.

 you  h i t   me  f a s t e r  than   a  sh ark  a t t ack
QZX FDK WG OEBKGL KFEP E BFELS EKKEJS
you s aw my pi c t ur e
QZX BEU WQ CDJKXLG
on the b ack st r e e t s ba ck a l r i ght
ZP KFG HEJSBKLGGKB HEJS EALDIFK
and you w e r e mor e than j u s t a pr e t t y f a c e
EPY QZX UGLG WZLG KFEP RXBK E CLGKKQ OEJG
bu t how you f o o l e d me am s t i l l a ma z e d ba b y
HXK FZU QZX OZZAGY WG EW BKDAA EWEMGY HEHQ


Finally it deciphers to:

you hit me faster than a shark attack
you saw my picture
on the backstreets back alright
and you were more than just a pretty face
but how you fooled me am still amazed baby.

Tips and tricks:

  1. Use frequency analysis - the order goes as.. e t a o i n s r h l d c u m f p g w y b v k x j q z
  2. First substitution is critical, so cross check the high frequency letter with all possible word formations in the ciphertext, before substituting.
  3. Look for single letter word - it should be either A or I.
  4. Use frequency analysis for digraphs - the order goes as.. TH HE IN ER ED AN ND AR RE EN ES TO NT EA OU NG ST AS RO AT
  5. After every substitution, look for the clue from two letter and three letter words.
  6. Use the list of two letter and three letter words list in wikipedia.
  7. When the starting letters of a word are determined, look for the words with the same starting letters and word length in the dictionary.
  8. Once 50% complete, make quick guesses.

Monday, July 04, 2005

TUNES project

These days I had been surfing a lot and read any technical content that comes to my notice. This morning I was browsing for some interesting sites on operating systems and just happened to end up with this website. http://cliki.tunes.org/index

TUNES is a recursive acronym for TUNES is a Useful, Nevertheless Expedient, System.Based on my understanding of this project, their feeling is that there is a growing gap between the ever-growing hardware potential, and the slowly enhancing software realizations. Their ultimate goal is to rethink how computer systems should be to fill this gap. You can read more about this project here.

What attracted me more about this site are their reviews on the existing systems and methodologies. These include operating systems, programming languages, virtual machines, user interface, methods of reflection and migration.

I was simply amazed by the list of existing operating systems and programming languages they have mentioned. I wonder that such a huge numbers of them exist. It was really an eye-opener to me. To my knowledge, I thought and could number some 6 operating systems and 15 programming languages.

I hope that these pages would be a good reading for anyone.
Reviews on existing operating systems :
http://cliki.tunes.org/Operating%20Systems
Reviews on existing programming languages :
http://cliki.tunes.org/Programming%20Languages

The above links are must see for those interested in operating systems and programming languages.

Sunday, April 10, 2005

How function calls work?

This is the first time am peeking into low-level of programming. This topic is not part of my curriculum but had to explore this in order to understand few advanced features of functional programming language. I'll detail about those features in my future blog.

Function, is just a piece of code which can be referred using a name. The computer stores the data of the function (such as arguments, local variables, return address etc.,) when a function is invoked and destroys them when the function returns. Similarly when a function calls other functions, the caller function returns only after all the callee functions are returned. We can infer a LIFO (last-in-first-out) scenario in this, ie., the function that is invoked last completes first and the one invoked first completes last. So the stack becomes an obvious choice for function calls implementation.

In the memory, the stack grows from the higher memory address to the lower memory addresses. So we have to visualize an inverted stack. Have look at the below diagram.


Stack frame


Let me brief through all the terminologies before we get into the working.

Stack frame - The section of memory where the local variables, arguments, return address and other information of a function are stored, is called stack frame or activation record.

Stack pointer - For function calls, a bulk of data is pushed and popped from the memory, whenever a function is invoked and returned respectively. But we may need to access the data which is deep into the stack and it will be inefficient to pop off all the data to do that. Hence, unlike conventional stack implementation, we have a register called stack pointer that points to top of the stack such that all the addresses smaller than the address to which the stack pointer points are considered as garbage and all the address larger are considered as valid.

Stack bottom - It is the highest valid address of the stack. When the stack is initialized the stack pointer points to stack bottom.

Stack limit - It is the smallest valid address of the stack. When the stack pointer goes below this address, then there's stack overflow.

Frame pointer - When a new function is invoked the frame pointer points to where the stack pointer was and when the function returns the stack pointer points back to where the frame pointer is.

Return address - It points to address (within the caller function) to which the control should pass when the callee (current) function returns.

Static link - This comes into picture for the programming languages that support nested functions. When a function
(nested function) is defined within another function (enclosing function), the nested function may need to access the variables of the enclosing function. Therefore, the nested function's stack frame should be able to access the enclosing function's frame. This made possible by static link which points to the address of the enclosing function's frame.

How it works?

The working of stack differs with different programming languages. Programming languages like C, doesn't support nested functions and they dont need static link. Programming lanuages like Pascal support nested functions and they use static link. There are programming languages like ML and Scheme which support both nested functions and function-valued variables (ie., a function returned as a result of another function). I am not aware of how these function calls work. I have discussed the working only for programming languages that support nested functions.

Consider a pseudo-C code: (This doesn't work in C. Just using a comfortable syntax.)

int foo (int arg1)
{
   int f (int arg2)
   {
      int i=10;
      return arg2 * i;
   }
   int g(int arg3)
   {
      return f(arg3);
   }
   return g(arg1);
}

f and g are nested functions of foo. foo invokes g which inturn invokes f.


Stack frame - working

When the function foo invokes function g, the argument arg3 is pushed into the stack. A new section memory is allocated for the function g and the stack pointer points to the highest unused address. The frame pointer points to where stack pointer was. The old frame pointer is pushed ie., 1. The static link
which points to the address of the enclosing function foo is pushed, ie., 1. The return address is pushed ie., to foo.

Then when the function f is invoked the argument arg2 is pushed. Now the frame pointer points to where the stack pointer was. The old FP is pushed ie., 3. The static link pointing to foo is pushed ie., 1. The return address is pushed ie., to g. Then the local variable i is pushed.

When the function f returns. The stack pointer points to where stack pointer was and the frame pointer points to the address stored as old FP in f's stack frame.

This simply doesn't work with languages like ML and scheme because in those, the local variables should not be destroyed while the function returns! as they may be required for the nested callee function.

How recursive functions work?

The recursive functions work in same way as the other functions. For every recursive call a new stack frame of the function is created. But this avoided in tail recursive calls in functional languages. In tail recursive call, each recursive call is made to loop to the same stack frame.


Stack overflow

When the stack pointer goes below the stack limit, then there's stack overflow. The most common cause for stack overflow is the infinite loop in the recursive function calls. Imagine if a function is invoked recursively for infinite times, a stack frame gets appended for every recursive call and at one point the stack pointer goes below the stack limit and thus a stack overflow is caused.



References:
1. Understanding the stack.
2. Modern compiler implementation in ML by Andrew W. Appel

Sunday, March 06, 2005

Abstract data types vs Data structures

C++ data structures is another course am learning this semester. This course is a pre-requisite for all non-Computer Science students. I have already learnt and implemented all the C++ data structures in a private course during my undergraduate studies, but not with the same understanding am learning them these days. Frankly I dont know much about data structures during my work experience. So I consider this course as an opportunity to get my hands on C++ again and implement various datastructures.

First thing I got to know is.. the difference between the abstract data types (ADT) and datastructures. Earlier I thought they are same.

Before starting that, we need to be clear about the logical and implementation level of data. Let us take an example of a simple built-in data type integer. At the logic level, we application programmers know only about what are the operations an integer data type can perform, ie., addition, subtraction etc., But we are no way aware of how the data type is actually implemented. At the implementation level, it is about how the data type integer is implemented in the machine level, ie., it could either of binary-coded decimal, unsigned binary, sign-and-magnitude binary, One's complement and Two's complement notation.

Now.. for the understanding the ADT and data structure, we need to assume a higher level abstraction where we have the built-in types at the implementation level.

To put it simple, ADT is a logical description and data structure is concrete. ADT is the logical picture of the data and the operations to manipulate the component elements of the data. Data structure is the actual representation of the data during the implementation and the algorithms to manipulate the data elements. ADT is in the logical level and data structure is in the implementation level.

As you can see, ADT is implementation independent. For example, it only describes what a data type List consists (data) and what are the operations it can perform, but it has no information about how the List is actually implemented.

Whereas data structure is implementation dependent, as in the same example, it is about how the List implemented ie., using array or linked list. Ultimately, data structure is how we implement the data in an abstract data type.

This is something new of what I have learnt.

Tuesday, March 01, 2005

got to know.. 'Leda'

I just came across a new programming language called Leda when I was reading the book 'Thinking in C++' by Bruce Eckel. What is so unique about Leda is, it is a multiparadigm programming language. Multiparadigm, I mean, has all the paradigms of programming languages implemented in it ie., imperative, object-oriented, functional and logic!

As we all know that these abtraction of programming languages were basically to address the problem needs to be solved by the computer. As in imperative languages the problem space and solution space were different, so it became tedious for a programmer to map the spaces with each other. And then came the better abstraction, object-oriented programming. Here the problem space and solution space looked similar! like.. all were treated as objects ie., the abstraction was catered in a way that it fits the real world. Later we had functional and logic paradigms which had their own conceptualization of the real world ie., as 'lists' in LISP and 'algorithms' in Prolog. So there came a point that every paradigm best suits only for the particular kind of problem that it was catered for.

So Leda was designed to have all the paradigms in one language and the programmer can chose the paradigm based on which will best suit his problem to be solved. It was founded by Timothy Budd. There is a book named 'Multiparadigm Programming in Leda' authored by the founder himself. I tried finding a free soft copy of this book to learn more but in vain. But I got a research paper on this topic from my univ's 'Adaptive Systems and Languages group'. Will post more when I get time to read that paper.

Thursday, January 20, 2005

Paradigms of languages

In the first lecture, I learnt about the different paradigms of prg. languages. This is not new to us but generally we classify them as low-level, structural and object oriented languages, and what I learnt is a way forward. Have a look at it.

The taxonomy is provided based on the nature of program and data in these paradigms. The taxonomy is based on my instructor, Dr. Mattox A. Beckman's views.


1. Low level languages
Program : series of control signals for a CPU.
Data : integers, memory address, IEEE floating point.
eg: Assembly languages and microcode

2. Imperative languages
Program : list of commands to be executed.
Data : low level types (integers) and composite types (records and arrays)
eg: C, BASIC, Fortran, Forth and Pascal.
Fortran is still the best language to do intensive mathematic calculations. It has advantages over Matlab.

3. Data encapsulation languages
Program : list of commands to be executed.
Data : low level types (integers), composite types and we can hide details from the user.
eg: Clu, Modula 2, Ada

4. Object oriented languages
Program : message between objects
Data : object (functions with state)
eg: Smalltalk, Java, C++
C++ and Java were actually inspired by Smalltalk which should have been the web language, - what Java has become. But they didn't see the opportunity at the right time.

5. Functional languages
Program : an expression to evaluate.
Data : low level types and higher order types (the functions)
eg: LISP, Scheme, ML

6. Logic programming languages
Program : a logical predicate to satisfy.
Data : set of assertions about what we know to be true.
eg: Prolog

So from this taxonomy am already introduced to four of the paradigms. I learnt BASIC during my schooling days and have done few coding in assembly level language during my undergraduate studies. I have also learnt C and C++ in a private course. Then I learnt and worked in Java during my work experience.

Still there are two more paradigms unexplored and I'll learn them in this course (functional and logic programming languages). Hope that should provide me with the well-rounded understanding of the different paradigms, I strive for.