The Model Student: The ACCU 2010 Crypto Challenge

Electronic computers advanced cryptography enormously. Richard Harris sets a challenge, and finds a solution.

Once again I have had the good fortune to have been invited to contribute a cryptographic puzzle as part of the fund raising efforts for Bletchley Park and the Museum of Computing during the ACCU conference.

Electronic computers advanced cryptography enormously. Richard Harris sets a challenge, and finds a solution. Electronic computers advanced cryptography enormously. Richard Harris sets a challenge, and finds a solution. Last year, if you recall, I designed a puzzle based upon the Enigma machine which could be broken with pencil and paper in 15 to 30 minutes once the weaknesses in the rotor mechanism and the crib in the message had been spotted.

As was the case with that puzzle, this one has been designed to be possible to solve with pencil and paper alone and includes a bonus question that cannot be answered if the problem is simply brute-forced.

So that you too may enjoy the puzzle, it is repeated below followed by its historical justification, its solution and the names of the master cryptographers who solved it at the conference.

The challenge

Encoding

The enemy are using a 32 character alphabet encoded as 5 bit unsigned binary numbers. The # character is a control code indicating that the following character should be interpreted as its numerical value rather than as a letter or punctuation. The full table of character mappings is given below.

      00000000001111111111222222222233  
      01234567890123456789012345678901  
      #abcdefghijklmnopqrstuvwxyz ?!.,

Encryption

The encryption scheme used by the enemy is a symmetric key stream cipher in which each plaintext number is bitwise exclusive or-d with the key to create the ciphertext number after which the plaintext number is added to the key (discarding any bits above the 5th) to create a new key.

An in-place C implementation of this scheme is given in Listing 1.

    void  
    encrypt(unsigned char *s, size_t n,  
       unsigned char key)  
    {  
      while(n--)  
      {  
        unsigned char c = *s;  
        *s++ ^= key;  
        key += c;  
        key &= 0x1f;  
      }  
    }

Listing 1

Plaintext

We know that the enemy prefix their messages with a single 5 bit number (i.e. one character) representing the message's priority. We also know that they foolishly postfix their messages with the date in the form

      DDMMM

where DD is the day and MMM is the month, represented by a number and a three character abbreviation respectively.

The enemy encrypt each message with a randomly chosen key. This message key is repeated twice in a 2 character header at the start of the plaintext which is itself encrypted using a daily key.

Example

The plaintext

      kkbflee#dfeb

represents a priority 2 message of 'flee' sent on the 4th February that will be encrypted with the message key 'k' and the daily key using

      assert(*plaintext == *(plaintext+1));  
 
      encrypt(plaintext+2, 10, *plaintext);  
      encrypt(plaintext,   2,   daily_key);

If the daily key is 'd' this yields the ciphertext

      odik,zaimkvz

Ciphertext

Decrypting the following ciphertext will reveal the number of the box in which to deposit your ticket.

      ynb#zuybzmot.vt!xbhaxh

Bonus question

If we add to the ciphertext and the information we have about the structure and content of the plaintext a set of guessed key and/or plaintext characters at specific locations in the plaintext, and if we assume that these guessed characters are added to this set in the worst possible order that is logically consistent with a cryptanalysis, how large must this set be to guarantee that e correctly deduce where to deposit the ticket?

The historical justification

After the electro-mechanical rotor-based cryptosystems like the Enigma, the next big development in commercial cryptography were the digital electronic cryptosystems of the 1970s.

Perhaps the most successful of these was the Data Encryption Standard, or DES, developed by IBM as a candidate for a government encryption standard proposed by the US National Bureau of Standards and not officially retired until 2005.

This was a block cipher, in which the plaintext is broken up into relatively large blocks, each of which is algorithmically combined with a fixed key.

In contrast to block ciphers are stream ciphers which bear a much greater resemblance to their forebears, being polyalphabetic substitution ciphers operating on single characters, or even on single bits, of the message at a time. In doing so, they are attractive for telecommunications, for which the size of the message can rarely be determined in advance and the hardware may have limited volatile memory in which to store the message data.

These ciphers draw their inspiration from the only provably unbreakable encryption algorithm: the one-time pad. To operate a one-time pad, the sender and receiver of encrypted messages must first exchange identical sequences of random characters. To encrypt a message the sender iterates through both its characters and those in the sequence of random characters, combining them; typically with x-or. To decrypt a message the receiver performs the same operation on the ciphertext. Crucially, each must discard the random characters as they are used; if they do not do so their messages can be broken with statistical analyses.

The difficulty in exchanging the random characters makes the one-time pad unattractive for almost all applications. It is therefore extremely tempting to replace the random sequence with a pseudo-random sequence so that the two parties need only exchange the much shorter seed. The resulting cryptosystem is no longer provably unbreakable, but may be practically unbreakable.

Provided the pseudo-random sequence of characters is not too predictable, that is.

The solution

To decrypt the message we apply the same algorithm, although since the plaintext character is recovered after the x-or we don't need to store it to add to the key. An in-place C implementation is given in Listing 2.

    void  
    decrypt(unsigned char *s, size_t n,  
       unsigned char key)  
    {  
      while(n--)  
      {  
        *s ^= key;  
        key += *s++;  
        key &= 0x1f;  
      }  
    }

Listing 2

Trivially, we could brute force the message and discover the plaintext with a worst case of 32 guessed characters. The real question is whether or not we can decode it with fewer.

Noting that the enemy postfix their messages with the date we know that the 5th character from the end must be '#' , represented by a 0 . Since the ciphertext is generated by x-oring the plaintext with the key we can deduce that the key for this character must be 'b'. Running the last 5 characters through the decryption function yields

      #jmar

indicating that the message was sent on the 10th March.

We might assume that the final characters of the message represent the number of the box, in which case the 7th character from the end must also be a ' # ', meaning that the key must be ' ! '. Using this key to decrypt the last 7 characters of the message yields

      #e#jmar

which is entirely consistent with what we already know about the plaintext. This is reasonably convincing evidence that the ticket should be deposited in box 5 and that we need just one guessed character.

Reasonably convincing, but wrong.

In last year's Enigma challenge, the predictable structure of the message's postfix signature was the crib that ultimately allowed its decryption. As such it was too tempting for me to resist using it as a red herring in this puzzle.

This time the crib is, in fact, the repeated message key.

The trick is to recognise that for single bits, addition is exactly the same as x-or. Specifically, for any pair of integers, we have

      (x+y)&1 == (x^y)&1

Denoting the i th bits of the daily key, the message key and the j th character of the plaintext as d i , m i and T j,i respectively, we have

From this we can deduce that the least significant bit of the message key is given by

We can perform a similar calculation for the second least significant bits, but must take into account the fact that there may have been a carry when the daily key was added to the message key.

Fortunately, we can retrieve the carry bit c 1 from the information we have already extracted from the message.

Denoting the i th carry bit as c i we can generalise our formulae to bits other than the least significant.

giving us a recursive procedure for extracting the bits of the keys.

The first two characters of the ciphertext are

      y = 25 = 11001  
      n = 14 = 01110

Applying this procedure yields

Giving us daily and message keys of

      01010 = 10 = j  
      10011 = 19 = s

Using these to decrypt the message yields

      ssqdrop no. three#jmar

meaning that the ticket should be placed in box number 3 and that it was not necessary to guess any characters.

And finally...

Congratulations to Duncan Gary Duke, Andrew Kemp, Per Liboriussen, Callum Piper and Martin Turnock who successfully cryptanalysed the Crypto Challenge.

To everyone who took part, and to everyone who donated to Bletchley and the Museum of Computing, we extend our deepest gratitude.