Clubhouse Crypto

by Kirby Urner
First posted: Nov 19, 2000
Last modified: Nov 2, 2003

Neal Stephenson's best-selling novel Cryptonomicon provides a good example of "math through story-telling", meaning it weaves narrative (in many cases historically well-informed) with exposure to mathematical concepts and principles. This fits my model of the ideal math teacher: someone who makes the world more comprehensible to students by using math in a story-telling context.

Somewhere in the book, Neal mentions "clubhouse codes", to which many children gain exposure, and which involve simply substituting one letter of the alphabet for another, in order to scramble a message.

The recipient of the scrambled message needs the key, in order to reverse the substitution and regain the plaintext from the ciphertext (the encoded message). Or not: such clubhouse codes are in principle rather easily crackable, based on analysis of letter frequencies in the language being encrypted.

For example, the letter E occurs most frequently in English, and the word THE, also extremely common, includes E. That's a place to start then: look for whatever letter occurs most frequently and, if you know the plaintext was English to begin with, substitute E. Then look for a recurring 3-letter pattern that might be the word THE -- you now have a partial candidate key for E, H and T. With sufficient perseverance along these lines, your ciphertext will gradually reveal its secrets.

However, regardless of its crackability, a clubhouse code helps define some of the core concepts and key terms in cryptology: plaintext, key and ciphertext. In a system such as this one, the intended recipient of your encoded message needs the key in order to unlock it. So there's the problem of passing keys around -- shall these be encrypted as well, and with what keys?

The Python permute method below takes whatever list we give it and returns a random rearrangement of its members. The method selects one list member at a time, using random.choice, until all the members have been selected.

"""
Clubhouse crypto:  simple substitution, easily cracked
(but useful for learning purposes).

Permutation class S appended to facilitate group theory
approach to permutations as per this thread archived
at the Math Forum:

http://www.mathforum.com/epigone/mathedu/swexprundwand
"""
import string, random

def permute(pool):
    """
    Randomly permute whatever list is passed in as pool
    """
    oldlist = []+list(pool)
    newlist = []
    for i in range(len(oldlist)): # = range(26)
        randitem = random.choice(oldlist)
        oldlist.remove(randitem)
        newlist.append(randitem)
    return newlist

In the examples below, we take advantage of the fact that the uppercase alphabet is predefined as a variable in the string module.

Usage:

Python 2.1a1 (#9, Jan 22 2001, 20:58:30) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.6 -- press F1 for help
>>>
>>> from clubhouse import *
>>> permute(string.uppercase)  # random selection  
['Q', 'S', 'L', 'F', 'Z', 'D', 'N', 'Y', 'R', 'T', 'V', 'K', 'W', 
'B', 'H', 'A', 'X', 'E', 'C', 'G', 'M', 'O', 'I', 'P', 'J', 'U']
>>> permute(string.uppercase) # random selection  
['U', 'V', 'Q', 'Y', 'O', 'H', 'C', 'W', 'G', 'N', 'Z', 'S', 'D', 
'I', 'J', 'M', 'F', 'L', 'B', 'R', 'P', 'T', 'X', 'E', 'K', 'A']

The new ordering now serves as the basis for a substitution key: simply pair A with the first letter selected, B with the next, and so on, through Z -- the mkdict (make dictionary) method does this job:

def mkdict(n=0):
    """
    Pair uppercase alphabet or integers 1...n with
    randomly permuted version of same
    """
    if n>0:  pool = range(1,n+1)
    else:    pool = string.uppercase
    tuples = zip(pool,permute(pool))
    codedict = {}
    for pair in tuples:
        codedict[pair[0]]=pair[1]
    return codedict

Usage:

>>> dict = mkdict()  # pairing letters with substitutes, using permute()
{'Z': 'P', 'X': 'A', 'Y': 'O', 'V': 'K', 'W': 'S', 'T': 'C', 
'U': 'I', 'R': 'G', 'S': 'E', 'P': 'Q', 'Q': 'L', 'N': 'Y', 
'O': 'J', 'L': 'X', 'M': 'U', 'J': 'F', 'K': 'T', 'H': 'M', 
'I': 'R', 'F': 'W', 'G': 'D', 'D': 'V', 'E': 'H', 'B': 'N', 
'C': 'Z', 'A': 'B'}

The clubhouse.encrypt and clubhouse.decrypt methods (see source) simply make use of these auto-generated keys to encipher and decipher, making use of a key file on the side (a file that needs to be transmitted separately to the intended receiver).

The key file contains the same information as a substitution dictionary, but in a somewhat more compact form, as a list of one or more "cyclic permutations". For example, let's consider the key for the above dictionary:

>>> mkkey(dict)
[('Z', 'P', 'Q', 'L', 'X', 'A', 'B', 'N', 'Y', 'O', 'J', 'F', 'W', 
'S', 'E', 'H', 'M', 'U', 'I', 'R', 'G', 'D', 'V', 'K', 'T', 'C')]

Each letter maps to the one following (e.g. 'Z' maps to 'P'), while the last in a cycle maps to the first (in this case, 'C' maps to 'Z'). These are the same pairings defined by dict. In the next example, you will find more than one cycle, plus some letters are not mentioned at all: the ones which simply map to themselves.

>>> dict = mkdict()
>>> dict
{'Z': 'K', 'X': 'W', 'Y': 'H', 'V': 'S', 'W': 'F', 'T': 'U', 'U': 'M', 
'R': 'I', 'S': 'E', 'P': 'Z', 'Q': 'Y', 'N': 'N', 'O': 'J', 'L': 'G', 
'M': 'R', 'J': 'D', 'K': 'Q', 'H': 'A', 'I': 'L', 'F': 'T', 'G': 'O', 
'D': 'C', 'E': 'P', 'B': 'B', 'C': 'X', 'A': 'V'}
>>> mkkey(dict)
[('Z', 'K', 'Q', 'Y', 'H', 'A', 'V', 'S', 'E', 'P'), ('X', 'W', 'F', 
'T', 'U', 'M', 'R', 'I', 'L', 'G', 'O', 'J', 'D', 'C')]

The encrypt method invokes mkkey to extract the cyclic permutations from any substitution dictionary. The decrypt method invokes revkey to reverse the cyclic permutations, and feeds the output to gendict to build a reverse-lookup dictionary.

To make deciphering a little harder, you might want to use Python to mask word breaks by parsing your plaintext into 5-letter chunks, and strip out all punctuation (unless you want to include punctuation in the substitution, as per an "expanded alphabet" key). The methods given here take neither of these precautions, making decryption all that much easier.

Enigma

During WWII, the German intelligence community standardized on an enciphering system known as Enigma, which depended on mechanical devices, somewhat like typewriters, and which contained removable rotors. When the operator pressed a letter (e.g. W), another letter would light up on the console (e.g. Q). These illuminated letters comprised the ciphertext, which is what would actually get transmited, perhaps by Morse code over radio.

The receiver would need to assemble an identical Enigma machine using the same secret key, part of which was enciphered in the message itself, and part of which was scheduled ahead of time for each day of the year, and locked in a safe. Once the receiver's Enigma was properly configured, the operator would get the plaintext back out again (i.e. typing a W would light up a Q, and so forth).

Each rotor was internally wired to cause a letter permutation, such as in clubhouse codes, but then the first rotor in line would turn with each keystroke (changing the substitution), plus knock the next rotor a notch forward once per full rotation, as per a car's odometer -- and so on down the line of rotors (how many depended on the model -- three or four was about average). Note that although the different rotors were standard, their order was variable. Other scrambling tricks were also employed.

The idea behind Enigma, as with any key-based enciphering system (e.g. see Blowfish below) is to make possession of a secret key the only feasible means of recovering the plaintext. In other words, discovery of the key by "brute force" (trying all possibilities) would be (a) the only other option and (b) entirely infeasible -- even if the would-be decoder was in posession of an Enigma machine (or corresponding software algorithm).

Enigma is prototypical of software "machinery" which does nothing to hide the algorithm (for example, the Blowfish algorithm is in the public domain). You can see exactly how the Enigma works, but examination of the workings should only convince you that, without the key, it's hopeless to attempt any brute force solution, plus there are no short-cuts to exploit.

In fact, thanks to Lance Ellinghouse, Python optionally ships with the rotor module as part of the standard library, wherein you will find methods loosely based on the Enigma machine. The number of rotors is up to you. The Python documentation gives the details for using these methods, with the excerpt below giving some idea of the facilities:

Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.6 -- press F1 for help

>>> from rotor import *
>>> enigma = newrotor("this is my key") # nb rotors optional, default = 6
>>> plaintext="Now is the time for all bozos to clown around"
>>> ciphertext = enigma.encrypt(plaintext)
>>> ciphertext
'\216r\000\270\010\256z\220\310*\204\243\242\230\324l\256\243j\352\
233\036\224^\207[;\355\202u\214@\326\341Q\334\352\007 \235\225\011D|\304'
>>> decoder = newrotor("this is my key")
>>> decoder.decrypt(ciphertext)
'Now is the time for all bozos to clown around'

Unfortunately for learning purposes, rotor is not a source code module, meaning you will find no additional Python by looking under the hood.

As it turns out, Enigma was vulnerable to short cutting by means of various mathematical tricks. Intercepted ciphertexts were indeed cracked by the folks at Bletchley Park, working under the direction of Alan Turing -- inventor of the "Turing Machine" an abstraction for thinking about stored-program computing.

Stephenson's Cryptonomicon is in part about the Enigma story, including attempts to keep Enigma-users from believing that Enigma had been cracked -- so they would keep using it. This plot twist takes us deeper into the paranoia-filled world frequented by cryptologists, where the goal is not only to crack codes, but to keep the other side guessing as to which of its several security systems might have been compromised.

Blowfish

The Blowfish algorithm was placed in the public domain by Bruce Schneier in 1993. This secret key code is designed to be both easy to implement and robust i.e. it's supposedly uncrackable except by inefficient (i.e. impractical) brute force methods.

The algorithm accepts a user-supplied, variable-length key as a stick to stir 1168 hexadecimal digits of PI into a unique broth. Binary plaintext is then chopped into 64-bit chunks and enciphered against this key-initialized "PI soup" using something called a 16-round Feistel network. Anyone with a soup keyed the same way will be able to recover the binary content by reversing the 16 steps for each chunk and stringing them back together.

Python's built-in read method turns binary data into character strings. Eight-character strings of 64 bits are converted to long integers, the data type Python uses for bitwise operations such as AND (&) and XOR (^) -- operations at the heart of the blowfish algorithm. Since the plaintext may not divide evenly into 64-bit chunks, 1 to 8 bytes are padded on the end (before enciphering), each encoding the number of bytes to shave off when the ciphertext is decoded.

The session below gives an idea of how to use the included blowfish.py module:

>>> from blowfish import *
>>> encrypt(r'.\ocn\bftestpix.gif')
File size: 9793 bytes
Initialization complete
Padding with 7 bytes
New file size: 9800
Encrypting...
Writing to file: 9800 bytes
>>> decrypt(r'.\ocn\bftestpix.cpt')
File size: 9800 bytes
Reading from .\ocn\bftestpix.cpt
Writing to .\ocn\bftestpix.dcp
Using key .\ocn\bftestpix.key
Initialization complete
Decrypting...
Last byte: 7 
Writing file: 9793 bytes

A block enciphering strategy such as Blowfish is vulnerable to a specific form of tampering unless a feature known as "block chaining" is implemented. For example, if a bank routinely sends financial transactions with number amounts in blocks 14 and 31, then even without knowing the secret key, an interceptor might switch these two blocks, and the message would decrypt to a sensible message at the other end -- and yet not the one originally sent.

Block chaining stitches blocks together in some standard manner, such that block switching will not go undetected. A full-featured implementation of the Blowfish algorithm provides for this option, as do other sophisticated codes in this genre.

Also so far not a part of this implementation is a utility to convert a pass phrase into a key of bits. You can use the function blowfish.mkchunk for testing purposes: it takes a string of characters and returns a large number corresponding to the string's bytes. The built-in hex function will show you the bytes in hexadecimal. The key should be between 4 and 56 bytes.

>>> import blowfish
>>> blowfish.mkchunk("my key")
120367002379641L
>>> hex(120367002379641L) # note: "y"=79, space = 20
'0x6D79206B6579L'

Although you can use these large decimals generated by mkchunk, it waters down the overall strength of your system, since byte codes corresponding to the letters of your pass phrase would comprise but a small percentage of the total bit permutations theoretically available to you -- because a pass phrase would tend consist of readable, memorable words.

A more sophisticated utility would "hash" your pass phrase in some way, taking better advantage of the full key space, thereby removing this helpful clue that the key bits directly correspond to some humanly readable phrase.

A last word about security: for learning purposes, it's convenient to have the key and key length automatically saved to disk, in some [filename].key corresponding to whatever [filename].cpt contains your ciphertext. However, saving keys this way defeats the purpose of using strong encryption.

Ideally, in the real world, you would want to generate a key from a pass phrase using some utility and then save no trace of either that key or the pass phrase. Some of the readings below should give you more ideas about how to manage at this level.

RSA

RSA, developed in 1977 at MIT by Rivest, Shamir, and Adleman, is an algorithm used for public key cryptography. RSA is not the only public key encryption scheme, but it's a popular one, thanks in part to its early use in "Pretty Good Privacy" (PGP) freeware. The public key approach involves senders and receivers generating keys in pairs, only one of which is shared publicly.

The whole idea of public key cryptography is relatively new -- late 20th century. Cryptographer James Ellis, working in the GCHQ, first stumbled on the concept in 1970. Whit Diffie and Marty Hellman, collaborators at Stanford University, first published the idea in the open in 1976, having hit upon it independently. Clifford Cocks and Malcolm Williamson, associates of Ellis, came up with public key algorithms virtually identical to RSA and Diffie-Hellman respectively, but again, this did not become public knowledge until the 1990s, with the posthumous publication of The history of Non-Secret Encryption by Ellis himself.

Plaintext enciphered with the public key can only be deciphered with the private key. So would-be recipients of enciphered messages freely advertise their public keys, treating them like phone numbers in a directory. If I want to send you a message that only you can decipher, I'll look up your public key and use it to make the ciphertext. Since you have the private key, only you will be able to recover the plaintext (at least in theory).

In RSA, the public/private key pairs are generated by a mathematical process involving randomly generated large prime numbers (a hundred digits or more). In actual practice, these are more likely "strongly probable primes" with a 99.999% or greater chance of being true primes, as certified by the so-called Miller-Rabin algorithm.

>>> import primes
>>> primes.bigppr()
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 3.76411019293 seconds
5229236241814320020881869258471526191920757122084994253398649989353163288
4929115163958123166453249867L

Two such probable primes, p and q, get multiplied together to form composite number n (called the modulus). These three numbers, p,q and n, are further manipulated to derive additional values e and d. The tuple (n,e) constitutes the public key unique to each recipient, and is used to encipher, while the tuple (n,d) constitutes the private key each recipient uses to decipher. The private keys are never transmitted between parties. Generating d from (n,e) requires knowing p and q, the prime factors of n -- but these are destroyed after n is generated. Since factoring such huge numbers as n is well-nigh impossible, this system is as secure as any with only brute force solutions.

The example below uses small primes to demonstrate the RSA algorithm. The Python built-in pow function and % operator have great utility in this context. Note that the pow function optionally accepts a 3rd argument, such that pow(a,b,n) = (a**b) % n, meaning: "the remainder, when a raised to power b is divided by n."

>>> p = 83                 # prime, in secure RSA over 100 digits long
>>> q = 37                 # another prime, also huge in actual practice
>>> n = p*q                # n = product of p*q, publicly shared
>>> n
3071
>>> r = (p-1)*(q-1)        # used to compute e, and d from e
>>> r
2952
>>> e = 25                  # e must be odd w/ no factors in common w/ r
>>> d = primes.invmod(r,e)  # compute d such that d * e mod r = 1
>>> d
1417L
>>> (1417 * 25) % r        # check
1
>>> plaintext  = 1015       # long messages need to be block-enciphered
>>> ciphertext = pow(plaintext,e,n)   # (e,n) is the public key
>>> ciphertext
752
>>> deciphered = pow(ciphertext,d,n)  # (d,n) is the private key
>>> deciphered
1015

If n, the product of two probable primes, could be easily factored back into p and q, then the RSA scheme would not be secure. But present-day mathematics offers no speedy algorithms for factoring composite numbers with huge prime factors. Unless such algorithms are devised, or some other clever short-cutting "crack" is developed, RSA will remain both popular and pretty good.

Different encryption schemes may be combined of course. Given the relative slowness of RSA, it's typically used to encrypt and send a traditional secret key to the recipient, using the recipient's public key. The message body may then be encrypted with some other non-public key algorithm, such as IDEA (the original PGP strategy -- not counting the Phil Zimmermann's first draft using Bass-o-Matic).

For example, suppose you want to send someone the key for deciphering a Blowfish message: you could use your recipient's public key to encipher the Blowfish key. Your recipient will then use RSA to derive the Blowfish key, and decrypt the Blowfish-enciphered message.

For further reading:

Source code: clubhouse.py, blowfish.py, primes.py
Discussion of this code on edu-sig (Python elist)
More background on edu-sig (look for Cryptonomicon)
Exploring Groups and Codes with Python (Mar 2001)
Cryptonomicon by Neal Stephenson (interview with the author)
Crypto: How the Code Rebels Beat the Government -- Saving Privacy in the Digital Age
by Steven Levy (Viking Press, 2001)
Kids' Page at the National Security Agency (USA)
NSA Cryptologic Museum
cryptonomicon.com
Re: the Solitaire cypher (featured in the novel -- includes Python version)
Re: the Blowfish cypher (with Python elist discussion and source)
Virtual Tour of Bletchley Park (where Turing et al broke codes in WWII)
The Principle of the Enigma by Tony Sale
The Enigma machine by David Hamer (with a link to some of Turing's original writings)
Alan Turning Home Page maintained by Andrew Hodges
Download software simulators of WWII-era encryption machines (Windows)
Newsgroup thread re this essay (Nov 28, 2000)
Numeracy + Computer Literacy essay by Kirby Urner (more math through programming)
Re: Math Word Problems (MWP) as a distinct topic (Nov 19, 2005 12:19 PM)

Oregon Curriculum Network