It’s not pretty and it’s not clever. Frances Buontempo considers if the cut-up method can be used to generate editorials.
It is traditional to review the last twelve months around this time of year. Having now avoided writing a proper editorial for over twelve months, I wondered if my collection of musings was worth reviewing. The thing to do was clearly try to automate the process. Having been far too busy to write an automatic editorial generator, but inspired by Charles Stross’s recent blog [ Stross ] wherein he used Markov chains to generate text based on the King James Bible and H. P. Lovecraft leading to a strange and oddly pleasing fusion of the two styles, I found some python code to generate text [ MarkovChains ] and ran it over my previous excuses for editorials. A Markov chain generates a new state from the current state without looking back at history and is therefore frequently described as memoryless. Each state, for example in text processing, a word, can move to one or more other states with a pre-specified probability. The probabilities in text processing will be formed from analysis of a ‘seed’ document. Though this would not create a review as such, it should generate text in the spirit of the inputs. Unfortunately, this code tended to keep whole sentences or at least phrases, though it did generate some interesting ‘thoughts’, if one can call machine-written words thoughts.
‘If the code is compiled, there is no documentation, or no version control.’
‘I suspect I will not be changed between runs’
‘get off having to write an automatic editorial generator’
‘I enjoy reading sci-fi, though I do wonder why these stories still tend to insist on the idea of carrying out instructions.’
‘I keep writing’
Deeply disappointed with the rehash of whole sentences, like a bad montage of television programmes at the end of the year, I then ran Stross’s perl on the same input. After a couple of package installations, and filtering out all the errors, we get a variety of unhelpful or ridiculous musings, my favourites being
‘Many then fall in love with their brains engaged.’
‘Electronic wizards can be given the instructions for a four year stint.’
‘C++ is provable or falsifiable.’
‘The creation of the calculus gave ways to form the language, though paused for Turing.’
And near poetry
‘If it works, it easier than an answer.We have sometimes taken as‘You have decayed away.Imagine that one day.A variety of ways of editing inputs for computers,so many technical books do you.’
Randomly generated machine outputs have a long and varied history. For example, Monte-Carlo simulations are frequently used to solve difficult numerical problems. This approach requires an upfront, often iterative, model wherein the next number is generated using the previous number with some degree of random perturbation, or each output generated by choosing a random input. Genetic programming, GP, attempts to automatically generate code, or even design machines such as circuit boards, by randomly piecing together shapes according to rules, be that expression trees or chips and connectors [ GP ]. GP needs no upfront model, but does require a pre-defined fitness function to select the better solutions to a given problem. It starts with a generation of randomly created solutions and cuts them up, referred to as crossover, sometimes randomly mutating parts, to reform other candidate solutions from the pieces, supposedly thereby mimicking evolution.
Unfortunately, it is difficult to decide a model for generating an editorial in advance, or give a precise fitness function for acceptable editorial attempts. This does not preclude the possibility of using randomness to create text, poems, or indeed other forms of art. Having (nearly) stuck to my book buying ban this year, I persuaded my sister to buy me Assimilate: A Critical History of Industrial Music [ Assimilate ] (not recommended for the faint-hearted). It delves into the artistic background and precedents for various noisy ‘metal machine music’ [ MMM ] bands. The original Lou Reed album is often described as, ‘ Electric guitars feeding back to create a complex multi-layered sound collage ’ [ BBC ], though the phrase has become almost legendary appearing in songs and titles by Die Krupps, Sabaton and others. Assimilate suggests many approaches to creating industrial music trace back to William S Burrough’s ‘Cut-up’ method: “ The cutup is a mechanical method of juxtaposition in which Burroughs literally cuts up passages of prose by himself and other writers and then pastes them back together at random ” [ Cut-up ]. This in turn can be traced back to the Dada movement.
Frequently, legacy code bases appear to have been formed using a similar cut (and paste) approach. Snippets of functions proliferate through the entire code base peppered with a variety of mutations on the way. Other functions fail to follow the artistic coding style, if there is one, for example breaking brace placement or white space conventions, suggesting a manual attempt to slam randomly selected code samples taken from elsewhere into the mix to solve a problem (possibly accidentally creating another). People usually leave typos intact when doing this, making it easy to trace the provenance of the code. If only they'd copy over any unit tests when taking such an approach. I believe it is traditional to ask when confronted by such modernist techniques and ‘installations’, “ But is it art? ” 1
The final result can appear like a discordant, jarring mess speckled with repeated leitmotifs. This mirrors the disorienting effect of some of the more extreme, experimental industrial noise artist’s outpourings. Where the music is a bid to either block out the world or to escape the claimed ‘viral impact’ of convention by shaking people out of their norms, the code is usually just an endeavour to implement some new features, though it may have a similar slightly nauseating impact on its audience.
Applying the cut-up coding methodology is impossible without an input stream of code to copy and paste. Nowadays people tend to use the internet as the source of all source, so in the spirit of Dadaist impishness, it can be lots of fun to unplug network cables at random. Since the Dada movement has been described as “ flout[ing] conventional aesthetic and cultural values by producing works marked by nonsense, travesty, and incongruity ” [ Dada ], there would be delicious irony in apply Dadaist techniques in order to enforce aesthetic and cultural values in a code base, thereby keeping the flame [ KoF ] and stopping “ nonsense, travesty and incongruity ”. I suggest a New Year’s resolution to attempt to write some code from time to time with your network cable unplugged.
If one already has a code base, that can be used directly as an input for genetic programming. Or, in lieu of a fully formed GP application, other types of randomness can still be applied fruitfully. Indeed, a partial step towards genetic programming becoming common-place is the recent interest in mutation testing. This takes and mutates existing source code, then running it against a suite of tests, which are functioning like the fitness function in GP. The mutations may swap binary logical or arithmetic operators, such as
, delete statements, or swap variables in the same scope. Various other mutations are possible. The only difference to GP is the lack of cross-over and the original code is human generated, rather than randomly generated by a machine. For example, in the Java based PITest,
Faults (or mutations) are automatically seeded into your code, then your tests are run. If your tests fail then the mutation is killed, if your tests pass then the mutation lived. The quality of your tests can be gauged from the percentage of mutations killed. [ PITest ]
This is therefore a way of testing the tests. Any living mutations can suggest further tests that need adding, or requirements that need clarifying, or if you are very lucky code that can be deleted. This supposes you have some tests. Applications do exist to write tests for you, for example Microsoft’s Pex and Moles [ Pex ] though I suggest you do write some of your own tests first, preferably before writing any code let alone before using Pex. I assure you the potential edge cases you may have missed if you do not are legion. Being presented with millions of tests which fail from a few thousand lines of code is overwhelming. Nonetheless either randomly mutating code or randomly generating tests can be very informative.
It is difficult to draw concrete conclusions from these thoughts, so I ran Stross’s Perl 2 code [op cit] over the above. It produced two revelations:
Deeply disappointed with repeated leitmotifs
The claimed ‘viral impact’ of convention by shaking people out of mutations’
It seems my automatic editorial generator is a long way off. However, parallels between the ways in which some code bases develop, various art movements in the last hundred years and differing aesthetic viewpoints is interesting. The creative process is fascinating, whether applied to music, art or computer programming. Randomly shaking things up can give results. The results may not always be pleasing, but can be thought provoking and might just work. Whether everyone is in agreement about the final result is another matter. Some will scream in horror, “ Make it stop! ” while others may be delighted with the outcome. We are, after all, frequently told “Beauty is in the eye of the beholder.”