Showing posts with label programming. Show all posts

Monday, February 19, 2018

Syntax Char RNN for Context Encoding

Summary

Syntax Char RNN attempts to enhance naive Char RNN by encoding syntactic context along with character information. The result is an algorithm that, in selected cases, learns faster and delivers more interesting results than naive Char RNN. The relevant cases appear to be those that allow for more accurate parsing of the text.

This blog post describes the general idea, some findings, and a link to the code.

Background

As both a writer and a technologist I have for some time now been interested in the ability to programmatically generate language that is at once creative and meaningful. Two of my previous projects in this context are Poetry DB and Poem Crunch. I also wrote a novel that incorporates words and phrases generated by a Char RNN that had been trained on the story's text.

Andrej Karpathy's now-famous article on RNNs was a revelation when I first read it. It proved that Deep Learning can generate text in ways that at first appear almost magical. It has afforded me a lot of fun ever since.

However, in the context of generating meaningful text and language creatively, it ultimately falls short.

It is helpful to remember that Char RNN is essentially an autoencoder. Given a particular piece of text, let's say this blog post, training will build a model that, if fully successful, will be able to reproduce the original text exactly: it will generate exact copies of the original text from which it learned and created the model.

The reason for Char RNN's widespread employment in fun creative projects is its ability to introduce novelty by either tuning the temperature hyperparameter or, more commonly, as a side effect of imperfect learning.

To be sure, imperfect learning is the norm rather than the exception. For any text beyond a certain level of complexity, a naive Char RNN will reach a point during training when it can no longer improve its model.

This naturally leads to the question, can the Char RNN algorithm be enhanced?

Context encoding

Char RNN encodes individual characters, and the sequence of encodings can be learned using for example LSTM units to remember a certain length of sequence. Aside from the relative position of the character encodings, the neural network has no further contextual information to help it 'remember'.

What would happen if we added other contextual information to the character encodings? Would it learn better?

Parts of Speech

Parts of Speech are structural parts of sentences and a fairly intuitive candidate for the problem at hand. Although POS parsing hasn't always been very accurate, SyntaxNet and spaCy have been setting new benchmarks in recent times. Even so, accuracy might still be a problem (more on that later), but they certainly hold promise.

So how does POS parsing fit into Char RNN?

Let's take a look at the following sentence and its constituent parts.

Bob   bakes a  cake
PROPN VERB DET NOUN

We can see that the 'a' in 'bakes' and the 'a' in 'cake' are contextually different. The first is part of a verb and the second is part of a noun. If we were able to encode the character and POS together, for each character across the whole text, we would cover a sequence longer than is practical for an LSTM to remember. In other words, the model would understand syntactical structure in a more generic sense than with naive Char RNN.

B + PROPN
o + PROPN
b + PROPN
[space] + SPACE
b + VERB
a + VERB
k + VERB
e + VERB
s + VERB
[space] + SPACE
a + DET
[space] + SPACE
c + NOUN
a + NOUN
k + NOUN
e + NOUN

One way of achieving this is, for each character, to create a new composite unit that captures both the character and the pos type. So if we create separate encodings for the characters and the pos categories, eg. a = 1, b = 2, etc. and NOUN = 1, VERB = 2 etc., then we could do something along the lines of:

a + VERB = 1 + 2 = 3

However this creates a new problem, namely one of duplicates. I.e. we'd end up with lots of cases that have the same final encoding (in this example, the final encoding is 3):

a + VERB = 1 + 2 = 3

and

b + NOUN = 2 + 1 = 3

A better solution would have to ensure the encodings are completely separate before sorting them back into consecutive indexes to ensure uniqueness.

But the real problem with this solution is that, although we now have an encoding influenced by both characters and types, we've lost each unit's individual quality. In other words, the importance of a character encoded along with one type of POS unit is no longer properly weighted against a character of a different type of POS unit. Instead, it has simply become a composite type of its own.

An improved approach would be to encode both the character and the type as independent data, albeit of the same unit, and let the LSTM do the rest.

A Syntax Char RNN tensor might then look as follows:

[[ char, type ], [ char, type ] ... ]

However this heavily favours the type encoding over the character encoding, which in turn will skew the weightings.
A more balanced encoding might be:

[[ char, char, char, char, type ], [ char, char, char, type ] ... ]
              word 1                          word 2

A sense of unevenness remains, because some words are longer than others: why should each word receive just one type encoding? This is something left as a refactoring improvement for later.

For the time being, experimentation showed a kind of optimum results from adding two type encodings per word, as follows:

[[ char, char, char, char, type, type ], [ char, char, char, type, type ] ... ]
                  word 1                              word 2

Results

Char RNN can generate surprising turns of phrase and novel combinations of words, but longer extracts often read like gibberish. The hope was that context encoding might improve this state of affairs by strengthening the overall sentence structure represented in the model.

The SyntaxNet installation also installs DRAGNN and a language parsing model. Due to problems I had getting consistent results from SyntaxNet, I eventually settled on DRAGNN instead.

DRAGNN

Shakespeare

The first benchmark was based on the Tiny Shakespeare corpus. The following snippets are from checkpoints with equivalent validation loss, trained using the same hyperparameters (allowing for the proportionately longer sequence length in the Syntax Char RNN due to the additional type encodings).

Naive Char RNN (temperature: 0.8)

Capulet:
with him; there were not high against the nurse,
and i, as well as half of his brother should prove,
thou ask my father's power,
in this life of the world command betwixt
of greep and displent, rup in manrown:
and thou dost command thy stors; and take our woes,
that star in the sea is well.
 Now is a cunning four gloves of all violer on
himself and my friend traitor's jointure by us
to be holy part that were her horse:' the miles
for this seat with me in the island from scards
shall have stone your highness' weech with you.
 And unjust while i was born to take
with hardness from my cousin when i forget from me.

 Shepherd:
the unrelery, reign'd with a virtuous tongue,
to blush of his harms, and as sweet, if they
cape of england's purple say well true song hence,
shall appetite straight hath the law with mine?
 The composition should know thy face of my heart!

 Second huntsman:
i know him, and with my chast my mother.

Syntax Char RNN (DRAGNN parser; temperature: 0.8)

ITrue words, his son, fair despair is, and the guard brother; 
always you tell, 
though to see so many trees; i come!
 
Pedant:
 would you have no redress of joy of the march?
 
I, what i come up.
 
Menenius:
 i not a gentlemen cred; and as this old farewell.
 
Lucio:
 sir, she's a duke.
 
Capulet:
 so straight is the tyrant, shape, madness, he's weigh; 
which of the confail that e'er said and gentle sick ear together, 
we will see the backs.

Note: Syntax Char RNN output has been reformatted, but remains otherwise unaltered

I think it is easy to agree that the naive Char RNN generated text reads significantly better. There is a somewhat interesting punchiness to Syntax Char RNN's shorter dialogue sections, but that's about all it has in its favour.

This was, frankly, disappointing.

However, there is a reasonable chance that inaccurate parsing could be influencing the results. The DRAGNN model probably doesn't generalise well to Shakespearian English.

Would prose offer a better benchmark?

Jane Austen

The works of Jane Austen was used next. They would almost certainly parse more accurately.

The results this time were rather surprising. Syntax Char RNN raced away, reducing its loss pretty quickly. After one hour and fifteen minutes on my laptop CPU, Syntax Char RNN hit a temporary minimum of 0.771.

After the same time frame and with the same hyperparameters (again allowing for a slight adjustment of sequence length due to the extra type encodings for POS), naive Char RNN went as low as 1.025 - still nowhere near the Syntax Char RNN checkpoint.

I left it running overnight and it still reached only 0.944 after just over 8 hours.

This was interesting.

What about the quality of the generated text?

Naive Char RNN (loss: 0.944; temperature: 0.8):

"i beg young man, nothing of myself, for i have promised to be whole 
that his usual observations meant to rain to yourself, married more word
--i believe i am sure you will allow that they were often communication 
of it, but that in that in as he was in the power of a hasty announte 
by print,  to have given me my friend; but at all," said lady russell, 
so then assisted to  his shall there must preferred them very ill--
considertaryingly, very pleasant, for my having forming fresh person's 
honour that you bowed really satisfied we go.

Syntax Char RNN (DRAGNN parser; loss: 0.771; temperature: 0.8)

Mr. Darcy was quite unrescrailous of place, which and want to carry 
by the attractions and fair at length; and if harriet's acknowledge by 
engaging it for a sorry day he must produce ithim. I could be 
mr. Knightley's compray of marrying in the rest of the disposition, 
and i was particularly successful. He has been much like her sister 
to julia, i wishin she believed he may want pause by the room with 
the same shade. "" but indeed you had obliged to endeavour to concern 
of a marriage of his yielding.""

The results are roughly comparable, neither are special. If pushed I'd say I prefer the latter over the former, it reads a little better.

spaCy

spaCy is an amazing set of tools made available by the good folks at Explosion AI. Unlike SyntaxNet, or even DRAGNN, it is a breeze to use.

The spaCy parser's data had an interesting effect on training. A training run with DRAGNN data reached a loss of 0.708 after just under 6 hours, then failed to go lower for the rest of its total run of over 16 hours. The spaCy parser achieved 0.686 after 5.5 hours, and its best loss of 0.650 after just under 8 hours.

Here are snippets from the relevant checkpoints.

Syntax Char RNN (spaCy parser; loss: 0.686; temperature: 0.8)

You beg your sudden affections and probability, that they could not 
be extended to herself, but his behaviour to her feelings were 
very happy as miss woodhouse--she was confusion to every power 
sentence, and it would be a sad trial of bannet, but even a great 
hours of her manners and her mother was not some partiality of 
malling to her from her something considering at mr. Knightley, 
she spoke for the room or her brother.

Syntax Char RNN (spaCy parser; loss: 0.650; temperature: 0.8)

In the room, when mrs. Elton could be too bad for friendship by it 
to their coming again. 
She was so much as she was striving into safety, and he knew 
that she had settled the moment which she said," i can not belong 
at hints, and where you have, as possible to your side, 
i never left you from it in devonshire; and if i am rendered 
as a woman," said emma," they are silent," said elizabeth," 
because he has been standing to darcy, and marianne?"" oh! 
No, no, well," said fitzwilliam, when the subject was for a week, 
that no time must satisfy her judgment.

Note: Lines have been wrapped, but formatting remains otherwise unaltered

Both are quite readable, except for the injudicious use of quote marks (a problem that is most likely the result of redundant spaces picked up during pre-processing).

Even allowing for over-fitting, it is quite clear that in the case of Jane Austen's text the snippets generated by Syntax Char RNN are more readable and cohesive than those from naive Char RNN. Among the former, those produced via spaCy also show a marked improvement over the results produced via DRAGNN parsing.

Since the only significant difference between the Syntax Char RNN runs trained on Jane Austen texts were the data from the two different parsers, these findings suggest that accuracy of parsing between DRAGNN and spaCy likely accounts for the difference in performance and readability between the runs. This in turn suggests that a lack of accurate parsing accounts for the poor results achieved with Tiny Shakespeare.

Code

The code is available on github. Comments and suggestions welcome.

The majority of work was around pre-processing (parse and prepare). For training and sampling I was able to build on the existing Pytorch Char RNN by Kyle Kastner (which in turn credits Kyle McDonald, Laurent Dinh and Sean Robertson). I altered the script interface, but the core process remains largely the same.

Concerns and Caveats

The approach and implementation isn't ideal. Below are some considerations.

Pre-processing is complex. It has to make assumptions about the parser input, bring character and syntax encoding together, and try to remove data that can skew the weightings.
Pre-processing can be slow. It can take anything from a few seconds to tens of minutes, depending on the size and complexity of the file.
POS parsing is imperfect. The results suggest spaCy is doing a better job than DRAGNN, but even at its best it will have errors.
Applicability is limited to text that is at least consistently parseable by an available parser. Poetry, archaic language, social media messages etc. likely fall outside this scope.
Some POS units are known to cause skewing by introducing extra spaces. For example "Mary's" becomes:

Mary : PROPN++NNP
's    : PART++POS

All punctuation are separated out as well, for example:

, : PUNCT++,

The effect is that these parts of speech remain tokenised when encoded, resulting in redundant spaces on one side of each token. Unless they are subsequently removed, spaces become overrepresented in the resulting encoding, affecting weightings for all character representations ever so slightly. The code manages to remove some of these redundant spaces - with occasional side effects - but not all.

To Do

Avenues to investigate, features to add.

Reduce the number of redundant space encodings
Calculate a more granular weighting for type
Investigate further candidates for context encoding, over and above syntax
Investigate more elegant ways of grouping encoding contexts
Add validation loss for comparison to training loss
Estimate parser accuracy for a specific text
Run on GPUs!

Conclusion

The findings suggest that in some cases where parsing is accurate and consistent, Syntax Char RNN trains faster and achieves better results than naive Char RNN. This lends support to the hypothesis that accurate contextual encodings, over and above syntactic Parts of Speech, can improve Char RNN's autoencoding.

While they come with several caveats, the findings nonetheless warrant further experimentation and clarification.

Sunday, February 07, 2016

What Shakespeare has in Common with Software Development

Shakespeare is widely regarded as the world's leading playwright in English, and perhaps any language. Such is his influence that phrases and ideas coined by him at the turn of the 17th century still live on in our colloquial speech today. Romeo and Juliet is shorthand for passionate, ill-fated love, and quotable lines from his works permeate our treasure trove of idioms and phrases.

What is perhaps less well known is that many of Shakespeare's plays have no definitive version. Take "Hamlet", for instance. There is the famous First Folio version, compiled and published seven years after his death, and there is the First Quarto version, a.k.a. the Bad Quarto, and then also the Second Quarto version. None of these versions are considered 100% definitive. Edited versions usually combine parts of each to present the modern reader with the most feasible "Hamlet", and even these are subject to change.

How did this happen? So many details about that time have been lost to history that it is difficult for us to reconstruct a real sequence of events from the remaining evidence. There are entire books written to argue one case or another, but consider that some people even dispute William Shakespeare's authorship, then it is clear that we are on shaky ground from the get-go.

Personally, I've come to a different view while mulling over an under appreciated ingredient of Shakespeare's genius, an aspect that has something in common with software engineering - especially agile development.

Shakespeare wasn't just a writer, he was also an actor and part-owner of the theatre company the Lord Chamberlain's Men (later the King's Men). I find it useful to think of his plays as a function not only of Shakespeare's maturing talents as a writer, but also of the needs of the company. Those needs were financial, like any company's, and were directly informed by the success or failure of a particular play in the eyes of the audience of the day, as well as the tastes of their influential patrons.

It is thus hard to imagine that Shakespeare would just write a single, finished version of Hamlet, tell the actors their lines once-and-for-all and be done with it. As part-owner he had a responsiblity and exposure that went well beyond writing. He would have wanted to make sure the play is as good as it can be, on a continual basis. The company would receive financial feedback, and the company's patron would have his say, and so the day-to-day operations would hone the way the play was performed - if it was performed at all.

As an actor of second-tier roles he would also have been in a unique position to experience feedback from the audience. I imagine him night after night, observing the audience's reactions, hearing them laugh at the funny parts (or not), seeing them moved or engaged during tragic or passionate moments, and smiling or bored as the case may be during the play or afterwards. He would be thinking of the various stakeholders, of the dramatic value of a particular phrase or scene, of the audience's reactions, and so he might choose to change the lines - add a bit more zing, create more drama, more references to current affairs - who knows?

Shakespeare's mind would have been working constantly to improve the play and I have no doubt that this is precisely what happened. His plays have a uniquely organic feel to them, as if the action is happening right there, and the actors could step off the stage and mingle with the audience at any moment. By assimilating his audience's emotions and interests he was bringing art closer to the audiences' reality.

It is this approach of continual improvement, of being tested night after night against a real live audience, that strikes me as being very much in the spirit of agile development. It's a bit like running continuous integration while already in production.

I would go a step further and suggest that Shakespeare was so canny and pragmatic that, even if he had a successful version of a play, should the political climate change he would be willing to adapt the play again, to cater to his audience and so prolong the play's financial success. If this is so, he may well have found a dramatic architecture that admitted of continual adaptation, just like good software architecture is flexible, and written with ease of maintenance in mind. That would certainly go some way towards explaining his plays' capacity to be continually repurposed for modern audiences.

To put that achievement into perspective, imagine writing software that is still in demand 400 years later!

If we take this view it is a bit of a shame that not more of our worthy literary works are "production tested" with a feedback loop that permits continuous improvement. There was a time when serial publication afforded authors some engagement with their readers, and thus to inform the next installment. Nowadays, authors are required to write once, for all time. But in software development we know that this is usually premature, costly, and occasionally disastrous.

This is the reason that many writers form reading groups with other writers, to permit them a trusted soundboard and source of feedback. But the General Reader is a different beast, whose tastes are not to be tamed so easily. Shakespeare wrote "not for an age, but for all time", and perhaps it's because he wrote not once, but all the time. He understood the value of his users.

Monday, March 02, 2015

Why We Need Poetry Technology

Sometimes you’re doing something that is so new that it has no name yet.

Over the last few years, on and off, I’ve been experimenting with programmatic approaches to new ways of writing. An obvious starting point was cut-ups. Cut-ups are cool and can be useful as a writing aid. William Burroughs made them sing in striking disharmony. He also revoked the traditional author’s monopoly on textual narrative.

I wrote several variations on a primitive cut-up generator program. It was meant to automate the job that Burroughs achieved with paper and scissors. The outcome illustrates some of the possibilities (as well as limitations) of a basic application of the idea.

This was 2007. A few isolated voices aside it was hard to find anyone who was experimenting in this area. There was hype about e-books and Amazon's soon-to-be-released e-book reader, but if literature was about to experience a revolution I wasn't in on the secret.

And so the first Kindle arrived and all the talk was of e-books, as if that was a milestone in literary innovation. To be sure, it wasn’t even close. The first e-books were simply content transposed into portable digital formats, and the Kindle extended that to the physical appliance. In other words, what you held in your hand and how you paged it was new. The reading content remained just the same.

The years passed and it kept nagging at me that in the age of the internet, literature was missing out on a massive opportunity. What opportunity exactly? The opportunity to use information technologies and the internet for the purposes of literary creativity. To bend it to our will. What the first Great Big Work of the internet age was going to look like exactly I wasn’t too sure, but one thing was crystal clear: it hadn’t yet been written.

Fast forward seven years and the major literary awards are still going to traditional forms of literature. But on the fringes and beneath the surface, the beginnings of a new way of writing literature is brewing. When I first started reading about Andy Warhol's literary works, about Flarf poetry, text remixing and the use of texts as material, the scales fell from my eyes.

Kenneth Goldsmith chronicles the background and rise of this subculture in his excellent book Uncreative Writing. This loosely distributed internet engaged community has been producing interesting and provocative literature for the last decade or so. Its influences, too, are discussed in stimulating detail and reach back via Andy Warhol, Oulipo, the Situationists, Walter Benjamin and all the way to Gertrude Stein.

However, as Goldsmith himself observes, he himself is but a bridge between the old world of literature and the new world of an as yet undefined anonymously authored Uncreative era:

“The future really belongs to anonymous writers writing for anonymous readers: people who are writing programmes for machines to read, for other machines to read; I think this whole thing is going to be pushed much further. I’m just a bridge between the old and the new.”

Goldsmith’s work has provided me with a guiding principle while I explored the implications of uncreative writing. It inspired my venture into experimental poetry curation, an online zine called Poetry WTF?!. The work that is being published at Poetry WTF?! operate on the principle that existing texts are material to be used for new poetry. The resulting artefacts are often hand crafted to reveal stimulating, ironic or conceptual new poems - and sometimes a creation exhibits all of these qualities at once.

By decoupling human agency from the immediacy of expression and instead reintroducing it at the stage of creative composition we are preparing the stage for a new type of writing. Nevertheless, when we do so we are still inhabiting the world that the Oulipoets from the 1960s would recognise. It is a phase in a literary evolution that has not been taken to its limits, that hasn’t transformed into a radically new literary being just yet. We could say that the ontology of literature is still authored, analog, and un-automated.

As Goldsmith speculates, the next stage of this evolution will be evidenced by greater anonymity of authorship as well as readership. In fact this is already happening. This anonymous dialectic between creator and consumer is being played out, at the very moment that I’m authoring this, by a variety of Twitter bots. Pentametron, the brainchild of NY based conceptual artist Ranjit Bhatnagar, is one of the best known poetry Twitter bots. It has been around since 2012.

Pentametron employs an automated program (the bot) that searches Twitter feeds for tweets written in iambic pentameter, matches two that rhyme, and writes them out as Pentametron tweets. Pretty simple, but the results are remarkably readable and rather moreish. They’re also a vindication of Goldsmith’s controversial observation that language transforms rather than loses its expressive capacity when viewed as material - which is precisely what Twitter bots do par excellence. The success of the work now depends on the repeatable realisation of a concept rather than on novelty of expression.

This technological mediation is clearly a step in the right direction. It is easy to see that Pentametron’s automated method of operation and Kindle’s mere transplant of content from a physical book to an e-book are poles apart. In the former, technology is playing a significant role in the creative process. The medium itself is now coming into play both in creation as well as consumption.

While Twitter hosts some of the most famous literary bots, it isn’t the only platform where automated, anonymous literature can be read. Tumblr has its own share of autoposted literary mash-ups. A typical case in point is King James Programming, which employs an algorithm known as Markov chains to combine phrases from the King James Bible with a couple of programming guides. The results are generally seamless and frequently funny, as this example illustrates:

“And since programming languages are largely written in English, who would suspect a language to come from Japan? And yet, here is this great and wide sea, wherein are things creeping innumerable, both small and great”

The snowball poem generator uses Markov chains to create a totally different type of poem called a snowball (a.k.a. a chaterism). It is a type of constrained writing (because it is based on a set of rules) and concrete poetry (since its typography is important). This particular snowball generator even got a mention on Boing Boing.

The use of Markov chains has become a favoured approach in automated poetry and literature generation. Markov chains are used so widely now, from genetics to physics, that few people are aware that the Russian mathematician Andrei Markov in fact developed his now-famous concept by studying consonant and vowel patterns in poetry. Poetry generation and Markov chains go together like strawberries and cream.

Yet, as can be expected in such a young and burgeoning field, Markov chains is not the only game in town. Various other approaches to generating literature have been attempted. Just recently it came to light that Zackary Scholl submitted several poems to a literary journal back in 2011, one of which was subsequently accepted and published. The twist in the tale is that the poem was not written by him directly, but generated by a program he developed. Scholl's program employs a type of context-free grammar, an area of linguistics invented by Noam Chomsky, called Backus-Naur Form.

Scholl has made his code available on Github, and the poetry generator can be seen in action on his website, where you can generate new poems at the click of a button. Some are pretty good, too.

This is definitely a trend, and the methods will only get more complex. What machine learning can do for Watson of Jeopardy fame, it can surely do for poetry and literature in general. But who will take up the challenge?

This is part of the question that has been bouncing around my head during the past year. I looked in vain around the internet for evidence that literature or poetry is evolving along the lines of, say, finance or marketing, which both enjoy tremendous technological innovation to create more intelligent platforms and, of course, generate more money. Surprise surprise, I couldn’t find even a single website dedicated to literary texts that made their content available via an API. How on earth are we going to get literature into the information age if Shakespeare is still stuck in a book?! (including e-books)

That’s when I decided to create Poetry DB, the world’s first poetry database that has an easy-to-use API ready and available for both human and automated machine consumption. As of this writing Poetry DB contains a selection of poetry by most of the well known pre-20th century poets in the English language (from Chaucer to Dickinson and beyond), as well as the complete works of a subset of those (such as Shelley, Keats, Clare, Byron and Blake).

Yet whenever dinner party talk turned to my hobbies, I got the same slightly anxious reaction about my hopes for programmatically generated poetry. I would talk about APIs and the ability to grab lines from different poets at Poetry DB and pass them through an algorithm that splices and dices and produces something both modern and ancient and beautiful. Then I'd hear a response along the lines of “but that’s not really poetry … (!!!)” or “but that’s just …. wrong”. The fact that I couldn’t point to any concrete example of Greatness in this brave new world didn't exactly help my cause.

This leads me back to the start of this discussion. Sometimes you’re doing something that is so new that it has no name yet. It finally dawned on me that the activity I'm engaged in is not simply creating poetry. I'm not just writing poetry. I am also trying to define the process and tools that are required for its new form. In short I'm entering a radically new space, helping to midwife a new type of literary ontology.

It is the literary equivalent of music pioneers like The Beatles (not that I'm comparing myself with them of course) playing with tape loops, creating noise that didn’t always sound like music to anyone else - maybe not even to themselves. Yet today their innovations are accepted as groundbreaking music. Today we also have sophisticated production music technologies with which to create and control sound and music. In other words, what we are doing with data driven and algorithmic poetry is perhaps best described as poetry made with poetry technology, via the application of poetry science. The end goal is still "poetry", but it's a new kind of poetry, in a new medium, and a new type of audience.

What do we mean by poetry science and poetry technology? Does this playful activity really warrant such formal terms? I think it does, because I think the process is being misunderstood as just a different type of traditional poetry, and development of the field is languishing as a result.

From Homer to Elena Ferrante, from Aristotle through to the present day, literature and literary appraisal are bound in a dialectic that permeates culture and occasionally beyond, even into the very fabric of politics and society. A body of knowledge has evolved that has theoretical as well as practical implications. This knowledge includes a more or less formal understanding of poetry (metre, rhythm, rhyme in traditional forms), drama, prose and various other forms of literature. It also concerns detailed and analytical appraisals, such as what are examples of good literature and why, which range from close readings to serious, serious literary ciriticsm. This body of knowledge is enormously rich.

Whenever a writer attempts to innovate, he or she is applying part of this inherited knowledge in new contexts or to new purposes. The outcome may be more or less successful, but part of this learning process is what we may consider the science of literature or poetry as the case may be. In other words, poetry science is both (1) the body of knowledge and (2) the application of (a subset and a particular interpretation of) that knowledge. Such a body of knowledge will no doubt in time come to include more formal interfaces to information technology, which becomes part of that science. Just like a pipet, and a petri dish, a telescope and data science are all inextricably part of physical science. Above all, science is a learning process to discover what succeeds and what doesn’t. Poetry technology (and literature technology more broadly) is the development of methods, supporting tools, and processes for the purpose of generating new poetry.

So for instance, Poetry DB and my forked development of Scholl’s original Poetry Generator are all poetry technologies aimed at the creation of poetry. They are also experiments that enable me to learn what works and how these technologies could be improved. As the field evolves, and machine learning techniques are developed that are capable of absorbing the existing body of poetry knowledge (not only an understanding of its formal properties as poetry differs from, say, prose, but especially the qualitative understanding of what distinguishes Great poetry from mediocre poetry), we may gradually come to see genuine novelty.

Just as it took a few decades for music technologies such as sophisticated post production software to really mature and come into their own, so it will take a while for poetry science and technology to evolve a robust set of concepts and solutions that writers will want to use on a regular basis. But given a bit of time a new breed of writers, with the aid of poetry technology, will plant their flags firmly in the technological infosphere.

In the meantime, if we continue to associate poetry technology with poetry's traditional context, its growth will be stunted. That's the alternative of inertia. "Yes, so what about traditional poetry?", I hear you say. They will co-exist, the old and the new. They have to. But it's time that we acknowledge poetry technology for what it is, and welcome the new.

Saturday, January 26, 2008

Singularity programming design

Scope

Having touched on the possibility, both in terms of its theory and very briefly its application, of singularity programming as an alternative programming model, we now focus on the kind of design it would embody.

Platform games

Facebook took the world by storm in 2007 and was voted favourite technology by many leading technology readers. Was it just a gimmick? A fad? Marc Andreessen, respected software engineer, entrepeneur, and co-founder of Ning, lauded Facebook's adoption of an API model. "My personal opinion is that the new Facebook Platform is a dramatic leap forward for the Internet industry", he stated unequivocally on his blog.

In his analysis of Facebook's success, Andreessen cites the advantages of a platform over an application, including the "walled gardens" of closed solutions that have been knocked off the playing field by the openness of the web.

It seems that it should almost go without saying, but apparently it takes someone with Andreessen's clout and standing to put one and one together: solutions that have been fully crystallised by developers fare less well than those that can be reprogrammed. Platforms, in other words, are flexible to users' needs and input.

This distinction is not dissimilar to the evolution of unusual states we discussed before. An "event handling" program has the potential to deal with unusual data or events in a way that is mostly hardcoded. As a result, perturbations that vary too much from the anticipated will either be rejected outright or alternatively push the system to an unusable state with no differentiation of function possible in that new state.

But what if "unusual data" is not a rogue Denial of Service attack, but instead represented users' varying needs?

Design class

In cellular activity prior to individuation and the formation of tissue and organs, cells are considered pluripotent. In other words they have the ability to be any one of several cell types. In fully individuated humans there are 254 different types of cells. Jellyfish have about three.

During experiments the researcher and theorist Stuart Kauffman found that the process of induction, i.e. when cell collectives suppress or enhance cellular differentiation in other collectives via signals, there are "recurrent patterns of gene activity within these networks, patterns which exhibit the kind of homeostatic stability associated with attractors" (DeLanda, p. 65). Those attractors, he concluded, represent consistent cell types.

Object orientation is only one of several models available to the computer scientist, but it is particularly suited to our theory. Objects are like cells in the sense that they hide certain kinds of information (as cells would contain the cytoplasm or nucleus that in turn contains the chromosomes and DNA) that are nevertheless vitally important to their eventual, activated functionality.

If we extend our analogy of cells at the level of pluripotent collectives to the layer of possibility in a pre-formed system (please note: I am using the term pre-formed in the sense of "unformed but will eventually be formed" not in the sense of "already formed prior usage"), we have the corresponding notion of signals that could determine the type of cell available to the collective for tissue building or, in the case of a system, the kinds of input that could determine the type of object available for component building.

It is a bit like the problem of cross-cutting concerns that have annoyed developers for years. Some kinds of functionality do not have a core function (logging is the classic example) but nevertheless requires implementation across the majority of classes (objects) which themselves are meant to solve the problem of a separation of concerns. Input data that try to find a matching pattern in a pre-formed system cut across all classes (pre-formed objects).

But cross-cutting concerns are already well defined problems, whereas the problems to be dealt with at the pre-formed object level are not well defined yet. In our example an undefined perturbation to the system is precisely the cause of variation and diversity at the next level: the component level. Until the data has found a way to fit itself in available classes (not necessarily in a complete way) the differentiated class cannot emerge. Likewise, in its evolutionary form, a new class may emerge that instantiates clusters of objects and a component that is ultimately rejected. It is therefore expected to be an evolutionary process.

By focusing on differentiation - i.e. the evolution of a system from its classes designed by a separation of concerns to differentiated (instantiated) objects to components through to a fully realised and differentiated product - the availability of unrealised objects that have the ability to change state according to unpremeditated signals (data or events) are neglected.

Faceless

If Facebook is truly a leap forward for the internet it is immensely exciting to speculate what it could be if not just external programmers but also users had the ability to contribute radically to the platform. It is in part the satisfaction of users' diverse needs to play and interact with objects and people in the environment that drives the thriving communities of Second Life and World of Warcraft.

In the singularity programming environment this level of interaction is envisaged as part of an evolving dialogue initiated by signals to a pre-formed layer of digital object possibility where classes enhance and suppress information to form new types of objects. These objects then cluster together to structure novel components and building blocks to respond to the information contained in the user's signal.

Wednesday, January 23, 2008

What is singularity programming?

Singularity programming is a radical form of design (not just coding) that takes its inspiration from the mathematical concepts of manifolds and singularities.

The question is asked

In principle the question is asked: how would a program look that responded to an information system whose steady state has undergone the equivalent of a phase transition in physics?

Singularity basics

It is tempting to plough ahead without an understanding of singularity basics, but that would leave the reader with little benefit from this exploration. Therefore it is valuable to touch upon a few core concepts that nevertheless have deep application, but require a bit of mental abstraction.

The term singularity is familiar in the context of manifolds in differential geometry, but it is used to describe several different (albeit related) topics. In particular I am using the term singularity in the classic Riemannian sense and its more famous extension in Einstein's General Theory of Relativity.

Riemann's is also the version referenced by Manuel DeLanda when he expands the notion of manifolds and singularities to describe physical processes. He posits that the intrinsic structure of a manifold can describe the evolution of such processes over time.

We are interested in the singularities that are topological points and thereby define a steady state. They have an influence on the behaviour of trajectories, and therefore on the physical system itself. A singularity, in this sense, often acts as an attractor within the manifold. Any trajectory, as long as its origin lies within the basin of attraction, will have as its end point this attractor singularity.

Thus we could also have spoken of attractor programming or steady state programming rather than singularity programming, were it not for the notion of a phase transition associated with the symmetry-breaking bifurcation of one singularity to another. (A symmetry-breaking bifurcation, in short, implies that the system has changed state and its new stable state is represented by a different singularity.)

To use a simple example we may think of water. When it is a liquid its state can be described by a certain singularity in a manifold. It may lose temperature, or gain temperature – whether through kinetic or heat energy – but essentially it remains water. However, when this type of energy is consistently applied to the water it may become a gas. At this point it undergoes a phase transition, and stabilises around a new state (gas). Both these states would be represented by two different singularities within the manifold.

Meanwhile back at the digital manifold

I want to use these terms as metaphors in a digital space, the space defined by system calls, applications and user spaces in the operating system, memory and storage systems of a computer.

For starters let's imagine a smoothly running system – business as usual – evolving through two singularities in the manifold. First phase: total inertia. Bootup? First phase transition. Loaded Windows? One stable state is reached. Or was it Linux instead? A different stable state. Perturbed by applications? Hmmm ... but if you close them again, the system returns to the typical stable state of Windows, or Linux, and so remains around the same singularity.

This gets us going in the right direction, but for the purposes of typical programming the example is a bit too broad. Most of us who develop aren't system hackers – we write user space software.

Nevertheless, we already have some correlating ideas. Programs, or certain types of data, perturb the system and push the system around the basin of attraction of a particular singularity. It generally continues to stabilise around that singularity, but occasionally a large memory leak or a kernel panic can lead to a phase transition in the system. And let's be honest, in most systems this phase transition is rather fatal to the user. The infamous Blue Screen of Death is a memorable case in point.

This hints at the paradigm I am suggesting: a form of programming that caters for such a new state. But ... what exactly is singularity programming then?

It is not error handling

To begin with, we may start with something it is not - namely traditional error handling. At, for example, the assigning of values to a variable in a C++ or Java program I, as a programmer, might notice that the value could cause an anomaly through division by zero. To handle this exception - which is a kind of error - I write an error handler. In effect, we are using a logical form of redirection that continues in the same domain - it originates and remains within the basin of attraction, in other words.

Any well-written piece of software should trigger an error handler in such a situation. The error handler diverts the flow from disaster and the program continues its execution. It's the equivalent of the program saying: "Oh by the way, this is the problem that just arose, but you don't need to take it too seriously, just let me get on with things ...".

However the state of the system is not radically changed by this logical redirection, and hence we cannot speak of error handling as singularity programming. In fact we might say that the goal of error handling is to keep the system in its present state, which is to say the program does not want the system to change its state and reach a new singularity.

I give up, are you going to tell me what singularity programming is?

Let's look at our example of a stable system again, and imagine that it is a firewall. A simple firewall may accept internet (untrusted) data at secure ports, inspect the packets, and pass the packets to a local network via another port. Packet load may vary, but the firewall can normally continue these operations with no disruption and little noticeable change in resource usage for months on end. Often the system wouldn't even need a reboot. It's a simple system that remains relatively stable during its lifetime.

Briefly take one step back in order to satisfy our analogy. The firewall system reached its present state after the connection, installation, configuration and implementation of hardware, network, operating system, and crucial operations software. We may have attempted different tactics during any of these processes, but eventually we would have a stable, running system whose state is represented by an attractor singularity in our imaginary firewall manifold.

Now imagine that the system is perturbed by unusual volumes and types of data, for instance during a Denial of Service attack. In simple terms, the system becomes overloaded using all of its resources to cope - or at least those focusing on typical functions.

To make things worse, certain types of attack can deliberately alter the configuration of the firewall to allow more access, then disable some processes, and ultimately allow a flow of untrusted data to pass through. Under these circumstances normal data will be processed and inspected very slowly, or not at all.

It might well be impossible for normal operations to resume even when the attack ends. In such a case the system administrator would have to intervene, reconfiguring or reinstalling as the case may be.

In summary, a Denial of Service attack could push the system into a new state whereby, even if the attack halts, the state gravitates to its new singularity (no doubt a faulty one, in the eyes of the system owner).

If we tried error handling, it would involve shutting down port access when certain parameters have been exceeded, alerting operators about the excessive activity, and activating processes that can protect sensitive areas of the system. Error handling may therefore save the current state of the system, and allow normal operations to proceed.

Singularity programming, on the other hand, would allow the system to be flooded and attempt to operate under the new state. Thus it is not a system of error-prevention, but instead encourages unusual states as a necessary evolution of the system.

When the system has unusual numbers of data packets pouring in and it does not enable error handling, we could imagine a new form of program being triggered. The Singularity Program could decide to open more ports and activate processes that are hungry for this abundant data. A projector of data on a screen, for instance.

Although I am not advocating any particular use for singularity programming at this stage - I want to present the theory of its possibility - we might reflect, momentarily, on an analogous situation in an economy. When an abundance of goods or services arrives in a market the price might go down, but instead of rejecting the goods a portion of the market might transform them, since they are so readily available, into other. more valuable goods.