Monday, August 20, 2018

The Role of Gender in the Hong Kong Film "After This Our Exile"


After This Our Exile won numerous film awards, including Best Film at both the Golden Horse and Hong Kong film awards in 2006. It is a touching, often tragic film. A family falls apart when a father indulges his character flaws at the cost of his family. He gambles and borrows, losing money and failing to repay his debts.

His wife (Lin) realises he won't change his ways change and decides to leave him. Their poor young son is caught in the middle. He spots her attempt to leave the first time around, but is blamed when she gets away the next time around. Referred to as "Boy" (at least in the English translations), Lin abandons him too. Her role is simplified, no doubt as a way to focus on the father and son's relationship.

Boy misses her and experiences conflicting feelings of loyalty. However his Dad's influence prevails, and he soon adopts his dad's negative view about his mom. Only when it is too late does he realise  his dad is the real bad apple of the family. His father, impulsive and unwilling to work, forces the boy to steal for money. At this point the boy gets caught and thrown into a correctional facility.

FAfter This Our Exile is therefore a cautionary morality tale. Now that China has an up and coming middle class, the film is perhaps saying don't throw away your parental responsibilities to chase your dreams.

While I found the film genuinely affecting, I lament the missed opportunity to realise the potential in the mother's role. Her character is at first wonderfully interesting, full of passionate restraint as she schemes to escape a dead end life. Sadly this is marred by the father's view of her as merely an unfaithful woman, which Boy believes and is reinforced when she exchanges parental love for a kind of naive yet inconsequential sentimentality about Boy. And so she finally transforms into yet another stereotype, of the lover turned domesticated wife. For someone so strong-willed this doesn't make a lot of sense.

Shing, the dad, is portrayed as a weak-willed character. He is all the more dangerous for having once possessed a dream of success that may have been within reach had he worked at it. He doesn't want to lose face completely and looks for easy solutions. However the interest of the tragic story is based on more than character flaws. A central part of the plot revolves around the particular way in which gender roles play out in the narrative.

The Chinese version of the title is 父子, which literally means "father son". We should therefore be under no illusion that Shing and Boy are the central characters in this story. The moral seems to be that only a father can give his son the right education in life, and when he fails to do so, tragedy will follow.

It is worth remembering that Chinese culture is largely paternalistic, so this moral injunction isn't a surprise to Chinese audiences, nor even the strong filial loyalty, as filial piety is an essential part of Confucian teaching. However to Western audiences such a paternalistic morality is more likely to meet with disapproval as they would expect a more equal, nuanced message about gender roles, such as I have expressed above. However it is precisely the strict partitioning of roles that proves instructive about the forces that drive the story.

Lin, the mother, runs away but fails (the first time) due to a premonition the boy has. In a powerful and dramatic early scene Shing apprehends her and takes her home in a fury. He treats her badly in what appears to be a domestic pattern of abuse, he also ignores her accusations about his bad habits. Instead he becomes very emotional - almost histrionic (a character trait that, in the West at least, has a long and unhappy association with women).

On the other hand Lin keeps her cool and gets to the point, even if it takes her a while to open up. It is a very powerful scene. Shing remains in denial at first, but eventually succumbs to the truth. It appears to be out of deep love, but we soon learn that he also needs Lin to help him pay his debts. She is the provider. His love has a dark side.

She, on the other hand, manages to win back his trust to win herself time. He clearly believes in his own masculinity, as they make love that same night in a scene that gives an insight to Lin's precarious position. She still finds him attractive enough to give in to their passion, but the viewer is aware that she may have something up her sleeve and probably needs to keep him on her side.

It is this 'cunning' element of her character that is juxtaposed with his more straightforward bad character. The question is hinted, if not exactly asked: is it worse to be honest and emotional, yet a hopeless case, or more capable but a bit cunning and hypocritical? Given the way things play out, it is clear that Shing's character is judged in the negative. His is a bad sort who fails his own son in the worst kind of way. But what of Lin? Could she have saved them by staying on?

I think the answer is no, she was always more ambitious, and the context is about the father-son relationship being the backbone of society. However this also shows the way in which Lin's character is problematic. She has to leave because by staying and saving her family she would be fulfilling the father's role. She would become the backbone that rises through the slackness of her husband's lack of moral fibre. And this will not do. She can't be the man, she can't be the one to wear the trousers. Instead, it is better that she disown her family in a double negative, moving from female victim to active seeker of happiness in the arms of another man who happens to be rich and successful - even if it makes her look flighty. She is not even evil, incapable of real evil - just inconsequential.

Given the film's ending it is probably safe to assume that the film doesn't directly acknowledge society's role in Lin's decision. Or to put it differently, it doesn't acknowledge that her choices are by default highly constrained. It's a lose-lose situation, morally speaking, so she might as well choose the option in which she gains something.

Although she does not want to cut herself off from Boy completely, she acts in her own interest for the new baby, and the narrative turns her parental care into distracted sentimentality. She completes a double negative leaving Shing and Boy to their circle of masculinity while she pursues a new motherhood. She is diminished by being seen as a giver of children, not a giver of souls, which is the role demanded from Shing.

It is perhaps slightly unfair to suggest that the film intentionally sets out to paint Lin as mere stereotype. After all, it was meant to be about the father and the son. But as pointed out, the film does not acknowledge the society's influence in her options. The film's limited view of gender roles has repercussions when, ultimately, the father fails to be a father - to be a man of substance - and we are left to wonder whether his excessive masculine posturing isn't partly responsible for his failure in fatherhood.

Boy, then, is the only character left to fulfil the expectation of being a man, and, being a boy, he cannot do so. It is only at the end of the film, once he has grown up, that he sets out to right those wrongs. He shows some of the backbone that both his parents lacked.

Sunday, August 12, 2018

Why Are All the Black Kids Sitting Together in the Cafeteria? : Learning about Racism

"Why Are All the Black Kids Sitting Together in the Cafeteria?", a book on race relations first published in 1997, has opened my eyes about the nature of racism and what we can do about it. I can't really do the book justice in a blog post, but I would like to highlight some of the points that have made an impression on me.

By way of quick introduction, the author Beverly Daniel Tatum, PhD, is a psychologist as well as an educator. She originally wrote the book in answer to questions about race she would often receive, in particular the one in the title, from well-meaning White teachers who were concerned and perplexed upon seeing black kids sitting together in the cafeteria and not mingling.

The book draws on a variety of research studies while avoiding abstract theory, focusing instead on concrete examples. This makes the book very accessible without skimping on credibility. The version I read is the recently updated 20th anniversary edition. The first version is already a classic in the field.

I'm sure everyone who reads it will have their own a-ha! moment. I personally had several. Yet there was one that really stood out. It was the realisation that my fundamental assumption about racism was inadequate. Like many other people apparently, I equated racism with a kind of prejudice towards people of other races. Being a good citizen, this understanding of racism meant I could check my attitude and behaviour and feel confident that, yes, I am not being racist in my daily life.

However that is a fairly superficial definition of racism that does not get at the heart of the problem. To put this in perspective, my understanding of how racism developed has forever been changed by a recent trip to Washington DC during which I visited the African-American Museum. It tells the story of the global slave trade during the colonial era, the immense suffering of the slaves who were sold and bough as chattel, and their resistance and perseverance over centuries to find a better way and a better life. In one single afternoon I learned more about slavery and its consequences than I'd ever known before.

As a UK citizen born in South Africa I also had occasion to ponder how slavery and racism have manifested in different ways on three different continents, but that is a whole analysis unto itself. For present purposes, suffice it to say that a historical perspective, of slavery and its consequences in particular, is essential to understanding what racism is. Those consequences are an ideology of White privilege that have been ingrained in culture and set in laws over centuries.

What that means is that racism is structural in nature, and that existing societal structures are racist inasmuch as they favour White people over Black people (and other races). This is the crux of the matter, and that is why the idea of racism as a prejudice is of limited use. To dismantle racism we can't only check our attitude and biases, we have to go much, much further.

Tatum quotes David Wellman, who defines racism in this sense as a "system of advantage based on race" (p. 87). Another definition of racism often used is that of "prejudice plus power", which explains how the structural inequality comes into being and has been enforced:

"Racial prejudice combined with social power - access to social, cultural, and economic resources and decision-making - leads to the institutionalization of racist policies and practices" - p. 87

However this definition, Tatum concludes, has one drawback that on a practical level many White people feel that they do not have the 'power' that is being alluded to. She therefore prefers Wellman's definition. Nevertheless, I would add that this definition does indeed describe how racism was *initially* instantiated. It is now maintained simply through the ongoing maintenance of the status quo that was perviously established by those with social power. Only when the status quo is challenged can this historical reality be uncovered and seen for what it is, namely the construction of racism.

The second big idea I encountered was that of intersectionality. This is a term that has had a fair amount of coverage in the popular media, but I never really looked at it closely. In practical terms a person's identity may form along several axes of distinctiveness, or otherness, of which Tatum highlights seven: race or ethnicity, gender, religion, sexual orientation, socioeconomic status, age, and physical or mental ability (p. 103). Following another book I'm currently reading (Neurotribes by David Silverman) I would probably expect neurodiversity to eventually join this list.

The point is that each of these categories of otherness has a form of oppression associated with it: racism, sexism, religious oppression / anti-semitism, heterosexism, classism, ageism, and ableism (p. 103).

As an example of intersectionality, one's identity might form along the following manifested centres of experience: black, female, Christian, lesbian, working class, middle aged, and healthy. Any part of the identity not in the dominant or normal side of the category means that the individual will experience oppression or discrimination in some way or form.

When it comes to race, White is the dominant and normal race. As a result it is not uncommon for a White person to not really self-identify in terms of race. Tatum quotes Debby Irving in her memoir Waking up White:

"The way I understood it, race was for other people, brown and black-skinned people. Don't get me wrong - if you put a census form in my hand I would know to check 'white' or 'Caucasian'. It's more that I thought all those other categories, like Asian, African American, American Indian, and Latino, were the real races. I thought white was the raceless race- just plain, normal, the one against which all others are measured" - p. 186

However there is "a hidden cost of racism for Whites" (p. 187), namely the experience of psychological discomfort whenever racism is brought up - guilt, shame, frustration, even anger. The absence of a racial identity in the case of Whites, Tatum contends, is the root cause of this discomfort. She notes that a common reaction for the White person, once they become aware of racism and their own role in it as a White person, is to conclude that they need to have more interactions with Black people or make friends with Black people.

She explains that a more fruitful approach is to develop a positive White identity first of all. This point was another big a-ha! moment for me, because I have experienced that psychological discomfort myself. Combined with the perception of racism as a form of prejudice, silence often seems like the safest route in the face of uncertainty - and yet of course it does not change anything. But with a positive racial identity there would be a foundation to work from and things can begin to fall into place.

While acknowledging that there is no set recipe, Tatum offers practical advice to encourage White intragroup conversations and help develop such a positive White identity:
- find other Whites who have already progressed along the way and can show you what to do
- read autobiographies and biographies by White anti-racist activists, like "A Season of Justice" by Morris Dees, or "White Like Me" by Tim Wise
- participate in White anti-racism consciousness raising groups (eg. Showing Up for Racial Justice (SURJ))

She provides perspective on the need for all-White support groups and the function they fulfill:

"Particularly when Whites are trying to work through their feelings of guilt and shame, separate groups give White people the 'space to speak with honesty and candor rarely possible in racially mixed groups'. Even when Whites feel comfortable sharing these feelings with people of color, frankly, people of color don't necessarily want to hear about it" - p.205

The onus is on the White individual to do the work and develop his or her White racial identity, making it into something positive.

The question of identity is also central on the side of Black people to gain insight into the question in the title: why are all the Black kids sitting together in the cafeteria? During pre-adolescence, race isn't viewed in the same way by kids because their identity has not been fully formed yet. But during adolescence new social factors come into play (who is dating who, who is friends with who, what is my future?, etc.) become increasingly important. Black kids are then often drawn together by their shared experiences of being institutionally othered and oppressed by the rest of society. In other words, by the sort of structural racism that does not affect the dominant White group much.

"In the prepuberty stage, the personal and social significance of one's REC[racial-ethnic-cultural]-group  membership has not yet been realized, and REC identity is not yet under examination [...] During adolescence their understanding evolves to include not just more about themselves but also more about their group, including an 'understanding of a common fate or shared destiny based on ethnic or racial group membership and that these shared experiences differ from the experiences of individuals from other groups" - p. 135

In the face of such experiences, being part of a larger group of people who understand one's situation and experiences is a benefit. Therefore the concerns of those White teachers who asked why they are sitting together are valid, but misplaced. The root of the problem should be sought in the institution of racism, and how to dismantle it, and not directly in the behaviour and thoughts of those being othered and oppressed.

This brings us to another important point, namely what those of us who are on the privileged side of the equation can do about it. The book points out on more than one occasion that those who are oppressed do not want us to speak for them because they have their own voices. So what is it that we can do? Plenty, as it turns out. For starters, by starting in our own sphere of influence and pointing out when someone has made a racist comment or joke can change awareness.

One of my favourite examples is actually in the context of sexism, but it could as easily have applied to racism. It happens when Andy Murray corrects a journalist for referring to Sam Querrey as "the first US player since 2009 to reach a major semi-final since 2009". Of course, Serena Williams (and other US women tennis players) have been winning plenty since 2009. The counter-argument that the context was implicitly "men's tennis" is almost the point, because the same can be said about all institutional racism and sexism: the existing context, or status quo, can only be exposed by drawing attention to it.

As White people we have more social power than we often realise, and even simple interventions, like the way Andy Murray used his influence in the media, can make a powerful statement.

There is tons more excellent material in the book, and I've glossed over much at the expense of nuance. Three more worth mentioning in passing include the need for affirmative action and for goal setting in affirmative action programs; the challenges of aversive racism; and how to counter the influence of bias in decision-making.

Rather than go into all of them, I want to highlight one last point that really stuck out for me. In the final section of the book Tatum discusses racism and the experience of racism in the context of other ethnic and racial groups in the US, including Native Americans, Asians, Latinx, and others.

In the case of Native Americans - a catchall name for many different communities - researchers like Paul Ongtooguk have noticed that such communities have been reduced to static stereotypes in the public mind. Even when their traditions have been preserved it is usually presented in terms of how things used to be once upon a time. In other words, it gives them no sense of their current existence, nor of their future.

While the traditional arts and crafts were worthy of study, the curriculum embodied a "museum" perspective whereby the traditional life of Alaska Natives was studied "as an interesting curiousity commemorating the past." Ongtooguk explained, "The most disturbing picture of Inupiaq culture, then, was of its static nature - something that had happened 'back then' rather than something that was happening now. Did this mean that the people living in the region now were like a cast of actors who had run out of lines?" - p. 267

Ongtooguk focuses on creating study materials that allow American Native students to see themselves in the future. This future oriented imagination is an important part of the continuity of community identity, and therefore of their cultural survival as a distinct group, and almost certainly of their capability to thrive again in the future.

It is worth summarising these insights once more:

1. Racism is institutionalised and structural, not just a question of conscious prejudice
2. White people should develop a positive White racial identity that does not deny the reality and history of racism, but acknowledges, addresses, and helps to dismantle it
3. Personal identity formation is a complex process influenced by highly individual combinations of intersectionality
4. The survival and prosperity of a community lies not only in preserving its past, but also in connecting to its present and actively imagining its future

This has without doubt been an eye opener to me. From a practical point of view, and from my personal perspective as a middle class White male, the second point is a clear call to action.

Sunday, July 22, 2018

Detecting Similarity of Textual Style and Content performs rudimentary detection of textual style and content. Basically it uses the predictive capability of the pytorch-char-rnn autoencoder to check the likelihood of a character in a provided input text against character sequences in an existing trained model (trained on some other text).

The average of likelihoods across the provided input text is calculated to provide a broad indication of the similarity of style and content of the input text compared to the original text on which the model was trained. In particular it provides a similarity score as a percentage (higher means more similar).

For example a sentence from the original modelled text should come up with high similarity, typically scoring over 97%. A text in the same language, but written in a very different style might score over 90% but not as high.

An input text written in a totally different language should score significantly lower, eg. 80-85%. If the texts do not share all the textual characters, for example the Turkish alphabet compared to the Roman alphabet, the score will drop even more.

Under the hood the script actually detects variance, and then converts it to a similarity score for convenience. The lower the detected variance, the more like the original text the provided input text is.

The script is provided as part of my pytorch-char-rnn repo.

Below are some examples:

Example 1: Compare English text from Jane Austen's Persuasion with a model trained on Jane Austen's fiction.

python2.7 \
--text "Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who, for his own amusement, never took up any book but the Baronetage; there he found occupation for an idle hour" \
--checkpoint checkpoints/austen_checkpoint.cp \
--charfile data/austen_chars.pkl 
Parameters found at checkpoints/austen_checkpoint.cp... loading

Detected similarity: 99.15%

Example 2: Compare German text from the Bible with a model trained on Jane Austen's fiction.

python2.7 \
--text "Am Anfang schuf Gott Himmel und Erde. Und die Erde war wüst und leer, und es war finster auf der Tiefe; und der Geist Gottes schwebte auf dem Wasser." \
--checkpoint checkpoints/austen_checkpoint.cp \
--charfile data/austen_chars.pkl 
Parameters found at checkpoints/austen_checkpoint.cp... loading

Detected similarity: 83.84%

In principle the technique can be improved by creating a larger window for comparison. In other words not just character by character, but character sequence by character sequence across a moving window. A bit like LSTM in reverse. It isn't clear whether all the information is available to make this possible, I'll have to do a bit of digging around the model's saved state.

I'll leave that as an exercise for another day.

Monday, February 19, 2018

Syntax Char RNN for Context Encoding


Syntax Char RNN attempts to enhance naive Char RNN by encoding syntactic context along with character information. The result is an algorithm that, in selected cases, learns faster and delivers more interesting results than naive Char RNN. The relevant cases appear to be those that allow for more accurate parsing of the text.

This blog post describes the general idea, some findings, and a link to the code.


As both a writer and a technologist I have for some time now been interested in the ability to programmatically generate language that is at once creative and meaningful. Two of my previous projects in this context are Poetry DB and Poem Crunch. I also wrote a novel that incorporates words and phrases generated by a Char RNN that had been trained on the story's text.

Andrej Karpathy's now-famous article on RNNs was a revelation when I first read it. It proved that Deep Learning can generate text in ways that at first appear almost magical. It has afforded me a lot of fun ever since.

However, in the context of generating meaningful text and language creatively, it ultimately falls short.

It is helpful to remember that Char RNN is essentially an autoencoder. Given a particular piece of text, let's say this blog post, training will build a model that, if fully successful, will be able to reproduce the original text exactly: it will generate exact copies of the original text from which it learned and created the model.

The reason for Char RNN's widespread employment in fun creative projects is its ability to introduce novelty by either tuning the temperature hyperparameter or, more commonly, as a side effect of imperfect learning.

To be sure, imperfect learning is the norm rather than the exception. For any text beyond a certain level of complexity, a naive Char RNN will reach a point during training when it can no longer improve its model.

This naturally leads to the question, can the Char RNN algorithm be enhanced?

Context encoding

Char RNN encodes individual characters, and the sequence of encodings can be learned using for example LSTM units to remember a certain length of sequence. Aside from the relative position of the character encodings, the neural network has no further contextual information to help it 'remember'.

What would happen if we added other contextual information to the character encodings? Would it learn better?

Parts of Speech

Parts of Speech are structural parts of sentences and a fairly intuitive candidate for the problem at hand. Although POS parsing hasn't always been very accurate, SyntaxNet and spaCy have been setting new benchmarks in recent times. Even so, accuracy might still be a problem (more on that later), but they certainly hold promise.

So how does POS parsing fit into Char RNN?

Let's take a look at the following sentence and its constituent parts.

Bob   bakes a  cake

We can see that the 'a' in 'bakes' and the 'a' in 'cake' are contextually different. The first is part of a verb and the second is part of a noun. If we were able to encode the character and POS together, for each character across the whole text, we would cover a sequence longer than is practical for an LSTM to remember. In other words, the model would understand syntactical structure in a more generic sense than with naive Char RNN.

[space] + SPACE
b + VERB
a + VERB
k + VERB
e + VERB
s + VERB
[space] + SPACE
a + DET
[space] + SPACE
c + NOUN
a + NOUN
k + NOUN
e + NOUN

One way of achieving this is, for each character,  to create a new composite unit that captures both the character and the pos type. So if we create separate encodings for the characters and the pos categories, eg. a = 1, b = 2, etc. and NOUN = 1, VERB = 2 etc., then we could do something along the lines of:

a + VERB = 1 + 2 = 3

However this creates a new problem, namely one of duplicates. I.e. we'd end up with lots of cases that have the same final encoding (in this example, the final encoding is 3):

a + VERB = 1 + 2 = 3


b + NOUN = 2 + 1 = 3

A better solution would have to ensure the encodings are completely separate before sorting them back into consecutive indexes to ensure uniqueness.

But the real problem with this solution is that, although we now have an encoding influenced by both characters and types, we've lost each unit's individual quality. In other words, the importance of a character encoded along with one type of POS unit is no longer properly weighted against a character of a different type of POS unit. Instead, it has simply become a composite type of its own.

An improved approach would be to encode both the character and the type as independent data, albeit of the same unit, and let the LSTM do the rest.

A Syntax Char RNN tensor might then look as follows:

[[ char, type ], [ char, type ] ... ]

However this heavily favours the type encoding over the character encoding, which in turn will skew the weightings.
A more balanced encoding might be:

[[ char, char, char, char, type ], [ char, char, char, type ] ... ]
              word 1                          word 2

A sense of unevenness remains, because some words are longer than others: why should each word receive just one type encoding? This is something left as a refactoring improvement for later.

For the time being, experimentation showed a kind of optimum results from adding two type encodings per word, as follows:

[[ char, char, char, char, type, type ], [ char, char, char, type, type ] ... ]
                  word 1                              word 2


Char RNN can generate surprising turns of phrase and novel combinations of words, but longer extracts often read like gibberish. The hope was that context encoding might improve this state of affairs by strengthening the overall sentence structure represented in the model.

The SyntaxNet installation also installs DRAGNN and a language parsing model. Due to problems I had getting consistent results from SyntaxNet, I eventually settled on DRAGNN instead.



The first benchmark was based on the Tiny Shakespeare corpus. The following snippets are from checkpoints with equivalent validation loss, trained using the same hyperparameters (allowing for the proportionately longer sequence length in the Syntax Char RNN due to the additional type encodings).

Naive Char RNN (temperature: 0.8)

with him; there were not high against the nurse,
and i, as well as half of his brother should prove,
thou ask my father's power,
in this life of the world command betwixt
of greep and displent, rup in manrown:
and thou dost command thy stors; and take our woes,
that star in the sea is well.
 Now is a cunning four gloves of all violer on
himself and my friend traitor's jointure by us
to be holy part that were her horse:' the miles
for this seat with me in the island from scards
shall have stone your highness' weech with you.
 And unjust while i was born to take
with hardness from my cousin when i forget from me.

the unrelery, reign'd with a virtuous tongue,
to blush of his harms, and as sweet, if they
cape of england's purple say well true song hence,
shall appetite straight hath the law with mine?
 The composition should know thy face of my heart!

 Second huntsman:
i know him, and with my chast my mother.

Syntax Char RNN (DRAGNN parser; temperature: 0.8)

ITrue words, his son, fair despair is, and the guard brother; 
always you tell, 
though to see so many trees; i come!
 would you have no redress of joy of the march?
I, what i come up.
 i not a gentlemen cred; and as this old farewell.
 sir, she's a duke.
 so straight is the tyrant, shape, madness, he's weigh; 
which of the confail that e'er said and gentle sick ear together, 
we will see the backs.

Note: Syntax Char RNN output has been reformatted, but remains otherwise unaltered

I think it is easy to agree that the naive Char RNN generated text reads significantly better. There is a somewhat interesting punchiness to Syntax Char RNN's shorter dialogue sections, but that's about all it has in its favour.

This was, frankly, disappointing.

However, there is a reasonable chance that inaccurate parsing could be influencing the results. The DRAGNN model probably doesn't generalise well to Shakespearian English.

Would prose offer a better benchmark?

Jane Austen

The works of Jane Austen was used next. They would almost certainly parse more accurately.

The results this time were rather surprising. Syntax Char RNN raced away, reducing its loss pretty quickly. After one hour and fifteen minutes on my laptop CPU, Syntax Char RNN hit a temporary minimum of 0.771.

After the same time frame and with the same hyperparameters (again allowing for a slight adjustment of sequence length due to the extra type encodings for POS), naive Char RNN went as low as 1.025 - still nowhere near the Syntax Char RNN checkpoint.

I left it running overnight and it still reached only 0.944 after just over 8 hours.

This was interesting.

What about the quality of the generated text?

Naive Char RNN (loss: 0.944; temperature: 0.8):

"i beg young man, nothing of myself, for i have promised to be whole 
that his usual observations meant to rain to yourself, married more word
--i believe i am sure you will allow that they were often communication 
of it, but that in that in as he was in the power of a hasty announte 
by print,  to have given me my friend; but at all," said lady russell, 
so then assisted to  his shall there must preferred them very ill--
considertaryingly, very pleasant, for my having forming fresh person's 
honour that you bowed really satisfied we go.

Syntax Char RNN (DRAGNN parser; loss: 0.771; temperature: 0.8)

Mr. Darcy was quite unrescrailous of place, which and want to carry 
by the attractions and fair at length; and if harriet's acknowledge by 
engaging it for a sorry day he must produce ithim. I could be 
mr. Knightley's compray of marrying in the rest of the disposition, 
and i was particularly successful. He has been much like her sister 
to julia, i wishin she believed he may want pause by the room with 
the same shade. "" but indeed you had obliged to endeavour to concern 
of a marriage of his yielding.""

The results are roughly comparable, neither are special. If pushed I'd say I prefer the latter over the former, it reads a little better.


spaCy is an amazing set of tools made available by the good folks at Explosion AI. Unlike SyntaxNet, or even DRAGNN, it is a breeze to use.

The spaCy parser's data had an interesting effect on training. A training run with DRAGNN data reached a loss of 0.708 after just under 6 hours, then failed to go lower for the rest of its total run of over 16 hours. The spaCy parser achieved 0.686 after 5.5 hours, and its best loss of 0.650 after just under 8 hours.

Here are snippets from the relevant checkpoints.

Syntax Char RNN (spaCy parser; loss: 0.686; temperature: 0.8)

You beg your sudden affections and probability, that they could not 
be extended to herself, but his behaviour to her feelings were 
very happy as miss woodhouse--she was confusion to every power 
sentence, and it would be a sad trial of bannet, but even a great 
hours of her manners and her mother was not some partiality of 
malling to her from her something considering at mr. Knightley, 
she spoke for the room or her brother.

Syntax Char RNN (spaCy parser; loss: 0.650; temperature: 0.8)

In the room, when mrs. Elton could be too bad for friendship by it 
to their coming again. 
She was so much as she was striving into safety, and he knew 
that she had settled the moment which she said," i can not belong 
at hints, and where you have, as possible to your side, 
i never left you from it in devonshire; and if i am rendered 
as a woman," said emma," they are silent," said elizabeth," 
because he has been standing to darcy, and marianne?"" oh! 
No, no, well," said fitzwilliam, when the subject was for a week, 
that no time must satisfy her judgment.

Note: Lines have been wrapped, but formatting remains otherwise unaltered

Both are quite readable, except for the injudicious use of quote marks (a problem that is most likely the result of redundant spaces picked up during pre-processing).

Even allowing for over-fitting, it is quite clear that in the case of Jane Austen's text the snippets generated by Syntax Char RNN are more readable and cohesive than those from naive Char RNN. Among the former, those produced via spaCy also show a marked improvement over the results produced via DRAGNN parsing.

Since the only significant difference between the Syntax Char RNN runs trained on Jane Austen texts were the data from the two different parsers, these findings suggest that accuracy of parsing between DRAGNN and spaCy likely accounts for the difference in performance and readability between the runs. This in turn suggests that a lack of accurate parsing accounts for the poor results achieved with Tiny Shakespeare.


The code is available on github. Comments and suggestions welcome.

The majority of work was around pre-processing (parse and prepare). For training and sampling I was able to build on the existing Pytorch Char RNN by Kyle Kastner (which in turn credits Kyle McDonald, Laurent Dinh and Sean Robertson). I altered the script interface, but the core process remains largely the same.  

Concerns and Caveats

The approach and implementation isn't ideal. Below are some considerations.

  1. Pre-processing is complex. It has to make assumptions about the parser input, bring character and syntax encoding together, and try to remove data that can skew the weightings.
  2. Pre-processing can be slow. It can take anything from a few seconds to tens of minutes, depending on the size and complexity of the file.
  3. POS parsing is imperfect. The results suggest spaCy is doing a better job than DRAGNN, but even at its best it will have errors.
  4. Applicability is limited to text that is at least consistently parseable by an available parser. Poetry, archaic language, social media messages etc. likely fall outside this scope.
  5. Some POS units are known to cause skewing by introducing extra spaces. For example "Mary's" becomes:
's    : PART++POS
All punctuation are separated out as well, for example:
, : PUNCT++,
The effect is that these parts of speech remain tokenised when encoded, resulting in redundant spaces on one side of each token. Unless they are subsequently removed, spaces become overrepresented in the resulting encoding, affecting weightings for all character representations ever so slightly. The code manages to remove some of these redundant spaces - with occasional side effects - but not all.

To Do

Avenues to investigate, features to add.
  1. Reduce the number of redundant space encodings
  2. Calculate a more granular weighting for type
  3. Investigate further candidates for context encoding, over and above syntax
  4. Investigate more elegant ways of grouping encoding contexts
  5. Add validation loss for comparison to training loss
  6. Estimate parser accuracy for a specific text
  7. Run on GPUs!


The findings suggest that in some cases where parsing is accurate and consistent, Syntax Char RNN trains faster and achieves better results than naive Char RNN. This lends support to the hypothesis that accurate contextual encodings, over and above syntactic Parts of Speech, can improve Char RNN's autoencoding.

While they come with several caveats, the findings nonetheless warrant further experimentation and clarification.