Bonsai Language Pt 1
What happens when you throw human communication into the gears of an adversarial ML loop?
Disclaimer: I’m not an academic, so take any concrete statements regarding the process with a grain of salt here.
I was astonished last year to discover that Bonsai trees are genetically identical to regular trees. I had assumed that selective breeding was a factor in producing suitable dwarf trees for cultivation, but no, all bonsai trees are regular trees that have been carefully grown that way.
I thought of those strange, distorted trees recently as I contemplated the ongoing - but escalating - arms race of academic essays.
In order for academic essays to “work”, they have to be written by the student and assessed by the teacher, two humans. The student’s language is expected to conform to general and institute-specific standards of formal writing, and informally, clever students identify their marker’s personal language pecadillos and make sure to adapt their work to appeal to this semi-conscious bias.
Of course, an essay can be a hefty effort, especially for the student who has not gained the understanding that the essay is supposed to demonstrate. Attempting to cheat by either plagiarising other texts or paying someone else to do the work is a time-honoured tradition. Some folks have done very well out of this blackmarket for essays.
On the other side, marking an essay represents a smaller individual effort, but quickly becomes a hefty task when faced with several classes of such essays. The economic pressure on teachers in the modern world is always longer hours, more classes, fewer colleagues. Faced with hundreds of vaguely similar essays to mark out of hours, teachers without the luxury of TA’s have been known to resort to loose heuristics of differing levels of quality.
This is all before a computer enters the mix. In fact, the primary innovation in this process over the last twenty years has been the establishment of geo-arbitraged essay mills, where wealthy students in one country have poor but well-educated people in another country write essays for them.
Unwilling to throw away the essay format, the academic institutions have responded to an increased perception of student plagiarism and ghostwriting with the beginnings of computer-aided marking. The essay is reviewed against a large and growing corpus of known papers to identify wholesale copying, and compared with the essayist’s other writing to assess if the work is by the same author.
Does this actually work? Well, probably it partly does. Maybe even enough to justify it. But like so many forensic techniques, the accuracy gap between actual validity and where wishful thinking would like it to be is worth questioning.
Students (and supporting industries) didn’t stand still for this. Teachers started receiving essays with odd synonym use, words substituded without any apparent understanding of context, rendering paragraphs essentially meaningless. The students were feeding plagiarised text into Google Translate and rotating it through nearby languages to generate novelty. It’s a crude technique that tends not to fool human readers, but might get past a plagiarism detector (at least for a while).
I’ve run out of space for further words, so stay tuned for a Part 2, if I don’t get distracted!