It’s perhaps the ultimate goal: eliminate teacher workload by letting AI do all the marking. Even better, have the AI use all this data to design personalised follow up work for each pupil, and a detailed report on what to teach next for each teacher.
It’s also the goal we’re working towards at stylus — and we’re well on the way!
When we tell teachers about LearnCycle — our AI-marked, human-moderated feedback service — we’re often met with some scepticism. This is understandable: if you’ve ever experimented with getting AI to mark a piece of work, you’ll know how complex a process it can be.
The most sceptical of all are English teachers. In the world of English, nuance and complexity reign supreme and there are, famously, no right or wrong answers (though as a former English teacher myself, I’d like to see the latter notion consigned to the bin).
So how do you train AI to mark a piece of writing?
The structure of English
At its most simplistic, there are two forms of writing in an English lesson: writing critically, and writing creatively. The former means writing about what someone else wrote in the form of essays or reviews, and the latter means writing something of your own: stories, poems, speeches, letters, and anything else that is not structured around someone else’s work.
We’re interested in the latter today.
Broadly speaking, English teachers are concerned with two things when marking creative writing:
The technical accuracy — spelling, punctuation and grammar
The content — language and structural devices, and the pupil’s ability to choose these appropriately according to the task and audience
Point one above will generally remain the same across all writing (and all subjects), with the occasional deviation such as the decision to use slang when writing dialogue.
Point two is much more open to variation, and requires identification of both a purpose (persuading, instructing, informing) and a form (newspaper article, speech, brochure). Purpose is broadly concerned with the choice of language or linguistic device, while form includes structural devices along with elements of language.
There are therefore three areas in which we need to train an LLM:
Technical accuracy | Crafting by purpose | Crafting by form |
E.g. use of full stops, commas, semicolons | E.g. use of direct address and rhetorical questions in a piece of persuasive writing | E.g. Opening a letter with ‘Dear…’ and closing with ‘Yours sincerely’ or ‘Yours faithfully’ |
Step one in training an LLM to mark writing therefore means teaching it the detail of each category.
Complexity at the top end
There will often be overlap across the three categories above: a rhetorical question, for instance, might be typical of several purposes and forms. As pupils develop a sense of individual voice in their writing, the lines between each area increasingly blur — they become more experimental, and at the top end can use language, structure and linguistic devices in surprising, and increasingly successful, ways.
Thus, we also need to train the LLM through examples, and build in scope for uncertainty. Most teachers will be familiar with the concept of comparative judgement — rather than scoring each piece individually, teachers instead compare only two and determine which is better. Over time, this produces a ranked list to which scores can be applied if needed.
A similar method can be used to help LLMs recognise ‘what a good one looks like’ and in turn, ‘why this one is better than that one’. Just as humans become better writers through reading and unpacking examples of good writing, LLMs become better markers when challenged, repeatedly, to apply what they already know in marking a text, and then justify their decisions.
Zooming in: commas in focus
To explore the full process of training an LLM to mark writing would mean unpacking each of the three categories above in turn, and is perhaps the work of a novella rather than an article! Instead, let’s zoom in on one specific focus: commas.
Commas are, in my view, the most challenging piece of punctuation for pupils to get to grips with — they are highly versatile, and perhaps the most poorly-defined piece of punctuation (just watch a grammar aficionado wince when a child declares that commas simply ‘create a pause’).
Since LLMs are only as good as the body of knowledge provided to them, for us to see meaningful feedback offered we need to ensure we define clearly how we want the LLM to look for, and feed back on, commas: quality in for quality out.
We need to teach the LLM specific comma rules to look for: commas separating items in a list, for instance, or a single comma separating a subordinate clause from a main clause. Each of these needs multiple supporting examples.
We then need the LLM to be able to differentiate between the layers of complexity in the use of commas. For instance, in the two sentences below, which pupil might be said to be writing to a more advanced standard?
A: I bought flour, eggs, butter and milk.
B: As soon as I started cooking, I realised I was missing several ingredients.
With all else being equal, ‘B’ is probably stronger than ‘A’ because it is a complex sentence — yet it uses only one comma, compared to the two in ‘A’. Our LLM needs to understand that using commas to separate main and subordinate clauses is generally more skilful than using commas in a list. Simply counting the commas and awarding a mark for each one used correctly is not enough.
Then we challenge the LLM with a second example and ask it, again, which is better:
A: The shop, which was only a few metres down the road, was filled with options, and for a moment I forgot why I was there. I bought flour, eggs, butter and milk. It was only on my return that I remembered I was making soup.
B: As soon as I started cooking, I realised I was missing several ingredients. Quickly walking to the shop, I bought what I needed. The moment I got home, I started cooking.
This time, I expect you picked ‘A’. How would you explain this to an LLM, given that:
Both examples use commas correctly on more than one occasion
Both examples use commas accurately in complex sentences
For a start, we can add a new rule to help the LLM determine which is the better example: it is more skilful to apply commas in a range of ways, than in a single way repeatedly. It is more skilful to apply multiple comma rules within the same sentence, than a single rule. We might also start to define by exclusion: not every sentence should use commas.
Making a judgement
There is, of course, more going on than simply the commas to make ‘A’ the stronger example, for instance:
It uses a variety of sentence types
It creates a moment of humour through the contrast between the ingredients purchased, and the recipe the narrator is trying to make.
When the LLM makes its overall judgement, it will therefore need to draw together a series of ‘mini’ judgements from across a range of categories and rules, and in turn determine which is the highest priority for the pupil or teacher to work on next.
But, if the assessment is made following a series of lessons on the use of commas, we might also narrow the parameters such that the feedback only focuses on commas. Thanks to the training earlier on, we can now expect a higher quality of feedback, which might include:
A specific comma rule for the pupil to try and apply in their redrafted work
A worked example, followed by practice sentences for the pupil to punctuate
A breakdown for the teacher of exactly which rules which pupils can successfully apply
Hyper-targeted teaching like this can turn a complex piece of punctuation into a series of achievable small steps which, when mastered, can radically impact a pupil’s outcomes in assessed work.
Embedding good habits
Isolating a single rule in this way means we don’t just have to use examples from English exercise books. The recent Ofsted subject report into English highlights the tendency for misconceptions and mistakes to go uncorrected — and given English lessons might only account for (at most) an hour per day of a pupil’s school experience, it is not solely down to English teachers to challenge this.
At stylus, any work written on our headed paper can be scanned in and returned to us for marking: a teacher might therefore decide to send in samples of a pupil’s work across every subject in order to identify the technical mistakes they are making most often. Tired pupils rushing to finish an exam will fall into bad habits, but spotting the errors they make in lessons when they aren’t thinking about them gives us the opportunity to fix them before they start to cost marks.
We are in the early stages of trialling our technical marking service for writing, suitable for pupils in years 5-8. We are seeking schools to share with us their approach to teaching writing, and potentially to trial our service with small groups of pupils. If you would be interested in taking part, please register interest here.
Comments