Despite ongoing debates about whether so called large language models /generative language (and other media) tools are ‘proper’ AI (I’m sticking with the shorthand), my own approach to trying to make sense of the ‘what’, ‘how’, ‘why’ and ‘to what end?’ is to use spare moments to read articles, listen to podcasts, watch videos, scroll through AI enthusiasts’ Twitter feeds and, above all, fiddle with various tools on my desktop or phone. When I find a tool or an approach that I think might be useful for colleagues with better things to do with their spare time I will jot notes in my sandpit, make a note like this blog post comparing different tools or record a video or podcast like those collected here or, if prodded hard enough, try to cohere my tumbling thoughts in writing. The two videos I recorded last week are an effort to help non-experts like me to think, with exemplification, about what different tools can and can’t do and how we might find benefit in amongst the uncertainty, ethical challenges, privacy questions and academic integrity anxieties.
The video summaries were generated using GPT4 based on the video transcripts:
Can I use generative AI tools to summarise web content?
In this video, Martin Compton explores the limitations and potential inaccuracies of ChatGPT, Google Bard, and Microsoft Bing chat, particularly when it comes to summarizing external texts or web content. By testing these AI tools on an article he co-authored with Dr Rebecca Lindner, the speaker demonstrates that while ChatGPT and Google Bard may produce seemingly authoritative but false summaries, Microsoft Bing chat, which integrates GPT-4 with search functionality, can provide a more accurate summary. The speaker emphasizes the importance of understanding the limitations of these tools and communicating these limitations to students. Experimentation and keeping up to date with the latest AI tools can help educators better integrate them into their teaching and assessment practices, while also supporting students in developing AI literacy. (Transcript available via Media Central)
Using a marking rubric and ChatGPT to generate extended boilerplate (and tailored) feedback
In this video, Martin Compton explores the potential of ChatGPT, a large language model, as a labour-saving tool in higher education, particularly for generating boilerplate feedback on student assessments. Using the paid GPT-4 Plus version, the speaker demonstrates how to use a marking rubric for take-home papers to create personalized feedback for students. By pasting the rubric into ChatGPT and providing specific instructions, the AI generates tailored feedback that educators can then refine and customize further. The speaker emphasizes the importance of using this technology with care and ensuring that feedback remains personalized and relevant to each student’s work. This approach is already being used by some educators and is expected to improve over time. (Transcript available via Media Central)
I should say that in the time since I made the first video (4 days ago) I have been shown a tool that web connects ChatGPT and my initial fiddling there has re-dropped my jaw! More on that soon I hope.
I have recently had a few interesting conversations about how our approaches to teaching and assessment in higher education might change post-Covid and it seems apparent that ‘consensus’ is unlikely to be the defining word as we move forward. In a few of those instances I was talking about ‘ungrading’ and, judging by the more dismissive responses, I feel that a fuller understanding of what ungrading could be might help challenge some of my interlocutors’ assumptions and pre-judgements. In this post, I will start with a few provocations that I’d urge you to commit agreement or disagreement to before moving on. I will then offer a brief definition followed by some examples from my own practice and then a rationale with some links to other online articles (that deal with this topic more thoroughly and from a position of much greater expertise) before offering a rudimentary continuum of possibilities all of which can broadly sit under ungrading as an umbrella term.
Yes or no?
It is possible for practiced teachers/ lecturers to distinguish the quality of work to a precision of a few percentage points
Double marking will usually ensure fairness and reliability
For the purposes of student summative assessment, feedback is synonymous with evaluation
Grades (whether percentage scales or A-E) are useful for teachers and students
Individual teachers/ lecturers have little or no agency when it comes to making decisions about how to grade or whether to grade
If you said mostly ‘yes’ then you are likely to be harder to persuade but please read on! I’d very much like to hear reasoned objections to the arguments I try to pull together below. If you said mostly ‘no’ then I would like to hear about your ungrading activities, ideas or, indeed, ongoing reservations or obstacles.
As I mention above, ungrading is not a single approach but a broad range of possible alternative approaches and ways of seeing assessment and feedback. The reason I posed the yes/ no statements above was because the first prerequisite to trying an ungrading process is to hold (or be open to) a sentiment or value that questions the utility and effectiveness (and ubiquity) of grades on student work. Fundamentally, ungrading is, at one end of the scale, completely stopping the process of adding grades to student work. A less radical change might be to shift from graded systems to far fewer gradations such as pass/ not yet passed (so called ‘minimal grading’). A ‘dipping the toes’ approach might include more dialogue with students about their grades, self and peer assessment or grade ‘concealment’ as part of a process to encourage deeper connection with the actual feedback. Wherever ungrading happens on this continuum, it doesn’t mean not collecting information about what students are doing. By eschewing grades and rigid (supposedly measurable) criteria we open opportunities for wider, qualitative, multi-voiced narratives about what has been achieved.
My toe dipping
In my previous role I was lucky enough to work on one of the only post grad (PG) programmes across the whole university that did not use percentage grades. Instead, all summative work was deemed pass or fail. The reason for this was because my students were my colleagues studying for a PG Certificate in Teaching in HE. To grade colleagues was seen as problematic for all sorts of reasons and even discourteous. One senior colleague said it would open a can of worms to grade colleagues who would question grades on all sorts of bases. This immediately raises several questions:
Why is grading discourteous to colleagues but not to ‘normal’ students?
How did their status change the degree to which we evaluated (labelled?) them?
If pass/ fail worked OK (the only student/ colleagues who expressed disappointment at only having pass/ fail were high fliers it should be noted) and they achieved these qualifications, why wasn’t that happening on other qualifications?
Even if only appropriate with ‘professional’ students, why wasn’t it the default on, say, PG counselling programmes?
I pushed the ungrading a step further by de-coupling the previous ‘gatekeeping’ aspect of lesson observations from the graded assessment process (each had been deemed pass or fail to that point), by removing grading from formative work and by modifying the language used on first submission summatives to pass/ not yet passed.
I have also used audio and video feedback and sometimes coupled that with grade discussions/ negotiations and on others with embedding the grades within a multimedia response (harder to skim or ignore than text!). The biggest barriers in both instances were not the students but departmental and institutional pressures to conform to routine practice.
So why do it?
“When we consider the practically universal use in all educational institutions of a system of marks, whether numbers or letters, to indicate scholastic attainment of the pupils or students in these institutions, and when we remember how very great stress is laid by teachers and pupils alike upon these marks as real measures or indicators of attainment, we can but be astonished at the blind faith that has been felt in the reliability of the marking systems” (Finkelstein, 1913 – yes, 1913)
There are two ways of perceiving the above quote I suppose: 1. The utility of grading has won through. Over a hundred years on, their use is still ubiquitous so surely that’s evidence enough that Finkelstein was mistaken or 2. Once we get stuck in our ways in education it takes a monumental effort to change the fundamentals of our practices (cf. examinations and lectures).
Jesse Stommel reflects on the ubiquity and normalisation thus:
“Without much critical examination, teachers accept they have to grade, students accept they have to be graded, students are made to feel like they should care a great deal about grades, and teachers are told they shouldn’t spend much time thinking about the why, when, and whether of grades. Obedience to a system of crude ranking is crafted to feel altruistic, because it’s supposedly fair, saves time, and helps prepare students for the horrors of the “real world.” Conscientious objection is made to seem impossible.” (Stommel, 2018)
A century apart, both are objecting on one level to the claims (or assumptions) made in defence of grading: That they can provide accurate and fair measures; that there is no viable alternative; that they somehow prepare students for life after study. Although I admit I have not made a systematic review of the literature, it does seem much easier to find compelling research to suggest that grading has all sorts of reliability problems. Hooking back to my own (dis)interest in judgemental observations on the PG Cert HE, Ofsted (Governmental body responsible for overseeing standards in schools in England), the epitome of graded judgements, were eventually persuaded that the judgements their inspectors made about lesson observations were neither valid nor reliable. If such a body has issues with trained inspectors’ abilities to make fair judgements on a graded scale, it makes me wonder why similar discussions are not happening in that same body about teachers’ abilities to make fair, valid and reliable judgements of their students. One argument I have read to counter this is that it’s the best system we have for allocating places and deciding who is most worthy of merit – this is hardly a glowing accolade. In addition to this:
“ Grades can dampen existing intrinsic motivation, give rise to extrinsic motivation, enhance fear of failure, reduce interest, decrease enjoyment in class work, increase anxiety, hamper performance on follow-up tasks, stimulate avoidance of challenging tasks, and heighten competitiveness” (Schinske & Tanner, 2014)
And all this BEFORE we have even thought about implicit bias, the skewing of grading systems to favour elites and other prejudicial facets that are embedded in the assumptions that buttress them. Many esteemed experts in assessment and feedback are unequivocal in their concerns over grading and/ or the way grading is done. Chris Rust, for example argues:
“much current practice in the use of marks and the arrival at degree classification decisions is not only unfair but is intellectually and morally indefensible, and statistically invalid” (Rust, 2007)
So, how might we do something with this? Without actually changing anything I would argue a good starting point would be to develop and share a healthy scepticism about the received wisdom and convention (of grading as well as many other seemingly immutable educational practices). If we read more of the research and feel compelled to act but constrained by the culture, the expectations of students, by the demands of awarding bodies and so on perhaps we could experiment with removing grades from one or two pieces of work. Alternatively, we might begin to change the ‘front and centre’ aspect of grades by, for example, concealing them, within feedback or inviting students to determine or negotiate grades based on their feedback. Going further we might involve students more in determining summative grades as well as assisting us in defining criteria for success at the outset. We may decide to shift to a minimal grading model or elect to grade only major summatives or offer a single grade across an entire module (or year?) or, going further still, use outcomes of peer review and student self-assessment to determine grades. We may invite (with justifications) students to grade themselves (see Stommel example) or perhaps explore the possibilities of offering programmes that do not grade at all. As Stommel (2018) says:
If you’re a teacher and you hate grading, stop doing it.
Finkelstein IE. (1913) The Marking System in Theory and Practice. Baltimore: Warwick & York
Rust, C. (2007) Towards a scholarship of assessment, Assessment & Evaluation in Higher Education, 32:2, 229-237
Schinske, J., & Tanner, K. (2014). Teaching more by grading less (or differently). CBE—Life Sciences Education, 13(2), 159-166.
Blum, S. D., & Kohn, A. (2020). Ungrading: Why Rating Students Undermines Learning (And What to Do Instead). West Virginia University Press.
Elbow, P. (1997). Grading student writing: Making it simpler, fairer, clearer. New directions for teaching and learning, 1997(69), 127-140.
Winstone, N., & Carless, D. (2019). Designing effective feedback processes in higher education: A learning-focused approach. Routledge. (Broader, contemporary issues around feedback and assessment design)
The Reflect Educational Blogging platform is provided by UCL to allow students and staff to blog for teaching and learning purposes. Any views expressed on these pages do not necessarily represent the views of the University.