AI text generation: Should we get students back in exam halls?

There’s a lot of talk about in-person, invigilated, hand-written exams being the obvious solution to assessment concerns being discussed across education in light of developments in what is porpularly referred to as AI.  Putting aside scalability issues for now, I have looked at some of the literature on utility and impact of such exams so that we might remind ourselves that there is no such thing as a simple and obvious solution!

According to Williams and Wong (2009) in-person, closed-book exams are: 

an anachronism given the human capital needs of a knowledge economy, not just because of the absence of technology that is used routinely in everyday business and commerce, but because this type of examination instrument is incompatible with constructivist learning theory that facilitates deep learning (pp. 233-234). 

My own sense was that during the pandemic we were finally able to leverage circumstance along with similar arguments to effect change. We saw successful implementation of alternative assessments such as ‘capstones’, grade averaging and take-home exams as the examinations themselves were cancelled, modified or replaced. But since the great return to campus,  we have witnessed a reinvigoration of enthusiasm for the return of exams, the re-booking of exhibition centres and conference halls to host them and hear many academic colleagues doubling down on the exam as a panacea as the capabilities of generative AI tools have caught the World’s attention. 

Non-pedagogic reasons are often heard in support of the ‘traditional’ exam (imagine, red bricks, sun shining through windows and squeaky invigilator shoes).  These may invoke convention and tradition as well as pragmatic reasons of identity confirmation and significant reductions in marking time where feedback is not required to be given on examinations (Lawrence & Day, 2021). It has to be said, that the widely held belief that examinations promote rigour is supported by some research (especially in medical education). So, for example, students spend more time preparing for traditional exams and attend to studies more assiduously (Durning et al. , 2016). Durning et al. also argue that medical students need to have the knowledge to hand and that the students who do well in these exams do better by their patients. Misunderstandings about the nature of open book exams and (over) confidence in their ability to find answers in sources available leads to less preparation for open book exams and can lead some students to spend more time searching than producing (Johanns et al., 2017).   In addition, closed-book, in-person exams are believed to reduce cheating in comparison to open book exams or other assessment types (Downes, 2017; D’Souza and Siegfeldt, 2017). Although exams are seen to favour high-achieving students (Simonite, 2010), it is interesting to note that high achievers are more likely to cheat in exams (Ottaway et al., 2017).   

Online exams in particular are found to increase the likelihood of ‘cheating’ and lead to confusions about what is permitted and what constitutes collusion (Downes, 2017). However, whether cheating is less likely in closed book exams is contested (Williams, 2006). Williams and Wong (2009) argue that of open book exams where the pressure and dependency on memorization are reduced:

“The opportunity for academically dishonest practice is less because of the way these examinations are structured, but so is the temptation to resort to this kind of behaviour in the first place” (p.230).

Whilst online exams  are perceived to be more reliable and efficient (sample student group n=342) compared to paper-based exams (Shraim, 2019), both staff and students perceive opportunities for cheating to be easier in online modes (Chirumamilla et al., 2020) 

There are three dominant themes in the literature which focus on issues with traditional examinations: pedagogic, wellbeing and inclusivity. Closed exams tend to focus on recall and memorization at expense of higher order/ critical thinking (Bengtsson, 2019). Significant proportions of students use memorization techniques and consequently can perceive exams as unfair when exam questions do not mirror problems or content they have practiced (Clemmer et al., 2018). Open book exams de-emphasize memorisation imperatives (Johanns et al., 2017). Open book/ open web – when well-designed (e.g. problem based) is seen as more authentic, more applicable to real-world scenarios, and more learner-directed and bridges the learning with social context (Williams and Wong, 2009). 

Exams put ‘unnatural pressure’ (Bengtsson, 2019, p.1) on students that affects performance. The common perception that stress is ‘good for students’ is undermined by studies that show impeded cognition and outcome in stressed students (Rich, 2011). Students tend to prefer coursework or coursework + exams rather than exams alone (Richardson, 2015; Turner and Briggs, 2018). A small study of student perceptions of alternatives offered due to Covid-19 found that replacing traditional examinations with open-book, take home examinations found the stresses reported were replaced by technical anxieties and a sense that the papers were much harder than traditional invigilated exams would have been (Tam, 2021). A study in New Zealand of ‘take home tests’ however, found students performed better and saw learning and anxiety reduction benefits (Hall, 2001).  

A comparative study of undergraduate psychology students found greater student satisfaction and pass rates for students undertaking coursework, slightly lower satisfaction and pass rates for seen exams and lowest satisfaction and pass rate for the unseen exams which meant students saw as unfair, stressful and invalid due to need to memorize (Turner and Briggs, 2018).  

Although Richardson’ s (2014) review found studies offer contradictory findings in terms of ethnicity and performance in exams and coursework, all ethnicities tend to do better in terms of grade profile with coursework.  However, markers are idiosyncratic, privilege ‘good’ language and expression (Brown, 2010) and this contributes to higher degree outcomes for primary/ first language English speakers over English as second language speakers (Smith, 2011). Coursework increases consistency of marks across types of assessment, improves mean performance in terms of final degree outcomes and counter-balances disproportionate disadvantage of exams faced by students whose means scores are low (Simonite, 2010).  

It goes without saying that there is no ‘one size fits all’ solution but we do need to think carefully, in light of research, of the consequences of the decisions we make now about how we manage assessment in the future. It would be foolish to knee-jerk our  responses though. Just because the wheels of change move so slowly in universities, shifts back to exams may appear to offer a path of least resistance. Instead, our first consideration must be modifications and innovations that address issues but are also positive in their own right. We need to consider the possibilities of more programmatic assessment for example or perhaps learn from medical education ‘OSCE’ assessments where knowledge and communication are assessed in simulated settings or even look further to other higher education cultures where oral assessments are already the default. To achieve this level of change we need to recognise that AI is a catalyst to changes that many have been advocating (from a research-based position) for a long time but have often only achieved limited success if the resource for change has not accompanied that advocacy.

References 

Bengtsson, L. (2019). Take-home exams in higher education: a systematic review. Education Sciences, 9(4), 267. 

Brown, Gavin. (2010). The Validity of Examination Essays in Higher Education: Issues and Responses. Higher Education Quarterly. 64. 276 – 291. 10.1111/j.1468-2273.2010.00460.x. 

Chirumamilla, A., Sindre, G., & Nguyen-Duc, A. (2020). Cheating in e-exams and paper exams: the perceptions of engineering students and teachers in Norway. Assessment & Evaluation in Higher Education, 45(7), 940-957. 

Clemmer, R., Gordon, K., & Vale, J. (2018). Will that be on the exam?-Student perceptions of memorization and success in engineering. Proceedings of the Canadian Engineering Education Association (CEEA). 

Downes, M. (2017). University scandal, reputation and governance. International Journal for Educational Integrity, 13(1), 1-20. 

D’Souza, K. A., & Siegfeldt, D. V. (2017). A conceptual framework for detecting cheating in online and take‐home exams. Decision Sciences Journal of Innovative Education, 15(4), 370-391. 

Durning, S. J., Dong, T., Ratcliffe, T., Schuwirth, L., Artino, A. R., Boulet, J. R., & Eva, K. (2016). Comparing open-book and closed-book examinations: a systematic review. Academic Medicine, 91(4), 583-599. 

Hall, L. (2001). Take-Home Tests: Educational Fast Food for the New Millennium? Journal of the Australian and New Zealand Academy of Management, 7(2), 50-57. doi:10.5172/jmo.2001.7.2.50 

Johanns, B., Dinkens, A., & Moore, J. (2017). A systematic review comparing open-book and closed- book examinations: Evaluating effects on development of critical thinking skills. Nurse Education in Practice, 27, 89-94. https://doi.org/10.1016/j.nepr.2017.08.018  

Lawrence, J. & Day, K. (2021) How do we navigate the brave new world of online exams? Times Higher Available: https://www.timeshighereducation.com/opinion/how-do-we-navigate-brave-new-world-online-exams [accessed 17/6/21] 

Ottaway, K., Murrant, C., & Ritchie, K. (2017). Cheating after the test: who does it and how often?. Advances in physiology education, 41(3), 368-374. 

Rich, J. D. (2011). An experimental study of differences in study habits and long-term retention rates between take-home and in-class examinations. International Journal of University Teaching and Faculty Development, 2(2), 121. 

Richardson, J. T. (2015). Coursework versus examinations in end-of-module assessment: a literature review. Assessment & Evaluation in Higher Education, 40(3), 439-455. 

Shraim, K. (2019). Online examination practices in higher education institutions: learners’ perspectives. Turkish Online Journal of Distance Education, 20(4), 185-196. 

Simonite, V. (2003). The impact of coursework on degree classifications and the performance of individual students. Assessment & Evaluation in Higher Education, 28(5), 459-470. 

Smith, C. (2011). Examinations and the ESL student–more evidence of particular disadvantages. Assessment & Evaluation in Higher Education, 36(1), 13-25. 

Tam, A. C. F. (2021). Students’ perceptions of and learning practices in online timed take-home examinations during Covid-19. Assessment & Evaluation in Higher Education, 1-16. 

Turner, J., & Briggs, G. (2018). To see or not to see? Comparing the effectiveness of examinations and end of module assessments in online distance learning. Assessment & Evaluation in Higher Education, 43(7), 1048-1060. 

Williams, J. B., & Wong, A. (2009). The efficacy of final examinations: A comparative study of closed‐book, invigilated exams and open‐book, open‐web exams. British Journal of Educational Technology, 40(2), 227-236. 

Williams, J. B. (2006). The place of the closed book, invigilated final examination in a knowledge economy. Educational Media International, 43, 2, 107–119.