Pepijn Stoop | Will ChatGPT soon be making our exams?

Pepijn Stoop ,
12 januari 2023 - 10:22

Deel op

Just before the New Year, the “Help! AI is going to replace us!” cry was heard: the linguistic AI tool ChatGPT evoked enthusiasm, but also concerns. As an artificial intelligence student, I was immediately sold by ChatGPT's potential.

Foto: Sara Kerklaan

What exactly is ChatGPT? It's a model that can understand language in context and respond accordingly. The model has been trained on billions of pieces of online text: from forum responses to articles. This makes it a master at analyzing the context of a question like “What should I eat for dinner tonight?” as well as “Explain to me what a black hole is.” To the latter question, the model responds with an answer as a student might write it: with a strong structure, multiple paragraphs, and a clear conclusion.

Wow! ChatGPT managed to find Turing's article, extract Lovelace's opinion, and write it down in understandable language

Okay, sounds great, but won't students use this en masse to do their homework? And can I, in my role as teaching assistant, detect that? Let's test it. In September, I assistant taught the course Introduction to Artificial Intelligence, a course where KI students learn the basics of AI and have to answer fundamental questions. As I mentioned earlier, ChatGTP is very good at answering such questions.

A recent exam question from this course was: Describe Lady Lovelace's objection (Turing 1950) in your own words. AI pioneer Alan Turing cited Lovelace's opinion in his article: she believed that machines could not be intelligent because they only do what is instructed and could never generate new ideas. The review model says the following:

Lady Lovelace's objection is that a machine can only do that which is commanded (20 points)
Can a deterministic machine be used for a conditioned reflex, or in other words, can a machine learn? (10 points)
A Turing machine depends on input and the program. Can the injection of an idea lead to an explosion of new ideas (supercritical)? (10 points)
A variant of Lady Lovelace's objection is that a machine can never do anything new (5 points)
A better variant is that a machine can never surprise us (5 points)

Could the program answer the exam question correctly?

ChatGPT gave this as its answer:

Ada Lovelace, also known as Lady Lovelace, was a mathematician and writer who is known for her work on Charles Babbage’s proposed mechanical general-purpose computer, the Analytical Engine. In a footnote to an article by Luigi Menabrea about the Analytical Engine, Lovelace wrote a series of comments that are now known as “Notes on the Analytical Engine.” These notes contain what is considered to be the first published algorithm intended to be processed by a machine.

In these notes, Lovelace expressed an objection to the idea that a machine could be capable of creating original and creative works, as proposed by Alan Turing in his 1950 paper Computing Machinery and Intelligence. Lovelace argued that a machine could only perform tasks that it had been specificaly programmed to do and could not produce anything original or creative on its own. She wrote: “The analytical engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.”

Lovelace’s objection has been influential in shaping the debate about the capabilities of artificial intelligence and the potential for machines to exhibit creative or original thought.

Wow! ChatGPT managed to find Turing's article, extract Lovelace's opinion, and write it down in understandable language. Putting it next to the answer model, we see that this answer satisfies item 1: “Lovelace argued ...” and item 4: “... anything original ....” 30 points yields a 10 for this question, so the program scores an 8. I ran the answer through the best-known GPT detector that claims to be able to predict whether something comes from ChatGPT or not. It failed: the text was 85 percent authentic, according to the detector.

Still, the answer feels artificial. The average student would not be able to reproduce the first paragraph about Lovelace's work because it mentions terms not covered in the subject. Can this answer be called “in its own words”? Or is it plagiarism? The professor of the subject, Dr. Arnoud Visser, believes the latter: “This answer is fraudulent because GPT can be considered an entity, almost a personality. Entities should be given credit when you quote them.”

As a teaching assistant, I find that ChatGPT does a good job of answering questions about class material, but the whole thing still feels somewhat artificial. So, dear students I teach in 2023: I would still mostly keep doing your homework yourself.

Pepijn Stoop is an artificial intelligence student at the UvA.