Large Language Models often work very well and can do everything from solving bugs to coming up with exam questions. But that has enormous consequences for education, Han van der Maas observes. “What should the final attainment levels of our education be now? What should a student still know about statistics and mathematics?”
Large Language Models (LLMs), a type of generative AI trained to generate enormous datasets, cannot count, give fake references and even advise suicide. To train LLMs, dubious American and Chinese tech companies use enormous amounts of electricity and data that have been obtained improperly. By using LLMs, we are also giving away all kinds of data and regularly violating the GDPR. And yet I use them all day long: AI is the new smoking.
Kitchen cabinets
LLMs often work very well. They improve texts, find new literature, explain mathematics, look up the dimensions of kitchen cabinets, come up with exam questions, fix bugs in code, and so on. Not flawlessly, of course, but so well that I can hardly imagine how I could do without them. I never thought a statistical model could be so useful, and to be honest, I still don’t quite understand how that’s possible (my first scientific article was about neural networks of four nodes).
I am now once again amazed by the development of autonomous programming assistants, or agentic coding. I am not a pioneer in this field, and for some, this column contains only old news. But for many, that is not the case. In addition to the early adopters and avid users, and of course the opponents of AI in research and education, I see many colleagues who use it sparingly and do not have a good understanding of the latest developments. And as a student, teacher and researcher, you need to be aware of those developments.
Autonomous
The development of autonomous programming assistants is a major step forward. Anyone can now develop software in a short time and for little money. Tools such as OpenAI’s Codex, Anthropic’s Claude models and GitHub Copilot can develop complete software packages based on a plan written in natural language. I tested this by developing a website without looking at the code even once. It took five minutes to install the necessary tools.
My plan was for a game that people could play against each other online. I described that game, StrategoChess, a combination of Stratego and chess, in a 415-word document and placed it in the directory where these tools could do their thing. Codex first asked me for some clarification (do you also want an offline mode, should there be a chess clock, and so on) and then had a first working version ready within minutes. I then asked for all kinds of improvements, such as a nicer board, sounds and a better layout, and after an hour or two, the project was more or less finished. The interaction with these tools is particularly efficient. What is particularly striking is how well they keep an eye on the intention of the project and come up with useful suggestions themselves. “Yes, go ahead,” I say. For a good web programmer, such a project would probably take weeks of work. StrategoChess, which according to its creator is a really great game. Play with me!
Explosion
And this is just a game. All around me, I see an explosion of serious applications in which existing code is greatly accelerated and teams of specialised agents work on large projects. The next step is towards autonomous agent systems, using OpenClaw, for example. OpenClaw no longer waits for commands, but largely does its own thing. I am waiting to see how well that works in practice.
What does this mean for our research? Anyone can now develop software, such as tools for collecting data. Data analysis can also be performed by agents. In a dialogue about a dataset, they propose analyses, carry them out and, if desired, present the results in excellent graphs. Much has already been said about LLM co-writing; in any case, it saves me a lot of time. One effect of this is that scientific journals now have to process many more articles, which is a problem because the willingness to review does not increase proportionally. We can only hope that this development will ultimately translate into better, rather than simply more, scientific articles. This is clearly a major challenge for the research community.
With the pen
This also has enormous consequences for our education. What should the final attainment levels of our education be now? What should a student still know about statistics and mathematics? To what extent should you still be able to programme or write independently without such tools, with the pen, so to speak? Until recently, I insisted that students needed a decent basic level of programming, but now I am beginning to have doubts. I also thought that an annual writing assignment in a room without aids was a good idea. Now I’m not so sure.
Han van der Maas is full professor Methods of Psycholgy.