AI-Generated Slides: Are They Good? Can Students Tell?
Juho Leinonen, Lisa Zhang, Arto Hellas
Problem
This preprint addresses the gap in understanding the effectiveness of generative AI (GenAI) tools in educational contexts, specifically their ability to generate instructional slides from instructor-authored course notes. While there is growing interest in leveraging GenAI for educational purposes, empirical evaluations of the quality of AI-generated materials and their reception by students remain underexplored. The authors aim to assess both the quality of slides produced by various GenAI tools and students’ perceptions of these slides compared to those created by instructors.
Method
The study employs an end-to-end educational tool, NotebookLM, alongside two general-purpose large language models (LLMs)—Claude and M365 Copilot—and two coding assistants—Cursor and Claude Code. The authors conduct a narrative assessment of the slides generated by these tools, focusing on criteria such as accuracy, completeness, and pedagogical soundness. Selected slides are then modified and utilized in a real course setting. Student perceptions of the quality of AI-generated slides versus instructor-created slides are evaluated through surveys. The analysis includes qualitative assessments from educators and quantitative ratings from students, allowing for a comprehensive evaluation of the tools’ effectiveness.
Results
The findings indicate that slides generated by coding assistant tools (Cursor and Claude Code) were rated highest in terms of accuracy, completeness, and pedagogical soundness. Notably, students rated the quality of GenAI-generated slides comparably to that of instructor-created slides, with no significant ability to distinguish between the two. The study reports a negative correlation between high quality ratings and the identification of slides as AI-generated, suggesting that students may associate lower quality with AI sources. These results highlight the potential of GenAI tools to produce educational materials that meet or exceed traditional standards.
Limitations
The authors acknowledge several limitations, including the potential bias in narrative assessments from educators who may have varying levels of familiarity with GenAI tools. The study’s reliance on a single course context may limit the generalizability of the findings across different educational settings and disciplines. Additionally, the survey methodology may not capture the full spectrum of student perceptions, particularly regarding long-term engagement with AI-generated materials. The authors also note the need for further research to explore the implications of integrating GenAI into instructional design and the ethical considerations surrounding its use.
Why it matters
This research has significant implications for the integration of GenAI in educational workflows, suggesting that such tools can produce high-quality instructional materials that are well-received by students. The findings encourage educators to explore the responsible and effective use of GenAI in their teaching practices, potentially transforming instructional design processes. Furthermore, the study raises important questions about student perceptions of AI-generated content, which could inform future research on the acceptance and trustworthiness of AI in educational contexts.
Authors: Juho Leinonen, Lisa Zhang, Arto Hellas
Source: arXiv:2605.13532
URL: https://arxiv.org/abs/2605.13532v1
By Turing Wire editorial staff · May 13, 2026 · Editorial standards →
Source: arXiv cs.CL