A task-oriented review of multi-turn interaction with LLMs, produced at Carnegie Mellon University. The paper's first draft was completed in April 2025 and extended with a transparent PRISMA-ScR-inspired review protocol through April 2026.
Heinz College, Carnegie Mellon University.
yubol@andrew.cmu.edu
Carnegie Mellon University.
xiaobins@andrew.cmu.edu
Carnegie Mellon University.
yidim@andrew.cmu.edu
Carnegie Mellon University.
xding2@andrew.cmu.edu
Carnegie Mellon University.
xinyuyao@andrew.cmu.edu
Carnegie Mellon University.
rk2x@andrew.cmu.edu
Heinz College, Carnegie Mellon University.
rpadman@andrew.cmu.edu
† Co-second authors (equal contribution). ‡ Co-third authors (equal contribution).
The core scope is settings in which an LLM participates in sequential text-based interaction and is evaluated on its ability to maintain context, adapt across turns, and achieve task success along a dialogue trajectory. We organize this literature into two task families (instruction following and conversational engagement) spanning six high-impact domains (math, coding, healthcare, education, role-play, jailbreak).
We treat LLM-based agents as adjacent rather than central. Agentic systems often extend multi-turn interaction with tool use, explicit planning, environment manipulation, or multi-agent coordination over richer action spaces. We review them when they directly inform advances in multi-turn interaction but we do not survey the agent literature exhaustively.
We exclude multimodal LLMs (MLLMs) from the main scope. Multimodal systems involve substantially different observation spaces, task formulations, and evaluation protocols. Limiting to non-multimodal settings keeps the scope coherent.
Borderline works are included when they directly inform multi-turn interaction. When a paper's primary contribution lies in agent planning, environment control, or multimodal perception, we treat it as adjacent rather than core.
The original manuscript was largely completed in April 2025, and in the current revision we retrospectively document the corpus-construction process used for that version while extending the survey to papers available through April 2026 under the same inclusion, exclusion, and boundary-setting rules.
Rather than claiming a fully prospective systematic review, we adopt a PRISMA-ScR-inspired transparency protocol: the paper's Appendix (Review Methodology) reports the search sources, keyword families, screening stages, inclusion and exclusion criteria, and the logic by which papers were assigned to the final corpus. This improves methodological transparency while remaining consistent with the task-oriented narrative synthesis the survey provides.
If this survey or its companion website has helped your research, please cite us:
@article{li2025beyond,
title = {Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models},
author = {Li, Yubo and Shen, Xiaobin and Yao, Xinyu and Ding, Xueying and
Miao, Yidi and Krishnan, Ramayya and Padman, Rema},
journal = {arXiv preprint arXiv:2504.04717},
year = {2025}
}
docs/)We thank collaborators and colleagues at Carnegie Mellon University for feedback throughout the drafting of this survey. Bibliographic work, benchmark auditing, and cross-cutting synthesis were aided by careful reviewer critique on the early version. Any remaining errors or omissions are our own.
This survey is itself subject to four biases, scoped to the review process (not system-level ethics, which we treat in Challenges).
We frame this survey as a transparent task-oriented synthesis rather than an exhaustive canon, and hope it will serve as a starting point for future extensions into multimodal, embodied, and fully agentic multi-turn systems.