About & Cite

About this survey.

A task-oriented review of multi-turn interaction with LLMs, produced at Carnegie Mellon University. The paper's first draft was completed in April 2025 and extended with a transparent PRISMA-ScR-inspired review protocol through April 2026.

Authors

Who wrote this.

Yubo Li (lead)

Heinz College, Carnegie Mellon University.
yubol@andrew.cmu.edu

Xiaobin Shen †

Carnegie Mellon University.
xiaobins@andrew.cmu.edu

Yidi Miao †

Carnegie Mellon University.
yidim@andrew.cmu.edu

Xueying Ding ‡

Carnegie Mellon University.
xding2@andrew.cmu.edu

Xinyu Yao ‡

Carnegie Mellon University.
xinyuyao@andrew.cmu.edu

Krishnan Ramayya

Carnegie Mellon University.
rk2x@andrew.cmu.edu

Rema Padman (advisor)

Heinz College, Carnegie Mellon University.
rpadman@andrew.cmu.edu

† Co-second authors (equal contribution). ‡ Co-third authors (equal contribution).

Scope

What this survey covers — and what it doesn't.

The core scope is settings in which an LLM participates in sequential text-based interaction and is evaluated on its ability to maintain context, adapt across turns, and achieve task success along a dialogue trajectory. We organize this literature into two task families (instruction following and conversational engagement) spanning six high-impact domains (math, coding, healthcare, education, role-play, jailbreak).

We treat LLM-based agents as adjacent rather than central. Agentic systems often extend multi-turn interaction with tool use, explicit planning, environment manipulation, or multi-agent coordination over richer action spaces. We review them when they directly inform advances in multi-turn interaction but we do not survey the agent literature exhaustively.

We exclude multimodal LLMs (MLLMs) from the main scope. Multimodal systems involve substantially different observation spaces, task formulations, and evaluation protocols. Limiting to non-multimodal settings keeps the scope coherent.

Borderline works are included when they directly inform multi-turn interaction. When a paper's primary contribution lies in agent planning, environment control, or multimodal perception, we treat it as adjacent rather than core.

Methodology

How the corpus was built.

The original manuscript was largely completed in April 2025, and in the current revision we retrospectively document the corpus-construction process used for that version while extending the survey to papers available through April 2026 under the same inclusion, exclusion, and boundary-setting rules.

Rather than claiming a fully prospective systematic review, we adopt a PRISMA-ScR-inspired transparency protocol: the paper's Appendix (Review Methodology) reports the search sources, keyword families, screening stages, inclusion and exclusion criteria, and the logic by which papers were assigned to the final corpus. This improves methodological transparency while remaining consistent with the task-oriented narrative synthesis the survey provides.

Cite

BibTeX.

If this survey or its companion website has helped your research, please cite us:

@article{li2025beyond,
  title   = {Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models},
  author  = {Li, Yubo and Shen, Xiaobin and Yao, Xinyu and Ding, Xueying and
             Miao, Yidi and Krishnan, Ramayya and Padman, Rema},
  journal = {arXiv preprint arXiv:2504.04717},
  year    = {2025}
}

Quick links

Paper PDF (if placed alongside this site in docs/)
GitHub repository (source, paper LaTeX, supplemental data)
Benchmark explorer

Acknowledgements

We thank collaborators and colleagues at Carnegie Mellon University for feedback throughout the drafting of this survey. Bibliographic work, benchmark auditing, and cross-cutting synthesis were aided by careful reviewer critique on the early version. Any remaining errors or omissions are our own.

Broader impact & biases of this survey

Caveats.

This survey is itself subject to four biases, scoped to the review process (not system-level ethics, which we treat in Challenges).

Search / availability bias. Indexed English-language arXiv PDFs are over-represented; lower-resource regions and proprietary deployments are under-represented.
Deliberate scope bias. We exclude agentic-first and multimodal systems from the core taxonomy.
Taxonomic interpretation bias. Boundary papers could plausibly be reassigned between IF and CE, or between method and task families.
Domain-emphasis bias. Healthcare, education, role-play, and jailbreak receive deeper treatment than equally deployed but less benchmarked settings (customer service, legal, translation).

We frame this survey as a transparent task-oriented synthesis rather than an exhaustive canon, and hope it will serve as a starting point for future extensions into multimodal, embodied, and fully agentic multi-turn systems.