← Reference library
PaperHigh credibilityAAAI 2024 · Li et al. · January 1, 2024
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation Using the StepGame Benchmark
Our summary
Evaluates LLMs on the StepGame spatial-reasoning benchmark, finding they map language to spatial relations reasonably but degrade on multi-hop spatial inference; proposes prompting and neuro-symbolic enhancements.
Why it matters
Pins spatial and geometric reasoning failures to multi-hop composition over relations — a concrete, reproducible weak spot.
Related findings (1)
Published June 26, 2026
Cite this
Qlarify Labs. (2026). Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation Using the StepGame Benchmark. Retrieved from https://labs.qlarify.fi/references/spatial-reasoning-stepgame-2024