News & Library
Reference library
The best external writing on AI testing, limitations and quality — curated, summarized, and rated. We link out to the source; the value-add is our summary and the findings each piece connects to.
2 references
- PaperHigh credibilityarXiv
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Measures how requiring strict output formats (JSON/XML/schema) affects answer quality, finding that tight format constraints can degrade reasoning performance versus free-form responses.
🐛 1 linked findingTool useEvals - PaperHigh credibilityarXiv
Gorilla: Large Language Model Connected with Massive APIs
Connects an LLM to large API collections and documents the tendency to hallucinate API calls and arguments when prompted directly; retrieval-aware training reduces but does not eliminate the fabrication.
🐛 1 linked findingHallucinationTool useAgents