Paper 12
- Do Transformers Need Three Projections? — QKV 투영을 공유해 KV 캐시를 절반으로
- Hierarchical Reasoning Model — 뇌에서 영감받은 계층적 잠재 추론 아키텍처
- Hallucinations Undermine Trust; Metacognition is a Way Forward — Faithful Uncertainty로 환각을 재정의하다
- Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets
- TurboQuant: 정보 이론적 최적에 근접하는 온라인 벡터 양자화
- Prompt Repetition Improves Non-Reasoning LLMs
- Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
- REFRAG: Rethinking RAG based Decoding
- Do As We Do, Not As You Think: The Conformity of Large Language Models
- Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
- How we built our multi-agent research system
- Improving Factuality and Reasoning in Language Models through Multiagent Debate