attention 5

Do Transformers Need Three Projections? — QKV 투영을 공유해 KV 캐시를 절반으로 Jun 11, 2026
Prompt Repetition Improves Non-Reasoning LLMs Mar 8, 2026
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Mar 8, 2026
Stanford CME295: Lecture 1 - Transformer 기초 Mar 8, 2026
Stanford CME295: Lecture 0 - Transformer 개요 Mar 8, 2026