neurips 1 Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Mar 8, 2026