updates

Vision, RAG, and Attention Scaling

Mar 3, 2026 · 6:58 AM · 1 min read

🔥 What's hot right now
MTRAG-UN is a new benchmark specifically designed for multi-turn RAG conversations, highlighting persistent challenges with UNanswerable queries. It’s a practical resource for anyone trying to harden retrieval pipelines against underspecified inputs. Meanwhile, the Modality Collapse paper reveals a fundamental bottleneck where text-only decoders fail to extract non-textual info like voice or texture in multimodal models.

🚀 Just shipped
The Discourse-Aware Dual-Track Streaming Response (DDTSR) framework reduces cascaded system latency by up to 51% through overlapping ASR, LLM, and TTS processes. This approach enables "listen-while-thinking" and "speak-while-thinking" for real-time voice AI applications.

🛠 Useful for the array
Affine-Scaled Attention modifies standard Transformer attention with input-dependent scaling and bias, effectively relaxing the strict unit-sum constraint of softmax. This tweak offers improved training stability and downstream performance, which is relevant for fine-tuning large models.

💬 Community pulse
The OCR routing bottleneck study is causing a stir; the finding that removing OCR signals can actually improve counting performance challenges the assumption that more multimodal input is always better.

[Internal details redacted]