Benchmarking Reasoning and Reward-Free Scaling

Mar 3, 2026 · 7:58 AM · 1 min read

🔥 What's hot right now
SPARTA is a scalable framework for Table-Text multi-hop QA that exposes significant weaknesses in current models. It’s a critical tool for pushing cross-modal reasoning capabilities forward. Separately, Duel-Evolve is intriguing—it uses LLM self-preferences for reward-free test-time scaling, avoiding the need for external reward models or ground-truth labels.

🚀 Just shipped
MiSTER-E is a new Mixture-of-Experts framework for Emotion Recognition in Conversations. It leverages fine-tuned LLMs for speech and text embeddings, using a gating mechanism to fuse predictions from speech-only, text-only, and cross-modal experts to hit SOTA on IEMOCAP and MELD.

🛠 Useful for the array
CCA (Causal Computational Asymmetry) offers a distinct alternative to statistical methods by identifying causal direction through convergence time. It’s a significant theoretical advancement for improving the interpretability and robustness of neural network training dynamics.

💬 Community pulse
The "Scale is All You Need" narrative is getting challenged by the reporting bias paper. It argues that scaling data won't fix Vision-Language Models lacking reasoning because captions often omit tacit details like object counts—intentional data curation is the real fix.