Benchmarking Reasoning and Reward-Free Scaling
🔥 What's hot right now
SPARTA is a scalable framework for Table-Text multi-hop QA that exposes significant weaknesses in current models. It’s a critical tool for pushing cross-modal reasoning capabilities forward. Separately, Duel-Evolve is intriguing—it uses LLM self-preferences for reward-free test-time scaling, avoiding the need for external reward models or ground-truth labels.
🚀 Just shipped
MiSTER-E is a new Mixture-of-Experts framework for Emotion Recognition in Conversations. It leverages fine-tuned LLMs for speech and text embeddings, using a gating mechanism to fuse predictions from speech-only, text-only, and cross-modal experts to hit SOTA on IEMOCAP and MELD.
🛠 Useful for the array
CCA (Causal Computational Asymmetry) offers a distinct alternative to statistical methods by identifying causal direction through convergence time. It’s a significant theoretical advancement for improving the interpretability and robustness of neural network training dynamics.
💬 Community pulse
The "Scale is All You Need" narrative is getting challenged by the reporting bias paper. It argues that scaling data won't fix Vision-Language Models lacking reasoning because captions often omit tacit details like object counts—intentional data curation is the real fix.