new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jun 8

ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations

Visual-Interleaved Chain-of-Thought (VI-CoT) enables MLLMs to continually update their understanding and decisions based on step-wise intermediate visual states (IVS), much like a human would, which demonstrates impressive success in various tasks, thereby leading to emerged advancements in related benchmarks. Despite promising progress, current benchmarks provide models with relatively fixed IVS, rather than free-style IVS, whch might forcibly distort the original thinking trajectories, failing to evaluate their intrinsic reasoning capabilities. More importantly, existing benchmarks neglect to systematically explore the impact factors that IVS would impart to untamed reasoning performance. To tackle above gaps, we introduce a specialized benchmark termed ViC-Bench, consisting of four representive tasks: maze navigation, jigsaw puzzle, embodied long-horizon planning, and complex counting, where each task has dedicated free-style IVS generation pipeline supporting function calls. To systematically examine VI-CoT capability, we propose a thorough evaluation suite incorporating a progressive three-stage strategy with targeted new metrics. Besides, we establish Incremental Prompting Information Injection (IPII) strategy to ablatively explore the prompting factors for VI-CoT. We extensively conduct evaluations for 18 advanced MLLMs, revealing key insights into their VI-CoT capability. Our proposed benchmark is publicly open at Huggingface.

  • 9 authors
·
May 20, 2025

DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations

We present cosmological results from the measurement of baryon acoustic oscillations (BAO) in galaxy, quasar and Lyman-α forest tracers from the first year of observations from the Dark Energy Spectroscopic Instrument (DESI), to be released in the DESI Data Release 1. DESI BAO provide robust measurements of the transverse comoving distance and Hubble rate, or their combination, relative to the sound horizon, in seven redshift bins from over 6 million extragalactic objects in the redshift range 0.1<z<4.2. DESI BAO data alone are consistent with the standard flat ΛCDM cosmological model with a matter density Ω_m=0.295pm 0.015. Paired with a BBN prior and the robustly measured acoustic angular scale from the CMB, DESI requires H_0=(68.52pm0.62) km/s/Mpc. In conjunction with CMB anisotropies from Planck and CMB lensing data from Planck and ACT, we find Ω_m=0.307pm 0.005 and H_0=(67.97pm0.38) km/s/Mpc. Extending the baseline model with a constant dark energy equation of state parameter w, DESI BAO alone require w=-0.99^{+0.15}_{-0.13}. In models with a time-varying dark energy equation of state parametrized by w_0 and w_a, combinations of DESI with CMB or with SN~Ia individually prefer w_0>-1 and w_a<0. This preference is 2.6σ for the DESI+CMB combination, and persists or grows when SN~Ia are added in, giving results discrepant with the ΛCDM model at the 2.5σ, 3.5σ or 3.9σ levels for the addition of Pantheon+, Union3, or DES-SN5YR datasets respectively. For the flat ΛCDM model with the sum of neutrino mass sum m_ν free, combining the DESI and CMB data yields an upper limit sum m_ν< 0.072 (0.113) eV at 95% confidence for a sum m_ν>0 (sum m_ν>0.059) eV prior. These neutrino-mass constraints are substantially relaxed in models beyond ΛCDM. [Abridged.]

  • 203 authors
·
Nov 3, 2024