BigEarthNet.txt: A Large-Scale Multi-Sensor Image-Text Dataset and Benchmark for Earth Observation Paper • 2603.29630 • Published 5 days ago • 1
AgentVLN: Towards Agentic Vision-and-Language Navigation Paper • 2603.17670 • Published 18 days ago • 1
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Paper • 2308.11462 • Published Aug 20, 2023 • 5
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23, 2025 • 24
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model Paper • 2505.23579 • Published May 29, 2025 • 2
Photon Collection Speedup Volume Understanding with Efficient Multimodal Large Language Models • 3 items • Updated 1 day ago • 1
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation Paper • 2603.28068 • Published 5 days ago • 9
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models Paper • 2603.25155 • Published 10 days ago • 1
HAI-DEF Concept Apps Collection Collection of concept apps built around HAI-DEF open models/libraries to inspire the community. Learn more at http://goo.gle/hai-def` • 7 items • Updated 24 days ago • 51
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 40
VideoPrism Collection VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. • 5 items • Updated 24 days ago • 19
Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation Paper • 1803.01199 • Published Mar 3, 2018 • 1