WildEval

non-profit

wild_eval

WildEval

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

DongfuJiang authored a paper 19 days ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

DongfuJiang authored a paper 19 days ago

ClawBench: Can AI Agents Complete Everyday Online Tasks?

DongfuJiang authored a paper 19 days ago

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

View all activity

DongfuJiang

authored 4 papers 19 days ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

Paper • 2604.05117 • Published Apr 6 • 36

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 263

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2604.12374 • Published Apr 14 • 37

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Paper • 2605.05242 • Published about 1 month ago • 119

ChengsongHuang

submitted a paper to Daily Papers 22 days ago

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Paper • 2605.09959 • Published 23 days ago • 17

ChengsongHuang

submitted a paper to Daily Papers 23 days ago

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Paper • 2605.08083 • Published 26 days ago • 69

ChengsongHuang

authored 2 papers 23 days ago

Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented Generation

Paper • 2602.03689 • Published Feb 3

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Paper • 2605.05566 • Published 27 days ago • 37

ChengsongHuang

submitted a paper to Daily Papers 26 days ago

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Paper • 2605.05566 • Published 27 days ago • 37

DongfuJiang

authored 3 papers 2 months ago

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

Paper • 2603.12698 • Published Mar 13 • 1

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published Mar 19 • 69

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Paper • 2603.20278 • Published Mar 17 • 98

ChengsongHuang

authored a paper 3 months ago

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

Paper • 2603.09206 • Published Mar 10 • 53

ChengsongHuang

authored 3 papers 4 months ago

submitted a paper to Daily Papers 4 months ago

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published Jan 30 • 35

ChengsongHuang

authored a paper 5 months ago

RelayLLM: Efficient Reasoning via Collaborative Decoding

Paper • 2601.05167 • Published Jan 8 • 31

ChengsongHuang

submitted a paper to Daily Papers 5 months ago

RelayLLM: Efficient Reasoning via Collaborative Decoding

Paper • 2601.05167 • Published Jan 8 • 31

ChengsongHuang

authored a paper 5 months ago

Benchmark^2: Systematic Evaluation of LLM Benchmarks

Paper • 2601.03986 • Published Jan 7 • 34

AI & ML interests

Recent Activity

Team members 9

WildEval's activity