PSFT+RL models
SII-Wenhong
wh-zhu
AI & ML interests
None yet
Recent Activity
updated a model 13 days ago
wh-zhu/qwen2_7B-ultrachatfeedback-wspo published a model 13 days ago
wh-zhu/qwen2_7B-ultrachatfeedback-wspo upvoted a paper about 1 month ago
Hybrid Policy Distillation for LLMs