'RL 🤖' 카테고리의 글 목록

TLDR;First unified (generation+ understanding) reward model을 제안 (현재까지는 specific task에 대한 reward model만 존재)Motivation현재까지 reward model들은 specific task에만 한정되어 있었음하지만 task들은 서로 연결되어 있고, 상호작용할 때 효과가 강해진다고 믿음. (e.g., image evaluation이 video evaluation에 도움)Method먼저 (1) Large-scale human preference dataset을 만들고(2) Preference pair dataset을 위한 reward model을 학습specific baseline (VLM, Diffusion model)에서 mu..