RLhf
![[TIL] LLM as reward models/evaluators (#RLHF, #Self-improvement)](https://img1.daumcdn.net/thumb/R750x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FeGV2Rp%2FbtsJkQzATZj%2F0SnNyhWbLpwATjU7GKAe30%2Fimg.png)
[TIL] LLM as reward models/evaluators (#RLHF, #Self-improvement)
다른 분야도 겅부해야지 .. 정신차리고 .. ☑️ RewardBench (8 Jun 2024)Evaluating Reward Models for Language ModelingReward model들을 평가하는 밴치마크이다. RLHF: 사람이 만든 preference data를 이용해 reward model을 training 하는 과정 ☑️ Self-Taught Evaluators (8 Aug 2024)Reward modeling에는 human judgment annotation이 필요하지만 이건 너무 costly함Human annotation 없이 self-improvement framework를 만듦 Without any labeled preference data, our Self-Taught E..