다른 분야도 겅부해야지 .. 정신차리고 ..
☑️ RewardBench (8 Jun 2024)
- Evaluating Reward Models for Language Modeling
- Reward model들을 평가하는 밴치마크이다.
- RLHF: 사람이 만든 preference data를 이용해 reward model을 training 하는 과정
☑️ Self-Taught Evaluators (8 Aug 2024)
- Reward modeling에는 human judgment annotation이 필요하지만 이건 너무 costly함
- Human annotation 없이 self-improvement framework를 만듦
- Without any labeled preference data, our Self-Taught Evaluator can improve a strong LLM (Llama3-70BInstruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench
- LLM-as-a-Judge (Judgment generator)는 아래와 같은 input을 받는다.
- an input (user instruction) x;
- two possible assistant responses y (A) and y (B) to the user instruction x;
- the evaluation prompt containing the rubric and asking to evaluate and choose the winning answer,
☑️ META-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge (30 Jul 2024)
- Background: Self-Rewarding Language Models
- LLMs can improve by judging their own responses instead of relying on human labelers (AI feedback training)
- Key insights: 모델을 계속 반복해서 training 하면 instruction following은 보장되지만, judge의 improvement는 보장되지 못한다는 단점 존재
- LLM acts as an actor, judge, and meta-judge
- Self-improvement process, where the model judges its own judgements and uses that feedback to refine its judgment skills.
- Model이 자신의 judgement를 판단하고, 자신의 judgement skill을 수정하기 위해 feedback 사용
- Surprisingly, this unsupervised approach improves the model’s ability to judge and follow instructions, as demonstrated by a win rate improvement of Llama-3-8B-Instruct from 22.9% to 39.4%
반응형
'NLP' 카테고리의 다른 글
[TIL] RAG (Retrieval-Augmented Generation) 훑어보기 (2) | 2025.01.12 |
---|---|
[TIL] In-context Learning with Long-context LLMs (0) | 2024.09.13 |
[NLP] LORA : Low-Rank Adaptation of Large Language Models 논문 리뷰 (0) | 2023.04.04 |
[NLP] Transformer(트랜스포머, Attention is all you need) (0) | 2021.02.09 |
[NLP] Attention Mechanism (어텐션 메커니즘) (0) | 2021.02.09 |