[TIL] LLM as reward models/evaluators (#RLHF, #Self-improvement)

2024. 8. 30. 05:00·NLP
반응형

다른 분야도 겅부해야지 .. 정신차리고 .. 

☑️ RewardBench (8 Jun 2024)

  • Evaluating Reward Models for Language Modeling
  • Reward model들을 평가하는 밴치마크이다. 
  • RLHF: 사람이 만든 preference data를 이용해 reward model을 training 하는 과정 

 

 

☑️ Self-Taught Evaluators (8 Aug 2024)

  • Reward modeling에는 human judgment annotation이 필요하지만 이건 너무 costly함
  • Human annotation 없이 self-improvement framework를 만듦 
    • Without any labeled preference data, our Self-Taught Evaluator can improve a strong LLM (Llama3-70BInstruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench
  • LLM-as-a-Judge (Judgment generator)는 아래와 같은 input을 받는다.  
    • an input (user instruction) x; 
    • two possible assistant responses y (A) and y (B) to the user instruction x; 
    • the evaluation prompt containing the rubric and asking to evaluate and choose the winning answer,

any human annotation 없이도 human annotated data와 비슷한 성능을 보인다.


☑️ META-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge (30 Jul 2024)

  • Background: Self-Rewarding Language Models
    • LLMs can improve by judging their own responses instead of relying on human labelers (AI feedback training) 
    • Key insights: 모델을 계속 반복해서 training 하면 instruction following은 보장되지만, judge의 improvement는 보장되지 못한다는 단점 존재 
  • LLM acts as an actor, judge, and meta-judge 
    • Self-improvement process, where the model judges its own judgements and uses that feedback to refine its judgment skills. 
    • Model이 자신의 judgement를 판단하고, 자신의 judgement skill을 수정하기 위해 feedback 사용 

  • Surprisingly, this unsupervised approach improves the model’s ability to judge and follow instructions, as demonstrated by a win rate improvement of Llama-3-8B-Instruct from 22.9% to 39.4%
반응형
저작자표시 (새창열림)

'NLP' 카테고리의 다른 글

[TIL] RAG (Retrieval-Augmented Generation) 훑어보기  (2) 2025.01.12
[TIL] In-context Learning with Long-context LLMs  (0) 2024.09.13
[NLP] LORA : Low-Rank Adaptation of Large Language Models 논문 리뷰  (0) 2023.04.04
[NLP] Transformer(트랜스포머, Attention is all you need)  (0) 2021.02.09
[NLP] Attention Mechanism (어텐션 메커니즘)  (0) 2021.02.09
'NLP' 카테고리의 다른 글
  • [TIL] RAG (Retrieval-Augmented Generation) 훑어보기
  • [TIL] In-context Learning with Long-context LLMs
  • [NLP] LORA : Low-Rank Adaptation of Large Language Models 논문 리뷰
  • [NLP] Transformer(트랜스포머, Attention is all you need)
당니이
당니이
씩씩하게 공부하기 📚💻
  • 당니이
    다은이의 컴퓨터 공부
    당니이
  • 전체
    오늘
    어제
    • 분류 전체보기 (136)
      • Achieved 👩🏻 (14)
        • 생각들 (2)
        • TIL (6)
        • Trial and Error (1)
        • Inspiration ✨ (0)
        • 미국 박사 준비 🎓 (1)
      • Computer Vision💖 (39)
        • Basic (9)
        • Video (5)
        • Continual Learning (7)
        • Generative model (2)
        • Domain (DA & DG) (5)
        • Multimodal (8)
        • Multitask Learning (1)
        • Segmentation (1)
        • Colorization (1)
      • RL 🤖 (1)
      • Autonomous Driving 🚙 (11)
        • Geometry (4)
        • LiDAR 3D Detection (1)
        • Trajectory prediction (2)
        • Lane Detection (1)
        • HDmap (3)
      • Linux (15)
      • PyTorch👩🏻‍💻 (10)
      • Linear Algebra (2)
      • Python (5)
      • NLP (10)
        • Article 📑 (1)
      • Algorithms 💻 (22)
        • Basic (8)
        • BAEKJOON (8)
        • Programmers (2)
      • ML (1)
        • 통계적 머신러닝(20-2) (1)
      • SQL (3)
      • 기초금융 💵 (1)
  • 블로그 메뉴

    • 홈
    • About me
  • 링크

    • 나의 소박한 github
    • Naver 블로그
  • 공지사항

  • 인기 글

  • 태그

    코딩테스트
    conda
    domain adaptation
    Linux
    알고리즘
    CL
    Incremental Learning
    CV
    til
    백준
    LLM
    pytorch
    NLP
    자료구조
    Python
    리눅스
    dfs
    백트래킹
    domain generalization
    continual learning
  • 최근 댓글

  • 최근 글

  • hELLO· Designed By정상우.v4.10.3
당니이
[TIL] LLM as reward models/evaluators (#RLHF, #Self-improvement)
상단으로

티스토리툴바