당니이
다은이의 컴퓨터 공부
당니이
전체 방문자
오늘
어제
  • 분류 전체보기 (140)
    • Achieved 👩🏻 (14)
      • 생각들 (2)
      • TIL (6)
      • Trial and Error (1)
      • Inspiration ✨ (0)
      • 미국 박사 준비 🎓 (1)
    • Computer Vision💖 (39)
      • Basic (9)
      • Video (5)
      • Continual Learning (7)
      • Generative model (2)
      • Domain (DA & DG) (5)
      • Multimodal (8)
      • Multitask Learning (1)
      • Segmentation (1)
      • Colorization (1)
    • RL 🤖 (4)
    • Autonomous Driving 🚙 (11)
      • Geometry (4)
      • LiDAR 3D Detection (1)
      • Trajectory prediction (2)
      • Lane Detection (1)
      • HDmap (3)
    • Linux (15)
    • PyTorch👩🏻‍💻 (10)
    • Linear Algebra (2)
    • Python (5)
    • NLP (11)
      • Article 📑 (1)
    • Algorithms 💻 (22)
      • Basic (8)
      • BAEKJOON (8)
      • Programmers (2)
    • ML (1)
      • 통계적 머신러닝(20-2) (1)
    • SQL (3)
    • 기초금융 💵 (1)

블로그 메뉴

  • 홈
  • About me

공지사항

인기 글

태그

  • dfs
  • Incremental Learning
  • NLP
  • til
  • LLM
  • CL
  • 백트래킹
  • 리눅스
  • 알고리즘
  • domain adaptation
  • Linux
  • conda
  • 코딩테스트
  • 백준
  • domain generalization
  • CV
  • continual learning
  • pytorch
  • Python
  • 자료구조

최근 댓글

최근 글

티스토리

hELLO · Designed By 정상우.
당니이

다은이의 컴퓨터 공부

[TIL] LLM as reward models/evaluators (#RLHF, #Self-improvement)
NLP

[TIL] LLM as reward models/evaluators (#RLHF, #Self-improvement)

2024. 8. 30. 05:00
반응형

다른 분야도 겅부해야지 .. 정신차리고 .. 

☑️ RewardBench (8 Jun 2024)

  • Evaluating Reward Models for Language Modeling
  • Reward model들을 평가하는 밴치마크이다. 
  • RLHF: 사람이 만든 preference data를 이용해 reward model을 training 하는 과정 

 

 

☑️ Self-Taught Evaluators (8 Aug 2024)

  • Reward modeling에는 human judgment annotation이 필요하지만 이건 너무 costly함
  • Human annotation 없이 self-improvement framework를 만듦 
    • Without any labeled preference data, our Self-Taught Evaluator can improve a strong LLM (Llama3-70BInstruct) from 75.4 to 88.3 (88.7 with majority vote) on RewardBench
  • LLM-as-a-Judge (Judgment generator)는 아래와 같은 input을 받는다.  
    • an input (user instruction) x; 
    • two possible assistant responses y (A) and y (B) to the user instruction x; 
    • the evaluation prompt containing the rubric and asking to evaluate and choose the winning answer,

any human annotation 없이도 human annotated data와 비슷한 성능을 보인다.


☑️ META-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge (30 Jul 2024)

  • Background: Self-Rewarding Language Models
    • LLMs can improve by judging their own responses instead of relying on human labelers (AI feedback training) 
    • Key insights: 모델을 계속 반복해서 training 하면 instruction following은 보장되지만, judge의 improvement는 보장되지 못한다는 단점 존재 
  • LLM acts as an actor, judge, and meta-judge 
    • Self-improvement process, where the model judges its own judgements and uses that feedback to refine its judgment skills. 
    • Model이 자신의 judgement를 판단하고, 자신의 judgement skill을 수정하기 위해 feedback 사용 

  • Surprisingly, this unsupervised approach improves the model’s ability to judge and follow instructions, as demonstrated by a win rate improvement of Llama-3-8B-Instruct from 22.9% to 39.4%
반응형
저작자표시 (새창열림)

'NLP' 카테고리의 다른 글

[TIL] RAG (Retrieval-Augmented Generation) 훑어보기  (2) 2025.01.12
[TIL] In-context Learning with Long-context LLMs  (0) 2024.09.13
[NLP] LORA : Low-Rank Adaptation of Large Language Models 논문 리뷰  (0) 2023.04.04
[NLP] Transformer(트랜스포머, Attention is all you need)  (0) 2021.02.09
[NLP] Attention Mechanism (어텐션 메커니즘)  (0) 2021.02.09
    'NLP' 카테고리의 다른 글
    • [TIL] RAG (Retrieval-Augmented Generation) 훑어보기
    • [TIL] In-context Learning with Long-context LLMs
    • [NLP] LORA : Low-Rank Adaptation of Large Language Models 논문 리뷰
    • [NLP] Transformer(트랜스포머, Attention is all you need)
    당니이
    당니이
    씩씩하게 공부하기 📚💻

    티스토리툴바