[Daily] Self-Correct Reasoning / Verification of LLM

NLP

[Daily] Self-Correct Reasoning / Verification of LLM

당니이 2025. 3. 21. 13:56

오늘은 interview 준비를 하면서 평소 관심있었던 self-correction 논문을 읽었다!

1. Small Language Models Need Strong Verifiers to Self-Correct Reasoning

TLDR;

Small LLM 으로 self-correction data를 모으고 self-refienment ability를 가질 수 있도록 fine-tune
Self-refining model을 만드는게 목표

Motivation

Self-correction: Self-verify + Self-refine
- Self-verify: LLM이 initial solution을 judge
- Self-refine: Incorrect 하면 solution을 revise
Self-refine: Critique + Correction
- Critiques: error의 위치를 pinpoint 하고, explain하고, 어떻게 고칠지 guidance를 줌 (feedback)

Method

Rejection sampling finetuning (다양한 solution을 sampling해 fine-tuning에 이용) 방법을 이용

Step1. Generate + filter critiques
- Correct solution을 힌트 삼아 critique을 생성함 (few-shot critique prompt 존재)
- Step by step으로 feedback을 제공하도록 함 -> 이런 데이터셋을 모음
- Format에 맞도록 critique을 필터링함
Step2. Supervised fine-tuning of the refiner
- 위에서 모은 데이터셋으로 cross-entropy loss를 이용해 fine-tune
참고로 self-verifier는 역시 small LLM을 이용하였고, correct 확률을 뱉게해서 특정 확률 이하면 refinement를 적용

Results

Refine 후에 성능이 크게 향상됨 (실선)

2. A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

TLDR;

Fine-grained step-level dataset을 만듦 (Reasoning Verification Evaluation Benchmark)
Reasoning의 각 step의 error를 디텍팅할 수 있는 데이터셋

Motivation

Automatic하게 reasoning chain을 step-by step으로 평가할 수 있으면 좋을 것임.

Method

각 step의 correctness 판단 기준
- Step relevance
- Step type
- Step attribution to external source (about factual error)
- Step logical correctness (about logical error)

각 step은 먼저 final answer와의 relevance로 labeled 된 다음, attribution step (factual knowledge 관련) /logical step (previous step과 logical한 inference 관계를 이루는가)으로 나눔.
For attribution step -> Wikipedia paragraph 리트리벌을 통해 correctness 판단
For logical step -> label for logical correctness

Results

이런식으로 LLM들의 reasoning 성능 판단이 가능하다.

저작자표시 (새창열림)