[TIL] In-context Learning with Long-context LLMs

NLP

[TIL] In-context Learning with Long-context LLMs

당니이 2024. 9. 13. 05:10

☑️ Backgrounds

Long-context LLMs

GPT4o는 128k token을 take 할 수 있고, Gemini1.5의 경우 2M의 token을 take할 수 있다고 한다.
LLama 3 405B를 pre-training 할 때는, context length를 6개의 stage로 gradually하게 늘려서 training함 (8k context window에서 시작해 -> 128k window로 마감)

In-context Learning

[예시 1] [예시2] ... [Your real questions]
여기서 예시를 넣을 때는 Question + Solution

☑️ Many-Shot In-Context Learning (May 2024)

Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples
- Context window를 늘리는게 many-shot example 사용을 가능하게 한다.
- 그리고 이런 many-shot은 few-shot 보다 강력한 효과를 보인다. (Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks.)

하지만 Many-shot ICL can be bottlenecked by the available amount of human-generated outputs.
- 즉, many-shot은 Human-Written Rationale을 요구한다.
- --> Many-shot Learning without Human-Written Rationale이 가능할까? 에 대해 탐구한 페이퍼

Methods

앞에서 ICL 예시는 question + answer로 구성된다고 언급함. 그렇다면
Reinforced ICL: human-written solution을 model generated solution으로 대체할 수 있을까?
Unsupervised ICL: model은 answer가 없는 only "question" 만으로 improve 할 수 있을까?

UICL 예시 -> 질문만 계속 던진다.

Results

☑️ In-Context Learning with Long-Context Models: An In-Depth Exploration (Apr 2024)

Tested LLama2, classification task. 위 paper와 유사함.
- Motivation: in-context learning over large quantities of data becomes a potential alternative to finetuning. (Many-context learning은 finetuning의 대체제로서 potential을 지닌다)
- 하지만 many-shot ICL에는 efficiency & performance의 tradeoff가 존재한다. (The efficiency and performance tradeoff between many-shot ICL and finetuning on the same data is complex)

위 figure처럼 context example이 증가하면 성능도 폭발적으로 증가한다.

이 paper에서는 ICL demonstrantion(example들의 case) 숫자가 증가할수록, ICL의 behavior가 달라진다고 밝힘

In-context learning becomes less sensitive to example order: 순서에 덜 sensitive 해진다.
Retrieval에 대한 이득이 감소한다. (Long-context ICL은 careful retrieval의 중요성을 낮게한다.)
- 여기서 Retrieval ICL이란 -> test set example과 관련 있는 subset example을 retrieval 하는 것 (=A strong alternative for in-context learning is to retrieve a relevant subset of examples as demonstrations for each test set example.)

ICL example이 늘어날수록, Random/Retrieval ICL의 performance gap이 아래처럼 줄어든다.
- Finetuning 성능은 특히 (a) plot에서는 아무리 많은 데이터셋이 있어도 long-context ICL을 능가하지 못함.

저작자표시 (새창열림)