[VQA] Zero-shot VQA + Domain Adaptation VQA 분야 개괄 — 다은이의 컴퓨터 공부

오늘 우연히 난징..에서 오신 교수님 세미나를 듣게됐다. 가벼운 마음으로 갔는데 생각보다 내 관심분야랑 비슷해서 안들으려다가 슈루룩 들어버렸당. VQA는 원래도 좀 관심이 많았는데, ~~(예전에 연구하려다가 엎어진..)~~ 걍 아이디어 노트 개념으로 기록해놓는당 (가독성 떨어질 수 있음)

Current problem of LLM ..

Model Memorization > brittle
GPT prompt sensitivity > unstable performance (bc of memory-based generalization이어서 그럼)

Keys to unlock LLM capabilities ..

Chain-of-thought prompting
Think step by step
Instruction Tuning ..

Leveraging LLM for Multimodal purposes

VIsual GPT(2021) : LLM을 multi-model purpose로 finetuning 하려는 첫 시도
InstructBLIP(2023) : 많은 task들을 수행

Model finetuning & Model deployment

새로운 multi-model capability를 finetuning 없이 얻는 법?

Visual Question Answering
[Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training, EMNLP'22]
Pretrained LLM과 Pretrained Vision model을 연결짓는 방법인 듯?

Plug-and-Play VQA (No training)
BLIP (Image-Question Matching Module, Captioning module 이용)

2. Model deployment - Prompt Tuning

Prompt tuning은 large training example을 요구 > data efficiency를 높이는게 중요
[Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation]
Source domain에 overfit 되기 쉬우므로 regularization을 하려고 함
Smooth decision boundary > Model은 non-smooth boundary에서 오분류할 확률이 높으므로

Source를 target에 가깝게 만드는 Perturbation을 학습 > Discriminator가 구분하지 못할 정도로
Optimer prompt를 배우기 위한 방법인거임

References

저작자표시 (새창열림)

'Computer Vision💖 > Multimodal' 카테고리의 다른 글

[Daily] MM-EUREKA: Exploring Visual Aha Moment with Rule-Based Large-Scale Reinforcement Learning (0)	2025.03.12
[Multimodal] 멀티모달 러닝 (Multimodal Learning)에 대한 아주 기초적인 이해 (1)	2024.01.18
[XAI] Generating Visual Explanations(2016) - 이미지 분류에 대한 설명을 생성하는 알고리즘 (0)	2021.08.15
[XAI] OpenAI CLIP 논문 리뷰[3] - Domain Generalization (2)	2021.07.19
[XAI] OpenAI CLIP 논문 리뷰[2] - Zero shot & Representation learning (0)	2021.07.17

티스토리툴바