[Incremental Learning] Architecture-based 방법론을 짚어보자

1. SupSup

Supermasks in Superposition (NeurIPS'20)

Training시에는 task별로 separate supermask (=subnetwork)를 학습하고, Inference 시에는 모든 task의 supermask 들을 위에 겹쳐놓고 gradients to maximize confidence를 이용해 task identity를 추론한다. (task별로 가장 좋은 성능을 낼 수 있는 supermask를 선택하기 위함)
- Supermask : 원래 pruning에서 나온 방법으로, 무작위로 초기화된 네트워크를 학습시켜 최종적으로 원하는 결과에 달성할 수 있는 subnetwork를 찾아내는 방법이다.
Train/Test 시에 task ID 제공 여부에 따라 CL 흐름이 나뉘는데, 만약에 task ID가 제공되지 않으면 이를 optimizatino problem으로써 inference한다.

Transformers for Continual Learning with DYnamic TOken eXpansion (CVPR'22)

기존의 dynamic architectures들은 test-time에서 task identifier를 요구한다. 왜냐하면 특정 task에 맞는 parameter를 골라야하기 때문이다. 하지만 이는 realworld 상황과 맞지 않다. test-time에서 task 정보를 요구하지 않는 알고리즘들도 존재하지만 (e.g. DER) 이들은 memory overhead가 발생하고, post-processing pruning 과정이 요구된다.
또한, 기존 알고리즘들은 hyper parameter에 민감하다.
- 따라서 memory & time overhead를 막고, hyperparameter 세팅을 하지 않는 알고리즘을 transformer에 기반해 고안한다. DyTox는 그리고 test-time에서 task를 몰라도 정상 작동한다.

기본적으로 CIL (Class Incremental Learning) 시나리오를 기반으로 하며, ViT 아키텍처를 기반으로 한다. ViT는 일단 크게 세가지 부분으로 구성된다.
- Patch tokenizer : 이미지를 N개의 패치로 나누고, 이들은 linear layer에 projected 된다. 그리고 학습된 positional embedding도 element-wise로 추가된다.
- Self-Attention (SA) based encoder : 위에서 생성된 token들은 Self-Attention block에 fed된다.
- Classifier : 위 과정으로 생성된 "class token"은 linear classifier에 fed된다.
위 Self-Attention과정을 task단위로 변형해 Task-Attention Block(TAB)를 만든다.
- 위 "class token"을 사용하지 않고, "task token($\theta_i$)"을 사용한다. 그리고 이러한 task token($\theta_i$)과 patch token($x_L$)을 concat하고($z_i$), 여기에 Task-Attention(TA)을 다음과 같이 적용한다. Query와 Key의 유사도를 scaled dot attention으로 구하고, 이를 통해 attention value를 구해 weighted sum을 한다. 특히 Query에는 task token만이 포함됨을 유의하자.

그리고 위 과정을 통해 task별 final embedding이 출력되고, 이 final embedding $e_i$는 task-specific classifier에 삽입된다. 여기서 task-specific classifier는 $Norm_i$과 linear projection parametrized로 구성된다.

[Incremental Learning] Hybrid-based 방법론을 훑어보자(RPS-Net ,FRCL) (0)	2023.03.08
[Incremental Learning] Rehearsal-based 방법론을 훑어보자(ER-MIR, OCS) (0)	2023.03.08
[Incremental Learning] Continual learning 갈래 짚어보기 (0)	2023.03.06
[Incremental Learning] Scalable and Order-robust Continual learning with Additive Parameter Decomposition 논문 리뷰 (0)	2023.03.06
[Incremental Learning] Lifelong Learning with Dynamically Expandable Networks(DEN) 논문 리뷰 (0)	2023.03.06