0516 :: shine

사과나무심기 2024. 5. 16. 23:15

Prompt를 통해 LM을 Inference하는 방식이
헷갈렸다.

LLama나 GPT는 decoder-only모델인데,
그렇다면 prompt 도 토큰하나하나 다 생성하도록
하는걸까?
그럼 너무 비효율적이고, 말이 안되지않나?
이런 생각을 했다.

비슷한 부분을 궁금해 하는 사람들이
꽤 있었다.
https://www.reddit.com/r/LanguageTechnology/comments/16nl811/decoderonly_transformer_models_still_have_an/?rdt=64739

From the LanguageTechnology community on Reddit

Explore this post and more from the LanguageTechnology community

www.reddit.com

알아본 결과
역시
prompt 도 토큰하나하나 생성하는게아니라
prompt를 한번에 인풋으로 넣고
토큰을 생성하는 것이다.

Llama도 동일하게 적용된다.
Input이 token 이아니라 prompt이다.
(token이 생성된 이후에는 prompt + 생성된 토큰)

**한김에 config정리
GQA에 의해서 구분되어진 컨피그
num_attention_heads (쿼리수)
num_key_value_heads (K,V수)
num_hidden_layers (transformers layer수)

shine_hyun shine_hyun