LLM Training

์ด๊ฒƒ์€ ๋งค์šฐ ์ถ”์ฒœํ•˜๋Š” ์ฑ… https://www.manning.com/books/build-a-large-language-model-from-scratch ์—์„œ์˜ ๋‚ด ๋…ธํŠธ์™€ ์ถ”๊ฐ€ ์ •๋ณด์ž…๋‹ˆ๋‹ค.

Basic Information

์ด ํฌ์ŠคํŠธ๋ฅผ ์ฝ๋Š” ๊ฒƒ์œผ๋กœ ์‹œ์ž‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์•Œ์•„์•ผ ํ•  ๊ธฐ๋ณธ ๊ฐœ๋…์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค:

0. Basic LLM Concepts

1. Tokenization

์ด ์ดˆ๊ธฐ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: ์ž…๋ ฅ์„ ์˜๋ฏธ ์žˆ๋Š” ๋ฐฉ์‹์œผ๋กœ ํ† ํฐ(์•„์ด๋””)์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

1. Tokenizing

2. Data Sampling

์ด ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  ํ›ˆ๋ จ ๋‹จ๊ณ„์— ๋งž๊ฒŒ ์ค€๋น„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ํŠน์ • ๊ธธ์ด์˜ ๋ฌธ์žฅ์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ์˜ˆ์ƒ ์‘๋‹ต๋„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

https://github.com/HackTricks-wiki/hacktricks/blob/kr/todo/llm-training-data-preparation/2.-data-sampling.md

3. Token Embeddings

์ด ์„ธ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: ์–ดํœ˜์˜ ๊ฐ ์ด์ „ ํ† ํฐ์— ์›ํ•˜๋Š” ์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์–ดํœ˜์˜ ๊ฐ ๋‹จ์–ด๋Š” X ์ฐจ์›์˜ ๊ณต๊ฐ„์—์„œ ํ•œ ์ ์ด ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ์–ด์˜ ์ดˆ๊ธฐ ์œ„์น˜๋Š” "๋ฌด์ž‘์œ„๋กœ" ์ดˆ๊ธฐํ™”๋˜๋ฉฐ, ์ด๋Ÿฌํ•œ ์œ„์น˜๋Š” ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค(ํ›ˆ๋ จ ์ค‘ ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค).

๋˜ํ•œ, ํ† ํฐ ์ž„๋ฒ ๋”ฉ ๋™์•ˆ ๋‹ค๋ฅธ ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” (์ด ๊ฒฝ์šฐ) ํ›ˆ๋ จ ๋ฌธ์žฅ์—์„œ ๋‹จ์–ด์˜ ์ ˆ๋Œ€ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ฌธ์žฅ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์œ„์น˜์— ์žˆ๋Š” ๋‹จ์–ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ํ‘œํ˜„(์˜๋ฏธ)์„ ๊ฐ–๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

3. Token Embeddings

4. Attention Mechanisms

์ด ๋„ค ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: ์ผ๋ถ€ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์–ดํœ˜์˜ ๋‹จ์–ด์™€ ํ˜„์žฌ ๋ฌธ์žฅ์—์„œ์˜ ์ด์›ƒ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํฌ์ฐฉํ•˜๋Š” ๋งŽ์€ ๋ฐ˜๋ณต ๋ ˆ์ด์–ด๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋งŽ์€ ๋ ˆ์ด์–ด๊ฐ€ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋งŽ์€ ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์ด ์ •๋ณด๋ฅผ ํฌ์ฐฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

4. Attention Mechanisms

5. LLM Architecture

์ด ๋‹ค์„ฏ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: ์ „์ฒด LLM์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ๊ฒƒ์„ ํ†ตํ•ฉํ•˜๊ณ , ๋ชจ๋“  ๋ ˆ์ด์–ด๋ฅผ ์ ์šฉํ•˜๋ฉฐ, ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ํ…์ŠคํŠธ๋ฅผ ID๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ๊ทธ ๋ฐ˜๋Œ€๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ด ์•„ํ‚คํ…์ฒ˜๋Š” ํ›ˆ๋ จ ํ›„ ํ…์ŠคํŠธ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ์—๋„ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

5. LLM Architecture

6. Pre-training & Loading models

์ด ์—ฌ์„ฏ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: ๋ชจ๋ธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ด์ „ LLM ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์˜๋œ ์†์‹ค ํ•จ์ˆ˜์™€ ์ตœ์ ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฃจํ”„๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

https://github.com/HackTricks-wiki/hacktricks/blob/kr/todo/llm-training-data-preparation/6.-pre-training-and-loading-models.md

7.0. LoRA Improvements in fine-tuning

LoRA์˜ ์‚ฌ์šฉ์€ ์ด๋ฏธ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ณ„์‚ฐ์„ ๋งŽ์ด ์ค„์ž…๋‹ˆ๋‹ค.

7.0. LoRA Improvements in fine-tuning

7.1. Fine-Tuning for Classification

์ด ์„น์…˜์˜ ๋ชฉํ‘œ๋Š” ์ด๋ฏธ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ƒˆ๋กœ์šด ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Œ€์‹  LLM์€ ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ๊ฐ€ ๊ฐ ์ฃผ์–ด์ง„ ์นดํ…Œ๊ณ ๋ฆฌ์— ๋ถ„๋ฅ˜๋  ํ™•๋ฅ ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: ํ…์ŠคํŠธ๊ฐ€ ์ŠคํŒธ์ธ์ง€ ์•„๋‹Œ์ง€).

https://github.com/HackTricks-wiki/hacktricks/blob/kr/todo/llm-training-data-preparation/7.1.-fine-tuning-for-classification.md

7.2. Fine-Tuning to follow instructions

์ด ์„น์…˜์˜ ๋ชฉํ‘œ๋Š” ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ง€์นจ์„ ๋”ฐ๋ฅด๋„๋ก ์ด๋ฏธ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ฑ—๋ด‡์œผ๋กœ์„œ ์ž‘์—…์— ์‘๋‹ตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

7.2. Fine-Tuning to follow instructions

Last updated