This video tutorial provides an intuitive, in-depth breakdown of how an LLM learns language and uses that learning to generate text. Key concepts shown below are covered in a way that is both broad and deep, while still keeping the material accessible without losing technical rigor:
* Historical context for LLMs and GenAI
* Training an LLM -- 100K overview
* What does an LLM learn during training?
* Inferencing an LLM -- 100K overview
* 3 steps in the LLM journey from pre-training to serving
* Word Embeddings -- representing text in numeric format
* RMS Normalization -- the sound engineer of the Transformer
* Benefits of RMS Normalization over Layer Normalization
* Rotary Position Encoding (RoPE) -- making the Transformer aware of token position
* Masked Self-Attention -- making the Transformer understand context
* How RoPE generalizes well making long-context LLMs possible
* Understanding what Causal Masking is (intuition and benefit)
* Multi-Head Attention -- improving stability of Self Attention
* Residual Connections -- improving stability of learning
* Feed Forward Network
* SwiGLU Activation Function
* Stacking
* Projection Layer -- Next Token Prediction
* Inferencing a Large Language Model
* Step by Step next token generation to form sentences
* Perplexity Score -- how well did the model does
* Next Token Selector -- Greedy Sampling
* Next Token Selector -- Top-k Sampling
* Next Token Selector -- Top-p/Nucleus Sampling
* Temperature -- making an LLM's generation more creative
* Instruction finetuning -- aligning an LLM's response
This video tutorial provides an intuitive, in-depth breakdown of how an LLM learns language and uses that learning to generate text. Key concepts shown below are covered in a way that is both broad and deep, while still keeping the material accessible without losing technical rigor:
* Historical context for LLMs and GenAI
* Training an LLM -- 100K overview
* What does an LLM learn during training?
* Inferencing an LLM -- 100K overview
* 3 steps in the LLM journey from pre-training to serving
* Word Embeddings -- representing text in numeric format
* RMS Normalization -- the sound engineer of the Transformer
* Benefits of RMS Normalization over Layer Normalization
* Rotary Position Encoding (RoPE) -- making the Transformer aware of token position
* Masked Self-Attention -- making the Transformer understand context
* How RoPE generalizes well making long-context LLMs possible
* Understanding what Causal Masking is (intuition and benefit)
* Multi-Head Attention -- improving stability of Self Attention
* Residual Connections -- improving stability of learning
* Feed Forward Network
* SwiGLU Activation Function
* Stacking
* Projection Layer -- Next Token Prediction
* Inferencing a Large Language Model
* Step by Step next token generation to form sentences
* Perplexity Score -- how well did the model does
* Next Token Selector -- Greedy Sampling
* Next Token Selector -- Top-k Sampling
* Next Token Selector -- Top-p/Nucleus Sampling
* Temperature -- making an LLM's generation more creative
* Instruction finetuning -- aligning an LLM's response
* Learning going forward