Resources & references

Curated, low-noise links. Skip the AI-influencer YouTube feeds; these are the materials that actually move the needle.

Official

Math

Classical machine learning

Deep learning

Transformers & modern AI

Practice grounds

Tools you'll keep using

Vocabulary cheat sheet

TermWhat it means
EpochOne full pass through the training data
Batch / mini-batchSubset of training data used in one gradient step
Learning rate (LR)Step size for the optimizer update
LogitsRaw, unnormalized model outputs before softmax / sigmoid
Cross-entropyLoss for classification: −Σ y log(p)
BackpropagationChain-rule algorithm for computing gradients through a network
SGD / Adam / AdamWOptimizers (update rules for parameters from gradients)
OverfittingTraining loss low, validation loss high — model memorized noise
RegularizationTechniques to reduce overfitting (dropout, weight decay, etc.)
EmbeddingDense vector representation of a discrete token / category
AttentionMechanism that weighs input positions when producing each output
Self-attentionAttention where Q, K, V all come from the same sequence
Pre-train / fine-tuneTrain on big general data, then adapt on small task-specific data
LoRA / PEFTParameter-efficient fine-tuning — train a small adapter, freeze the base
InferenceRunning a trained model on new inputs (no gradient updates)
TokenDiscrete unit fed to a language model (subword, character, or word)