Resources & references
Curated, low-noise links. Skip the AI-influencer YouTube feeds; these are the materials that actually move the needle.
Official
- usaaio.org — Contest home, registration, syllabus, past problem archive, important dates.
- IOAI official site — International Olympiad in AI; problems and country results.
Math
- 3Blue1Brown — Essence of Linear Algebra (YouTube). The geometric intuition for vectors, matrices, determinants, eigenvectors. Watch it once a year.
- 3Blue1Brown — Essence of Calculus. Same treatment for derivatives, chain rule, integrals.
- Mathematics for Machine Learning — Deisenroth, Faisal, Ong. Free PDF at mml-book.github.io. The exact subset of math used in ML, no more.
- Khan Academy — fill in any high-school gaps in algebra, probability, calculus.
Classical machine learning
- Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow — Aurélien Géron. Read Parts I & II (skip the TensorFlow chapters; do PyTorch on the side).
- An Introduction to Statistical Learning (ISL) — James, Witten, Hastie, Tibshirani. Free PDF at statlearning.com. The conceptual foundation for every model in scikit-learn.
- scikit-learn user guide at scikit-learn.org. Worth reading the supervised-learning chapter cover-to-cover.
- Kaggle Learn — short interactive lessons on pandas, ML, feature engineering. Free.
Deep learning
- Deep Learning — Goodfellow, Bengio, Courville. Free at deeplearningbook.org. Dense; read Chapters 6–9 first.
- PyTorch official tutorials at pytorch.org/tutorials. Start with "Learn the Basics" and "Quickstart."
- Karpathy's "Neural Networks: Zero to Hero" series (YouTube). Build everything from scratch in pure Python, then PyTorch. Watch micrograd and makemore first.
- fast.ai Practical Deep Learning for Coders at course.fast.ai. Top-down, hands-on. Free.
Transformers & modern AI
- "Attention Is All You Need" (Vaswani et al., 2017). The original transformer paper. Read it after watching Karpathy's GPT-from-scratch video.
- Karpathy "Let's build GPT: from scratch, in code, spelled out" (YouTube). Best single resource for understanding transformer training end-to-end.
- The Illustrated Transformer — Jay Alammar at jalammar.github.io. Visual companion to the original paper.
- Hugging Face course at huggingface.co/learn. Practical fine-tuning with the
transformerslibrary.
Practice grounds
- Kaggle — Getting Started competitions (Titanic, House Prices, MNIST) for fundamentals, then Featured competitions when you're ready.
- AIcrowd — Hosts research-style ML competitions.
- Codeforces / AtCoder — Not AI, but algorithm contests sharpen the coding speed you'll need under contest pressure.
- Hugging Face Spaces — Ship a working demo of every model you train. Cheap motivation.
Tools you'll keep using
- Python 3.11+,
venvor conda for environments. - JupyterLab for exploratory work, VS Code for serious editing.
- Git + GitHub for the portfolio repo. Commit every notebook.
- Weights & Biases (free for students) for experiment tracking once you start running many DL experiments.
- Google Colab as a free GPU backup. Useful when local hardware can't keep up.
Vocabulary cheat sheet
| Term | What it means |
|---|---|
| Epoch | One full pass through the training data |
| Batch / mini-batch | Subset of training data used in one gradient step |
| Learning rate (LR) | Step size for the optimizer update |
| Logits | Raw, unnormalized model outputs before softmax / sigmoid |
| Cross-entropy | Loss for classification: −Σ y log(p) |
| Backpropagation | Chain-rule algorithm for computing gradients through a network |
| SGD / Adam / AdamW | Optimizers (update rules for parameters from gradients) |
| Overfitting | Training loss low, validation loss high — model memorized noise |
| Regularization | Techniques to reduce overfitting (dropout, weight decay, etc.) |
| Embedding | Dense vector representation of a discrete token / category |
| Attention | Mechanism that weighs input positions when producing each output |
| Self-attention | Attention where Q, K, V all come from the same sequence |
| Pre-train / fine-tune | Train on big general data, then adapt on small task-specific data |
| LoRA / PEFT | Parameter-efficient fine-tuning — train a small adapter, freeze the base |
| Inference | Running a trained model on new inputs (no gradient updates) |
| Token | Discrete unit fed to a language model (subword, character, or word) |