UCLA Stats 404 - Statistical Computing and Programming

Course Overview (Python) and Syllabus and GitHub repository

  • Week 1: Setting up machine learning environment
    • Git, virtual environment, Jupyter Lab, PyCharm
    • Lab 1 Solution
  • Weeks 2 and 3: Introduction to Python and pandas
    • Python: expressions, control flow, functions, variable types, passing by reference, list comprehension, functional programming
    • pandas: reading-in data, subsetting, EDA, split + apply+ combine, pandas + databases
    • Lab 2 Solution
  • Weeks 4 and 5: Regression methods + numerical optimization + loss functions
    • linear, logistic, Elastic nets, PCA regression, hyper-parameter tuning, Deep Learning and custom loss functions
  • Week 6: Python and Big Data
    • pandas and big data, Dask, pySpark + SparkSQL, embarrassingly parallel processes, AWS S3
  • Week 7 and 8: Introduction to Software Development
    • robustness, reproducibility, readability
    • testing suite, ML test, typing, model roll-out
  • Weeks 9 and 10: Final Project Presentations
    • Presentations evaluation form