UCLA Stats 404 - Statistical Computing and Programming

Course overview (Python) website, syllabus, GitHub repository, class pre-requisites and software installation instructions

  • Week 1: Business use case, setting up reproducible machine learning environment, introduction to Git

  • Weeks 2 and 3: Introduction to Python, pandas and SQL

    • Python: expressions, control flow, functions, variable types, passing by reference, list comprehension, functional programming

    • pandas: reading-in data, subsetting, EDA, split + apply+ combine, pandas + databases

  • Weeks 4 and 5: Introduction to ML POC

    • linear, logistic, Elastic nets, PCA regression, hyper-parameter tuning, Deep Learning and custom loss functions

  • Week 6: Improving POC by understanding computational constraints

    • pandas and big data, Dask, pySpark + SparkSQL, embarrassingly parallel processes, AWS S3

  • Week 7 and 8: Productionalizing POC

    • reproducibility, readability, robustness

    • testing suite, ML test, typing, model roll-out

  • Weeks 9 and 10: Final Project Presentations

    • In-class presentations

  • Week 11: Final's Week