UCLA Stats 404 - Statistical Computing and Programming
Course overview (Python) website, syllabus, GitHub repository, class pre-requisites and software installation instructions
Week 1: Business use case, setting up reproducible machine learning environment, introduction to Git
Weeks 2 and 3: Introduction to Python, pandas and SQL
Python: expressions, control flow, functions, variable types, passing by reference, list comprehension, functional programming
pandas: reading-in data, subsetting, EDA, split + apply+ combine, pandas + databases
Weeks 4 and 5: Introduction to ML POC
linear, logistic, Elastic nets, PCA regression, hyper-parameter tuning, Deep Learning and custom loss functions
Week 6: Improving POC by understanding computational constraints
pandas and big data, Dask, pySpark + SparkSQL, embarrassingly parallel processes, AWS S3
Week 7 and 8: Productionalizing POC
reproducibility, readability, robustness
testing suite, ML test, typing, model roll-out
Weeks 9 and 10: Final Project Presentations
In-class presentations
Week 11: Final's Week
Final project due