UCLA Stats 404 - Statistical Computing and Programming
Goal of course: Prepare students for Data Scientist or Machine Learning Engineering roles in industry, by learning marketable skills and best practices for collaborating with technical and non-technical stakeholders.
By the end of the course, students should be able to:
Explain how their work contributes to the business;
Learn about and implement iterative model development;
Write production-ready code, that runs not just on their computer;
Answer the business question end-to-end with ML;
Gain experience in Python, SQL and Git.
Agenda (subject to change):
Week 1: Business use case, setting up reproducible machine learning environment, introduction to Git
Weeks 2 and 3: Introduction to Python, pandas and SQL
Python: expressions, control flow, functions, variable types, passing by reference, list comprehension, functional programming
pandas: reading-in data, subsetting, EDA, split + apply+ combine, pandas + databases
Weeks 4 and 5: Introduction to ML POC
linear, logistic, Elastic nets, PCA regression, hyper-parameter tuning, Deep Learning and custom loss functions
Week 6: Improving POC by understanding computational constraints
pandas and big data, Dask, pySpark + SparkSQL, embarrassingly parallel processes, AWS S3
Week 7 and 8: Productionalizing POC
reproducibility, readability, robustness
testing suite, ML test, typing, model roll-out
Weeks 9 and 10: Final Project Presentations
In-class presentations
Week 11: Final's Week
Final project due
Relevant Links:
UCLA Course listing
Class pre-requisites and software installation instructions
Final project guidelines