UCLA Stats 404 - Statistical Computing and Programming

Goal of course: Prepare students for Data Scientist or Machine Learning Engineering roles in industry, by learning marketable skills and best practices for collaborating with technical and non-technical stakeholders.

By the end of the course, students should be able to:

  • Explain how their work contributes to the business;

  • Learn about and implement iterative model development;

  • Write production-ready code, that runs not just on their computer;

  • Answer the business question end-to-end with ML;

  • Gain experience in Python, SQL and Git.

Agenda (subject to change):

  • Week 1: Business use case, setting up reproducible machine learning environment, introduction to Git

  • Weeks 2 and 3: Introduction to Python, pandas and SQL

    • Python: expressions, control flow, functions, variable types, passing by reference, list comprehension, functional programming

    • pandas: reading-in data, subsetting, EDA, split + apply+ combine, pandas + databases

  • Weeks 4 and 5: Introduction to ML POC

    • linear, logistic, Elastic nets, PCA regression, hyper-parameter tuning, Deep Learning and custom loss functions

  • Week 6: Improving POC by understanding computational constraints

    • pandas and big data, Dask, pySpark + SparkSQL, embarrassingly parallel processes, AWS S3

  • Week 7 and 8: Productionalizing POC

    • reproducibility, readability, robustness

    • testing suite, ML test, typing, model roll-out

  • Weeks 9 and 10: Final Project Presentations

    • In-class presentations

  • Week 11: Final's Week

Relevant Links: