Unsupervised clustering and data cleaning
[This is session 3 in the seriesIntroduction to Machine Learning in Python]
Overview:
This workshop will focus on unsupervised machine learning and data cleaning. Unsupervised machine learning is a powerful technique where the algorithm analyzes and clusters unlabeled datasets. This workshop will scratch the surface of this side of machine learning, introducing unsupervised learning using the k-means and DBSCAN algorithms. This session will explore the data cleaning process in the machine learning pipeline in more detail.
By the end of the workshop, participants will be able to:
- Differentiate between supervised and unsupervised learning;
- Given a scaffolded environment and curated data set, train a DBSCAN model and describe how this algorithm works at a high level;
- Articulate the steps in data cleaning, along with the common issues and solutions to incomplete or faulty datasets.
Prerequisites:
- Participants should already have some familiarity with Python programming fundamentals, e.g. loops, conditional execution, importing modules, and calling functions. Furthermore, participants should ideally have attended the first lesson in the “Fundamentals of Machine Learning in Python” series, or they should already have some background on the general machine learning pipeline.
Date: Friday, 24 March 2023.
Time: 10 a.m. to 12 p.m.
Location: hybrid (in-person at Burnside Hall 1104, and online via Zoom).
Instructors: Jacob Errington, Faculty Lecturer, and Eric Mayhew, graduate student, School of Computer Science, 鶹AV.