Introduction to regression and data collection
[This is session 2 in the seriesIntroduction to Machine Learning in Python]
Date: Friday, 24 February 2023.
Time: 10 a.m. to 12 p.m.
Location: hybrid (in-person at Burnside Hall 1104, and online via Zoom).
Instructors: Jacob Errington, Faculty Lecturer, and Eric Mayhew, graduate student, School of Computer Science, 鶹AV.
Overview:
This workshop lesson will dive into regression and the process of data cleaning in machine learning. We will explore what regression is and how it differs from classification (to be covered later, in session 4). In terms of algorithms, we will discuss how decision trees and support vector machines are used to do regression tasks. This workshop will introduce you to these types of machine learning models in a hands-on way. We will also cover the data collection process of the machine learning pipeline.
By the end of the workshop, participants will be able to:
- Describe plainly how decision trees and support vector machines work;
- Given a scaffolded environment and curated data set, train a decision tree and describe how this algorithm works at a high level;
- Articulate the data collection process along with common problems in data collection.
Prerequisites:
· Participants should already have some familiarity with Python programming fundamentals, e.g. loops, conditional execution, importing modules, and calling functions. Furthermore, participants should ideally have attended the first lesson in the “Fundamentals of Machine Learning in Python” series, or they should already have some background on the general machine learning pipeline.
· You need to bring your own laptop for this workshop. Contact us if you would like to attend but it's impossible for you to bring a laptop.
· Install Anaconda on your computer. You can findinstallation instructions here. Please contact us (cdsi.science [at] mcgill.ca) if you are having trouble with installation.