class: left, bottom, title-slide .title[ # Machine learning ] .subtitle[ ## Introductory remarks ] .author[ ### Joshua Loftus ] --- <style type="text/css"> .remark-slide-content { font-size: 1.2rem; padding: 1em 4em 1em 4em; } </style> ![](../../../files/theme/LSE_stats_banner.jpg) ### ST 310: Machine Learning (for Data Science) Lecturer: [Joshua Loftus](https://joshualoftus.com/) Website: Moodle and also [ml4ds.com](https://ml4ds.com) ??? This is the very first, introductory video for week 1 Provide context Set expectations --- # About the course - Course info - Quick preview - Teaching/course philosophy ![](../../../files/lasso.gif) ??? Animation shows lasso, a method we'll learn about later --- class: inverse, center, middle # Course info --- # Format - *Mostly* self-contained -- ask if you need help! - Seminars: pre-work - http://ml4ds.com/ (links, slides, misc. notes) - Participation, active learning - Weekly readings, seminars, lectures - Reading - **ISLR** [Introduction to Statistical Learning](https://statlearning.com/) - **Mixtape** [Causal Inference: The Mixtape](https://mixtape.scunning.com/index.html) - **MLstory** [Patterns, Predictions, and Actions](https://mlstory.org/) - Supplemental references - **R4DS** [R for Data Science](https://r4ds.had.co.nz/) - **ESL** [Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) - **CASI** [Computer Age Statistical Inference](https://web.stanford.edu/~hastie/CASI/) --- # Assessments As described in the [course listing](https://www.lse.ac.uk/resources/calendar/courseGuides/undergraduate.htm) for ST310 - Formative: problem sets - Summative - Problem set(s) - Individual project: prediction competition - Group project: open-ended data analysis --- class: inverse, center, middle # Quick preview Don't worry about following all the details now We'll introduce R coding gradually --- .panelset[ .panel[.panel-name[Data] ```r library(gapminder) gapminder %>% head() %>% kable() ``` <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> lifeExp </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> gdpPercap </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1952 </td> <td style="text-align:right;"> 28.801 </td> <td style="text-align:right;"> 8425333 </td> <td style="text-align:right;"> 779.4453 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1957 </td> <td style="text-align:right;"> 30.332 </td> <td style="text-align:right;"> 9240934 </td> <td style="text-align:right;"> 820.8530 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1962 </td> <td style="text-align:right;"> 31.997 </td> <td style="text-align:right;"> 10267083 </td> <td style="text-align:right;"> 853.1007 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1967 </td> <td style="text-align:right;"> 34.020 </td> <td style="text-align:right;"> 11537966 </td> <td style="text-align:right;"> 836.1971 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1972 </td> <td style="text-align:right;"> 36.088 </td> <td style="text-align:right;"> 13079460 </td> <td style="text-align:right;"> 739.9811 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:right;"> 38.438 </td> <td style="text-align:right;"> 14880372 </td> <td style="text-align:right;"> 786.1134 </td> </tr> </tbody> </table> ] .panel[.panel-name[Code] ```r gdp_data <- gapminder %>% `filter`(year == max(year)) life_exp_plot <- ggplot(gdp_data, `aes`(x = gdpPercap, y = lifeExp)) + `geom_point`(aes(color = continent, shape = continent, size = pop)) life_exp_plot + `stat_smooth`(formula = y ~ x, method = "loess", span = 1) ``` ] .panel[.panel-name[Plot] <img src="01-1-introduction_files/figure-html/unnamed-chunk-1-1.png" width="648" /> ] .panel[.panel-name[Modify] ```r life_exp_plot + scale_x_log10() + stat_smooth(formula = y ~ x, method = "lm") + xlab("GDP per capita") + ylab("Life expectancy") ``` ] .panel[.panel-name[Plot again] <img src="01-1-introduction_files/figure-html/unnamed-chunk-2-1.png" width="648" /> ] ] ??? Preview of some R Model complexity: nonlinear vs relatively simple (linear) model Linear model: may not fit the data as well But! interpretable, one slope to summarize relationship But! reliable/stable, Machine learning allows us to find models with just the right amount of complexity --- class: bottom .pull-left[ ![xkcd 2048](https://imgs.xkcd.com/comics/curve_fitting_2x.png) Source: [xkcd](https://xkcd.com/2048/) ] .pull-right[ Machine learning in one picture ] --- class: inverse, center, middle # Teaching/course philosophy ??? Not just math, also my opinions Textbooks are excellent, read them! But! Value added by lectures --- # Recurring themes - Human-centric ML - Tools for us to control (not conversely) - Ethics of data science -- - Interpretability - Philosophy of science - Causality vs "curve fitting" -- - Social learning - Come post on the forum! - (More on this later) ??? Examples with ethical aspects Explicit consideration of ethics and professional responsibilities of data scientists Future lectures include a dose of philosophy, vaccinating against mindless application of tools Don't be a zombie data scientist Learn from each other, from people who we think are trustworthy... --- class: inverse, center, middle # Please allow me # to introduce # myself I'm Joshua Loftus ??? I'm the lecturer For seminars, help from the very capable teaching staff Am I trustworthy? Why should you listen to me? You decide! --- class: bottom .pull-left[ ![](../../../files/brooklynbridge.jpg) ] .pull-right[ - From the US - Used to teach at NYU - Cambridge postdoc - Stats PhD from Stanford - Reproducibility, fairness, interpretability, causality - First-gen at university - Travel, reading, audiobooks, lifting ] ??? I am a person in the world, not just on lecture slides! I look forward to getting to meet you! --- class: inverse, center, middle # The end of the beginning I hope you're as excited about the course as I am!