Definition: The science/art of programming computers so that they can learn from data
Popular ML Algorithms
 Linear & Polynomial Regression
 Logistic Regression
 knearest Neighbors
 Support Vector machines
 Decision Trees
 Random Forests
 Ensemble methods
Neural Networks Architectures
 Feedforward Neural Nets
 Convolutional Nets
 Recurrent Nets
 Long shortterm memory (LSTM) nets
 Autoencoders
 MultiLayer Perceptons (MLPs)
Famous Papers
 Machine Learning on handwritten digits  2006  <www.cs.toronto.edu/~hinton>
 The Unreasonable Effectiveness of Data  2009
Useful links  resources
 www.kaggle.com/
 Competitions
 Datasets
 Kernels
 OpenAI Gym
 For reinforcement learning
 scikitlearn user guide
 Dataquest  www.dataquest.io
 deep learning website  http://deeplearning.net
 Imperial College Course
 Machine Learning  The Complete Guide https://en.wikipedia.org/wiki/Book:Machine_Learning_%E2%80%93_The_Complete_Guide
 https://paperswithcode.com/
HandsOn Machine Learning Book
For the DL part see [Deep Learning]
Pandas / Sklearn / Numpy / Scipy Cheatsheet
dt.describe()
→ statistics about each column (count, mean, min, max 25% 50% etc.)
dt.info()
→ info about dataframe (dtype index, column dtypes, notnull values, memory usage)
dt["a_col"].value_counts()
→ get all the values encountered in the column
dt.corr()
→ Compute standard correlation coefficient for potential linear correlations
Apply a function to a dataframe: either dt.apply
or dt.where(... , inplace=True)
Use the viridis color palette: colorblindfriendly and prints better on greyscale!
https://cran.rproject.org/web/packages/viridis/vignettes/introtoviridis.html
SkLearn  fill missing values in a dataset:
Strategies:
 Get rid of corresponding districts
 Get rid of the whole attribute
 Set the values to some value (zero, mean, median, etc.)
Pandas/SkLearn: Convert a string column/category to nums
Get Numpy dense array from Scipy sparse matrix: sparse_mat.toarray()
Feature Scaling
Machine Learning algorithms don’t perform well when the input numerical attributes have very different scales.
 minmax scaling / normalization
 standardization
Definitions
Attribute: A data type (e.g., Mileage) Feature: Attribute + its value
Deep Neural Network
LTU: Neuron, a Sum using weights > z = w1x1 + w2x2 + … + wnxn (w^Tx), gives out a step function > e.g., Heaviside
Perceptron > single Layer of LTUs, Each neuron is connected to all input. The
enuron’s are also fed an extra bias feature x0 = 1 (bias neuron
)
Passthrough Input Layer
: Inputs are represented by neurons that just
propagate the input to the output
Activation function (activation_fn
): The function that evaluates the neuron
inputs and dicides on the triggering of the neuron
ReLU or Rectifier or Ramp
> max(0, z)
Hint: The derivative of ReLU is the Heaviside
and of the SmoothReLU the
logistic function
Deep Learning Theorems
 Universal Approximation Theorem

Any two optimization algorithms are equivalent when their performance is averaged across all possible problems
FAQ
 How do I tune the hyperparameters of my model?
 Grid search with crossvalidation to find the right hyperparameters
 Randomised search
 Use Oscar  http://oscar.calldesk.ai/
 It helps to have an idea of what values are reasonable for each hyperparameter!