1
Introduction to Machine Learning
- Big Data and Machine Learning.
- Supervised, unsupervised and reinforcement learning algorithms.
- Steps for building a predictive model.
- Detecting outliers and handling missing data.
- How to choose the algorithm and its variables
Demonstration
Getting started in the Spark environment with Python using Jupyter Notebook. View several examples of the models provided.
2
Model evaluation procedures
- Techniques for resampling in training, validation and testing sets.
- Learning data representativeness test.
- Predictive model performance measurements.
- Confusion and cost matrix and AUC-ROC curve.
Hands-on work
Evaluation and comparison of different algorithms on the provided models.
3
Predictive models, the frequentist approach
- Statistical learning.
- Data conditioning and dimensionality reduction.
- Support vector machines and kernel methods.
- Vector quantization.
- Neural nets and Deep Learning
- Ensemble learning and decision trees.
- Bandits' algorithms, optimism in the face of uncertainty.
Hands-on work
Implementing algorithm families using various data sets.
4
Bayesian models and learning
- Principles of Bayesian inference and learning.
- Graphical models: Bayesian networks, Markov fields, inference and learning.
- Bayesian methods: Naive Bayes, mixtures of Gaussians, Gaussian processes.
- Markov models: Markov processes, Markov chains, hidden Markov chains, Bayesian filtering.
Hands-on work
Implementing algorithm families using various data sets.
5
Machine Learning in live environments
- Features related to the development of a model in a distributed environment.
- Big Data deployment with Spark and MLlib.
- The Cloud: Amazon, Microsoft Azure ML, IBM Bluemix, etc.
- Maintenance of the model.
Hands-on work
Taking a predictive model live, with integration into batch processes and processing flows.