Synopsis
As a recurrent topic for a few years, the Data Science gathers subjects like statistics, machine learning, computer science and the domain expertise. Machine learning methods are characterized by algorithms that allow problem solving starting from data.
This training will give an insight of the diversity of machine learning methods, either for supervised learning (the explicative variable values are known and can be compared to the model results) or unsupervised learning (the explicative variable values are not known a priori) problems solving.
The training is designed around Python; the programming language basis and some specialized libraries (pandas, scikitlearn) will be core blocks.
Goals
Thanks to this training, you will develop the following skills:

 Know how to use Python in a data analyze project
 Know the main machine learning problems and the main models for each of them (Which model for which context and with which dataset?)
 Master the pandas library for data analyze et scikitlearn for machine learning model implementation
Duration
3 days

 Strong basis in statistic and probability
 Knowledge about Python programming language
See also DS1: Introduction to Data Science and DS2: Python for scientific computing
Program
Program
This program is indicative. It could be adapted to your specific needs.

Theoretical basis
 Statistic variable types
 Basic notions in statistics (mean, standard deviation, correlation, …)
 Usual probability laws (gaussian, uniform, Poisson, exponential, …)

Reminder about matrix computing

Working environment configuration
 Python, ipython and jupyternotebook setting up
 Presentation of package management tools (pip, conda) and data analyze Python library setting up (numpy, pandas, matplotlib, seaborn)

First program and test of the machine configuration

Data Science Python library using
 Build a data pipeline with Luigi
 Scientific computing with numpy
 Dataset handling with pandas

Data visualization with matplotlib and seaborn

Machine learning algorithms with scikitlearn
 Regression (linear regression, polynomial regression, gaussian regression, XGBoost, …)
 Classification (logistic regression, SVM, decision trees, …)
 Clustering (Kmeans, DBScan, clustering hiérarchique, …)

Dimension reduction (Principle Component Analysis)

Analyze of a "real" dataset
 Reading/Writing from/to a
csv
file  Elementary statistics and feature interpretation
 Data handling with pandas
 Machine learning algorithm conception with scikitlearn

Data visualization

DS3 – Data Science
–
Python for Data Science
The next courses in Paris :
Contact us for mostly onsite trainings at your office (dates are flexible to your needs).