SYNOPSIS

As a recurrent topic for a few years, the Data Science gathers subjects like statisticsmachine learningcomputer science and the domain expertise. Machine learning methods are characterized by algorithms that allow problem solving starting from data.

This training will give an insight of the diversity of machine learning methods, either for supervised learning (the explicative variable values are known and can be compared to the model results) or unsupervised learning (the explicative variable values are not known a priori) problems solving.

The training is designed around Python; the programming language basis and some specialized libraries (pandasscikit-learn) will be core blocks.

GOALS

Thanks to this training, you will develop the following skills:

  • Know how to use Python in a data analyze project
  • Know the main machine learning problems and the main models for each of them (Which model for which context and with which dataset?)
  • Master the pandas library for data analyze et scikit-learn for machine learning model implementation

PROGRAM

This program is indicative. It could be adapted to your specific needs.

  • Theoretical basis
    • Statistic variable types
    • Basic notions in statistics (mean, standard deviation, correlation, …)
    • Usual probability laws (gaussian, uniform, Poisson, exponential, …)
    • Reminder about matrix computing
  • Working environment configuration
    • Pythonipython and jupyter-notebook setting up
    • Presentation of package management tools (pipconda) and data analyze Python library setting up (numpypandasmatplotlibseaborn)
    • First program and test of the machine configuration
  • Data Science Python library using
    • Build a data pipeline with Luigi
    • Scientific computing with numpy
    • Dataset handling with pandas
    • Data visualization with matplotlib and seaborn
  • Machine learning algorithms with scikit-learn
    • Regression (linear regression, polynomial regression, gaussian regression, XGBoost, …)
    • Classification (logistic regression, SVM, decision trees, …)
    • Clustering (K-means, DBScan, clustering hiérarchique, …)
    • Dimension reduction (Principle Component Analysis)
  • Analyze of a “real” dataset
    • Reading/Writing from/to a csv file
    • Elementary statistics and feature interpretation
    • Data handling with pandas
    • Machine learning algorithm conception with scikit-learn
    • Data visualization

DURATION

3 days

PRE-REQUISITE

  • Strong basis in statistic and probability
  • Knowledge about Python programming language

See also DS1: Introduction to Data Science and DS2: Python for scientific computing

The next courses (Lyon or Paris):

.

Contact us for on-site trainings (dates are flexible to your needs).

You want to participate in this training ?

Please give us the details below if you can:

* Training

Place of training, Number of people involved, Initial level of participants, Time constraints, Specific expectations

* Contact details

Organization, Address, Contact, Email, Intracommunity VAT