Learning data science step by step
Most of the examples presented in Internet tutorials are either using powerful libraries (Scikit Learn, Keras…), complex models (neural nets), or based on data samples with many features.
In this collection of workbooks, I want to start from simple examples and raw Python code and then progressively complexify the data sets and use more complex technics and libraries.
On purpose, most datasets are generated in order to adjust the parameters fitting with the demonstration.
The notebooks are of type Jupyter, using Python 3.7
To read or edit the notebooks you may :
- Browse notebooks in HTML from the HTML table of content
- Open this repository in nbviewer
- Clone the repository in order to test and modify locally within Jupyter ou JupyterLab
Linear regression
Let’s progressively start from simple univariate example and then add progressively more complexity:
- Univariate function approximation with linear regression,
- Bivariate function approximation with linear regression,
- Feature engineering or feature learning with linear regression (HTML / Jupyter)
Classification
Binary classification with parametric models
- Univariate function as boundary on a two classes data, approximated with logistic regression,
- Bivariate parametric function as a boundary, approximated with logistic regression,
Binary classification with non-parametric models
- Bivariate with K Nearest Neighbors (KNN), homemade, using SciKit Learn (HTML / Jupyter)
- Non linear problem solving with Support Vector Machine (SVM) (HTML / Jupyter)
Multi-class classification with regression or neural networks
- Two features to separate the 2D plan into 3 or more categories
Multi-class classification with non-parametric models
Deep learning
Convolutional neural networks (CNN)
- Introduction to CNN as an image filter
- CNN versus Dense comparison on MNIST
- Interpretability
- Other CNNs
Generative networks (VAE, GAN)
- Generative Adversarial Networkds (GAN), the basics on MNIST, with Tensorflow 2 / Keras and Tensorflow Datasets
- Original GAN using Dense layers (HTML / Jupyter)
- GAN with convolutions (DCGAN) (HTML / Jupyter)
- GAN with convolutions (DCGAN), no Dense layer on the generator path (HTML / Jupyter)
- GAN and Bayesian network on ski outing reports and prediction of global warming impact on skiing in the Alps (HTML / Jupyter)
Natural Language Processing (NLP)
- Classification of mountaineering routes based on the textual description with fastText and Tensorflow (HTML / Jupyter)
- Summarized in Medium article “Full NLP use case with fastText and Tensorflow 2”
- Data preparation (HTML / Jupyter)
Reading list
Books
- Deep Learning - I. Goodfellow, Y. Bengio, A. Courville, The MIT Press.
- Very good overview of machine learning and its extension to deep learning
- An Introduction to Statistical Learning with Applications in R - G. James, D. Witten, T. Hastie, R. Tibshirani.
- Traditional machine learning including regressions, clustering, SVM…
Nice notebooks
Tutorials and courses
Papers
- You Look Only Once: Unified, Real-time object detection
- Learning to forget, continual prediction with LSTM - F. A. Gers et al.
- What are biases in my word embeddings ? - N. Swinger et al.