View on GitHub

data-science

Notebooks and Python about data science

If you like this project please add your Star

Local approximation of a bivariate polynomial expression using linear regression

Given two features, extract y (the label), perform linear regression. Coming from the single feature model, 2 features is adding some complexity not only on the maths, but also on the display through charts.

Learning goals:

  • Analiticaly solve the linear regression and implement it with simple "raw" Python
  • Use powers of the features to increase the model capacity
  • Use some off the shelve methods of Numpy
  • Switch to iterative methods with gradient descent (GD) and stochastic gradient descent (SGD)
  • SGD with powers of the features
  • Add L2 regularizer to the SGD with powers to improve convergence and fit
In [1]:
import numpy as np
from numpy import random
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.preprocessing import normalize as skNormalize
import pandas

In 2D, it would require many points to uniformly cover the x plan

It is then prefered to draw x points from a 2D uniform distribution

Ploting y as function of $x_0, x_1$ is then more challenging as $x_0$ and $x_1$ are not continuous monotonic vectors

$$ \begin{align} f(x) &= (x_0−0.2)^4 + (x_0−0.1)^3 + 0.1 x_1^2 + 0.35 \\ &= x_0^4 + 0.2 x_0^3 − 0.06 x_0^2 − 0.002 x_0 + 0.1 x_1^2 + 0.3506 \end{align} $$
In [2]:
# f(x) as a bivariate polynom
fPoly = np.array([[-0.002, 0], [-0.06, 0.1], [0.2, 0], [1, 0]])
# Generator
def generateBatch(N):
    #
    xMin = np.array([0, -0.5])
    xMax = np.array([0.5, 0.5])
    #
    b = 0.35
    std = 0.01
    #
    x = random.uniform(xMin, xMax, (N, 2))
    yClean = (x[:,0]-0.2)**4 + (x[:,0]-0.1)**3 + 0.1*x[:,1]**2 + b
    y =  yClean + random.normal(0, std, N) 
    return (x, y, yClean)
In [3]:
N = 100000
xTrain, yTrain, yTrainClean = generateBatch(N);
print('x', xTrain.shape, ', yClean', yTrainClean.shape, ', y', yTrain.shape);

xTest, yTest, yTestClean = generateBatch(N);
x (100000, 2) , yClean (100000,) , y (100000,)
In [4]:
fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTrain[:,0], yTrainClean, marker='.');
plt.xlabel('x0')
plt.subplot(1,2,2)
plt.scatter(xTrain[:,1], yTrainClean, marker='.');
plt.xlabel('x1')
plt.title('Bivariate model without noise');

It looks like a tobogan from the side ($x_0$) and from the front ($x_1$)

Remove biases

In [5]:
xUnB = xTrain - np.mean(xTrain, axis=0)
yUnB = yTrain - np.mean(yTrain)

Analiticaly / Closed form

Linear, 1st degree approximation of Y: \begin{align} Y = X W + b \end{align}

  • $X$ is a $N_{sample} \times N_{feature}$ matrix
  • $W$ is a $N_{feature}$ vector
  • $Y$ is a $N_{sample}$ vector
  • $b$ is a scalar

To simplify calculus, let's append b to W, X is modified to append $\mathbb {1}_n$ vector on first column: \begin{align} \Theta &= \begin{bmatrix} b \\ W \\ \end{bmatrix} \\ X_m &= \begin{bmatrix} \mathbb{1}_n & X \\ \end{bmatrix} \\ Y &= X_m \Theta \\ \end{align}

Using mean square error (Euclidian norm), we are looking for $w_{est}$ such that:

\begin{align} \Theta_{est} \in argmin_\Theta \lVert X_m \Theta - Y \rVert_2^2 \end{align}

Let's define f the function to minimize: \begin{align} f(\Theta) & = \lVert X_m \Theta - Y \rVert_2^2 \\ & = (X_m \Theta - Y_u)^T(X_m \Theta - Y) \\ & = \Theta^T X_m^T X_m \Theta + Y^T Y - 2 X_m^T \Theta^T Y \end{align}

Lookup the minimum through the gradient:

$$\nabla_\Theta f(\Theta) = 2 X_m^T X_m \Theta - 2 X_m^T Y$$

If $X^T X$ can be inverted: $$\nabla_\Theta f(\Theta) = 0 \iff \Theta = (X_m^T X_m)^{-1} X_m^T Y$$

With numpy "primitive" operations

In [6]:
xTrainW1 = np.concatenate((np.ones(N).reshape(-1, 1), xTrain), axis=1)

xTxInv = np.linalg.inv(np.matmul(xTrainW1.T, xTrainW1))
wEstWb = np.matmul(xTxInv, np.matmul(xTrainW1.T, yTrain))
bEst = wEstWb[0]
wEst = wEstWb[1:]
print("Intercept: %.4f, coefficients: " % bEst, wEst)
Intercept: 0.3440, coefficients:  [0.11291328 0.00015331]

Test model

In [7]:
yEst1 = np.matmul(xTest, wEst) + bEst
In [8]:
fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTest, marker='.');
plt.scatter(xTest[:,0], yEst1, marker='.', alpha=0.05);
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTest, marker='.');
plt.scatter(xTest[:,1], yEst1, marker='.', alpha=0.05);
# Mean square error 'a la mano'
mse1 = np.dot(yTest-yEst1, yTest-yEst1) / N
print('Linear regression MSE = {:.3e}'.format(mse1));
Linear regression MSE = 2.594e-04

The tobogan has been approximated by a plan

Home made with powers

Adding some flexibility to the model using powers of x

In [9]:
xTrain2 = np.concatenate((np.ones(N).reshape(-1, 1), xTrain, xTrain**2, xTrain**3, xTrain**4), axis=1)
xTrain2.shape
Out[9]:
(100000, 9)
In [10]:
x2Tx2Inv = np.linalg.inv(np.matmul(xTrain2.T, xTrain2))
w2Wb = np.matmul(x2Tx2Inv,np.matmul(xTrain2.T, yTrain))
b2 = w2Wb[0]
w2 = w2Wb[1:]
print("Intercept: %.4f, coefficients: " % b2, w2)
Intercept: 0.3505, coefficients:  [-1.63771045e-03 -1.90875981e-04 -5.14544930e-02  9.95880746e-02
  1.37758809e-01  1.80340031e-03  1.09411993e+00  5.54438162e-03]

Test model

In [11]:
xTest2 = np.concatenate((xTest, xTest**2, xTest**3, xTest**4), axis=1)
yEst2 = np.matmul(xTest2, w2) + b2
In [12]:
fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTest2[:,0], yTest, marker='.');
plt.scatter(xTest2[:,0], yEst2, marker='.', alpha=0.05);
plt.subplot(1,2,2)
plt.scatter(xTest2[:,1], yTest, marker='.');
plt.scatter(xTest2[:,1], yEst2, marker='.', alpha=0.05);
mse2 = np.dot(yTest-yEst2, yTest-yEst2) / N
mse2Clean = np.dot(yTestClean-yEst2, yTestClean-yEst2) / N
print('Linear regression with powers, MSE = {:.3e}, MSE to clean test data = {:.3e}'.format(mse2, mse2Clean));
Linear regression with powers, MSE = 1.006e-04, MSE to clean test data = 1.402e-08

The polynomial of y has been well approximated.

The MSE on the noisy sample should not be lower than the variance of the noise which is $(10^{-2})^2=10^{-4}$

NumPy least square

In [13]:
fit4, residues, rank, s = np.linalg.lstsq(xUnB, yUnB, rcond=None)
fit4, rank
Out[13]:
(array([0.11291328, 0.00015331]), 2)

Test model

In [14]:
yEst4 = np.matmul( xTest, fit4) + bEst
In [15]:
fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTest, marker='.');
plt.scatter(xTest[:,0], yEst4, marker='.', alpha=0.05);
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTest, marker='.');
plt.scatter(xTest[:,1], yEst4, marker='.', alpha=0.05);
mse4 = metrics.mean_squared_error(yTest, yEst4)
print('Numpy least square, MSE = {:.3e}'.format(mse4) );
Numpy least square, MSE = 2.594e-04

Gradient descent

Attempt to find a value of w such that the gradient is null (indication of minima, or saddle point)

Loop on the training data until the gradient is below a threshold

Gradient on $w$ for a linear regression is: \begin{align} \nabla_{w} = X^T (X w - y) \end{align}

In [16]:
# Gradient computation
def calcGradient(x, y, w):
    gradient = np.matmul(x.T, np.matmul(x, w) - y)
    return gradient, np.sum(gradient**2)

# Plot helper to show target reference value
def plotToRef(label, subNum, subRow, subCol, ref, values):
    nIter = len(values)
    r = range(nIter)
    plt.subplot(subNum, subRow, subCol)
    plt.title(label)
    plt.plot(r, values, r, np.ones((nIter))*ref, alpha=0.5)
    plt.grid();

Standard gradient descent

Loop on the same train data

In [17]:
w6 = np.ones(3)
threshold = 1e-6
learningRate = 1e-6
gradient, gradientNorm = calcGradient(xTrainW1, yTrain, w6)

print('w start =', w6, ', Gradient norm =', gradientNorm)
w6Learn = [np.concatenate((w6, [gradientNorm]))]

while gradientNorm > threshold:
    w6 = w6 - learningRate * gradient
    gradient, gradientNorm = calcGradient(xTrainW1, yTrain, w6)
    w6Learn.append(np.concatenate((w6, [gradientNorm])))
    
print('w end =', w6, ', Gradient norm =', gradientNorm, ', num iteration =', len(w6Learn))
df6 = pandas.DataFrame(w6Learn, columns = ('b', 'w0', 'w1', 'Gradient norm'))
w start = [1. 1. 1.] , Gradient norm = 8332553305.380094
w end = [3.44007156e-01 1.12913773e-01 1.53310758e-04] , Gradient norm = 9.96689182973102e-07 , num iteration = 7196
In [18]:
w6.shape
Out[18]:
(3,)
In [19]:
fig = plt.figure(figsize=(16,10))
plotToRef('$b$', 2,3,1, bEst, df6['b'])
plotToRef('$w_0$', 2,3,2, wEst[0], df6['w0'])
plotToRef('$w_1$', 2,3,3, wEst[1], df6['w1'])
plt.subplot(2,3,4)
plt.semilogy(df6["Gradient norm"])
plt.grid()
plt.legend();
No handles with labels found to put in legend.
In [20]:
fig = plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.semilogy(df6['b'], df6['Gradient norm'])
plt.xlabel('b')
plt.ylabel('Gradient norm')
plt.grid();
plt.title('Trajectory of the Gradient norm during convergence')
plt.subplot(1,2,2)
plt.semilogy(df6['w0'], df6['Gradient norm'])
plt.semilogy(df6['w1'], df6['Gradient norm'])
plt.grid();
plt.xlabel('w')
plt.ylabel('Gradient norm')
plt.legend(("$w_0$", "$w_1$"));

Test model

In [21]:
yEst6 = np.matmul(xTest, w6[1:]) + w6[0]
In [22]:
fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst6, marker='.', alpha=0.05)
plt.title('x0')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst6, marker='.', alpha=0.05)
plt.title('x1')
plt.legend(('ori clean', 'estimated'))
mse6 = metrics.mean_squared_error(yTest, yEst6)
print('Gradient descent MSE = {:.3e}'.format(mse6));
Gradient descent MSE = 2.594e-04

Stochastic gradient descent

Loop on new data, smaller batch

In [23]:
Nbatch = 1000
w7 = np.ones(3)
threshold = 1e-4
learningRate = 1e-4
nIterMax = 1e5

onesBatch = np.ones((Nbatch, 1))

# Initial batch
xBatch, yBatch, yBC = generateBatch(Nbatch)
gradient, gradientNorm = calcGradient(np.concatenate((onesBatch, xBatch), axis=1), yBatch, w7)

print('w start =', w7, ', Gradient norm =', gradientNorm)
w7Learn = [np.concatenate((w7, [gradientNorm]))]

# Continue
nIter = 0
while (gradientNorm > threshold) & (nIter <= nIterMax):
    w7 = w7 - learningRate * gradient
    xBatch, yBatch, yBC = generateBatch(Nbatch)
    gradient, gradientNorm = calcGradient(np.concatenate((onesBatch, xBatch), axis=1), yBatch, w7)
    w7Learn.append(np.concatenate((w7, [gradientNorm])))
    learningRate = learningRate * 0.9999
    nIter += 1
                                          
print('w end =', w7, ', Gradient norm =', gradientNorm, ', num iteration =', len(w7Learn))
df7 = pandas.DataFrame(w7Learn, columns = ('b', 'w0', 'w1', 'Gradient norm'))
w start = [1. 1. 1.] , Gradient norm = 812489.7498501091
w end = [3.43957080e-01 1.12969836e-01 2.40469033e-05] , Gradient norm = 6.611670633228906e-05 , num iteration = 33932
In [24]:
fig = plt.figure(figsize=(16,10))
plotToRef('$b$', 2,3,1, bEst, df7['b'])
plotToRef('$w_0$', 2,3,2, wEst[0], df7['w0'])
plotToRef('$w_1$', 2,3,3, wEst[1], df7['w1'])
plt.subplot(2,3,4)
plt.semilogy(df7['Gradient norm'])
plt.grid()
plt.legend();
No handles with labels found to put in legend.
In [25]:
fig = plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.semilogy(df7['b'], df7['Gradient norm'])
plt.xlabel('b')
plt.ylabel('Gradient norm')
plt.grid();
plt.title('Trajectory of the Gradient norm during convergence')
plt.subplot(1,2,2)
plt.semilogy(df7['w0'], df7['Gradient norm'])
plt.semilogy(df7['w1'], df7['Gradient norm'])
plt.grid();
plt.xlabel('w')
plt.ylabel('Gradient norm')
plt.legend(("$w_0$", "$w_1$"));

Test model

In [26]:
yEst7 = np.matmul(xTest, w7[1:]) + w7[0]
In [27]:
fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst7, marker='.', alpha=0.05)
plt.title('$x_0$')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst7, marker='.', alpha=0.05)
plt.title('$x_1$')
plt.legend(('ori clean', 'estimated'))
mse7 = metrics.mean_squared_error(yTest, yEst7)
print('Stochastic gradient descent MSE = {:.3e}'.format(mse7));
Stochastic gradient descent MSE = 2.594e-04

SGD with powers

Adding powers of x to the stochastic gradient descent

In [28]:
# Batch size
nBatch = 1000
# Initial coefficients (fixed and tuned manually for this trial)
w8 = np.ones(9)* 1.5 
# Gradient descent end threshold (on the gradient)
threshold = 1e-2
# Initial learning rate
learningRate = 1e-4
# Maximum number of iterations
nIterMax = 1e5

onesBatch = np.ones((Nbatch, 1))

# Initial batch
xBatch, yBatch, yBC = generateBatch(nBatch)
xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
gradient, gradientNorm = calcGradient(xBatch2, yBatch, w8)

print('w start =', w8, ', Gradient norm = {:.3e}'.format(gradientNorm))
w8Learn = [np.concatenate((w8, [gradientNorm]))]

# Continue
nIter = 0
while (gradientNorm > threshold) & (nIter < nIterMax):
    w8 = w8 - learningRate * gradient
    xBatch, yBatch, yBC = generateBatch(nBatch)
    xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
    gradient, gradientNorm = calcGradient(xBatch2, yBatch, w8)
    w8Learn.append(np.concatenate((w8, [gradientNorm])))
    learningRate = learningRate * 0.9999
    nIter += 1

print('w end =', w8, ', Gradient norm = {:.3e}'.format(gradientNorm), ', num iteration =', len(w8Learn))
df8 = pandas.DataFrame(w8Learn, columns = ('b', 'w0_1', 'w1_1', 'w0_2', 'w1_2', 'w0_3', 'w1_3', 'w0_4', 'w1_4', 'Gradient norm'))
w start = [1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5] , Gradient norm = 3.702e+06
w end = [ 0.38864066 -0.31918763 -0.13231316  0.24717002 -0.13794083  0.82083255
  0.87821587  1.1595413   1.12787625] , Gradient norm = 3.119e-01 , num iteration = 100001
In [29]:
fig = plt.figure(figsize=(12,20))
plotToRef('$w_{0,1}$', 5,2,1, fPoly[0,0], df8['w0_1'])
plotToRef('$w_{1,1}$', 5,2,2, fPoly[0,1], df8['w1_1'])
plotToRef('$w_{0,2}$', 5,2,3, fPoly[1,0], df8['w0_2'])
plotToRef('$w_{1,2}$', 5,2,4, fPoly[1,1], df8['w1_2'])
plotToRef('$w_{0,3}$', 5,2,5, fPoly[2,0], df8['w0_3'])
plotToRef('$w_{1,3}$', 5,2,6, fPoly[2,1], df8['w1_3'])
plotToRef('$w_{0,4}$', 5,2,7, fPoly[3,0], df8['w0_4'])
plotToRef('$w_{1,4}$', 5,2,8, fPoly[3,1], df8['w1_4'])
plotToRef('$b$', 5,2,9, bEst, df8['b'])
plt.subplot(5,2,10)
plt.semilogy(df8['Gradient norm'])
plt.grid()
plt.legend();
No handles with labels found to put in legend.

Convergence is not correct on $w_{1}$ that should only feature a small coefficient on the 2nd order term and no coefficient on the fourth order $w_{1,4}$

Test model

In [30]:
yEst8 = np.matmul(xTest2, w8[1:]) + w8[0]
In [31]:
fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst8, marker='.', alpha=0.05)
plt.title('$x_0$')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst8, marker='.', alpha=0.05)
plt.title('$x_1$')
plt.legend(('ori clean', 'estimated'))
mse8 = metrics.mean_squared_error(yTest, yEst8)
print('Stochastic gradient descent MSE = {:.3e}'.format(mse8));
Stochastic gradient descent MSE = 6.535e-04

Conclusion on SGD with powers

The fitting of the 4th and 3rd degree powers takes more time, leading to a initial w shape on the right hand graph ($x_1$).

The MSE is improved, but the optimizer is harder to tune

SGD with powers and regularizer

Let's add a regularizer to the SGD with powers, using the Euclidian norm (L2)

Gradient on $w$ for a linear regression with L2 regularizer is: \begin{align} \nabla_{w} = X^T (X w - y) + \lambda w \end{align}

In [32]:
# Calculate gradient with regularizer and normalization
def calcGradientRegularizeL2(x, y, w, lam):
    gradient = np.matmul(x.T, np.matmul(x, w) - y) + lam * w
    return gradient, np.sum(gradient**2)
In [33]:
nBatch = 1000
w9 = np.ones(9)* 1.5 
threshold = 1e-2
learningRate = 1e-4
nIterMax = 1e5
# Regularizer param
lambdaReg = 0.01

# Initial batch
xBatch, yBatch, yBC = generateBatch(nBatch)
xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
gradient, gradientNorm = calcGradientRegularizeL2(xBatch2, yBatch, w9, lambdaReg)
print('w start =', w9, ', Gradient norm = {:.3e}'.format(gradientNorm))

w9Learn = [np.concatenate((w9, [gradientNorm]))]
# Continue
nIter = 0
while (gradientNorm > threshold) & (nIter < nIterMax):
    w9 = w9 - learningRate * gradient
    xBatch, yBatch, yBC = generateBatch(nBatch)
    xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
    gradient, gradientNorm = calcGradientRegularizeL2(xBatch2, yBatch, w9, lambdaReg)
    w9Learn.append(np.concatenate((w9, [gradientNorm])))
    # learningRate = learningRate * 0.9999
    nIter += 1
    
print('w end =', w9, ', Gradient norm = {:.3e}'.format(gradientNorm), ', num iteration =', len(w9Learn))
df9 = pandas.DataFrame(w9Learn, columns = ('b', 'w0_1', 'w1_1', 'w0_2', 'w1_2', 'w0_3', 'w1_3', 'w0_4', 'w1_4', 'Gradient norm'))
w start = [1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5] , Gradient norm = 3.619e+06
w end = [ 0.36046531 -0.03535633 -0.03482748 -0.10542564 -0.11390478  0.47877567
  0.23068354  0.92126672  0.99334058] , Gradient norm = 8.573e-03 , num iteration = 46918
In [34]:
fig = plt.figure(figsize=(12,20))
plotToRef('$w0,1$', 5,2,1, fPoly[0,0], df9['w0_1'])
plotToRef('$w1,1$', 5,2,2, fPoly[0,1], df9['w1_1'])
plotToRef('$w0,2$', 5,2,3, fPoly[1,0], df9['w0_2'])
plotToRef('$w1,2$', 5,2,4, fPoly[1,1], df9['w1_2'])
plotToRef('$w0,3$', 5,2,5, fPoly[2,0], df9['w0_3'])
plotToRef('$w1,3$', 5,2,6, fPoly[2,1], df9['w1_3'])
plotToRef('$w0,4$', 5,2,7, fPoly[3,0], df9['w0_4'])
plotToRef('$w1,4$', 5,2,8, fPoly[3,1], df9['w1_4'])
plotToRef('$b$', 5,2,9, bEst, df9['b'])
plt.subplot(5,2,10)
plt.semilogy(df9['Gradient norm'])
plt.grid()
plt.legend();
No handles with labels found to put in legend.

Test model

In [35]:
yEst9 = np.matmul(xTest2, w9[1:]) + w9[0]
In [36]:
fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst9, marker='.', alpha=0.05)
plt.title('x0')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst9, marker='.', alpha=0.05)
plt.title('x1')
plt.legend(('ori clean', 'estimated'))
mse9 = metrics.mean_squared_error(yTest, yEst9)
print('Stochastic gradient descent with L2 regularizer, MSE = {:.3e}'.format(mse9));
Stochastic gradient descent with L2 regularizer, MSE = 1.477e-04

Conclusion on SGD with regularizer

The regularizer has a larger effect on 3rd and 4th degree terms, leading to a fitting that mostly use 1st and 2nd degree (parabolic)

Where to go from here ?

Other two feature linear implementation using Keras (HTML / Jupyter)

Theoretical and practical quality aspects of linear regression with the Gaussian Model (HTML / Jupyter)

More complex linear regression with Feature engineering or feature learning (HTML / Jupyter)

Compare with the two feature binary classification with logistic regression using "raw" Python or libraries (HTML / Jupyter)