Local approximation of a bivariate polynomial expression using linear regression¶

Given two features, extract y (the label), perform linear regression. Coming from the single feature model, 2 features is adding some complexity not only on the maths, but also on the display through charts.

Learning goals:¶

Analiticaly solve the linear regression and implement it with simple "raw" Python
Use powers of the features to increase the model capacity
Use some off the shelve methods of Numpy
Switch to iterative methods with gradient descent (GD) and stochastic gradient descent (SGD)
SGD with powers of the features
Add L2 regularizer to the SGD with powers to improve convergence and fit

import numpy as np
from numpy import random
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.preprocessing import normalize as skNormalize
import pandas

In 2D, it would require many points to uniformly cover the x plan

It is then prefered to draw x points from a 2D uniform distribution

Ploting y as function of $x_0, x_1$ is then more challenging as $x_0$ and $x_1$ are not continuous monotonic vectors

$$ \begin{align} f(x) &= (x_0−0.2)^4 + (x_0−0.1)^3 + 0.1 x_1^2 + 0.35 \\ &= x_0^4 + 0.2 x_0^3 − 0.06 x_0^2 − 0.002 x_0 + 0.1 x_1^2 + 0.3506 \end{align} $$

# f(x) as a bivariate polynom
fPoly = np.array([[-0.002, 0], [-0.06, 0.1], [0.2, 0], [1, 0]])
# Generator
def generateBatch(N):
    #
    xMin = np.array([0, -0.5])
    xMax = np.array([0.5, 0.5])
    #
    b = 0.35
    std = 0.01
    #
    x = random.uniform(xMin, xMax, (N, 2))
    yClean = (x[:,0]-0.2)**4 + (x[:,0]-0.1)**3 + 0.1*x[:,1]**2 + b
    y =  yClean + random.normal(0, std, N) 
    return (x, y, yClean)

N = 100000
xTrain, yTrain, yTrainClean = generateBatch(N);
print('x', xTrain.shape, ', yClean', yTrainClean.shape, ', y', yTrain.shape);

xTest, yTest, yTestClean = generateBatch(N);

x (100000, 2) , yClean (100000,) , y (100000,)

fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTrain[:,0], yTrainClean, marker='.');
plt.xlabel('x0')
plt.subplot(1,2,2)
plt.scatter(xTrain[:,1], yTrainClean, marker='.');
plt.xlabel('x1')
plt.title('Bivariate model without noise');

It looks like a tobogan from the side ($x_0$) and from the front ($x_1$)

Remove biases

xUnB = xTrain - np.mean(xTrain, axis=0)
yUnB = yTrain - np.mean(yTrain)

Analiticaly / Closed form¶

Linear, 1st degree approximation of Y: \begin{align} Y = X W + b \end{align}

$X$ is a $N_{sample} \times N_{feature}$ matrix
$W$ is a $N_{feature}$ vector
$Y$ is a $N_{sample}$ vector
$b$ is a scalar

To simplify calculus, let's append b to W, X is modified to append $\mathbb {1}_n$ vector on first column: \begin{align} \Theta &= \begin{bmatrix} b \\ W \\ \end{bmatrix} \\ X_m &= \begin{bmatrix} \mathbb{1}_n & X \\ \end{bmatrix} \\ Y &= X_m \Theta \\ \end{align}

Using mean square error (Euclidian norm), we are looking for $w_{est}$ such that:

\begin{align} \Theta_{est} \in argmin_\Theta \lVert X_m \Theta - Y \rVert_2^2 \end{align}

Let's define f the function to minimize: \begin{align} f(\Theta) & = \lVert X_m \Theta - Y \rVert_2^2 \\ & = (X_m \Theta - Y_u)^T(X_m \Theta - Y) \\ & = \Theta^T X_m^T X_m \Theta + Y^T Y - 2 X_m^T \Theta^T Y \end{align}

Lookup the minimum through the gradient:

$$\nabla_\Theta f(\Theta) = 2 X_m^T X_m \Theta - 2 X_m^T Y$$

If $X^T X$ can be inverted: $$\nabla_\Theta f(\Theta) = 0 \iff \Theta = (X_m^T X_m)^{-1} X_m^T Y$$

With numpy "primitive" operations¶

xTrainW1 = np.concatenate((np.ones(N).reshape(-1, 1), xTrain), axis=1)

xTxInv = np.linalg.inv(np.matmul(xTrainW1.T, xTrainW1))
wEstWb = np.matmul(xTxInv, np.matmul(xTrainW1.T, yTrain))
bEst = wEstWb[0]
wEst = wEstWb[1:]
print("Intercept: %.4f, coefficients: " % bEst, wEst)

Intercept: 0.3440, coefficients:  [0.11291328 0.00015331]

Test model¶

yEst1 = np.matmul(xTest, wEst) + bEst

fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTest, marker='.');
plt.scatter(xTest[:,0], yEst1, marker='.', alpha=0.05);
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTest, marker='.');
plt.scatter(xTest[:,1], yEst1, marker='.', alpha=0.05);
# Mean square error 'a la mano'
mse1 = np.dot(yTest-yEst1, yTest-yEst1) / N
print('Linear regression MSE = {:.3e}'.format(mse1));

Linear regression MSE = 2.594e-04

The tobogan has been approximated by a plan

Home made with powers¶

Adding some flexibility to the model using powers of x

xTrain2 = np.concatenate((np.ones(N).reshape(-1, 1), xTrain, xTrain**2, xTrain**3, xTrain**4), axis=1)
xTrain2.shape

(100000, 9)

x2Tx2Inv = np.linalg.inv(np.matmul(xTrain2.T, xTrain2))
w2Wb = np.matmul(x2Tx2Inv,np.matmul(xTrain2.T, yTrain))
b2 = w2Wb[0]
w2 = w2Wb[1:]
print("Intercept: %.4f, coefficients: " % b2, w2)

Intercept: 0.3505, coefficients:  [-1.63771045e-03 -1.90875981e-04 -5.14544930e-02  9.95880746e-02
  1.37758809e-01  1.80340031e-03  1.09411993e+00  5.54438162e-03]

Test model¶

xTest2 = np.concatenate((xTest, xTest**2, xTest**3, xTest**4), axis=1)
yEst2 = np.matmul(xTest2, w2) + b2

fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTest2[:,0], yTest, marker='.');
plt.scatter(xTest2[:,0], yEst2, marker='.', alpha=0.05);
plt.subplot(1,2,2)
plt.scatter(xTest2[:,1], yTest, marker='.');
plt.scatter(xTest2[:,1], yEst2, marker='.', alpha=0.05);
mse2 = np.dot(yTest-yEst2, yTest-yEst2) / N
mse2Clean = np.dot(yTestClean-yEst2, yTestClean-yEst2) / N
print('Linear regression with powers, MSE = {:.3e}, MSE to clean test data = {:.3e}'.format(mse2, mse2Clean));

Linear regression with powers, MSE = 1.006e-04, MSE to clean test data = 1.402e-08

The polynomial of y has been well approximated.

The MSE on the noisy sample should not be lower than the variance of the noise which is $(10^{-2})^2=10^{-4}$

NumPy least square¶

fit4, residues, rank, s = np.linalg.lstsq(xUnB, yUnB, rcond=None)
fit4, rank

(array([0.11291328, 0.00015331]), 2)

Test model¶

yEst4 = np.matmul( xTest, fit4) + bEst

fig = plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTest, marker='.');
plt.scatter(xTest[:,0], yEst4, marker='.', alpha=0.05);
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTest, marker='.');
plt.scatter(xTest[:,1], yEst4, marker='.', alpha=0.05);
mse4 = metrics.mean_squared_error(yTest, yEst4)
print('Numpy least square, MSE = {:.3e}'.format(mse4) );

Numpy least square, MSE = 2.594e-04

Gradient descent¶

Attempt to find a value of w such that the gradient is null (indication of minima, or saddle point)

Loop on the training data until the gradient is below a threshold

Gradient on $w$ for a linear regression is: \begin{align} \nabla_{w} = X^T (X w - y) \end{align}

# Gradient computation
def calcGradient(x, y, w):
    gradient = np.matmul(x.T, np.matmul(x, w) - y)
    return gradient, np.sum(gradient**2)

# Plot helper to show target reference value
def plotToRef(label, subNum, subRow, subCol, ref, values):
    nIter = len(values)
    r = range(nIter)
    plt.subplot(subNum, subRow, subCol)
    plt.title(label)
    plt.plot(r, values, r, np.ones((nIter))*ref, alpha=0.5)
    plt.grid();

Standard gradient descent¶

Loop on the same train data

w6 = np.ones(3)
threshold = 1e-6
learningRate = 1e-6
gradient, gradientNorm = calcGradient(xTrainW1, yTrain, w6)

print('w start =', w6, ', Gradient norm =', gradientNorm)
w6Learn = [np.concatenate((w6, [gradientNorm]))]

while gradientNorm > threshold:
    w6 = w6 - learningRate * gradient
    gradient, gradientNorm = calcGradient(xTrainW1, yTrain, w6)
    w6Learn.append(np.concatenate((w6, [gradientNorm])))
    
print('w end =', w6, ', Gradient norm =', gradientNorm, ', num iteration =', len(w6Learn))
df6 = pandas.DataFrame(w6Learn, columns = ('b', 'w0', 'w1', 'Gradient norm'))

w start = [1. 1. 1.] , Gradient norm = 8332553305.380094
w end = [3.44007156e-01 1.12913773e-01 1.53310758e-04] , Gradient norm = 9.96689182973102e-07 , num iteration = 7196

w6.shape

(3,)

fig = plt.figure(figsize=(16,10))
plotToRef('$b$', 2,3,1, bEst, df6['b'])
plotToRef('$w_0$', 2,3,2, wEst[0], df6['w0'])
plotToRef('$w_1$', 2,3,3, wEst[1], df6['w1'])
plt.subplot(2,3,4)
plt.semilogy(df6["Gradient norm"])
plt.grid()
plt.legend();

No handles with labels found to put in legend.

fig = plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.semilogy(df6['b'], df6['Gradient norm'])
plt.xlabel('b')
plt.ylabel('Gradient norm')
plt.grid();
plt.title('Trajectory of the Gradient norm during convergence')
plt.subplot(1,2,2)
plt.semilogy(df6['w0'], df6['Gradient norm'])
plt.semilogy(df6['w1'], df6['Gradient norm'])
plt.grid();
plt.xlabel('w')
plt.ylabel('Gradient norm')
plt.legend(("$w_0$", "$w_1$"));

Test model¶

yEst6 = np.matmul(xTest, w6[1:]) + w6[0]

fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst6, marker='.', alpha=0.05)
plt.title('x0')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst6, marker='.', alpha=0.05)
plt.title('x1')
plt.legend(('ori clean', 'estimated'))
mse6 = metrics.mean_squared_error(yTest, yEst6)
print('Gradient descent MSE = {:.3e}'.format(mse6));

Gradient descent MSE = 2.594e-04

Stochastic gradient descent¶

Loop on new data, smaller batch

Nbatch = 1000
w7 = np.ones(3)
threshold = 1e-4
learningRate = 1e-4
nIterMax = 1e5

onesBatch = np.ones((Nbatch, 1))

# Initial batch
xBatch, yBatch, yBC = generateBatch(Nbatch)
gradient, gradientNorm = calcGradient(np.concatenate((onesBatch, xBatch), axis=1), yBatch, w7)

print('w start =', w7, ', Gradient norm =', gradientNorm)
w7Learn = [np.concatenate((w7, [gradientNorm]))]

# Continue
nIter = 0
while (gradientNorm > threshold) & (nIter <= nIterMax):
    w7 = w7 - learningRate * gradient
    xBatch, yBatch, yBC = generateBatch(Nbatch)
    gradient, gradientNorm = calcGradient(np.concatenate((onesBatch, xBatch), axis=1), yBatch, w7)
    w7Learn.append(np.concatenate((w7, [gradientNorm])))
    learningRate = learningRate * 0.9999
    nIter += 1
                                          
print('w end =', w7, ', Gradient norm =', gradientNorm, ', num iteration =', len(w7Learn))
df7 = pandas.DataFrame(w7Learn, columns = ('b', 'w0', 'w1', 'Gradient norm'))

w start = [1. 1. 1.] , Gradient norm = 812489.7498501091
w end = [3.43957080e-01 1.12969836e-01 2.40469033e-05] , Gradient norm = 6.611670633228906e-05 , num iteration = 33932

fig = plt.figure(figsize=(16,10))
plotToRef('$b$', 2,3,1, bEst, df7['b'])
plotToRef('$w_0$', 2,3,2, wEst[0], df7['w0'])
plotToRef('$w_1$', 2,3,3, wEst[1], df7['w1'])
plt.subplot(2,3,4)
plt.semilogy(df7['Gradient norm'])
plt.grid()
plt.legend();

No handles with labels found to put in legend.

fig = plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.semilogy(df7['b'], df7['Gradient norm'])
plt.xlabel('b')
plt.ylabel('Gradient norm')
plt.grid();
plt.title('Trajectory of the Gradient norm during convergence')
plt.subplot(1,2,2)
plt.semilogy(df7['w0'], df7['Gradient norm'])
plt.semilogy(df7['w1'], df7['Gradient norm'])
plt.grid();
plt.xlabel('w')
plt.ylabel('Gradient norm')
plt.legend(("$w_0$", "$w_1$"));

Test model¶

yEst7 = np.matmul(xTest, w7[1:]) + w7[0]

fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst7, marker='.', alpha=0.05)
plt.title('$x_0$')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst7, marker='.', alpha=0.05)
plt.title('$x_1$')
plt.legend(('ori clean', 'estimated'))
mse7 = metrics.mean_squared_error(yTest, yEst7)
print('Stochastic gradient descent MSE = {:.3e}'.format(mse7));

Stochastic gradient descent MSE = 2.594e-04

SGD with powers¶

Adding powers of x to the stochastic gradient descent

# Batch size
nBatch = 1000
# Initial coefficients (fixed and tuned manually for this trial)
w8 = np.ones(9)* 1.5 
# Gradient descent end threshold (on the gradient)
threshold = 1e-2
# Initial learning rate
learningRate = 1e-4
# Maximum number of iterations
nIterMax = 1e5

onesBatch = np.ones((Nbatch, 1))

# Initial batch
xBatch, yBatch, yBC = generateBatch(nBatch)
xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
gradient, gradientNorm = calcGradient(xBatch2, yBatch, w8)

print('w start =', w8, ', Gradient norm = {:.3e}'.format(gradientNorm))
w8Learn = [np.concatenate((w8, [gradientNorm]))]

# Continue
nIter = 0
while (gradientNorm > threshold) & (nIter < nIterMax):
    w8 = w8 - learningRate * gradient
    xBatch, yBatch, yBC = generateBatch(nBatch)
    xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
    gradient, gradientNorm = calcGradient(xBatch2, yBatch, w8)
    w8Learn.append(np.concatenate((w8, [gradientNorm])))
    learningRate = learningRate * 0.9999
    nIter += 1

print('w end =', w8, ', Gradient norm = {:.3e}'.format(gradientNorm), ', num iteration =', len(w8Learn))
df8 = pandas.DataFrame(w8Learn, columns = ('b', 'w0_1', 'w1_1', 'w0_2', 'w1_2', 'w0_3', 'w1_3', 'w0_4', 'w1_4', 'Gradient norm'))

w start = [1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5] , Gradient norm = 3.702e+06
w end = [ 0.38864066 -0.31918763 -0.13231316  0.24717002 -0.13794083  0.82083255
  0.87821587  1.1595413   1.12787625] , Gradient norm = 3.119e-01 , num iteration = 100001

fig = plt.figure(figsize=(12,20))
plotToRef('$w_{0,1}$', 5,2,1, fPoly[0,0], df8['w0_1'])
plotToRef('$w_{1,1}$', 5,2,2, fPoly[0,1], df8['w1_1'])
plotToRef('$w_{0,2}$', 5,2,3, fPoly[1,0], df8['w0_2'])
plotToRef('$w_{1,2}$', 5,2,4, fPoly[1,1], df8['w1_2'])
plotToRef('$w_{0,3}$', 5,2,5, fPoly[2,0], df8['w0_3'])
plotToRef('$w_{1,3}$', 5,2,6, fPoly[2,1], df8['w1_3'])
plotToRef('$w_{0,4}$', 5,2,7, fPoly[3,0], df8['w0_4'])
plotToRef('$w_{1,4}$', 5,2,8, fPoly[3,1], df8['w1_4'])
plotToRef('$b$', 5,2,9, bEst, df8['b'])
plt.subplot(5,2,10)
plt.semilogy(df8['Gradient norm'])
plt.grid()
plt.legend();

No handles with labels found to put in legend.

Convergence is not correct on $w_{1}$ that should only feature a small coefficient on the 2nd order term and no coefficient on the fourth order $w_{1,4}$

Test model¶

yEst8 = np.matmul(xTest2, w8[1:]) + w8[0]

fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst8, marker='.', alpha=0.05)
plt.title('$x_0$')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst8, marker='.', alpha=0.05)
plt.title('$x_1$')
plt.legend(('ori clean', 'estimated'))
mse8 = metrics.mean_squared_error(yTest, yEst8)
print('Stochastic gradient descent MSE = {:.3e}'.format(mse8));

Stochastic gradient descent MSE = 6.535e-04

Conclusion on SGD with powers¶

The fitting of the 4th and 3rd degree powers takes more time, leading to a initial w shape on the right hand graph ($x_1$).

The MSE is improved, but the optimizer is harder to tune

SGD with powers and regularizer¶

Let's add a regularizer to the SGD with powers, using the Euclidian norm (L2)

Gradient on $w$ for a linear regression with L2 regularizer is: \begin{align} \nabla_{w} = X^T (X w - y) + \lambda w \end{align}

# Calculate gradient with regularizer and normalization
def calcGradientRegularizeL2(x, y, w, lam):
    gradient = np.matmul(x.T, np.matmul(x, w) - y) + lam * w
    return gradient, np.sum(gradient**2)

nBatch = 1000
w9 = np.ones(9)* 1.5 
threshold = 1e-2
learningRate = 1e-4
nIterMax = 1e5
# Regularizer param
lambdaReg = 0.01

# Initial batch
xBatch, yBatch, yBC = generateBatch(nBatch)
xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
gradient, gradientNorm = calcGradientRegularizeL2(xBatch2, yBatch, w9, lambdaReg)
print('w start =', w9, ', Gradient norm = {:.3e}'.format(gradientNorm))

w9Learn = [np.concatenate((w9, [gradientNorm]))]
# Continue
nIter = 0
while (gradientNorm > threshold) & (nIter < nIterMax):
    w9 = w9 - learningRate * gradient
    xBatch, yBatch, yBC = generateBatch(nBatch)
    xBatch2 = np.concatenate((onesBatch, xBatch, xBatch**2, xBatch**3, xBatch**4), axis = 1)
    gradient, gradientNorm = calcGradientRegularizeL2(xBatch2, yBatch, w9, lambdaReg)
    w9Learn.append(np.concatenate((w9, [gradientNorm])))
    # learningRate = learningRate * 0.9999
    nIter += 1
    
print('w end =', w9, ', Gradient norm = {:.3e}'.format(gradientNorm), ', num iteration =', len(w9Learn))
df9 = pandas.DataFrame(w9Learn, columns = ('b', 'w0_1', 'w1_1', 'w0_2', 'w1_2', 'w0_3', 'w1_3', 'w0_4', 'w1_4', 'Gradient norm'))

w start = [1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5] , Gradient norm = 3.619e+06
w end = [ 0.36046531 -0.03535633 -0.03482748 -0.10542564 -0.11390478  0.47877567
  0.23068354  0.92126672  0.99334058] , Gradient norm = 8.573e-03 , num iteration = 46918

fig = plt.figure(figsize=(12,20))
plotToRef('$w0,1$', 5,2,1, fPoly[0,0], df9['w0_1'])
plotToRef('$w1,1$', 5,2,2, fPoly[0,1], df9['w1_1'])
plotToRef('$w0,2$', 5,2,3, fPoly[1,0], df9['w0_2'])
plotToRef('$w1,2$', 5,2,4, fPoly[1,1], df9['w1_2'])
plotToRef('$w0,3$', 5,2,5, fPoly[2,0], df9['w0_3'])
plotToRef('$w1,3$', 5,2,6, fPoly[2,1], df9['w1_3'])
plotToRef('$w0,4$', 5,2,7, fPoly[3,0], df9['w0_4'])
plotToRef('$w1,4$', 5,2,8, fPoly[3,1], df9['w1_4'])
plotToRef('$b$', 5,2,9, bEst, df9['b'])
plt.subplot(5,2,10)
plt.semilogy(df9['Gradient norm'])
plt.grid()
plt.legend();

No handles with labels found to put in legend.

Test model¶

yEst9 = np.matmul(xTest2, w9[1:]) + w9[0]

fig = plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.scatter(xTest[:,0], yTestClean, marker='.')
plt.scatter(xTest[:,0], yEst9, marker='.', alpha=0.05)
plt.title('x0')
plt.legend(('ori clean', 'estimated'))
plt.subplot(1,2,2)
plt.scatter(xTest[:,1], yTestClean, marker='.')
plt.scatter(xTest[:,1], yEst9, marker='.', alpha=0.05)
plt.title('x1')
plt.legend(('ori clean', 'estimated'))
mse9 = metrics.mean_squared_error(yTest, yEst9)
print('Stochastic gradient descent with L2 regularizer, MSE = {:.3e}'.format(mse9));

Stochastic gradient descent with L2 regularizer, MSE = 1.477e-04

Conclusion on SGD with regularizer¶

The regularizer has a larger effect on 3rd and 4th degree terms, leading to a fitting that mostly use 1st and 2nd degree (parabolic)

Where to go from here ?¶

Other two feature linear implementation using Keras (HTML / Jupyter)

Theoretical and practical quality aspects of linear regression with the Gaussian Model (HTML / Jupyter)

More complex linear regression with Feature engineering or feature learning (HTML / Jupyter)

Compare with the two feature binary classification with logistic regression using "raw" Python or libraries (HTML / Jupyter)

data-science

Notebooks and Python about data science

If you like this project please add your Star

Local approximation of a bivariate polynomial expression using linear regression¶

Learning goals:¶

Analiticaly / Closed form¶

With numpy "primitive" operations¶

Test model¶

Home made with powers¶

Test model¶

NumPy least square¶

Test model¶

Gradient descent¶

Standard gradient descent¶

Test model¶

Stochastic gradient descent¶

Test model¶

SGD with powers¶

Test model¶

Conclusion on SGD with powers¶

SGD with powers and regularizer¶

Test model¶

Conclusion on SGD with regularizer¶

Where to go from here ?¶