Higher Order Linear Regression

Roma Fatima
Nov 11, 2021
2 min read

This blog helps us understand the concept of overfitting using the Higher Order Linear Regression.

Concept of Overfitting and Underfitting:

Suppose we have a sample dataset such as below:

Our expected output for that dataset will be:

In simple words, overfitting is the concept used to describe the model which is trained with lots of data and impacts the performance of the model negatively. What causes overfitting is high model complexity and limited training size.

In linear regression overfitting occurs when the model is "too complex". This usually happens when there are a large number of parameters compared to the number of observations. Such a model will not generalize well to new data. That is, it will perform well on training data, but poorly on test data.

The above dataset will have overfitting as:

Similarly, underfitting is the concept used to describe the model which is trained with minimal data and impacts the performance of the model negatively. What causes underfitting is over simplification.

In linear regression underfitting occurs when the model is too "simple". This usually happens when there is lesser parameters than the observations. Such a model will not generalize well to new data and have unreliable predictions. That is, it will perform poorly on training data as well as on test data.

The above dataset will have underfitting as:

My Contribution:

We begin the program by loading packages of numpy as np, pandas as pd and matplotlib.pyplot as plt. They are used for linear algebra, data processing and plotting graphs, respectively.

We generate 20 data pairs (X, Y) using y = sin(2*pi*X) + 0.1 * N. Where we take N from the normal gaussian distribution, as np.random.normal(0,1).

Output:

Now we train and test 10 data pairs each and plot it's graph. Blue points are train data pairs and Red points are test data pairs in the graph.

Output:

Now we implement polynomial regression for orders 0, 1, 3, 9.

My Submission

Experiments:

I got the below referenced code with sklearn which could have made the program easier. I experimented with altering this code but unfortunately I was not able to implement weight updating in this format.