top of page
Search

Higher Order Linear Regression

  • Writer: Roma Fatima
    Roma Fatima
  • Nov 11, 2021
  • 2 min read

This blog helps us understand the concept of overfitting using the Higher Order Linear Regression.

Concept of Overfitting and Underfitting:

Suppose we have a sample dataset such as below:

ree

Our expected output for that dataset will be:


ree

In simple words, overfitting is the concept used to describe the model which is trained with lots of data and impacts the performance of the model negatively. What causes overfitting is high model complexity and limited training size.

In linear regression overfitting occurs when the model is "too complex". This usually happens when there are a large number of parameters compared to the number of observations. Such a model will not generalize well to new data. That is, it will perform well on training data, but poorly on test data.

The above dataset will have overfitting as:

ree

Similarly, underfitting is the concept used to describe the model which is trained with minimal data and impacts the performance of the model negatively. What causes underfitting is over simplification.

In linear regression underfitting occurs when the model is too "simple". This usually happens when there is lesser parameters than the observations. Such a model will not generalize well to new data and have unreliable predictions. That is, it will perform poorly on training data as well as on test data.

The above dataset will have underfitting as:

ree

My Contribution:


We begin the program by loading packages of numpy as np, pandas as pd and matplotlib.pyplot as plt. They are used for linear algebra, data processing and plotting graphs, respectively.

ree

We generate 20 data pairs (X, Y) using y = sin(2*pi*X) + 0.1 * N. Where we take N from the normal gaussian distribution, as np.random.normal(0,1).

ree

Output:

ree

Now we train and test 10 data pairs each and plot it's graph. Blue points are train data pairs and Red points are test data pairs in the graph.

ree

Output:

ree

Now we implement polynomial regression for orders 0, 1, 3, 9.


ree


Experiments:

I got the below referenced code with sklearn which could have made the program easier. I experimented with altering this code but unfortunately I was not able to implement weight updating in this format.

ree

Challenges Faced:


One of the challenges I faced was in graphing part of the program. I was unable to plot line graphs because of this unknown error in plotting.


ree

To overcome this challenge I plotted the graph in points format as below.

ree

References:


Explanation of overfitting and underfitting concepts from slides of CSE 5334 Data Mining course at UTA: https://uta.instructure.com/courses/88045/files/15438532?module_item_id=3762690

 
 
 

Comments


bottom of page