Linear Regression Model
In a simple linear regression problem, our goal is to fit a straight line to data points that best predicts the output
from an input
.
The model is expressed as:
— the weight (slope of the line)
— the bias (intercept)
— the predicted value of
for a given
The objective of linear regression is to find parameters and
that make
as close as possible to the actual
values in the training data.
Squared Error Cost Function
To measure how well our model fits the data, we use the Mean Squared Error (MSE) cost function:
Substituting gives:
Here, is the number of training examples. The goal is to find the values of
and
that minimize this cost.
Gradient Descent Algorithm
Gradient Descent is an iterative optimization algorithm that finds the parameters $(w,b)$ minimizing by repeatedly updating them in the direction opposite to the gradient:
$latex
\begin{aligned}
w &:= w – \alpha \frac{\partial J(w,b)}{\partial w} \\
b &:= b – \alpha \frac{\partial J(w,b)}{\partial b}
\end{aligned}
$
where is the learning rate controlling how large a step is taken on each iteration.
Gradient Derivations
To implement gradient descent, we need the partial derivatives of with respect to both
and
.
Partial Derivative with Respect to 
Start with:
Differentiate with respect to :
Partial Derivative with Respect to 
Similarly, differentiating with respect to :
Gradient Descent Update Rules
Substituting the gradients into the update equations gives:
$latex
\begin{aligned}
w &:= w – \alpha \frac{1}{m} \sum_{i=1}^{m} ( wx^{(i)} + b – y^{(i)} ) x^{(i)} \\
b &:= b – \alpha \frac{1}{m} \sum_{i=1}^{m} ( wx^{(i)} + b – y^{(i)} )
\end{aligned}
$
These equations iteratively adjust and
to minimize the cost function.
Understanding Gradient Descent Behavior
To build intuition for gradient descent, consider the cost surface .
For linear regression, is a smooth, bowl-shaped surface in 3D space.
Each point on this surface corresponds to a different pair of parameters .
Gradient descent starts at some initial guess and moves iteratively down the slope of the cost surface until it reaches the bottom — the global minimum.
In general machine learning problems, gradient descent can sometimes converge to a local minimum instead of a global minimum.
However, for linear regression, the cost function:
is a convex function.
This means it has a single global minimum and no local minima — gradient descent will always converge to that global minimum if the learning rate is chosen appropriately.
Convexity and Convergence
Definition of Convexity:
Intuitively, this means the line segment connecting any two points on the curve of lies above the function itself.
For convex functions, gradient descent is guaranteed to reach the global minimum if is properly chosen.
- The cost function in linear regression is convex.
- Gradient descent always converges to the global minimum (no local minima).
- The choice of
affects convergence speed and stability.
Visualization (Conceptual)
Imagine the cost function surface as a smooth 3D bowl:
- The horizontal axes represent parameters
and
.
- The vertical axis represents the cost
.
- The global minimum is at the bottom of the bowl.
Gradient descent starts at some initial point and takes small steps downhill until it reaches the minimum.

Figure: Visualization of the convex cost function surface — a bowl-shaped curve with a single global minimum.
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
DonateDonate monthlyDonate yearly
Leave a Reply