Visualizing Linear Regression and Gradient Descent

Overview

In this lecture we build intuition for how $\textbf{gradient descent}$ optimizes a $\textbf{linear regression}$ model.
We coordinate three visualizations:

Training data with successive regression lines (iterations).
Contour plot of the cost function $J(w,b)$ showing the descent path.
3D cost surface of $J(w,b)$ with the same iteration points.

1. Training Data and Regression Lines

We simulate a realistic training set of 30 points (size in square feet vs. price in $1000s.
The red points are the observed data. The colored lines show five successive fits produced by gradient descent:
purple (initial, poor fit) → blue → teal → orange → red (final, best fit).
Colors correspond across all three plots so you can track the iterations visually.

Figure 1: Training data (30 red points) and five regression lines showing progression of gradient descent.

2. Contour Plot of the Cost Function $J(w,b)$

The cost function for linear regression is convex and produces elliptical (contour) level sets when plotted in the $(w,b)$ plane.
Each contour represents points of equal cost. The gradient-descent path (dashed line connecting iteration markers) moves from a high-cost region into the center (global minimum).

Figure 2: Contour plot of $J(w,b)$ with iteration points (purple, blue, teal, orange, red) and dashed descent path.

Intuition:

Each ellipse = constant $J(w,b)$ . Moving inward lowers the cost.
Gradient descent steps are (approximately) orthogonal to contours and head toward the center.
As we approach the center, step sizes decrease because the gradient magnitude shrinks.

3. 3D Cost Surface of $J(w,b)$

This 3D surface is the same convex bowl you saw as contours in the last plot.
Iteration points (same colors) are plotted on the surface and connected with a dashed path that slides down the bowl to the bottom (global minimum).

Figure 3: 3D visualization of $J(w,b)$ with the gradient descent path shown by iteration points and connecting dashed line.

How the Three Plots Relate

Figure 1 shows how the fitted line evolves on the data itself (improving prediction accuracy).
Figure 2 shows the corresponding movement of $(w,b)$ across equal-cost contours (2D view).
Figure 3 shows the same process on the 3D cost surface (height = cost).
The colors map iterations across all plots: Iter 1 → Iter 2 → Iter 3 → Iter 4 → Iter 5.

Key Insights

The red training points are the observed data; the colored regression lines show parameter updates.
Contours visualize constant-cost regions; gradient descent crosses these contours toward lower-cost areas.
Because the squared-error cost for linear regression is convex, gradient descent converges to the global minimum (assuming a suitable learning rate).

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

£5.00

£15.00

£100.00

£5.00

£15.00

£100.00

£5.00

£15.00

£100.00

Or enter a custom amount

Your contribution is appreciated.

Donate Donate monthly Donate yearly

Cenk Yildiran