Overview
In this lecture we build intuition for how optimizes a
model.
We coordinate three visualizations:
- Training data with successive regression lines (iterations).
- Contour plot of the cost function
showing the descent path.
- 3D cost surface of
with the same iteration points.
1. Training Data and Regression Lines
We simulate a realistic training set of 30 points (size in square feet vs. price in $1000s.
The red points are the observed data. The colored lines show five successive fits produced by gradient descent:
purple (initial, poor fit) → blue → teal → orange → red (final, best fit).
Colors correspond across all three plots so you can track the iterations visually.

Figure 1: Training data (30 red points) and five regression lines showing progression of gradient descent.
2. Contour Plot of the Cost Function 
The cost function for linear regression is convex and produces elliptical (contour) level sets when plotted in the plane.
Each contour represents points of equal cost. The gradient-descent path (dashed line connecting iteration markers) moves from a high-cost region into the center (global minimum).

Figure 2: Contour plot of with iteration points (purple, blue, teal, orange, red) and dashed descent path.
Intuition:
- Each ellipse = constant
. Moving inward lowers the cost.
- Gradient descent steps are (approximately) orthogonal to contours and head toward the center.
- As we approach the center, step sizes decrease because the gradient magnitude shrinks.
3. 3D Cost Surface of 
This 3D surface is the same convex bowl you saw as contours in the last plot.
Iteration points (same colors) are plotted on the surface and connected with a dashed path that slides down the bowl to the bottom (global minimum).

Figure 3: 3D visualization of with the gradient descent path shown by iteration points and connecting dashed line.
How the Three Plots Relate
- Figure 1 shows how the fitted line evolves on the data itself (improving prediction accuracy).
- Figure 2 shows the corresponding movement of
across equal-cost contours (2D view).
- Figure 3 shows the same process on the 3D cost surface (height = cost).
- The colors map iterations across all plots: Iter 1 → Iter 2 → Iter 3 → Iter 4 → Iter 5.
Key Insights
- The red training points are the observed data; the colored regression lines show parameter updates.
- Contours visualize constant-cost regions; gradient descent crosses these contours toward lower-cost areas.
- Because the squared-error cost for linear regression is convex, gradient descent converges to the global minimum (assuming a suitable learning rate).
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
DonateDonate monthlyDonate yearly
Leave a Reply