Computing the Cost Function with Python

For linear regression with one variable, the cost function is defined as:

$J(w,b) = \frac{1}{2m} \sum_{i=0}^{m-1} \left( f_{w,b}(x^{(i)}) - y^{(i)} \right)^2$

where

$f_{w,b}(x^{(i)}) = w x^{(i)} + b$

– $f_{w,b}(x^{(i)})$ : the model’s prediction for example $i$ given parameters $w$ and $b$ .
– $\left(f_{w,b}(x^{(i)}) - y^{(i)}\right)^2$ : the squared error between the prediction and the true target value.
– The sum adds these squared errors over all $m$ training examples.
– Dividing by $2m$ gives the average squared error, scaled by $\tfrac{1}{2}$ for convenience, running from $i=0$ to $m-1$ .

Regression Example

We want to fit a regression model using the dataset below, which shows the relationship between
house size (sq ft) and price ($1000s).

Size (sq ft)	Price ($1000s)
1000	230.0
1500	330.0
2000	430.0
2500	530.0
3000	630.0

Our goal is to model how the price depends on the size of the house.


import numpy as np
# Training dataset
# x_train represents house size (sq ft)
# y_train represents price ($1000s)
x_train = np.array([1000, 1500, 2000, 2500, 3000])  # feature values
y_train = np.array([230.0, 330.0, 430.0, 530.0, 630.0])  # target values


# Cost function (Mean Squared Error with 1/(2m) convention)
def compute_cost(x, y, w, b):
    """
    Compute the cost function for linear regression.
    """
    m = x.shape[0]  # number of training examples
    cost = 0.0

    # Compute the sum of squared errors
    for i in range(m):
        f_wb = w * x[i] + b   # prediction for x[i]
        error = f_wb - y[i]
        cost += error**2

    # Return the average squared error scaled by 1/(2m)
    cost /= (2 * m)
    return cost

Regression Lines and Their Fit

1. Best Fit Line

Equation: $y = 0.2x + 30$

Explanation: This line closely follows the trend of the training data. Each 500 sq ft increase in house size increases the price by about $100, matching the dataset. The slope and intercept are chosen to minimize the cost function.

Why it’s the best fit:

Accurately predicts the training data points.
Cost function is minimal among the three lines.

2. Underfit Line

Equation: $y = 0.1x + 50$

Explanation: This line has a smaller slope than the true trend. It rises more slowly than the actual data, systematically underestimating prices for larger houses and overestimating prices for smaller houses.

Why it underfits:

Too simple to capture the linear relationship in the data.
High bias, low variance.
Training data points are far from the line, leading to a higher cost.

3. Overfit Line

Equation: $y = 0.3x + 10$

Explanation: This line has a steeper slope than the true trend. It overestimates prices for large houses and underestimates for smaller ones, fitting the training data too aggressively.

Why it overfits:

Slope is too extreme compared to the actual trend.
Cost is higher than the best fit.
This line could be highly sensitive to small changes or noise in data.

Summary Table

Line	Equation	Fit Type	Reason
Best Fit	$y = 0.2x + 30$	Good Fit	Closely matches data, minimal cost
Underfit	$y = 0.1x + 50$	Underfit	Too shallow, misses trend, high bias
Overfit	$y = 0.3x + 10$	Overfit	Too steep, does not generalize well


import numpy as np
import matplotlib.pyplot as plt

# Colors to match lines in the first plot
colors = ['blue', 'green', 'orange']

# Three regression lines with parameters and color
lines = {
    "best_fit": {"w": 0.2, "b": 30, "color": "blue"},
    "under_fit": {"w": 0.1, "b": 50, "color": "green"},
    "over_fit": {"w": 0.3, "b": 10, "color": "orange"}
}

# Compute and print costs for each line
for name, params in lines.items():
    w, b = params["w"], params["b"]
    cost = compute_cost(x_train, y_train, w, b)
    print(f"Cost for {name} line (w={w}, b={b}): {cost:.2f}")

Cost Function Visualization – 3D

Below you will see visualizations of the three regression lines we discussed: best fit, underfit, and overfit.

For each line, three charts are shown:

2D Plot (Size vs Price): The training data points are shown in red, along with the regression line. This illustrates how well each line fits the data.
Contour Plot: Shows the cost function $J(w, b)$ over a grid of slope ( $w$ ) and intercept ( $b$ ), with the selected line’s parameters marked.
3D Surface Plot: Displays the cost function as a 3D surface. The line’s parameters are highlighted, showing where it sits in the cost landscape.

These visualizations help understand not only the fit of the line in data space but also how the choice of parameters affects the cost function.


import numpy as np
import matplotlib.pyplot as plt

# Loop through each regression line
for name, params in lines.items():
    w_line, b_line, color = params["w"], params["b"], params["color"]

    fig = plt.figure(figsize=(18, 5))

    # 1. 2D Plot: Size vs Price
    ax1 = fig.add_subplot(1, 3, 1)
    ax1.scatter(x_train, y_train, color='red', s=50, label='Training data')
    x_vals = np.linspace(900, 3100, 100)
    y_vals = w_line * x_vals + b_line
    ax1.plot(x_vals, y_vals, color=color, linewidth=2, label=f'{name} line')
    ax1.set_xlabel('Size (sq ft)')
    ax1.set_ylabel('Price ($1000s)')
    ax1.set_title(f'{name} Regression Line')
    ax1.legend()
    ax1.grid(True)

    # 2. Contour plot of cost function
    ax2 = fig.add_subplot(1, 3, 2)
    w_vals = np.linspace(0.0, 0.4, 50)
    b_vals = np.linspace(0, 60, 50)
    W, B = np.meshgrid(w_vals, b_vals)
    J = np.zeros_like(W)
    for i in range(len(b_vals)):
        for j in range(len(w_vals)):
            J[i, j] = compute_cost(x_train, y_train, W[i, j], B[i, j])
    cp = ax2.contour(W, B, J, levels=30, cmap='viridis')
    ax2.scatter(w_line, b_line, color=color, s=100, label=f'{name} line')
    ax2.set_xlabel('w (slope)')
    ax2.set_ylabel('b (intercept)')
    ax2.set_title('Contour of Cost Function')
    ax2.legend()

    # 3. 3D Surface plot
    ax3 = fig.add_subplot(1, 3, 3, projection='3d')
    ax3.plot_surface(W, B, J, cmap='viridis', alpha=0.7)
    ax3.scatter(w_line, b_line, compute_cost(x_train, y_train, w_line, b_line), color=color, s=50)
    ax3.set_xlabel('w (slope)')
    ax3.set_ylabel('b (intercept)')
    ax3.set_zlabel('Cost J(w,b)')
    ax3.set_title('3D Surface of Cost Function')

    plt.tight_layout()
    plt.show()

Cenk Yildiran