Simple and multiple regression example
Contents
Read in small car dataset and plot mpg vs. weight
load carsmall
whos
isdata = isfinite(MPG)&isfinite(Weight)&isfinite(Horsepower);
y = MPG(isdata);
x = Weight(isdata);
N = length(y)
clf
subplot(2,2,1)
plot(x,y,'x')
xlabel('weight [lb]')
ylabel('MPG')
hold on
Name Size Bytes Class Attributes
Acceleration 100x1 800 double
Cylinders 100x1 800 double
Displacement 100x1 800 double
Horsepower 100x1 800 double
MPG 100x1 800 double
Mfg 100x13 2600 char
Model 100x33 6600 char
Model_Year 100x1 800 double
Origin 100x7 1400 char
Weight 100x1 800 double
N =
93
Linear regression analysis
r = corrcoef(x,y)
r = r(1,2)
title(['r = ' num2str(0.01*round(r*100))])
xbar = mean(x);
ybar = mean(y);
sigx = std(x);
sigy = std(y);
a1 = r*sigy/sigx
yfit = ybar + a1*(x - xbar);
plot(x,yfit,'k-')
r =
1.0000 -0.8591
-0.8591 1.0000
r =
-0.8591
a1 =
-0.0086
Use Matlab regress function
X = [x ones(N,1)];
a = regress(y,X)
plot(x,X*a,'r-');
a =
-0.0086
49.2383
Multiple regression using weight and horsepower as predictors
Note weight and horsepower are highly correlated, so the additional
predictive power is unclear.
x1 = x;
x2 = Horsepower(isdata);
r12 = corrcoef(x1,x2);
r12 = r12(1,2);
ry2 = corrcoef(y,x2);
ry2 = ry2(1,2);
x2fit = mean(x2)+(x1-mean(x1))*r12*std(x2)/std(x1);
subplot(2,2,2)
plot(x2,y,'bx')
xlabel('Horsepower')
ylabel('MPG')
title(['r = ' num2str(0.01*round(ry2*100))])
subplot(2,2,3)
plot(x1,x2,'bx',x1,x2fit,'b-')
legend('raw','fit','Location','NorthWest')
xlabel('Weight')
ylabel('Horsepower')
title(['r = ' num2str(0.01*round(r12*100))])
X = [x1 x2 ones(N,1)];
b = regress(y,X)
ry12 = corrcoef(y,X*b);
ry12 = ry12(1,2);
subplot(2,2,4)
plot(X*b,y,'x');
xlabel('b0+b1*weight+b2*power')
ylabel('MPG')
title(['r = ' num2str(0.01*round(ry12*100))])
b =
-0.0066
-0.0420
47.7694
Stepwise regression
x2tilde = x2 - x2fit;
ytilde = y - yfit;
ry2t = corrcoef(ytilde,x2tilde);
ry2t = ry2t(1,2)
stepwisefit([x1 x2],y)
ry2t =
-0.2308
Initial columns included: none
Step 1, added column 1, p=3.24054e-28
Step 2, added column 2, p=0.02686
Final columns included: 1 2
'Coeff' 'Std.Err.' 'Status' 'P'
[-0.0066] [ 0.0011] 'In' [1.3519e-08]
[-0.0420] [ 0.0187] 'In' [ 0.0269]
ans =
-0.0066
-0.0420