# TOPIC 3 (2015)

Apunte InglésUniversidad | Universidad Pompeu Fabra (UPF) |

Grado | Administración y Dirección de Empresas - 2º curso |

Asignatura | Econometrics I |

Año del apunte | 2015 |

Páginas | 16 |

Fecha de subida | 10/04/2016 |

Descargas | 15 |

Subido por | laparicioimbuluzqueta |

### Vista previa del texto

ECONOMETRICS I
TOPIC 3: MULTIPLE REGRESSION
LINEAR REGRESSION WITH MULTIPLE REGRESSORS: ESTIMATION
Topic 2 ended on a worried note:
Although school districts with lower student-teacher ratio (STR) tend to have higher scores in the California data set, perhaps
students from districts with small classes have other advantages that help them perform well on standardized tests. Could this have
produced misleading results, and, if so, what ca be done?
MOTIVATION
Is our regression correctly
predicting the truth? In other
words, the effect of
increasing/decreasing STR is
trustful reflected in our
estimation?
Omitted factors can, in fact, make the OLS estimator of the effect of class size on test scores misleading or, more precisely, biased. This
topic explains this “omitted variable bias” and introduces multiple regression, a method that can eliminate this bias.

KEY IDEA OF MULTIPLE REGRESSION: If we have data on these omitted variables, then we can include them as additional
regressors and thereby estimate the effect of one regressor (the student-teacher ratio) while holding constant the other
variables.

ASPECTS IN COMMON WITH REGRESSION WITH A SINGLE REGRESSOR:
o
The coefficients can be estimated from data using OLS.

o
The OLS estimators are random variables because they depend on data from a random sample.

o
In large samples, the sampling distribution of the OLS estimators is approximately normal.

Omitted variable bias = If the regressor (the student-teacher ratio) is correlated with a variable that has been omitted from the
analysis (the percentage of English learners, in other words, students who have difficulties on the language and are still studying it) and
that determines, in part, the dependent variable (test scores), then the OLS estimator will have OMITTED VARIABLE BIAS.

Omitted variable bias occur when two conditions are true:
[1] The omitted variable is correlated with the included regressor X.

[2] The omitted variable is a determinant of the dependant variable Y.

Laura Aparicio
36
ECONOMETRICS I
EXAMPLE #1: PERCENTAGE OF ENGLISH LEARNERS
CONDITION 1: There is a small correlation [0.19] that suggests that districts with more English learners tend to have a
higher student-teacher ratio (larger classes).

CONDITION 2: It is plausible that students who are still learning English will do worse on standardized tests than native
English speakers, in which case the percentage of English learners is a determinant of test scores.

Thus omitting the percentage of English learners may introduce omitted variable bias.

EXAMPLE #2: TIME OF DAY OF THE TEST
CONDITION 1: The time of day of the test varies from one district to the next in a way that is unrelated to class size, then
the time if day and class size would be uncorrelated.

CONDITION 2: Conversely, the time of day of the test could affect scores (alertness varies through the school day).

Thus omitting the time of day of the test does NOT result in omitted variable bias.

EXAMPLE #3: PARKING LOT SPACE PER PUPIL
CONDITION 1: Schools with more teachers per pupil probably have more teacher parking space.

CONDITION 2: Under the assumption that learning takes place in classroom, not the parking lot, parking lot space has
no direct effect on learning.

Thus omitting the parking lot space per pupil does NOT result in omitted variable bias.

SUMMARY:
OMITTED VARIABLE BIAS AND THE FIRST LEAST SQUARES ASSUMPTION: Omitted variable bias means that the first least squares
assumption [𝐸(𝑢𝑖 |𝑋𝑖 ) = 0] is incorrect. WHY? If one of the factors included in the error term is correlated with X i, this means that the
error term is also correlated with Xi. Because ui and Xi are correlated, the conditional mean of ui given Xi is nonzero.

Laura Aparicio
37
ECONOMETRICS I
OLS ESTIMATOR IS BIASED AND INCONSISTENT
[1] Omitted variable is a problem whether the sample size is large or small.

[2] Whether this bias is large or small in practice depends on the correlation 𝝆𝑿𝒖
between the regressor and the error term. The larger |𝜌𝑋𝑢 |is, the larger the bias.

̂𝟏 depends on whether X and u are positively or
[3] The direction of the bias in 𝜷
negatively correlated.

Omitted variable bias formula: PROOF
EXAMPLE
THREE WAYS TO OVERCOME OMITTED VARIABLE BIAS:
A.

Run a randomized controlled experiment in which treatment (STR) is randomly assigned: then EL_PCT (English learners
percentage) is still determinant of TestScore, [𝛽𝑤 ≠ 0], but EL_PCT is uncorrelated with STR, as is any other factor in U’.

Laura Aparicio
38
ECONOMETRICS I
B.

Adopt a “cross tabulation” approach, with finer and finer gradations of STR and EL_PCT. Some problems: we will run out
of data and we don’t talk about other determinants like family income or parental education. This method consists of dividing
our data in as many groups as possible.

EXAMPLE:
i.

Districts are broken into four categories that correspond to the quartiles of the distribution of the percentage of
English learners across districts.

ii.

Within each of these four categories, districts are further broken down into two groups, depending on whether
the student-teacher ratio is small (STR < 20) or large (STR > 20).

iii.

TOTAL: districts are divided in 8 groups.

Over the full sample of 420
districts, the average test
score is 7.4 points higher in
districts
with
a
low
student-teacher ratio than
a high one; the t-statistic is
4.04, so the null hypothesis
that the mean test score is
the same in two groups is
rejected
at
the
1%
significance level.

But if we look to the final four rows and hold the percentage of English’s learners constant, the difference in performance
between districts with high and low student-teacher ratios is perhaps half (or less) of the overall estimate of 7’4 points.

C.

Multiple regression: omitted variable is no longer omitted because we include EL_PCT as an additional regressor.

The multiple regression model is a linear regression model that includes multiple regressors, X1, X2… XK. Associated with each regressor
is a regression coefficient, β1, β2…, βk. The coefficient β1 is the expected change in Y associated with a 1-unit change in X1, holding the
other regressors constant. The other regression coefficient have an analogous interpretation.

POPULATION MULTIPLE REGRESSION MODEL:
, POPULATION REGRESSION LINE:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 + ⋯ + 𝜷𝒏 𝑿𝒏 + 𝑼𝒊 ,
where 𝒊 = 𝟏, … , 𝒏
β0 is the intercept.

Laura Aparicio
39
ECONOMETRICS I
VARIABLES:
β1 is the slope coefficient of X1.

β2 is the slope coefficient of X2.

Xn are the independent (or explanatory) variables.

Y is the dependent variable
One or more of the independent variables in the multiple
β0 is the intercept parameter
regression model are sometimes referred to as control variables.

β1 is the causal effect of X on Y. Parameter that gives
The interpretation of the coefficient β1 is different than it was
the change in Y for a unit change in X, holding other
when there was only one regressor: β1 is the effect on Y of a unit
factors constant.

change in X1, holding X2 constant or controlling for X2.

Ui are unobserved factors that influence Y, other
than the variable X. Now, U excludes all the variables
that were correlated with Y and X.

The error term Ui in the multiple regression model is
HOMOSKEDASTIC if the variance of the conditional
distribution of Ui given X1, …, Xn is constant and thus does not
Finally, the intercept β0 is the expected value of Yi when X1 and X2
depend on the values of X. Otherwise, the error term is
are zero. In other words, it determines how far up the Y axis the
HETEROSKEDASTIC.

population regression line starts.

All these coefficients can be estimated using OLS.

The OLS estimators 𝛽̂0 , 𝛽̂1 , … , 𝛽̂𝑘 are the values of 𝑏0 , 𝑏1 , … , 𝑏𝑘 that minimize the sum of squared prediction mistakes∑𝑛𝑖=1(𝑌𝑖 − 𝑏0 −
̂ 𝒊 and residuals 𝒖
̂ 𝒊 are:
𝑏1 𝑋1 − ⋯ − 𝑏𝑘 𝑋𝑘𝑖 )2 . The OLS predicted values 𝒀
The OLS estimators 𝛽̂0 , 𝛽̂1 , … , 𝛽̂𝑘 and residual 𝑢̂𝑖 are computed from a sample of n observations of (𝑋1𝑖 , … , 𝑋𝑘𝑖 , 𝑌𝑖 ), 𝑖 = 1, … , 𝑛. These
are estimators of the unknown true population coefficients 𝛽0 , 𝛽1 , … , 𝛽𝑘 and error term 𝑢𝑖 .

APPLICATION TO TEST SCORES AND THE STUDENT-TEACHER RATIO
BEFORE
NOW
We used OLS to estimate the intercept and slope coefficient of
We are now in a position to address this concern by
the regression relating TestScore to STR, using our 420
using OLS to estimate a multiple regression in which
observations for California school districts; the estimated OLS
the dependent variable is the test score (Y) and there
regression line was:
are two regressors:
1.

X STR
2.

W PERCENTAGE OF ENGLISH LEARNERS
PROBLEM: IS THIS RELATIONSHIP MISLEADING BECAUSE STR
MIGHT BE PICKING UP THE EFFECT OF HAVING MANY ENGLISH
LEARNER IN DISTRICTS WITH LARGE CLASSES?
For our 420 districts. The estimated OLS regression line
for this multiple regression is:
Laura Aparicio
40
ECONOMETRICS I
The estimated effect on test scores of a change in STR in the multiple regression is approximately half as large as when the STR was
the only regressor. This difference occurs because the coefficient on STR in the multiple regression is the effect of a change in STR,
holding constant (or controlling for) PctEL, whereas in the single-regressor regression, PctEL is not held constant.

This two estimates can be reconciled by concluding that there is an omitted variable bias in the estimate in the single-regression
model.

Previously we saw that districts with high percentage of English learners tend to have: (1) low test scores and (2) high student-teacher
ratio. If the fraction of English learners is omitted from the regression, reducing the STR is estimated to have a larger effect in test
scores, but this estimate reflects BOTH the effect of a change in the STR and the omitted effect of having fewer English learners in the
district.

Laura Aparicio
41
ECONOMETRICS I
DIAGRAM
MEASURES OF FIT IN MULTIPLE REGRESSION = Three commonly used summary statistics in multiple regression are the Standard Error
of the Regression, the regression R2 and the adjusted R2 (also known as ̅𝑅̅̅2̅).

All three measure how well the OLS estimate of the multiple regression line describes, or “fits”, the data.

STANDARD ERROR OF THE REGRESSION (SER)
It estimates the standard deviation of the error term u i. Thus the SER is a measure of the spread of the distribution of Y
around the regression line. In multiple regression, the SER is:
Laura Aparicio
42
ECONOMETRICS I
The only difference between this formula and the SER for the single-regressor model is that here the division is 𝑛 − 𝑘 − 1
rather than 𝑛 − 2. As in previous sections, using this denominators instead of n is called a degrees-of-freedom adjustment.

If there is a single regressor, then 𝑘 = 1, so both formulas are the same. When n is large, the effect of the degrees-of-freedom
adjustment is negligible.

THE R2
The mathematical definition of the R2 is the same as for regression with a single regressor:
In multiple regression, the R2 increases whenever a regressor is added, unless the estimated coefficient on the added
regressor is exactly zero. In practice, it is extremely unusual for an estimated coefficient to be exactly zero, so in general the
SSR will decrease when a new regressor is added. But this means that the R2 generally increases (and never decreases) when
a new regressor is added.

Because the R2 increases when a new variable is added, an increase in the R 2 does not mean that adding a variable actually
improves the fit of the model. In this sense, the R2 gives and inflated estimate of how well the regression fits the data.

THE ADJUSTED R2
One way to correct for this is to deflate or reduce R2 by some factor, and this is what the adjusted R2 does:
THREE USEFUL THINGS TO KNOW:
̅̅̅̅𝟐 is always less than R2 because (𝑛 − 1)/(𝑛 − 𝑘 − 1)
[1] 𝑹
[2] Adding a regressor has two opposite effects:
a.

The SSR falls, which increases the ̅𝑅̅̅2̅.

b.

The factor (𝑛 − 1)/(𝑛 − 𝑘 − 1) increases.

Whether the ̅𝑅̅̅2̅ increases or decreases depends on which of these two effects is stronger.

̅̅̅̅𝟐 can be negative. This happens when the regressors, taken together, reduce SSR by such a small amount
[3] The 𝑹
that this reduction fails to offset the factor (𝑛 − 1)/(𝑛 − 𝑘 − 1).

Laura Aparicio
43
ECONOMETRICS I
Using the R2 and adjusted R2 is useful because it quantifies the
extent to which the regressors account for, or explain, the
variation in the dependent variable.

Nevertheless, heavy reliance in these statistics can be a trap.

The decision about whether to include a variable in a multiple
regression should be based on whether including that variable
allows you better to estimate the causal effect of interest. We
return to the issue of how to decide which variables to include and
which to exclude.

LEAST SQUARES ASSUMPTIONS
ASSUMPTION #1: THE CONDITIONAL DISTRIBUTION
ASSUMPTION #4: NO PERFECT MULTICOLLINEARITY
OF Ui GIVEN Xi HAS A MEAN OF ZERO
The regressors are said to exhibit perfect multicollinearity, if one of the
This assumption means that sometimes Yi is above the
population regression line and sometimes Yi is below
the population regression line, but on average over the
population Yi falls on the population regression line.

Therefore, for any value of the regressors, the
expected value of ui is zero. THIS IS THE KEY
ASSUMPTION THAT MAKES THE OLS ESTIMATORS
UNBIASED.

regressors is a perfect linear function of the other regressors which
makes impossible to calculate the OLS estimator. At an intuitive,
perfect multicollinearity is a problem because you are asking the
regression to answer an illogical question. In multiple regression, the
coefficient on one of the regressors is the effect of a change in that
regressor, holding the other regressors constant. In the hypothetical
regression of TestScore on STR and STR, the coefficient of the first
occurrence of STR is the effect on test scores of a change in STR,
holding constant STR. This makes no sense.

ASSUMPTION #2: (Xi, Yi), I = 1, …, n ARE
INDEPENDENTLY AND IDENTICALLY DISTRIBUTED
This assumption holds automatically if the data are
collected by simple random sampling.

In general, the software will do one of two things: either it will drop
one of the occurrences of STR or it will refuse to calculate the OLS
estimates and give an error-message.

SUMMARY
ASSUMPTION #3: LARGE OUTLIERS (= observations
with values far outside the usual range of data) ARE
UNLIKELY
This assumption serves as a reminder that, as in singleregressor case, the OLS estimator of the coefficients in
the multiple regression model can be sensitive to large
outliers. In other words, we assume that the regressors
and the dependent variables have nonzero finite fourth
moments.

The coefficients in multiple regression can be estimated by OLS. When the four least squares assumptions are satisfied, the OLS
estimators are: Unbiased, Consistent and Normally distributed in large samples.

Laura Aparicio
44
ECONOMETRICS I
EXTRA: IMPLICATIONS OF ASSUMPTION #1
CONTROL VARIABLES
IMPLICATION ON CONDITIONAL MEAN INDEPENDENCE
Laura Aparicio
45
ECONOMETRICS I
EXTRA: ASSUMPTION #4 IN STATA
Because the data differ from one sample to the next, different samples produce different values of the OLS estimators. This variation
is summarized in the SAMPLING DISTRIBUTION OF THE OLS ESTIMATORS.

Under the least squares assumptions:
̂ 𝟎, 𝜷
̂ 𝟏, … , 𝜷
̂ 𝒌 are unbiased and consistent estimators of 𝜷𝟎 , 𝜷𝟏 , … , 𝜷𝒌 in the linear multiple regression
[1] The OLS estimators 𝜷
model.

̂ 𝟎, 𝜷
̂ 𝟏, … , 𝜷
̂ 𝒌 is well approximated by a multivariate normal distribution, which
[2] In large samples, the joint distribution of 𝜷
is the extension of the bivariate normal distribution to the general case of two or more jointly normal random variables.

Laura Aparicio
46
ECONOMETRICS I
As discussed previously, PERFECT MULTICOLLINEARITY arises when one of the regressors is a perfect linear combination of the other
regressors. We are going to see:
I.

Some examples of perfect multicollinearity.

II.

How perfect multicollinearity can arise and be avoided in regressions with multiple binary regressors?
III.

What is imperfect multicollinearity?
(I) EXAMPLES OF PERFECT MULTICOLLINEARITY: We’ll examine three hypothetical regressions
EXAMPLE #1. We have three regressors: STR, PctEL (percentage of English learners) and FracEL (fraction of English learners
which varies between 0 and 1). The regressors would be perfectly multicollinear PctEL = 100·FracEL
EXAMPLE #2. We have two regressors: STR and NVS (“not very small classes” is a binary variable that equals 1 is STR > 12
and 0 otherwise). But, in fact, there are no districts in our data set with STR < 12; as you can see in the scatterplot, the smallest
value of STR is 14. Now recall that the linear regression model with an intercept can equivalently be thought as including a
regressor, X0, that equals 1 for all i. The regressors would be perfectly multicollinear NVS =X0
EXAMPLE #3. We have three regressors: STR, PctEL (percentage of English learners) and PctES (percentage of English
speakers). The regressors would be perfectly multicollinear PctEL = 100 – PctES which could be write also as PctEL =
100X0 – PctES.

IMPORTANT! Perfect multicollinearity is a feature of the entire set of regressors. If either the intercept (X 0) or other regressor
were excluded from this regression, the regressors would not be perfectly multicollinear.

(II) THE DUMMY VARIABLE TRAP
Another possible source of perfect multicollinearity arises when multiple binary, or dummy, variables are used as regressors.

Imagine you have partitioned the school districts into three categories: rural, suburban and urban. If you include all three variables
in the regression along with a constant, the regressors would be perfectly multicollinear Rural + Suburban + Urban = 1 = X0
To solve this problem you have to:
Exclude one of these four variables, either one of the binary indicators or the constant term.

(III) IMPERFECT MULTICOLLINEARITY
Imperfect multicollinearity means that two or more of the regressors are highly correlated. Imperfect multicollinearity does not
pose any problems for the theory of the OLS estimators; indeed, a purpose of OLS is to sort out the independent influences of the
various regressors when these regressors are potentially correlated.

If the regressors are imperfectly multicollinear, then the coefficients on at least one individual regressors will be imprecisely
estimated.

EXAMPLE: Consider the regression of TestScore on STR and PctEL. Suppose we were to add a third regressor, the percentage
of the district’s residents who are first-generation immigrants. First-generation immigrants often speak English as a second
language, so the variables PctEL and percentage immigrants will be highly correlated.

Laura Aparicio
47
ECONOMETRICS I
Districts with many recent immigrants will tend to have many students who are still learning English. Because these two
variables are highly correlate, it would be difficult to use these data to estimate the partial effect on test scores of an increase
in PctEL, holding constant the percentage of immigrants.

In other words, the data set provides little information about what happens to test scores when the percentage of English
learners is low but the fraction of immigrants is high, or vice versa. If the least squares assumption hold, then the OLS
estimator of the coefficient on PctEL in this
regression will be unbiased; however, it will have a
larger variance than if the regressors PctEL and
percentage immigrants were uncorrelated.

More generally, when multiple regressors are
imperfectly multicollinear, the coefficients on one
or more of these regressors will be imprecisely
estimated (that is, they will have a large sampling
variance).

PERFECT MULTICOLLINEARITY
IMPERFECT MULTICOLLINEARITY
It often signals the presence of a logical error.

It is not necessarily an error, but rather just a feature of OLS,
your data, and the question you are trying to answer.

Perfect multicollinearity, which occurs when one regressor is an exact linear function of the other regressors, usually arises from a
mistake in choosing which regressors to include in a multiple regression. Solving perfect multicollinearity requires changing the set of
regressors.

CONCLUSION
Regression with a single regressor is vulnerable to omitted variable bias: If an omitted variable is determinant of the dependent
variable and is correlated with the regressor, then the OLS estimator of the slope coefficient will be biased and will reflect both the
effect of the regressor and the effect of the omitted variable.

Multiple regression makes it possible to mitigate omitted variable bias by including the omitted variable in regression. The coefficient
on a regressor, X, in multiple regression is the partial effect of a change in X, holding constant the other included regressors.

Finally, the least squares assumptions for multiple regression are extensions of the three least squares assumptions for regression with
single regressor, plus a fourth assumption ruling out perfect multicollinearity. Because the regression coefficients are estimated using
a single sample, the OLS estimators have a joint sampling distribution and therefore have sampling uncertainty.

This sampling uncertainty must be quantified as part of an empirical study, and the ways to do so in the multiple regression model are
the topic of the next chapter.

Laura Aparicio
48
ECONOMETRICS I
HYPOTHESIS TESTS AND CONFIDENCE INTERVALS IN MULTIPLE REGRESSION:
INFERENCE
First of all, hypothesis tests and confidence intervals for a single regression coefficient are carried out using essentially the same
procedures that were used in the one-variable linear regression model of TOPIC 2. For example, a 95% confidence interval for β1 is
given by 𝛽̂1 ± 1.96𝑆𝐸(𝛽̂1 ).

NEW!! Hypothesis involving more than one restriction on the coefficient are called JOINT HYPOTHESIS which can be tested using an
F-statistic.

① HOW TO FORMULATE JOINT HYPOTHESES
Why can’t we just test the individual coefficients one at a time? Because maybe the coefficients alone are insignificant but,
once we test them jointly, they are relevant. So, the best approach, especially when the regressors are highly correlated, is
the F-statistic.

② HOW TO TEST THEM USING AN F-STATISTIC
When q = 2: if we knew that the t-statistics are uncorrelated, our equation will be simplified to the average of the
squared t-statistics.

When q = 1: the joint null hypothesis reduces to the null hypothesis on a single regression coefficient, and the Fstatistic is the square of the t-statistic.

When there are q restrictions: It’s a really complicated formula which is normally incorporated in regression
software. In large samples, under the null hypothesis the F-statistic is distributed as 𝐹𝑞,∞ . Thus the critical values
for the F-statistic can be obtained from the tables.

P-value:
Laura Aparicio
49
ECONOMETRICS I
Heteroskedastic-robust F-statistic: In some software packages you must select a “robust” option.

Homoskedastic- only F-statistic: There is a link between the F-statistic and R2: a large F-statistic should be
associated with a substantial increase in the R2.

Restricted regression: the null hypothesis forced to be true.

Unrestricted regression: the alternative hypothesis is allowed to be true.

K = number of regressors in the unrestricted regression.

SPECIAL CASE: TEST WITH ONLY ONE RESTRICTION INVOLVING MULTIPLE COEFFICIENTS
APPROACH 1: TEST THE RESTRICTION DIRECTLY (use the F-statistic when q = 1).

APPROACH 2: TRANSFORM THE REGRESSION.

Example: Suppose there are only two regressors, so the population regression has
the form
Laura Aparicio
50
ECONOMETRICS I
MODEL SPECIFICATION FOR MULTIPLE REGRESSION
So far, we’ve distinguished variables of interest and control variables.

[1] Variable of interest: we wish to estimate its causal effect.

[2] Control variables: regressors included to hold constant factors that, if neglected, could lead the estimated causal effect of
interest to suffer from omitted variable bias. Its coefficient may not represent the causal effect.

HOW CAN WE KNOW WHETHER TO INCLUDE A PARTICULAR VARIABLE? Regression specification proceeds by first determining a base
specification chosen to address concern about omitted variable bias. The base specification can be modified by including additional
regressors that address other potential sources of omitted variable bias. Simply choosing the specification with the highest R2 can lead
to regression models that do not estimate the causal effect of interest.

Laura Aparicio
51
...