# TOPIC 5 (2015)

Apunte InglésUniversidad | Universidad Pompeu Fabra (UPF) |

Grado | Administración y Dirección de Empresas - 2º curso |

Asignatura | Econometrics I |

Año del apunte | 2015 |

Páginas | 5 |

Fecha de subida | 10/04/2016 |

Descargas | 12 |

Subido por | laparicioimbuluzqueta |

### Vista previa del texto

ECONOMETRICS I
TOPIC 5: ASSESSING STUDIES BASED ON MULTIPLE REGRESSION
What makes a study that uses multiple regression reliable or unreliable?
To answer this question we’re going to use a framework which evaluates statistical studies by asking whether the analysis is internally
and externally valid:
INTERNALLY VALID: if the statistical inferences about causal effects are valid for the population being studied (the one from
which the sample was drawn).

EXTERNALLY VALID: if its inferences and conclusions can be generalized to other populations and settings.

① POTENTIAL THREATS TO EXTERNAL VALIDITY arise from differences between the population and setting studied and the
population and setting of interest:
DIFFERENCES IN POPULATIONS. It can be due to differences in characteristics of the populations, because of differences in
characteristics of the populations, because of geographical differences, or because the study is out of date. For example,
laboratory studies of the toxic effects of chemicals typically use animal populations like mice (the population studied), but
the results are used to write health and safety regulations for human populations (the population of interest).

DIFFERENCES IN SETTINGS. It includes differences in the institutional environment (public universities versus religious
universities), differences in laws (like legal penalties), or differences in the physical environment (California vs Alaska). For
example, a study of the effect on college binge drinking of an anti-drinking advertising campaign might not generalize to
another identical group of college students if the legal penalties for drinking at the two colleges differ.

SOLUTION: External validity must be judged using specific knowledge of the populations and settings studied and those of interest.

These threats are best minimized at the early stages of a study, before the data is collected. Moreover, if there are more studies on
different but related populations, then, similar findings bolster claims to external validity, while differences in their findings that are not
readily explained cast doubt on their external validity.

In regression estimation of causal effects, there are TWO TYPES OF ② THREATS TO INTERNAL VALIDITY:
[1] First, OLS estimators will be biased and inconsistent if the regressors and error terms are correlated (VIOLATION OF THE #1
LS ASSUMPTION) which will happen in the following cases:
Laura Aparicio
56
ECONOMETRICS I
PROBLEM
SOLUTION
When a variable that both determines
The decision whether to include a variable involves a trade-off
Y and is correlated with one or more of
between bias (if it’s not included) and variance of the coefficient of
the included regressors is omitted from
interest (if it’s included).

the regression. This bias persists even
There are 4 steps that can help you to decide (summary table [*]).

in large samples, so the OLS estimator
is inconsistent.

However, adding an omitted variable to a regression is not an
OMITTED
option if you do not have data on that variable and if there are no
VARIABLES
adequate control variables. In this case, we have three solutions:
#1: Use panel data. Same observational unit is observed at
different points in time.

#2: Use instrumental variables.

#3: Randomized controlled experiment.

If the true population regression
Functional form misspecification often can be detected by plotting
INCORRECT
function is nonlinear but the estimated
the data and the estimated regression function, and it can
FUNCTIONAL
regression is linear, then this functional
corrected by using a different functional form.

FORM IS USED
form misspecification makes the OLS
estimator biased.

Errors-in-variables bias in the OLS
MEASUREMENT ERROR IN X
estimator arises when an independent
If the measured variable equals the actual value plus a mean-zero,
variable is measured imprecisely. This
independently distributed measurement error, then the OLS
bias depends on the nature of the
estimator in a regression with a single right-hand variable is biased
measurement error and persists even if
toward zero (downward bias).

the sample size is large.

①
ONE OR MORE OF
Mistakes can come from surveys
THE REGRESSORS
(people
ARE MEASURED
speculate) or administrative records
WITH ERROR
(“ERRORS IN
VARIABLES”)
don’t know
exactly
and
②
(typographical errors).

③ Therefore, the regressor will be correlated with the error term
and 𝛽̂1 will be biased and inconsistent. And the probability limit is:
In the extreme case that the measurement error is so large that
essentially no information about X remains, 𝛽̂1 converges to 0. In
the other extreme, when there is no measurement error, 𝛽̂1 is
consistent.

Laura Aparicio
57
ECONOMETRICS I
MEASUREMENT ERROR IN Y
If Y has classical measurement error, then this measurement error
increases the variance of the regression and of 𝛽̂1 but does NOT
induce bias in 𝛽̂1 .

So, to solve the measurement error in X we must get an accurate
measure of this regressor.

Sample selection bias arises when a
(I) DATA ARE MISSING COMPLETELY AT RANDOM
selection
The effect is to reduce the sample size but not introduce bias.

process
influences
the
availability of data and that process is
related to the dependent variable,
THE SAMPLE IS
CHOSEN NONRANDOMLY
FROM THE
POPULATION
beyond depending on regressors.

(II) DATA ARE MISSING BASED ON X
The effect also is to reduce the sample size but not introduce bias.

Sample selection induces to correlation
(III) DATA ARE MISSING BECAUSE OF A SELECTION PROCESS THAT
between one or more regressors and
IS RELATED WITH Y BEYOND DEPENDING ON X
the error term, leading to bias and
Example: “Landon wins!” In that model the sample selection
inconsistency of the OLS estimator.

method used consisted of randomly selecting phone numbers of
automobile owners which was related with Y (who the individual
supported for president in 1936) because in 1936 car owners with
phones were more likely to be Republicans.

Simultaneous causality bias, also called
SIMULTANEOUSLY
simultaneous equations bias, arises in a
CAUSALITY
regression of Y on X when, in addition
BETWEEN THE
to the causal link of interest from X to
REGRESSORS AND
Y, there is a causal link from Y to X. This
DEPENDENT
reverse causality makes X correlated
VARIABLES
with the error term in the population
regression of interest.

We can solve this problem in two ways:
#1: Use instrumental variables regression.

#2: Implement a randomized controlled experiment.

Laura Aparicio
58
ECONOMETRICS I
STEP #1: In the test score regressions, this is the
coefficient on the student-teacher ratio.

STEP #2: Ask yourself: What are the most likely
sources of important omitted variable bias in
this regression? You’ll obtain a list of additional
“questionable” variables that might help to
mitigate possible omitted variables that might
help to mitigate possible omitted variable bias.

STEP #3: If they have nonzero coefficient, they
should remain in the specification and you
should modify your base specification. If not,
then these variables can be excluded from the
regression.

STEP #4: Presenting the other regressions
permits the sceptical reader to draw his or her own conclusions. An example of this summary would be:
Laura Aparicio
59
ECONOMETRICS I
[2] Second, confidence intervals and hypothesis tests are not valid when the standard errors are incorrect which will occur in
the following cases:
a.

The errors are heteroskedastic and the computer software uses the homoskedasticity-only standard errors
b.

The error term is correlated across different observations
SUMMARY
Laura Aparicio
60
...