TOPIC 5 (2015)Apunte Inglés
Vista previa del texto
TOPIC 5: ASSESSING STUDIES BASED ON MULTIPLE REGRESSION
What makes a study that uses multiple regression reliable or unreliable?
To answer this question we’re going to use a framework which evaluates statistical studies by asking whether the analysis is internally
and externally valid:
INTERNALLY VALID: if the statistical inferences about causal effects are valid for the population being studied (the one from
which the sample was drawn).
EXTERNALLY VALID: if its inferences and conclusions can be generalized to other populations and settings.
① POTENTIAL THREATS TO EXTERNAL VALIDITY arise from differences between the population and setting studied and the population and setting of interest: DIFFERENCES IN POPULATIONS. It can be due to differences in characteristics of the populations, because of differences in characteristics of the populations, because of geographical differences, or because the study is out of date. For example, laboratory studies of the toxic effects of chemicals typically use animal populations like mice (the population studied), but the results are used to write health and safety regulations for human populations (the population of interest).
DIFFERENCES IN SETTINGS. It includes differences in the institutional environment (public universities versus religious universities), differences in laws (like legal penalties), or differences in the physical environment (California vs Alaska). For example, a study of the effect on college binge drinking of an anti-drinking advertising campaign might not generalize to another identical group of college students if the legal penalties for drinking at the two colleges differ.
SOLUTION: External validity must be judged using specific knowledge of the populations and settings studied and those of interest.
These threats are best minimized at the early stages of a study, before the data is collected. Moreover, if there are more studies on different but related populations, then, similar findings bolster claims to external validity, while differences in their findings that are not readily explained cast doubt on their external validity.
In regression estimation of causal effects, there are TWO TYPES OF ② THREATS TO INTERNAL VALIDITY:  First, OLS estimators will be biased and inconsistent if the regressors and error terms are correlated (VIOLATION OF THE #1 LS ASSUMPTION) which will happen in the following cases: Laura Aparicio 56 ECONOMETRICS I PROBLEM SOLUTION When a variable that both determines The decision whether to include a variable involves a trade-off Y and is correlated with one or more of between bias (if it’s not included) and variance of the coefficient of the included regressors is omitted from interest (if it’s included).
the regression. This bias persists even There are 4 steps that can help you to decide (summary table [*]).
in large samples, so the OLS estimator is inconsistent.
However, adding an omitted variable to a regression is not an OMITTED option if you do not have data on that variable and if there are no VARIABLES adequate control variables. In this case, we have three solutions: #1: Use panel data. Same observational unit is observed at different points in time.
#2: Use instrumental variables.
#3: Randomized controlled experiment.
If the true population regression Functional form misspecification often can be detected by plotting INCORRECT function is nonlinear but the estimated the data and the estimated regression function, and it can FUNCTIONAL regression is linear, then this functional corrected by using a different functional form.
FORM IS USED form misspecification makes the OLS estimator biased.
Errors-in-variables bias in the OLS MEASUREMENT ERROR IN X estimator arises when an independent If the measured variable equals the actual value plus a mean-zero, variable is measured imprecisely. This independently distributed measurement error, then the OLS bias depends on the nature of the estimator in a regression with a single right-hand variable is biased measurement error and persists even if toward zero (downward bias).
the sample size is large.
① ONE OR MORE OF Mistakes can come from surveys THE REGRESSORS (people ARE MEASURED speculate) or administrative records WITH ERROR (“ERRORS IN VARIABLES”) don’t know exactly and ② (typographical errors).
③ Therefore, the regressor will be correlated with the error term and 𝛽̂1 will be biased and inconsistent. And the probability limit is: In the extreme case that the measurement error is so large that essentially no information about X remains, 𝛽̂1 converges to 0. In the other extreme, when there is no measurement error, 𝛽̂1 is consistent.
Laura Aparicio 57 ECONOMETRICS I MEASUREMENT ERROR IN Y If Y has classical measurement error, then this measurement error increases the variance of the regression and of 𝛽̂1 but does NOT induce bias in 𝛽̂1 .
So, to solve the measurement error in X we must get an accurate measure of this regressor.
Sample selection bias arises when a (I) DATA ARE MISSING COMPLETELY AT RANDOM selection The effect is to reduce the sample size but not introduce bias.
process influences the availability of data and that process is related to the dependent variable, THE SAMPLE IS CHOSEN NONRANDOMLY FROM THE POPULATION beyond depending on regressors.
(II) DATA ARE MISSING BASED ON X The effect also is to reduce the sample size but not introduce bias.
Sample selection induces to correlation (III) DATA ARE MISSING BECAUSE OF A SELECTION PROCESS THAT between one or more regressors and IS RELATED WITH Y BEYOND DEPENDING ON X the error term, leading to bias and Example: “Landon wins!” In that model the sample selection inconsistency of the OLS estimator.
method used consisted of randomly selecting phone numbers of automobile owners which was related with Y (who the individual supported for president in 1936) because in 1936 car owners with phones were more likely to be Republicans.
Simultaneous causality bias, also called SIMULTANEOUSLY simultaneous equations bias, arises in a CAUSALITY regression of Y on X when, in addition BETWEEN THE to the causal link of interest from X to REGRESSORS AND Y, there is a causal link from Y to X. This DEPENDENT reverse causality makes X correlated VARIABLES with the error term in the population regression of interest.
We can solve this problem in two ways: #1: Use instrumental variables regression.
#2: Implement a randomized controlled experiment.
Laura Aparicio 58 ECONOMETRICS I STEP #1: In the test score regressions, this is the coefficient on the student-teacher ratio.
STEP #2: Ask yourself: What are the most likely sources of important omitted variable bias in this regression? You’ll obtain a list of additional “questionable” variables that might help to mitigate possible omitted variables that might help to mitigate possible omitted variable bias.
STEP #3: If they have nonzero coefficient, they should remain in the specification and you should modify your base specification. If not, then these variables can be excluded from the regression.
STEP #4: Presenting the other regressions permits the sceptical reader to draw his or her own conclusions. An example of this summary would be: Laura Aparicio 59 ECONOMETRICS I  Second, confidence intervals and hypothesis tests are not valid when the standard errors are incorrect which will occur in the following cases: a.
The errors are heteroskedastic and the computer software uses the homoskedasticity-only standard errors b.
The error term is correlated across different observations SUMMARY Laura Aparicio 60 ...