class: center, middle, inverse, title-slide # The Standard Errors of Persistence ## A Discussion of Kelly 2019 ### Alex Stephenson ### UC Berkeley ### 2020-08-19 --- # Take Home Point Failure to understand and account for spatial autocorrelation creates bias in regressions that could call into question major findings in economics and political science. Only 25% of the studies in Kelly's sample have findings that are robust after taking into account possibility regressions are fitting spatial noise. --- # Goals of Kelly (2019) 1. Show the connection between high *t*-statistics and severe spatial autocorrelation of residuals -- 2. Apply that finding to the persistence literature --- # The setting for these findings There is Large literature on persistence that suggest modern outcomes (incomes, social attitudes) are shaped by past characteristics - Examples: - Medieval pograms and votes for Nazi parties Voigtlander and Voth - Slave trade and mistrust in African Societies (Nunn and Wantchekon 2011) - Slavery in the American South and contemporary differences in political attitudes (Acharya, Blackwell, Sen 2016) The persistence variables have high explanatory power (e.g. big *t*-statistics) --- # The Problem with Space Tobler's First Law of Geography: "Everything is related to everything else, but near things are more related than distant things." -- - Persistence regressions are spatial regressions. Like time series, spatial regressions can have correlated values. -- - Spatial Autocorrelation describes the degree to which observations in spatial locations are similar to each other --- # Kelly's Test Kelly proposes a two step procedure for identification of persistence issues: 1. Compute a Moran Index. Large Moran Index are "reliable warnings that nominal significance levels differ substantially from true ones". 2. Generate synthetic spatial noise to match correlation structure of variables of interest. Use these as placebo tests. --- # What the paper is not doing Checking for issues with data construction - e.g. Acemoglu, Robinson, Johnson 2001 (Albouy 2012), La Porta et al 1997 (Spamann 2010) -- Check plausibility of mechanism or quality of scholarship -- Checking for econometric issues -- Interested in disproving findings of original studies --- # Moran's Statistic Moran's Statistic or Moran's *I* is a measure of spatial autocorrelation `$$I = \frac{N}{\sum_{i=1}^n \sum_{i=1}^nw_{ij}}\frac{\sum_{i=1}^n\sum_{j=1}^nw_{ij}(x_i -\bar{x})(x_j - \bar{x})}{\sum_{i=1}^n(x_i - \bar{x})^2}$$` - N is the number of spatial weights indexed by *i* and *j* - `\(x\)` is the variable of interest. `\(\bar{x}\)` is the mean of `\(x\)`. - `\(w_{ij}\)` is the spatial weights matrix with zeros on the diagonals Think of this as the weighted sum of covariance between every pair of residuals with a weight scheme that follows Tobler's First Law of Geography --- # R Implementation [Full Code for R Simulation Setup](https://tinyurl.com/t3xh5zm) --- # Two Independent Spatial Processes ![:scale 75%](spatnoise.png) --- # Correlation in the Noise Even though by construction there is no relationship, a linear regression reports a negative and statistically significant coefficient <table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;"> <caption>Statistical models</caption> <thead> <tr> <th style="padding-left: 5px;padding-right: 5px;"> </th> <th style="padding-left: 5px;padding-right: 5px;">Model 1</th> </tr> </thead> <tbody> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">Intercept</td> <td style="padding-left: 5px;padding-right: 5px;">0.69<sup>***</sup></td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">(0.06)</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Hopscotch Enthusiasts in 1900</td> <td style="padding-left: 5px;padding-right: 5px;">-0.43<sup>***</sup></td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">(0.11)</td> </tr> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td> <td style="padding-left: 5px;padding-right: 5px;">0.20</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td> <td style="padding-left: 5px;padding-right: 5px;">0.19</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td> <td style="padding-left: 5px;padding-right: 5px;">50</td> </tr> <tr style="border-bottom: 2px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">RMSE</td> <td style="padding-left: 5px;padding-right: 5px;">0.14</td> </tr> </tbody> <tfoot> <tr> <td style="font-size: 0.8em;" colspan="2"><sup>***</sup>p < 0.001; <sup>**</sup>p < 0.01; <sup>*</sup>p < 0.05</td> </tr> </tfoot> </table> --- # Visual Inspection of Correlation The problem is that there is correlation in the residuals due to spatial autocorrelation. <img src="kelly_files/figure-html/unnamed-chunk-4-1.png" width="504" height="450" /> --- # Moran's Index for our simulated data A Moran Monte Carlo test reveals that our observed Moran statistic is highly unlikely to occur if there was no spatial noise. <img src="kelly_files/figure-html/unnamed-chunk-5-1.png" width="504" height="450" /> --- # The real world is spatially correlated ![:scale 75%](inflation.png) --- # The real world is spatially correlated ![:scale 75%](significance1.png) --- # Kelly's Results Replicates the leading regression in the paper exactly -- - Applies two step procedure -- - Implication: "Only about 1/4 of the persistence results we examine are robust after we take account of possibility that regressions might be fitting spatial noise" --- # Moran Index for Papers ![:scale 75%](moran.png) --- # The explanatory power of noise ![:scale 75%](noisesig.png) --- # Actual predictors explain noise ![:scale 75%](depend.png) --- # Concluding remarks From this sample of papers the papers that have low Moran statistics often use very weak instruments -- Sample appears to trade off weak data with spatial correlation -- Few if any actually estimate spatial regressions, even as a robustness check