Regression Discontinuity

Using Regression Discontinuity to Estimate a Local Average Treatment Effect of the Head Start Program in Impoverished American Counties

This project explores the effectiveness of a 'treatment'- a county receiving assistance in applying for a Head Start program in this case- on high school completion. This technique requires choosing a running variable (poverty rates), an outcome variable (high school completion), and a cutoff point at which treatment is administered (60% poverty in a county). An estimate of the local average treatment effect can be determined by observing any discontinuity at the cutoff point among counties’ outcome variable.

Scatterplot of County Poverty Rates Against High School Completion Rates

This visualization shows the bivariate relationship between county poverty rates and high school completion in respective counties. There appears to be an inverse correlation between poverty and high school completion, indicating that as poverty decreases in a county, high school completion tends to increase. The red line in this plot represents a cutoff point of 60% poverty that is used to delineate which counties received treatment and which did not.

Regression Discontinuity Plot

This visualization is a scatterplot fitted with the lines of best fit. At the cutoff point, there appears to be a discontinuity in the measure of high school completion at the 60% poverty threshold. This cutoff point reflects the level of poverty at which a county is eligble to receive assistance in applying for a Head Start program. Head Start is a program that aims to improve school performance in deeply impoverished areas, and programs will be established in only some counties that complete an application. With regression discontinuity, this cutoff point, referred to as a running variable, is established to examine if there is an effect of treatment after it has been administered. This plot serves as preliminary evidence that a discontinuity in high school completion does exist.

Regression Discontinuity Model

RD analysis of the effect of treatment on high school completion indicates that receiving help applying for a Head Start program does positively affect high school completion. RD_model_3 generates a treatment effect of 0.043 (0.008 , 0.092). While this estimate of a local effect is pretty small, we can be fairly certain that there is in fact an effect of treatment on this outcome because the confidence interval does not include zero.

Regression Discontinuity Analysis

The RD plot above shows that before the cutoff, high school completion is decreasing as poverty rate increases. After the threshold for treatment, the intercept for counties above the cutoff is higher and discontinuous from non-treated counties. Thus, there appears to be an estimated treatment effect of application assistance for a HS program on high school completion rates; counties after the cutoff point who received help in the application process, and are therefore more likely to establish a HS program, had slightly higher high school completion rates compared to the counties just below the cutoff point. RD rests on the assumption that counties within a very close range to the cutoff point are structurally quite similar and that by examining only these counties, it is possible to estimate an effect of treatment because of how alike these counties are to each other.


My mindset about data science overall completely shifted after my experience in SOAN 222, Data Science Tools for Social Policy. In this course, I was able to further my ability to apply data science techniques to real world social phenomena generally, but specifically regarding social policy questions. Regression discontinuity was just one skill we learned. The main focus was to learn statistical methods to examine causality and perform causal inference in relation to social policy decisions. I was introduced to the potential outcomes framework and the use of directed acyclic graphs (DAGs) to examine what it means to say that one social factor is a causal mechanism of another and how my assumptions of social influences are reflected in such graphs. I developed a much deeper understanding of regression analysis that incorporated multivariate analyses and the inclusion of interaction terms in regressions. This study of regression was essential to my learning of different techniques used to study causality and causal inference.

I first learned how to use instrumental variables to estimate a treatment effect when a focal x variable is not randomly assigned. Secondly, I learned the technique of difference-in-differences and used two-way fixed effects to perform causal inference while controlling for the influence of unit-invariant, time varying social factors, as well as time-invariant, unit-varying factors. Thirdly, we covered regression discontinuity, as discussed above. Overall, this course expanded my understanding of regression analysis, taught me how complex determining causality truly is, and reinforced the importance of presenting results in clear and intelligible ways so readers of various exposure to such techniques can understand the social policy implications of the analyses.