How Ecological Inference Corrupts an Ideology Model

Thursday, June 15, 2006 at 9:48AM

Sean Wilson in Law & Ideology, Quantitative Ideology Models, Quantitative Methods, Segal & Spaeth, composition

Sean Wilson in Law & Ideology, Quantitative Ideology Models, Quantitative Methods, Segal & Spaeth, composition

[version 1.1]*

In my last entry, I demonstrated that the bivariate ideology models constructed by judicial politics scholars over the last sixteen years had the unfortunate property of introducing ecological inference into the regression analysis. One may wonder why scholars did this to their models given the fact that there was no reason to do so (at least not since the mid 1990s). That is a topic for another day, however. For now, I want to consider a more direct question: what is “wrong” with relying upon ecological inference in a bivariate ideology model?

Although there are many problems with models that aggregate voting data, I want to focus upon one exclusive phenomenon in this entry: goodness of fit and model misspecification. I’ll hit the other problems in my next entry.

**A**. Goodness-of-Fit and Modeling Flaws

The best way to demonstrate the fit problem is with an example, followed by an interpretation. Let us assume that there are two hypothetical courts, Alpha and Beta, each with five justices who have the following voting data:

Alpha Court: | Beta Court | ||||||

Justice | N | Pct. L | Segal/ Cover | Justice | N | Pct. L | Segal/ Cover |

Rove | 10 | .10 | .10 | Rove | 10 | .10 | .10 |

O’Connor | 10 | .30 | .30 | Drudge | 10 | .10 | .30 |

Stewart | 10 | .50 | .50 | “Teddy” | 10 | .90 | .50 |

Clinton | 10 | .70 | .70 | Jesse | 10 | .90 | .70 |

Nader | 10 | .90 | .90 | Nader | 10 | .90 | .90 |

The difference between these two courts is that one has a distribution of liberal votes that is symmetrical, the other is polarized. The Alpha Court is anchored by two extreme justices, followed by those of proximate distance and a centrist. The Beta Court, by contrast, is plagued with two extreme “clans.” In both cases, however, the hypothetical Segal-Cover scores are the same. That is, in the case of Alpha, the scores correlate *perfectly *with the percentage of liberal votes cast -- think of it as the attitudinal model in heaven -- but in the case of Beta, newspaper editorials were simply not as accurate of a forecast.

If one were to regress the Segal-Cover scores against the votes in each of these courts, what do you think the difference would be in the goodness-of-fit for an aggregated versus non-aggregated model? The answer may surprise you. In the logit model, goodness of fit is around .5 for the Alpha Court (symmetrical) and .8 for Beta (polarized).[1] However, in the aggregated model, goodness of fit is a perfect 1.0 for the Alpha court and .47 for Beta.[2] In other words, *the fit of the models is moving in an opposite* *direction*.

And now the critical question: how come the aggregated model cannot properly distinguish between symmetrical and polarized voting for purposes of fit like the non-aggregated model can? That is, how can a model of bias report that a voting universe is less explained by the presence of bias when it is dominated by two extreme clans – the “Rehnquist Five” logic – versus when it has symmetrical variety? The answer is straight forward: when data exists in a binary format, cases that do not affiliate well with dichotomous outcomes – those that show little to no favoritism for “0s” or “1s” – are interpreted as not fitting the model framework well. Hence, Justice Stewart on the Alpha Court is pulling down the model’s fit. However, in an OLS regression of continuous-level data, median cases of X only fail to fit their model if they have extreme Y values (are outliers). Hence, “Teddy” on the Beta Court is pulling down the fit of the OLS regression. (Picture a scatter plot: Teddy is the highest from the line. I have the data posted below if you want to play with it).

So what do we make of this? The point is that the only way a median justice will fail to fit very well into an ecological regression that is already anchored by two extremes is if he or she is so extremely directional (biased) as to be an outlier (Teddy). Yet, if that same voting pattern happens in the logit regression, goodness of fit would increase, not decrease, because the more movement one sees “out of the middle,” the better those models perform. Hence, what aggregation does is it *transforms poorly fitting cases in one model into perfectly fitting cases in the other.* Stated another way, it transforms non-directional justices into optimally-biased justices.

Given what I have just demonstrated, it should be quite clear why aggregating votes is fundamentally objectionable from the standpoint of both measurement logic and model specification. Quite simply, a model that transforms median justices who do not affiliate with an observed measure of bias into cases that actually “jack up” the model’s assessment of how well bias explains the voting universe is nothing other than a kind of sophistry masquerading behind statistical software. It is sophistry because: (a) bias is supposed to be an observed, empirical phenomenon, not a manufactured one; and (b) a model predicated upon the idea that a median-measured justice could lower the overall picture of bias in a voting universe by becoming an extremist is simply an invalid theoretical design.

Some might be tempted to argue, however, that non-directional justices have “moderate ideology,” and that this is their true “bias.” The argument would be that aggregation is good because median-measured justices *should* bolster fit unless they become extreme. The reply to this view is straight forward: Segal and Spaeth do not have a criteria for observing moderation as a political subject matter at the case level. That is exactly what the whole objection is. What is a vote for moderate ideology? To determine whether non-directional justices are, in fact, expressing preference for a political subject matter that is different from liberalism or conservatism, one would need to *observe *it with a trichotomous variable that provides acceptable coding criteria for the three distinct ideological choices. Or, one would need to create a continuous level measure of quality liberalism for each choice available to a justice (McGuire and Vanberg). To date, neither of these options have materialized. Even if they ever do, it is doubtful that such an innovation will help the fit of ideology models. The reason is that the justices who we think are liberal and conservative may “defect” quite regularly for centrist alternatives. If this happens with any regularity, the goodness of fit will not be as high as some in our field would like.

Therefore, transforming justices who systematically resist a measure of bias into perfectly-biased justices through the magic of aggregation is a most objectionable way to conduct empirical analysis of the data that is currently available. In short, these ecological models are misspecified. No longer can political scientists assert as an empirical matter that 60-to-80% of the choices justices make in civil liberties cases arise out of their political values -- at least not to the extent that researchers have observed such phenomena in a data set. There is absolutely no empirical truth in that assertion whatsoever.

OUTPUT FOR ALPHA AND BETA:

The STATA file: http://ludwig.squarespace.com/storage/experiment.dta; Goodness-of-fit tables for the logistic regressions: http://ludwig.squarespace.com/storage/table.alpha-beta.doc

REFERENCES:

McGuire, Kevin T., and George Vanberg. 2005. *Mapping the Policies of the U.S. Supreme Court: Data, Opinions, and Constitutional Law,* paper presented at the Annual Meeting of the American Political Science Association.

[1] Logit is estimated with maximum likelihood. The only way to achieve a 1.0 (perfect) goodness of fit in a logit model is if the classification table perfectly predicts complete polarization. There would be no classification errors whatsoever.

[2] For those wanting more information, the logit classification table appears at the end of this journal entry. Fit is assessed with the R-squared analogues of phi-p and tau-p.

* corrected the spelling error in the title; minor editing in the final paragraph.

Article originally appeared on Ludwig (http://ludwig.squarespace.com/).

See website for complete article licensing information.