A DEEPER DIVE INTO THE ADVERSE IMPACT ANALYSES IN BAZILE V. CITY OF HOUSTON (2012)
by Eric Dunleavy Ph.D., Principal Consultant, DCI Consulting
Art has already written two detailed posts on Bazile v. City of Houston, which demonstrates how complex the 100-page ruling is. Judge Rosenthal dealt with a number of complex and technical issues. Art’s last post focused on the job-relatedness of the promotion exam. Another interesting aspect of the case relates to the adverse impact of the promotions and exam results that triggered the burden to demonstrate job-relatedness to begin with. A variety of adverse impact measurement methods were considered. Based on the written opinion, it appeared that the 4/5th rule may have been given more weight than statistical significance tests. However, as is often the case, it wasn’t that simple.
Substantial space was devoted to the 4/5th rule, which was correctly described as a descriptive index. Most of that space focused on its limitations, particularly in small samples. Multiple experts noted that the impact ratio and the 4/5th rule standard for evaluating the impact ratio doesn’t account for sampling error like statistical significance tests do, and that research has shown that 4/5th rule error rates are problematic in small samples. Of course, statistical significance test error rates may also be problematic when sample sizes are small.
During the time period of interest, the 4/5th rule was violated when analyzing the total selection process for promotions. There was expert disagreement over whether the 4/5th rule was violated on pass/fail results of the exam itself.
A number of statistical significance tests were also used to assess adverse impact in the time period of interest, including Fisher’s exact test, the Z “2 standard deviation” test, and the Pearson chi-square, which is essentially mathematically equivalent to the Z test. Interestingly, none of the tests produced statistically significant results. Fisher’s test produced a probability of .15, while the Z test produced a standard deviation of 1.66, which corresponded to a probability value of about .10 from a chi-square test. Thus, the statistical tests suggested that this difference in selection rates could happen by chance 1 time in 10 or even 1 time in 7, which is substantially more likely to be a chance event than the “1 in 20” EEO standard that a “2 standard deviation” or “alpha less than or equal to .05” translates to. From a social science perspective, we aren’t very confident that this difference in rates is not a chance event. Interestingly, one section of Judge Rosenthal’s opinion discussed how the 2 standard deviation rule of thumb is itself arbitrary and not binding.
It is also worth noting that there may have been some confusion concerning the relationship between the 4/5th rule and the statistical significance tests. At various points in the ruling it was concluded that there was “no demonstration that the 4/5th rule violation was statistically significant.” All three statistical tests listed above assess the likelihood that a difference in rates is 0, not whether a 4/5th rule violation is likely due to chance. This is an important distinction, because analyzing the difference between two rates and the ratio of those rates (via the 4/5th rule) are two different things. Interestingly, a relatively new statistical significance test, the “Z impact ratio” developed by Morris and Lobsenz in 2000, was used and discussed, although it is unclear how much weight it was given by Judge Rosenthal. This test directly assesses the likelihood that a 4/5th rule violation is due to chance. In this case this test was the only one that was statistically significant, but the experts noted that the creators of the test cautioned use in small samples and suggested using a more traditional measure.
Some other small sample rules of thumb like the “flip flop” test and “shortfall less than 1” rule were also considered, but didn’t seem to be given substantial weight. There was also expert debate on the appropriate statistical test and which probability model was most appropriate for the selection decision under scrutiny, but again, these factors seemed to make little difference based on the written opinion.
Critically, in addition to the time period of interest, promotion and exam data from previous years were also analyzed. This historical context and relevant results seemed to make the difference in the eyes of Judge Rosenthal. A Mantel-Haenszel test was computed, and produced a weighted statistical significance test across multiple years. This test was statistically significant. Perhaps most importantly, the historical data showed a 4/5th rule violation in every promotion cycle since 1993.
In the end, Judge Rosenthal seemed to put substantial weight on the historical pattern of 4/5th rule violations more than anything else. In other words, it appeared that the historical data trumped any particular adverse impact measurement method in the most recent promotion cycle, including statistical significance tests. This is a finding worth noting, particularly given the recent trend in EEO enforcement of relying only on statistical significance tests.
Could the 4/5th rule be mounting a comeback after year of being discarded in favor of statistical significance tests? Judge Rosenthal reasonably pointed out that the “2 standard deviation” probability threshold is as arbitrary as the 4/5th rule. With the availability of the Morris and Lobsenz “Z impact ratio” statistical significance test, the EEO community now has the means to consider sampling error and control error rates related to the 4/5th rule. We certainly recommend that both statistical significance tests and practical significance measures both be used, as did a recent TAC on the topic of adverse impact analyses (see Cohen, Aamodt, & Dunleavy, 2010). Stay tuned.