Calculating the observed chi-square value

 

a)      Analyze the following table that describes the relationship between the percent black and median income in Massachusetts census tracts in 2000.[1] 

 

Table 1a. Median Income by Percent Black, all MA census tracts, 2000.

 

         Percent Black

 

Income

 

Low % black

(0-7%)

 

Medium % black

(7-50%)

 

 

High % black

(50%+)

 

Total

 

Lower Income

$0-$40K

 

 

11.3%

(125)

 

 

48.8%

(101)

 

68.9%

(31)

 

19%

(257)

 

Higher Income

$40K+

 

 

88.7%

(979)

 

 

51.2%

(106)

 

31.1%

(14)

 

81%

(1099)

 

Total

 

100%

(1104)

 

100%

(207)

 

100%

(45)

 

100%

1356

 

Table 1a examines differences in median income across census tracts in Massachusetts with different proportions of African-Americans. The table describes a strong association between the percent African-American in census tracts and the median income of tracts: as the proportion African-American increases, the median income of a tract decreases.

 

Among tracts where the African-American population was 50 percent or higher, just under one third of the tracts reported median incomes of $40,000 or more. In sharp contrast, among non-black tracts – that is, tracts that had 7 percent or fewer African-Americans – the vast bulk (88.7%) reported median incomes of $40,000 of more.  This is a 57.6 percent difference in median income between low black and high black tracts in Massachusetts. The magnitude of this difference suggests that tracts with African-American majorities are much poorer on average than tracts with low proportions of African-Americans. Of course, these latter tracts are predominantly White…..

 

 

b)      To compute the chi-square statistic, first compute the expected cell frequency for each cell of the crosstab. The expected frequencies are those one would expect given the univariate distributions (or marginals) of the variables in the table. They are calculated as

 

                        Expected Cell Frequency = (column total)*(row total)/N

 

For example, the computation of the expected cell frequency for census tracts with lower income and low percent black would be:

 

Table 1b. Example, expected frequency for Table 1a.

 

        Percent Black

 

Income

 

Low % black

(0-7%)

 

Medium % black

(7-50%)

 

 

High % black

(50%+)

 

Total

 

Lower Income

$0-$40K

 

Expected (257*1104)/1356 = 209.2

 

 

257

 

Higher Income

$40K+

 

 

 

 

 

 

Total

1104

 

 

N=1356

 

 

c)      Using the same procedure to compute all the expected frequencies yields the following table of expected frequencies:

 

Table 1c. Table of expected frequencies for Table 1a.

 

 

 

Low % black

(0-7%)

 

Medium % black

(7-50%)

 

 

High % black

(50%+)

 

Total

 

Lower Income

$0-$40K

 

 

Observed

Expected

 

125

209.2

 

 

101

39.2

 

31

8.5

 

 

257

257

 

Higher Income

$40K+

 

 

Observed

Expected

 

 

979

894.8

 

106

167.8

 

 

14

36.5

 

 

1099

1099

 

Total

 

 

1104

1104

 

207

207

 

45

45

 

1356

 

 

 

 

d)      The next step is to subtract the expected cell frequency from the observed cell frequency for each cell. This value gives the amount of the deviation or error for each cell. Adding these to the preceding table yields the following.

 

 

Table 1d. Table of observed-expected frequencies for Table 1a.

 

 

 

Low % black

(0-7%)

 

Medium % black

(7-50%)

 

 

High % black

(50%+)

 

Total

 

Lower Income

$0-$40K

 

 

Observed

Expected

Obs-Exp

 

125

209.2

-84.2

 

101

39.2

61.8

 

31

8.5

22.5

 

257

257

 

Higher Income

$40K+

 

 

Observed

Expected

Obs-Exp

 

979

894.8

84.2

 

106

167.8

-61.8

 

14

36.5

-22.5

 

1099

1099

 

Total

 

 

1104

1104

 

207

207

 

45

45

 

1356

1356

 

Notice that the sum of the expected row total is the same as the sum for the observed row total; the same is true for the column totals. Note also that the sum of the observed – expected for both rows and columns equals zero.

 


e)      Following this, the difference computed in the last step (i.e., the observed – expected) is squared, resulting in the following table:

 

Table 1e. Table of (observed-expected)2 frequencies for Table 1a.

 

 

 

Low % black

(0-7%)

 

Medium % black

(7-50%)

 

 

High % black

(50%+)

 

Total

 

Lower Income

$0-$40K

 

 

Observed

Expected

Obs-Exp

(Obs-Exp)2

 

125

209.2

-84.2

7089.64

 

101

39.2

61.8

3819.24

 

31

8.5

22.5

506.25

 

257

257

 

Higher Income

$40K+

 

 

Observed

Expected

Obs-Exp

(Obs-Exp)2

 

979

894.8

84.2

7089.64

 

106

167.8

-61.8

3819.24

 

14

36.5

-22.5

506.25

 

1099

1099

 

Total

 

1104

1104

207

207

45

45

1356

1356

 

 

f)        Each of the squared differences is then divided by the expected cell frequency for each cell, resulting in the following table:

 

Table 1f. Table of (observed-expected)2/expected frequencies for Table 1a.

 

 

 

 

Low % black

(0-7%)

 

Medium % black

(7-50%)

 

 

High % black

(50%+)

 

Total

 

Lower Income

$0-$40K

 

 

Observed

Expected

Obs-Exp

(Obs-Exp)2

(Obs-Exp)2/E

 

125

209.2

-84.2

7089.64

33.9

 

101

39.2

61.8

3819.24

97.4

 

31

8.5

22.5

506.25

59.6

 

257

257

 

Higher Income

$40K+

 

 

Observed

Expected

Obs-Exp

(Obs-Exp)2

(Obs-Exp)2/E

 

979

894.8

84.2

7089.64

7.9

 

106

167.8

-61.8

3819.24

22.8

 

14

36.5

-22.5

506.25

13.9

 

1099

1099

 

Total

 

 

1104

1104

 

207

207

 

45

45

 

1356

1356

 

 

g)      The chi-square statistic is computed by summing the last row of each cell in the preceding tables.

 

 

            The computation for this example would result in the following:

 

 

Observed chi-square = Sum (observed-expected)2

                           Expected

 

                                                       =   33.9+97.4+59.6+7.9+22.8+13.9 =  235.5[2]

 




[1] Note I have reduced the number of categories that are probably sensible in the interests of simplifying the example.

[2] Note that this value is within the rounding error of the value for chi-square computed by SPSS for the same data.