Chapter V
One of the main lessons of the previous chapter was another call for wariness: we need formal procedures to guard against reading our ideas into the data. Without a procedure such as simulation, we could easily misread the numerical pattern in Table 4.3 to support the preconceived conclusion that the prosecution of witchcraft in New England was biased against women. When a conclusion shapes the reading of the evidence, one has allowed the tail to wag the dog. And if not reigned in, such mistakes can lead to "proofs" by assertion and personal conviction, not to sound and substantiated arguments.
As a formal check on the confirmatory urge, physical simulation has several notable virtues. For one thing, simulation keeps us on our toes. When an observed pattern looks for all the world like the consequence of a real "cause," simulation forces us to consider chance as an alternative explanation. It also demystifies the meaning of chance and fosters an understanding of probabilistic reasoning. So important are all these virtues that they ought to become habits. Even when the random draws will not be carried out, students should construct a simulation model of the problem being studied to build intuition and reinforce understanding.
Simulations can also be carried out with the aid of a computer. Because it excels at doing repetitious tasks at a rapid pace, a computer can generate the results of hundreds, thousands, or even millions of draws in a fraction of the time it would take by hand. The problem that Professor Tally and students tackled is a good candidate for computer simulation. To recall the situation, 16 of 89 witchcraft cases resulted in execution; in 68 cases a female was charged, and in 21, a male. The aim of simulation is to find out how frequently random draws yield a pattern as extreme or more extreme than the historical outcome of 2 males and 14 females executed. Table 5.1 displays the results of 1000 random draws by computer. [1] In 185 of the 1000 drawings, 0, 1, or 2 males would be selected for execution simply by chance. This proportion (20.7 percent), only slightly less than the 22 percent Professor Tally's students got in their 50 random draws by hand, again confirms the conclusion that the higher execution rate for females was much more likely due to chance than to a systematic bias against females charged as witches.
|
Table 5.1. Computer Simulation of 89 Witchcraft Cases: 16 Executions, 68 Cases against Females, 21 Cases against Males
|
|||
|
Females Executed |
Males Executed |
Number of Occurrences |
|
|
16 |
0 |
8 } |
|
|
15 |
1 |
52 } 217 |
|
|
14 |
2 |
157 } |
|
|
13 |
3 |
237 |
|
|
12 |
4 |
240 |
|
|
11 |
5 |
184 |
|
|
10 |
6 |
82 |
|
|
9 |
7 |
29 |
|
|
8 |
8 |
8 |
|
|
7 |
9 |
2 |
|
|
6 |
10 |
1 |
|
|
5 or fewer |
11 or more |
0 |
|
|
Total |
|
1000 |
|
The Logic of Hypothesis Testing: Proof by Contradiction
Another well-known method for determining whether a relationship exists between two categorical variables is the Chi-square test of independence. The reasoning in this test is essential the same as the reasoning used in simulation. Indeed, both procedures follow the general logic of what is called formal hypothesis testing. As in simulation, hypothesis testing is a systematic method for deciding whether chance can be dismissed as the explanation of an observed numerical pattern. If chance can be dismissed, then the pattern must be explained by some systematic relationship between the variables. The chance argument is called the "null hypothesis" : it represents the idea that no relationship exits between the variables and that any observed co-variation between the variables is due to chance. A second argument is called the "alternative hypothesis.". The inverse of the null hypothesis, the alternative hypothesis represents the claim that the variables are related, and that this relationship explains the pattern in the data. Typically, the alternative hypothesis is the one that the researcher would like to confirm, for it embodies his or her theory about how a particular set of observed effects came about.
To make this more concrete, we can turn again to the problem of gender and execution. Re-analyzing this familiar problem should make it easier to understand this method of testing the chance argument. The first step is to state the null and alternative hypotheses:
Null hypothesis: There was no association between the gender of the accused and the likelihood of being executed. Once formally charged as witches, women were no more likely to be executed than men; any observed difference between the female and male execution rates was due to chance.
Alternative hypothesis: There was an association between the gender of the accused and the likelihood of being executed. Once formally charged, women were more apt to be executed than men; the difference between the higher female execution rate and the lower male rate was not the result of mere chance but a meaningful, historical outcome: the consequence of gender prejudice.
One of the paradoxes of logical inquiry is that the alternate hypothesis of association cannot be directly proved. The best we can do is to demonstrate that the null hypothesis is not a satisfactory explanation of the data. When the null hypothesis proves unsatisfactory, that provides solid grounds for concluding that the alternative hypothesis is a better explanation. In short, a formal hypothesis test is a form of proof by contradiction which rests on probabilistic reasoning: if the null hypothesis is very unlikely, then the alternative hypothesis is presumably "true."
Having stated the competing hypotheses, we move on to see whether we can rule out the chance argument contained in the null hypothesis. This decision rests on a comparison between the observed cell frequencies in a table and what are called the expected values: the cell frequencies that would occur if the null hypothesis were perfectly true. In making this comparison, there are two key questions: 1) how much do the observed numbers differ from the expected numbers? 2) What is the probability that their overall difference could happen by chance? The reasoning used to interpret the answers will by now seem familiar. The greater the difference, the smaller the probability that the observed numbers happened by chance. The smaller the difference, the greater the probability that it occurred by chance. With this in mind, we turn next to the chi-square statistic.
The chi-square statistic can best be understood as the "chi-square difference" because it is a summary statistic that measures the overall difference between the observed and expected numbers. A large chi-square difference will have a low probability, indicating that the null hypothesis can be rejected. Conversely, a small chi-square difference will have a large probability, indicating that the two variables are not related but independent. (See Figure 5.1)
Table 5.2 displays the observed and expected and observed frequencies for the data on gender and execution. The expected frequencies show the theoretical outcomes that would occur under the assumption that the null hypothesis is perfectly correct. Thus, under this assumption 12.22 females and 3.78 males would be executed, as compared to the actual outcome of 14 females and 2 males having been executed. Note also that the observed and expected numbers are fairly close and do not differ greatly, and that small differences are more likely to happen just by chance. As this comparison suggests, the relatively small differences between the observed and expected frequencies yield a rather small chi-square value of 1.33. (We shall return to the calculation of chi-square in a moment.)
|
Table 5.2 Sex and Execution of Accused Witches, New England 1638-1697: Observed and
Expected Frequencies
|
|||||||||
|
Execution |
Female |
Male |
Total |
|
|||||
|
|
Observed |
Expected |
Observed |
Expected |
|
|
|||
|
No |
54 |
55.78 |
19 |
17.22 |
73 |
|
|||
|
Yes |
14 |
12.22 |
2 |
3.78 |
16 |
|
|||
|
Total |
68 |
68 |
21 |
21 |
89 |
|
|||
Each chi-square value, be it large or small, is associated with a probability value, and it is this probability value that tells the researcher whether to accept or reject the null hypothesis. The probability associated with a chi-square value of 1.33 happens to be .248. In other words, a chi-square value as large as 1.33 or larger occurs by chance in 24.8 times out of 100 random draws, or in 24.8 percent of any number of random draws. As was done in the physical simulation, we then compare the probability statistic to the chosen cutoff point. In this text the cutoff point--in technical terms, the level of statistical significance--is set conventially set to .05. If a chi-square value as large or larger than ours happens 5 or fewer times out of 100 trials merely by chance, we have a sufficient reason for rejecting the null hypothesis. If a value as large or larger than ours happens more than 5 times out of 100 by chance, then we cannot rule out chance and must accept the null hypothesis. The correct decision in this case should come as no surprise: since the probability value of .248 is much larger than the cutoff point of .05, we have to accept the null hypothesis and conclude that no statistically signfificant relationship exists between gender and execution. There are no statistical grounds for claiming that gender prejudice was operating in the prosecution of accused witches.
Calculation of Expected Values
Though computers and stastictical programs make can easily handle all the computations for a chi-sqaure test, it is important to know how to calculate the expected values and the chi-square statistic. Learning and practicing calculations by hand clarifies the connection between expected values and the null hypothesis, and the meaning of the chi-square statistic as well.
To see the connection between the expected values and the null hypothesis, let us begin by looking at Table 5.3 showing only the row and column totals, as well as the totals column percentages. If the null hypothesis were completely correct, the cell counts would reflect the total column percentages exactly. The second row of the totals column shows that 17.98 percent of the 89 witchcraft cases resulted in
|
Table 5.3a Sex and Execution of Accused Witches, New England 1638-1697: Marginal Totals
|
|||||||||
|
Execution |
Female |
Male |
Total |
|
|||||
|
|
|
|
No. |
% |
|
||||
|
No |
|
|
73 |
82.02 |
|
||||
|
Yes |
|
|
16 |
17.98 |
|
||||
|
Total |
68 |
21 |
89 |
100.00 |
|
||||
execution. If gender had absolutely no effect on the outcomes of cases, the female execution rate would equal the overall execution rate of 17.98 percent, as would the male execution rate. The same reasoning holds for the cases that did not result in execution: the female and male "non-execution" rates would both equal to the overall non-execution rate of 82.02 percent. (See Table 5.3b.)
|
Table 5.3b Sex and Execution of Accused Witches, New England 1638-1697: Expected Cell
Percentages
|
|||||||||
|
Execution |
Female |
Male |
Total |
|
|||||
|
|
% |
% |
No. |
% |
|
||||
|
No |
(a) 82.02 |
(b) 82.02 |
73 |
82.02 |
|
||||
|
Yes |
(c) 17.98 |
(d) 17.98 |
16 |
17.98 |
|
||||
|
Total |
100.00% 68 |
100.00% 21 |
89 |
100.00
|
|
||||
Using the column percentages for the cells, we can now compute the expected cell frequencies. Given that 17.98 percent of all females charged were executed, then the number of females executed in cell c would equal 12.22 (cell c: 1798 x 68=12.22). The expected numbers for the three remaining cells are calculated similarly, using the total column percentage and the appropriate row total. (Two significant digits to the right of the decimal are retained to meet a requirement of the chi-square calculation which will follow.)
Cell a: .8202 x 68 = 55.78
Cell b: .8202 x 21 = 17.22
Cell d: .1798 x 21 = 3.78
With these results, we can construct a table of observed and expected frequencies like Table 5.2.
Calculation of Chi-square
The calculation of chi-square is straightforward. For each cell, subtract the expected frequency from the observed frequency and square the result to produce absolute numbers without plus or minus signs. Then divide by the expected value. Finally, sum the results for all cells. A formula summarizes these steps, where "1" is the first cell, "n" is the last cell, and · is the symbol for sum:
Total= 1.338
After computing chi-square, we must determine the number of degrees of freedom for the table, which is the number of cell values that are free to vary when the numbers in the row and column totals are fixed. In a 2 X 2 table only one cell frequency is free to vary. This can be seen by removing the cell counts in Table 5.2, placing a number such as10 in one cell, and then using addition or subtraction to determine the other three cell counts. In a table with R rows and C columns, the general formula for degrees of freedom is df = (R-1) (C-1). A 2 X 2 table has 1 degree of freedom. [(2-1)(2-1) = 1]
With the chi-square statistic and the degrees of freedom identified, we can turn to a table of probabilities in a statistic book to find the critical value of chi-square. This is the minimum value of chi-square needed to reject the null hypothesis at a specified level of significance. We have already set the level of significance, or probability, to p=.05. For a table with 1 degree of freedom, the critical value of chi-square 3.84. Since the calculated chi-square value (1.34) is less than the critical, or minimal, value, the null hypothesis cannot be rejected. (See Figure 5.1) The overall difference between the observed and the expected counts is too small to deny that this difference could have happened by chance. The two variables are not related but instead independent.
|
|
To review, the chi-square method of hypothesis testing has seven basic steps. (Note in steps 5 and 6 that there are different procedures to follow when doing the computations by hand.)
1. State the null and alternative hypotheses.
2. Specify the decision rule and the level of statistical significance for the test, i.e., .05, .01, or .001. (A significance level of .01 would mean that the probability of the chi-square value must be .01 or less to reject the null hypothesis, a more stringent criterion than .05.)
3. Compute the expected values.
4. Compute the chi-square statistic.
5. Identify the probability value associated with the chi-square statistic.
5a. (Alternative for hand calculations) Determine the degrees of freedom for the table. Then identify the critical value of chi-square at the specified level of significance (see step 2) and appropriate degrees of freedom.
6. Compare the probability value with the chosen level of significance; using the specified decision rule, reject the null hypothesis if the probability of chi-square is equal or less than the significance level; accept the null hypothesis if the probability is greater than the significance level.
6a. (Alternative for hand calculations) Compare the computed chi-square statistic with the critical value of chi-square; reject the null hypothesis if the chi-square is equal to or larger than the critical value; accept the null hypothesis if the chi-square is less than the critical value.
7. State a substantive conclusion, i.e., describe the meaning and importance of the test results in terms of the historical problem under investigation.
Several of these steps deserve further comment. Step two is a precaution against confirmational bias. If it were permissible to set or change the decision rule and significance level after computing the test statistics, it would be possible to adjust these test criteria in favor of the preferred hypothesis. Suppose that a researcher did not specify the significance level before computing the test results and then found that the probability of chi-square was .536. He or she might decide to adopt a significance level of .07, thereby making it possible to reject the null hypothesis in favor of the alternative. This kind of after-the-fact tinkering not only invalidates the test, it is one of the reasons why formal procedures are necessary in the first place.
Step seven calls attention to the difference
between a statistical and a substantive conclusion. In the current example, the statistical result
has one main point: the probability
value of .248 serves to confirm the null hypothesis. Had the probability value been .036, we could summarize the statistical conclusion with similar succinctness:
the chi-square value for this table is statistically significant at
the .05 level; this provides grounds for concluding that there was an association
between gender and execution. In contrast, a satisfactory substantive conclusion
would translate and extend the bald statistical findings into a meaningful
historical interpretation. One could
capture the essential ideas in a couple of sentences, developing them further
as seemed appropriate: There
are no statistical grounds for believing that female witches were more likely
to be executed than male witches. Evidently,
the courts in seventeenth-century New England
on the whole prosecuted cases of witchcraft prejudice against women.
Formal Complaints: The Increasing Vulnerability of Old Women
Armed with additional statistical tools, we can now continue our investigations into New England witchcraft by taking up several new problems. An interesting way to begin is to take a closer look at the ages of witches. Using the exploratory approach, our aim is simply to discover and interpret interesting patterns in the data on the accused formally charged as witches from 1638 and 1697 (again omitting those charged during the Salem outbreak). With the help of the chi-square test, we can go further than was possible before. After identifying an interesting pattern, we can now determine whether to attribute it to chance or to a relationship between variables.
Whether female or male, the typical witch in New England was not "old" as in Europe but middle-aged. This is the finding made earlier by examining a large number of cases regardless of their sequence in time. But since history is centrally concerned with questions about change, it would be interesting to know whether the age patterns of New England witches persisted or changed during the six decades of witchcraft prosecution from 1638 to 1697. Did the typical witch continue to be middle-aged? As the years wore on, did the vulnerability of younger and older people to charges of witchcraft increase, decrease, or remain the same?
Some initial answers emerge after studying Table 5. 4. It cross-tabulates 77 witchcraft cases by age group and three periods of roughly two decades each. Because of missing information on the age of the accused, the table omits 14 of the 91 cases in the sample. Since the percentages are computed down the columns, the best way to interpret this table is to read down the columns and make comparisons across the rows.
|
Table 5.4 Age of Accused Witches by Period, 1638-1697 (Expected frequencies in parentheses) |
|||||
|
Age |
1638-1659 |
1660-1679 |
1680-1697 |
Total |
|
|
Under 40
|
7 (4) 25.00% |
1 (4.57) 3.12% |
3 (2.43) 17.65% |
11 14.29% |
|
|
40 to 59 |
19 (18.91) 67.86% |
26 (21.61) 81.25% |
7 (11.48) 41.18% |
52 76.54% |
|
|
60 and over |
2 (5.09) 7.14% |
5 (5.82) 15.62% |
7 (3.09) 41.18% |
14 18.18% |
|
|
Total |
28 100.00% |
32 100.00% |
17 100.00% |
77 100.00% |
|
|
Source: Sample of witches.
|
|||||
|
DF: |
4 |
|
|
|
|
|
Chi-square:
|
14.75 |
Probability |
= .0052 |
|
|
|
Critical value of chi-square: |
9.49 |
|
|
|
|
People under 40 years of age made up 25 percent of the 28 cases dentified from 1638 to 1659, only 3 percent of the total identified in the l660s and 70s, and 18 percent of the known total in the 1680s and 90s. Hence the vulnerability of this age group to charges of witchcraft was greatest during the initial period of prosecution; thereafter it declined sharply and then rose again. Women and men from 40 to 59 were always more vulnerable than younger people. They were named in two-thirds of the identifiable complaints brought in the first period, in four fifths of those in the second, and two-fifths of those in the last period. Middle-aged New Englanders, especially women, thus remained the chief targets of formal complaints throughout the six decades of witchcraft prosecution, though they shared this unfortunate distinction during last period with old women (no men over 60 are known to have been formally accused). In contrast to both younger groups, the vulnerability of old women increased without interruption, as the percentage of complaints against aged women rose from 7 percent in the 1640s, 50s, and 60s, to 41 percent in the 1680s and 90s. The typical witch continued to be a middle aged women during most of the seventeenth century. But the sharp drop in complaints against middle-aged women in the final period of witchcraft prosecution, compared to the increasing number and proportion of formal accusations against old women over all three periods, suggests that the image of the witch in the minds of accusers and the authorities evolved considerably. The mental association of witchcraft with middle-aged females, so clear and dominant from the late 1630s through the 1670s, faded in the 1680s and 90s, making way for the classic image of the witch as an aged woman.
The next question is whether we can rule out chance as the explanation of the observed numbers and the striking trends they seem to represent. The chi-square statistics are highly significant. The total chi-square of 14.75 has a probability of .0052, much less than our chosen level of significance, or probability, of .05. With hand calculations, we find the same result by comparing the chi-square value with the critical value of chi-square at 4 degrees of freedom [(3 rows -1) (3 columns -1) = 4]. The chi-square of 14.75 is considerably larger than the critical value of 9.49. In sum, the statistical results show that the null hypothesis can be rejected and that the observed trends are statistically significant, i.e., the trends represent a real shift in the age patterns of people charged as witches.
A problem arises, however, because of another requirement of the chi-square test not mentioned before. Chi-square provides reliable results so long as the number of cases is not too small and the expected frequencies in each cell are not too small. For tables with 2 rows and 2 columns, a rule of thumb says that no more than 20 percent of the cells should have an expected frequency of less than 5, and that no cell should have an expected frequency less than 1. (See Appendix B.) The expected values for this table do not meet this requirement because 4 out of 9 cells have numbers less than 5. Consequently, the chi-square estimate of the probability value is unreliable, casting doubt on our argument that the observed trends are statistically and historically significant. In this kind of situation, we have essentially two choices. We can retain our conclusion while adding the necessary qualification that the conclusion rests on a flawed significance test. Alternatively, we can redesign the table to eliminate the problem of too many small expected numbers.
In this instance, there is still a further consideration. Table 5.4 includes cases against both females and males. Since the vast majority of cases involved women, it is important to know whether the trends just discovered will hold for the cases against women after omitting those against males. A new table incorporates both concerns by 1)omitting the accused males, and 2) reducing the categories of the variable "Period" from three to two. Some information is lost because reducing the periods from three to two results in cruder chronological distinctions. What we gain is the increased reliability of both the test results and the conclusions drawn from them. This seems a worthwhile trade-off. (See Table 5.5)
|
Table 5.5 Age of Accused Witches by Period, 1638-1667 and 1668-1697 (Expected frequencies in parentheses) |
||||
|
Age |
1638-1667 |
1668-1697 |
Total |
|
|
Under 40 |
6 (5.16) 17.14% |
3 (3.84) 11.54% |
9 14.75% |
|
|
40 to 59 |
26 (21.8) 74.29% |
12 (16.2) 46.15% |
38 62.3% |
|
|
60 and over |
3 (8.03) 8.57% |
11 (5.97) 42.31% |
14 22.95% |
|
|
Total |
35 100.00% |
26 100.00% |
61 100.00% |
|
|
Source: Sample of witches.
|
||||
|
DF |
2 |
|
|
|
|
Chi-square |
9.61 |
Probability |
= .0082 |
|
|
Critical value of chi-square |
5.99 |
|
|
|
The summary statistics are again highly significant, but this time we can interpret them without reservation because the expected values are well within the required range. Only one of the six cells has an expected frequency less than 5. The chi-square value (9.61) is larger than the critical value (5.99) and highly significant (p = .0082 is considerably less than .05). So rarely would a chi-square difference as large or larger than ours occur by chance that we can confidently dismiss the null hypothesis and conclude that the trends in this table reflect real historical changes.
These trends lack the interesting complexities detected earlier. Missing are the distinct fluctuations in the proportion of complaints brought against females and males under 40 years of age and against women 40 to 59. Nevertheless, the trends in Table 5.5 present a similar, if simplified, story of change.
Even though the typical witch continued to be a middle-aged woman, this central tendency weakened as a smaller number of women 40 to 59 and a larger number of older women became the objects of formal complaints. The image of the witch as a middle-age female faded from the late 1660s on, giving way to the classic European stereotype: the witch as an aged woman. Women at this stage in their lives became more vulnerable to witchcraft allegations as the severity and scale of prosecution declined. As the number of formal complaints diminished, they tended to fall on the weaker members of the community. Compared to middle-aged wives, older widows and spinsters lacked the protection that husbands provided against accusations. With or without husbands, older women no longer performed the primary functions expected of adult women: bearing and raising children and providing for the domestic needs of husbands and families. For this reason, among others, they were more apt to be viewed as burdens than as contributing members of families and communities. Relatively defenseless and likely regarded as burdens, older women were increasingly exposed to charges of witchcraft.
If most New England witches were women of middle age or older, they were also, argues historian Carol Karlsen, women who came from families without male heirs. Under these circumstances, they stood to inherit, or did inherit property from their fathers or their husbands. In a society where the orderly transmission of property from one generation to the next was one of the over-riding concerns of family and community life, a chief source of social tension and conflict were the real and potential property disputes between those who inherited and those who did not or who received less than they believed were their due. For a female to inherit a father's or husband's estate made matters worse because contemporaries expected estates to pass down to males. Inheriting or potentially inheriting females were atypical in New England society. Often the objects of male resentment, they were more frequently accused of witchcraft than women from families with male heirs. Some accusations came at or near the time of the inheritance; others before, or long after, the inheritance. As in the case of Susanna Martin, some accused witches were women who felt wrongfully disinherited and pressed their claims of entitlement in court. Once accused, according to Karlsen, these women were more likely to be tried, convicted, and executed than accused women with brothers or sons.
This generalization and the larger argument of which it is a part appears in her recent and provocative book that seeks to explain why certain women were so susceptible to being denounced and prosecuted as witches. [2] The generalization rests on her analysis of 158 accused females whom she identified as having or not having brothers or sons. This claim deserves serious consideration because the research behind it supersedes other studies of witchcraft in New England and in Europe. By drawing on the numbers in the book, we can test her generalization to see if the likelihood of being tried, convicted, and executed was significantly greater for women without brothers or sons. This test will return us to the confirmatory approach to data analysis. In general, confirmatory studies evaluate an existing argument or theory using fresh data that differ from those used to generate the theory in the first place. Our test will be in keeping, not with the letter, but with the spirit of this approach because we shall be using Karlsen's information, not a new collection of data.
This said, we can evaluate each part of her generalization separately. Table 5. 6a cross-classifies the 158 identified accused females with or without brothers or sons by whether they were tried or not. Of the 96 accused women without brothers and sons, 42 percent were tried, compared to 37 percent of the 62 accused women who had brothers or sons. This pattern conforms to the generalization, but the difference of 5 percent proves too small to be statistically significant. The probability of .483, much larger than the chose significance level of .05, indicates that a difference of that size is very likely to happen by chance. (The same conclusion follows from noting that the chi-square value of .49 is considerably less than the critical value of 3.841.) The statistical evidence does not confirm the claim
|
Table 5.6a Prosecution of Accused Females With or Without Brothers and Sons, 1620-1725
|
||||||||
|
Action |
With Brothers or Sons |
Without Brothers or Sons |
Total |
|
||||
|
Not Tried |
39 62.9% |
55 58.51% |
94 59.49% |
|
||||
|
Tried |
23 37.1% |
41 42.7% |
64 40.51% |
|
||||
|
Total |
62 100.00% |
96 100.00% |
158 100.00% |
|
||||
|
Source: Karlsen, Devil, p. 102, table 11.
|
||||||||
|
DF |
1 |
|
|
|
||||
|
Chi-square |
.49 |
Probability |
= .4829 |
|
||||
|
Critical value of chi-square |
3.84
|
|
|
|
||||
that, once accused, women without brothers and sons were more apt to be tried than women with brothers and sons.
Although one part of the generalization fails the test, the claims concerning conviction and execution are upheld. As Table 5.5b shows, 26 percent of the inheriting or potentially-inheriting women were convicted, compared to 13 percent of the accused women from families with male heirs. The difference between the higher conviction rate of
|
Table 5.6b Conviction of Accused Females With or Without Brothers and Sons, 1620-1725 |
||||||||
|
Action |
With Brothers or Sons |
Without Brothers or Sons |
Total |
|
||||
|
Not Convicted |
54 87.1% |
71 73.96% |
125 79.11% |
|
||||
|
Convicted |
8 12.9% |
25 26.04% |
33 20.89% |
|
||||
|
Total |
62 100.00% |
96 100.00% |
158 100.00% |
|
||||
|
Source: Karlsen, Devil, p. 102, table 11.
|
||||||||
|
DF |
1 |
|
|
|
||||
|
Chi-square |
3.94 |
Probability |
= .0473 |
|
||||
|
Critical value of chi-square |
3.84 |
|
|
|
||||
the former and the lower rate of the latter is large enough to be statistically significant because the probability of chi-square ,.0473, is less than the cut off point of .05. (The alternate method used with hand calculations yields a significant result because the chi-square value of 3.94 is larger than the critical value of chi-square, 3.841, the minimum needed to reject the null hypothesis.)
The evidence supporting the final claim about executions is considerably stronger. Seventeen percent of the accused females without brothers and sons were executed as opposed to only 3 percent of the accused women with brothers or sons. The probability that a difference of 15 percent in this table could happen by chance
|
Table 5.6c Execution of Accused Females With or Without Brothers and Sons, 1620-1725
|
||||||||
|
Action |
With Brothers or Sons |
Without Brothers or Sons |
Total |
|
||||
|
Not Executed |
60 62.9% |
79 58.51% |
139 59.49% |
|
||||
|
Executed |
2 37.1% |
17 42.7% |
19 40.51% |
|
||||
|
Total |
62 100.00% |
96 100.00% |
158 100.00% |
|
||||
|
Source: Karlsen, Devil, p. 102, table 11.
|
||||||||
|
DF |
1 |
|
|
|
||||
|
Chi-square |
8.78 |
Probability |
= .0063 |
|
||||
|
Critical value of chi-square |
3.84 |
|
|
|
||||
is extremely remote: p=.0063, or roughly 6 times out of 1000. The evidence for this claim is "highly significant": there are strong statistical reasons for concluding that the likelihood of being executed was greater for accused women from families without male heirs than for women from families with male heirs. In comparison, the evidence concerning the likelihood of being convicted is merely "significant" because the test statistics are very close to the minimum values needed to eliminate the chance argument.
Two of the three points in Karlsen's generalization stand up to the scrutiny of the chi-square test. In a full evaluation of her argument, we would consider other aspects of her evidence and reasoning, along the lines suggested in chapter three where we looked into Monter's patriarchal interpretation. One issue to explore is whether a potential or possible inheritance was as likely as an actual inheritance to prompt an accusation as well as a prosecution. Another question is whether the numerical pattern might be explained by something other than resentment and disapproval of female inheritance and economic independence. Possibly, women from families without male heirs were more likely to be accused, convicted, and executed because they could not call on brothers or sons to shield them from allegations and prosecution. Women from families without male heirs were apt to have fewer influential defenders.
As for the more technical aspects of the tests, the different outcomes show how we can draw distinctions among degrees of statistical significance. The following table clarifies this point.
|
Table 5.7 Comparison of Test Results at Significance Level p = .05
|
||||||
|
Test |
Chi-square |
Critical Value |
Probability |
Null hypothesis and Statistical Significance |
Generalization |
|
|
Prosecution |
.49 |
3.84 |
.483 |
accept, not significant |
not supported |
|
|
Conviction |
3.94 |
3.84 |
.047 |
reject, significant |
supported |
|
|
Execution |
7.47 |
3.84 |
.0063 |
reject, highly significant |
strongly supported |
|
The purpose of confirmatory analysis is not only to evaluate existing arguments but also to revise them when possible. In revision, the specific generalization we have examined should reflect our findings: once accused, women without brothers or sons and women with brothers and sons were more or less equally apt to be subjected to the rigors of a trial. But the former were more likely than the latter to be convicted and executed.
At this point it will be useful to listen in again to discussions between Professor Tally and students as they explore the statistical and historical issues that come up in their investigations. Their discussions open with Karlsen's generalization.
Professor Tally: For the first part of today's class I asked you to think over the results of testing Karlsen's generalization and to bring your comments and questions up at the beginning of the period. So, fire away!
(Two, long minutes of silence)
That was a complicated analysis. I'm surprised that there aren't any questions. And comments. What do you make of the argument? Earlier we found that there was no relationship between gender and the likelihood of being executed. Now we have an argument that tries to explain why certain accused women were more apt than others to be hung. Are we getting closer to a persuasive explanation of the pattern of accusations and executions? . . . . Do I detect an over-dose of chi-square? Or is it statistical burn out? . . . . Melissa?
Melissa: I don't know about the others, but I've had an over-dose of chi-square. I've gotten the calculations down, but the idea of statistical significance is really confusing.
Professor Tally: I'm sure you're not the only one struggling with the meaning of statistical significance. A good way to understand it is this: we say a finding is statistically significant when we can eliminate chance as a possible explanation for it. When chance can't be eliminated, we say that the finding is "not significant." I introduced some complexity last time by drawing distinctions between statistically "significant" test results, and "highly significant" results. "Highly significant" was appropriate because the probability of a chi-square statistic--or the chi-square difference, as I've called it--as large or larger than the one we had was very low--.006. That was much lower than the probability of the second chi-square difference we had, which was .0473, and just below the cut off point of .05.
The first and most important thing to determine is whether a pattern is statistically significant or not significant. Making distinctions among degrees of significance is a way of differentiating the strength of the evidence in support of a proposed alternative hypothesis. However, I don't want any one to get too concerned with these distinctions until you're comfortable with the basic meaning of statistical significance.
Sometimes confusion arises because "significance" can refer to statistics as well as to important historical events. Because of this we have to use the terms precisely. I've just described what statistical significance means. Historical significance, or substantive significance generally, is another matter. A statistically significant finding may have little or no substantive significance. What if a diligent researcher discovered a statistically significant relationship between the number of vowels in a person's last name and being accused of witchcraft? He might point to pride to this new finding. In a well-turned field like the history of New England witchcraft, it's not easy to dig up something new. Would this "new" finding have any historical significance? Not really.
It seems to me that there are three points to remember. First, statistical significance and substantive significance are different. Second, the researcher must explain why a statistically significant finding is historically important; it's not going to speak for itself, and it's rarely going to be obvious. Third, the researcher has to be alert to the historical importance of a statistically non-significant finding. As we saw in the example of gender and execution, the discovery that two variables are independent may have something interesting to tell us. It shouldn't just be dropped.
Jim: At first I thought that execution could be explained by gender, so I was impressed by Karlsen's argument. Even though one part of generalization is wrong, she seems to be on the right track.
Elizabeth: After thinking about it a while, it seemed so obvious that women who did or could inherit property would be chief targets of male resentment that I wondered why no one thought of this before. Or is Karlsen's idea an old one?
Professor Tally: Previous studies of European witchcraft have touched on the connection between property disputes and witchcraft accusations, but Karlsen's book is the first to my knowledge that offers extensive quantitative evidence to support and refine the idea. She also points out, remember, that the study of particular trials also provides important qualitative evidence on the circumstances and nature of property disputes, evidence that resists quantification. In this respect, her book--like John Demos' Entertaining Satan and Boyer and Nissenbaum's Salem Possessed--is a good example of the blending of quantitative and traditional forms of history. Neither form is as persuasive on its own as they can be together. The quantitative findings, for example, furnish the basis for choosing revealing examples like Susanna Martin. Once the overall pattern is known, the historian can select examples that are representative of the sample and not just the first or most striking example that catches his or her eye.
Don't forget that Karlsen's data consist of all the accused women whom she could identify as having or not having male heirs in their families. This includes women accused during the Salem outbreak, whom we've omitted in our previous work. Just imagine how long it took to collect all the information, the months spent reading through family records, wills, trials, and so forth. Her research is impressive.
Ed: I 've got a more general question. Now that we're familiar with hypothesis testing and the chi-square test, is it really necessary to write down the null and alternative hypotheses? Do we have to include them in our paper for next week?
Professor Tally: The answer is "yes," and let me explain why. It's essential to know what question you're examining, and writing down the null and alternative hypotheses forces you to state explicitly and clearly what you 're investigating. This will help you guard against misinterpreting your research question and your test results.
Let me give you an example. One of the questions we examined last time was whether, once accused, women without brothers or sons were more likely to be convicted than women with brothers and sons. The table we used is the first one on the board.
|
Table 5.6b Conviction of Accused Females With or Without Brothers and Sons, 1620-1725
|
||||||||
|
Action |
With Brothers or Sons |
Without Brothers or Sons |
Total |
|
||||
|
Not Convicted |
54 87.1% |
71 73.96% |
125 79.11% |
|
||||
|
Convicted |
8 12.9% |
25 26.04% |
33 20.89% |
|
||||
|
Total |
62 100.00% |
96 100.00% |
158 100.00% |
|
||||
|
Source: Karlsen, Devil, p. 102, table 11.
|
||||||||
|
DF |
1 |
|
|
|
||||
|
Chi-square |
.49 |
probability |
= .4829 |
|
||||
|
Critical value of chi-square |
3.84 |
|
|
|
||||
Now, what if the same question were modified so it began like this: once accused and tried, were women etc. What table do we need for this question.? Look at the data sheet I handed out last time, and think about that while I put the correct table on the board.
|
Table 5.6d Conviction of Accused Females With or Without Brothers and Sons Who Were Tried, 1620-1725
|
||||
|
Action |
With Brothers or Sons |
Without Brothers or Sons |
Total |
|
|
Tried, not convicted |
15 |
16 |
31 |
|
|
Tried, convicted |
8 |
25 |
33 |
|
|
Total |
41 |
23 |
64 |
|
|
Source: Karlsen, Devil, p. 102, table 11.
|
||||
Here we're considering only the 64 women who were tried, not all 158 women who were accused. The question and the table have to correspond.
Think back to our first discussion of the gender and execution example. My hunch is that some initial confusion there stemmed from thinking that the question we were examining included all accused witches and not just those formally charged. So, yes, Ed, I do want you to write down the hypotheses and include them in your next paper. After you've had more practice framing questions and hypotheses, I won't keep requiring these formalities in your papers.
Okay. Let's move on to the major topic for today's discussion. Your assignment was to carry out hypothesis tests on two tables. These tables take up the question of whether execution rates varied significantly by the colony in which the accused lived. Below each table is a sample interpretation that you were supposed to comment on. Are the interpretations sufficient? satisfactorily developed? persuasive? If not, how could you improve them?
Let's take a look at them now and see what you've come up with.
|
Table A (5.8a) Execution of Accused Witches by Colony of Residence, New England, 1638-1697, Salem outbreak of 1692 excluded
|
||||
|
Action |
Massachusetts |
Connecticut |
Total |
|
|
Not executed |
44 |
31 |
75 |
|
|
|
89.90% |
73.81% |
82.42% |
|
|
Executed |
5 |
11 |
16 |
|
|
|
10.20% |
26.19% |
17.58% |
|
|
Total |
49 |
42 |
91 |
|
|
|
100.00% |
100.00% |
100.00% |
|
|
Source: Sample of witches |
||||
|
|
|
|
|
|
|
Interpretation : "Table A shows that the pattern did not occur by chance and that it can be attributed to a relationship between "Colony" and "Execution." |
|
Table B (5.8b) Execution of Accused Witches by Colony of Residence, New England, 1638-1697, Salem outbreak of 1692 included
|
||||
|
Action |
Massachusetts |
Connecticut |
Total |
|
|
Not executed |
176 |
31 |
207 |
|
|
|
88.44% |
73.81% |
85.89% |
|
|
Executed |
23 |
11 |
34 |
|
|
|
11.56% |
26.19% |
14.11% |
|
|
Total |
199 |
42 |
241 |
|
|
|
100.00% |
100.00% |
100.00% |
|
|
Source: Sample of witches; Lyle Koehler, A Search for Power: The "Weaker Sex" in Seventeenth-Century New England (Urbana, Ill., 1980), pp. 480-490; Richard Weisman, Witchcraft, Magic, and Religion in 17th-Century Massachusetts, (Amherst, 1984), pp. 209-216, Appendix C. |
||||
|
|
|
|
|
|
Interpretation: "The pattern in Table B is even more significant because the chi-square value is much larger than the critical value of 3.184. In these data, there was definitely a relationship between "Colony" and "Execution."
Karen: According to my figures, the null hypothesis can be rejected in both cases. So we can conclude that there was a relationship between "Colony" and "Execution." I think the interpretations are okay, but both could be improved. Like, thehe wording is pretty vague, and the conclusion doesn't have enough details.
Professor Tally: Good ideas. Can you be more specific? How would you state the conclusion?
Karen: I would say something like "a larger percentage of the accused witches were executed in Connecticut than in Massachusetts." That's important, but neither example mentions it.
Carrie: I agree. If you don't mention that the accused witches were more likely to be executed in Connecticut, you leave out the main point.
John: But all the reader has to do is look at the table to see that. I don't see why it's necessary to repeat what's pretty obvious in the table. Repetitious writing is a chore to read, and a drag to write, too.
Mark: Being clear about things doesn't have to be boring. Adding the sentence Karen and Carrie suggested makes the interpretation clearer and doesn't repeat anything. I remember misreading the gender and execution table. I don't think you can always count on person reading the table correctly. The numbers won't speak for themselves, you've got to explain them to people.
Jim: Yeah, John. They're right. You're "outnumbered".
Professor Tally: Anyone want to add something? . . . . No? Okay. John you're right in one sense. It's a real challenge to write interestingly about tables and numbers without being repetitious. Still, the others are making a point I want to stress, too. Including a clear, verbal description of the pattern in the table is the author's responsibility because it's his or her task to guide the reader through the analysis, pointing out what to notice and how to understand it.
Let me stress a related point, too. Now that you know the meaning of the summary statistics, chi-square and the probability value, you may end up focusing on them at the expense of other parts of the analysis. Yes, the summary statistics can tell you at a glance whether there's evidence for a significant relationship in the table. But these statistics will not tell you how two variables are related, or how they vary together. To know that, you have to study the pattern in the table. So be sure in your papers to include a clear description of the pattern in any table you present.
Before we go further, let's post the summary statistics for these two tables. Susan and Steve, will you put them on the board? You can collaborate, or each of you can take one of the tables.
|
Summary Statistics for Table A (5.8a) Execution of Accused Witches by Colony of Residence, New England, 1638-1697, Salem outbreak of 1692 excluded
|
|||||
|
DF |
1 |
|
|
|
|
|
Chi-square |
3.988 |
Probability |
= .0458 |
|
|
|
Critical Value of chi- square |
3.841 |
|
|
|
|
|
Number of cases |
91 |
|
|
|
|
|
|
|
|
|
|
|
|
Summary Statistics for Table B (5.8b) Execution of Accused Witches by Colony of Residence, New England, 1638-1697, Salem outbreak of 1692 included
|
|||||
|
DF |
1 |
|
|
|
|
|
Chi-square |
6.128 |
Probability |
= .0133 |
|
|
|
Critical Value of chi- square |
3.841 |
|
|
|
|
|
Number of cases |
241 |
|
|
|
|
Meanwhile, any other ideas for revising the flawed interpretations on your assignment sheet? The ones under the tables.
Margaret: Well, we've learned a lot about defining our samples, so I think it would be important to describe the samples. In Table A, the sample omits the Salem cases, but in Table B, they're included. Even though the conclusions seem to be the same, I would mention that the addition of the Salem cases didn't make much difference. The execution rate in Connecticut was significantly higher.
Steve: Yeah, I was surprised that the results were pretty much the same. I thought Salem would change things, being an exception and all.
Professor Tally: John?
John: There isn't much difference in the table percentages, but there are some large differences in the summary statistics. The chi-square and probability value are only "significant" in Table A, but they're "highly significant" in Table B. Now that I've had to re-think this interpretation business, I'd say these differences really should be noted.
Professor Tally (smiling): Bravo! I'm glad you've seen the light. That's a good point. Can anyone explain why these differences occurred?
Elizabeth (grinning): It's got to have something to do with the numbers.
Professor Tally: Yes, Oracle. But which numbers? Anyone? Here's a hint: the reason for the larger chi-square statistic in Table B doesn't have much to do with the distribution of cases in the cells. Look at the tables. The cell percentages differ only for Massachusetts, and then only very, very little.
John: There are more cases in Table B. Does it have something to do with that?
Professor Tally: You're on a roll, John. Yes, it does. Can you take it further?. . . . Anyone?
Carrie: Well, maybe when you put larger numbers into the chi-square formula-- which now we all know by heart--you'll get bigger differences between the observed and expected numbers. So the chi-square statistic will tend to be larger.
Professor Tally: Excellent! I'm vindicated for having made you do all those hand calculations of chi-square. Carrie's got the basic idea. The chi-square "difference" is sensitive to changes in sample size. In fact, it is directly proportional to the size of the sample. That means that doubling the cell counts in a table will double the computed chi-square value, even though the number of rows and columns of the table does not change.
Let's see what happens if we double the size of the cell counts in Table A. In the first cell we have 44 cases; 2 times 44 is 88. And so forth. I'll put the table and the chi-square statistics on the board:
|
Table A Revised (5.8c): Cell Counts Doubled.
|
|||||||
|
Action |
Massachusetts |
Connecticut |
Total |
|
|||
|
Not executed |
88 |
62 |
150 |
|
|||
|
|
89.80% |
73.81% |
82.42% |
|
|||
|
Executed |
10 |
22 |
32 |
|
|||
|
|
10.20% |
26.19% |
17.58% |
|
|||
|
Total |
98 |
84 |
182 |
|
|||
|
|
100.00% |
100.00% |
100.00% |
|
|||
|
Results for Doubled Cell Counts |
|||||||
|
DF |
1 |
|
|
|
|||
|
Chi-square |
7.98 |
Probability |
=.0047 |
|
|||
|
Critical value of chi-square |
3.841 |
|
|
|
|||
|
Sample size |
182 |
|
|
|
|||
|
|
|
|
|
|
|||
|
Actual Results for Table A (5.8a) |
|||||||
|
DF |
1 |
|
|
|
|||
|
Chi-square |
3.99 |
Probability |
.0458 |
|
|||
|
Critical value of chi-square |
3.841 |
|
|
|
|||
|
Sample size |
91 |
|
|
|
|||
Okay. So here's the result of doubling the cell counts. The total number of cases has also doubled, but notice that the percentages have stayed the same: the proportion of cases in each cell has not changed. But look at the test results. Doubling the cell counts and sample size has doubled the chi-square and that larger chi-square has a much smaller probability. So where the tests results are merely "significant" in Table A, they are "highly significant" in the table with doubled cell counts, even though the proportional relationships in the tables have remained the same.
Now that's the hypothetical case, and in practice you're not going to be dealing with two tables where one had twice as many cases in each cell as the other. So, let's get back to our real data. In the current example, the addition of 150 Salem cases increases the sample size from 91 to 241, which is a little more than two an a half times larger. These 150 cases are all distributed in the Massachusetts column, but they increase the Massachusetts execution rate by barely more than 1 percent. That 1 percent increase contributes a bit to the overall increase in the chi-square difference, but most of the overall increase simply comes from enlarging the sample size. For that reason, it wouldn't be wise to make too much out of this change and the corresponding decrease in the probability value from .0458 in Table A to .0133 in Table B. We can note that the addition of Salem raised the execution rate very slightly. However, the main point in both tables is that the Connecticut rate was significantly higher than the Massachusetts rate, with or without Salem.
Can you think of any implications of chi-square being proportional to sample size? Well, what might happen if you had a very large sample and a small chi-square difference?
Carrie: Probably the small difference would be statistically significant because the sample is so large.
Professor Tally: Right. And if the sample were small and the chi-square difference remained as small as before? Margaret?
Margaret: I guess the small difference would probably be not significant.
Professor Tally: Good. In a small sample, you need a relatively large chi-square difference to get a statistically significant result; and in a large sample, a relatively small chi-square difference will tend to be significant. You have to be very careful with large samples because small chi-square differences will be statistically significant at just about any chosen level of probability. To make matters worse, these small but statistically significant chi-square differences may have little if any substantive meaning. One way to compensate for this problem is to choose a more stringent level of probability when large samples are involved. This means choosing .01, or .001 instead of .05 when we're working with a large sample. And yet with rather large samples, small differences may prove to be significant at .001 or less, so we have to be on guard no matter what level of probability is chosen.
Elizabeth: How does all this affect our ability to compare degrees of significance?
Professor Tally: If the samples we're comparing are roughly the same size, we don't have to worry. They were exactly the same size in the examples we worked through last time. Our current examples of Table A and Table B show that we should discount what appears to be the more highly significant result in Table B because the result is due mainly to the increase in sample size. If we have two significant results, A and B, and the more highly significant one, B, comes from a smaller sample; then we have a good reason to described result B as more highly significant, because the higher significance in this case can't be attributed to a large sample size. Is that clear?
Alright, the final item on toady's agenda is really preparation for your assignment. I'm going to give you a table and the following problem to tackle. The problem comes out of our reading about Salem Village in Boyer and Nissenbaum's Salem Possessed. [3] The data pertain to the heads of household in the Village; not surprisingly, all were men, except for five widows who were serving as household head after their husbands' deaths. The information comes from records of the village meetings and a list of the accusers and accused in the village. The problem is whether a relationship existed between wealth and support for the witch trials. Supporters of the trials were residents who made accusations; opponents included the accused, residents who defended the accused against the charges, and three residents who defended others and were accused themselves. [4] If there were a relationship, what might be the connection between wealth and support or opposition to the trials. Let's take a little time now just listing the possibilities. I asked you to think about this before class, so you must have some thoughts. Susan?
Susan: At first it seemed to me that one possibility was a poor-versus-rich scenario. The poor might have been trying to use witchcraft accusations to strike back at wealthy enemies they couldn't attack openly. But then I remembered that most of the accused in New England and Europe were sort of poor, so decided that it might have been the wealthy who supported the trials as a way of controlling the poor--I mean the poor who were troublemakers.
Professor Tally: Good. Other ideas?
Elizabeth: My hunch is that the middle class was trying to strike at the wealthy people like the Porters and their allies. I'd include the Putnams in the middle class because they were declining economically. Salem was a massive outbreak, and in those circumstances accusations usually moved up the social ladder. When that happened, the wealthy, powerful people tried to stop the trials. So I think it's plausible that the middle groups, who resented the growing wealth and power of people like the Porters, would support the trials as a way of getting back at their enemies.
Steve: I'd agree with that, but I'd add the low economic group to the middle-class supporters. I remember Boyer and Nissenbaum saying that the village faction that supported the minister, Samuel Parris, was a sort of coalition of middling and low economic groups. So I think it was the same coalition that supported the trials, against the opposition of the wealthy.
Professor Tally:.. Any other hunches?
Melissa: There's one possibility no one has mentioned. Maybe the wealthy and middle group joined forces against the poor. That's what usually happened because elites controlled the courts and the prosecution of witchcraft. But after reading Salem Possessed, I think Steve and Elizabeth's ideas fit better with the history of Salem Village.
Professor Tally: Okay, we've got a number of plausible alternatives. In each one, we seem to be saying that witchcraft accusations were a means of wielding power against a group of perceived antagonists. An accusation is an attempt to punish or control an individual or group by bringing to down on them the weight of community and state sanctions. The question is who was attempting to wield power against whom? I'll list the possibilities you've mentioned so far on the board.
|
Interpretation |
Favored the Trials |
Opposed the Trials |
|
1 |
poor |
rich |
|
2 |
wealthy |
poor |
|
3 |
"middle class" |
wealthy and allies |
|
4 |
"middle class and the poor |
wealthy and allies |
|
5 |
wealthy and "middle class" |
poor |
Now, which one do you prefer? Let's take a quick poll. Hands up if you prefer the first interpretation. . . . . No one? No romantics . . . no radicals. . . no defenders of the oppressed? Okay. Number 2? . . . 2. Number 3?. . . . 5. Four? . . . . 12. Five? . . . . 2. A clear majority prefers number 4. But each of them has merit. Number 5, for instance, is quite plausible. Does anyone agree?
Ed: That seems to fit with a lot we've learned. The wealthy and middle group made up the elite, and the elite controlled the courts and the trials. The majority of the accused were relatively poor, so the purpose of the trials was to keep the poor in line or to get rid of people who were economic burdens, like widows and spinsters.
Professor Tally: Other thoughts?
Elizabeth: I think that Monter's argument fits here just as well. The purpose of the accusations and trials was to reinforce patriarchal control by punishing women who were considered too aggressive and too independent. They were pretty defenseless because they were rather poor, and many were widows or spinsters. The wealthy and middle-class men who controlled the courts made examples out of some women to warn other women to keep in line.
Professor Tally: We're nearly out of time, so I'll try to sum things up. A case can be made for each of these interpretations. It's fair to say, however, that the third and fourth interpretations are better at capturing the particular circumstances of Salem Village as Boyer and Nissenbaum describe them. The idea that the residents from the middle and low economic groups joined forces against the wealthy is in some respects consistent with their interpretation. In their view, village factionalism and the witch trials were part of the larger struggle between the Puritan farmers of rural Salem Village against the commercially-minded residents of Salem Town, plus some villagers who were oriented more toward the town and commerce than toward the village and agriculture. Chief among the latter were the Porters. This family included the wealthiest men in the village, Isreal Porter and his younger brother Joseph. These men and their allies formed one of the two village factions. The Putnam family headed the other faction that represented the interests of the village farmers. Elizabeth, you suggested putting the Putnams in the middle economic group because their preeminence in the village had deteriorated. Although the Porters were wealthier, in the 1690s, their economic decline had not gone that far. Even though the individual wealth of the various Putnams diminished as the family lands were divided among successive generations, the majority of the Putnam men in 1692 were still well off and counted among the wealthy and most influencial members of the village.
Apart from these reservations, interpretations three and four clearly deserve our consideration, and so do each of the others. I want you to work out for yourself an interpretation that seems to fit best with the particular history of Salem Village. Let me add that I can think of several other plausible interpretations that haven't been mentioned today, so don't feel limited to only the five on the board.
For next time, study the assignement sheet I'm now passing out. Then do a hypothesis test to decide whether your preferred hypothesis is supported by the data, and prepare to discuss your results in class as usual.
|
Table 5.9 Salem Village Residents: Wealth and Support for the Witch Trials of 1692
|
||||||||
|
Tax Bracket, 1690 |
Favored |
Opposed |
Total |
|
||||
|
More than 20 shillings |
8 57.14% |
6 42.86% |
14 100.00% |
|
||||
|
10 to 20 shillings |
9 60.00% |
6 40.00% |
15 100.00% |
|
||||
|
Less than 20 shillings |
11 57.89% |
8 42.11% |
19 100.00% |
|
||||
|
Total |
28 58.33% |
20 41.86% |
48 100.00% |
|
||||
|
|
|
|
|
|
||||
Source: Rate in support of the ministry, July 1689-July 1690, July 1694-July 1695, July 1695-July 1696; "List of Accused Witches Who Lived in and Around Salem Village;" Selected List of Accusers and Persons Against Whom They Testified;" List of Defenders Connected with Salem Village;" all in Paul Boyer and Stephen Nissenbaum, eds., Salem-Village Witchcraft: A Documentary Record of Local Conflict in Colonial New England (Belmont, Calif., 1972) pp. 353-54, 375, 379-82.
Note: Of the 137 heads of household residing in the village between 1690 and 1695, we know that 66 publicly declared their support or opposition to the accusations and trials: there were 30 accusers, 3 accused, 19 defenders, 1 accuser who was also accused, 10 defenders who were also accusers, and 3 defenders who were themselves accused. The table excludes 11 ambiguous cases involving heads of households who supported some trials but opposed others, as well as the 71 household heads who apparently took no public role in the witch-hunt. Also omitted are 2 supporters and 5 opponents whose wealth cannot be estimated because they did not appear on any of the tax lists for 1690, 1695, and 1695. In the cases of 5 supporters and 1 opponent, missing tax information in 1690 is estimated using the mean assessment for 1695 and 1696. Adding these 6 cases has a negligible effect on the chi-square statistics.
The tax assessed for the maintenance of the village church and ministry is an indicator of landed wealth, for the rates were based on the amount of improved and unimproved land a resident owned within the village boundaries . The tax assessments underestimate total wealth when a person had land holdings outside the village or non-landed wealth, or both. The most problematic case is Israel Porter, the leader of the Porter family. Most of his lands lay outside the village, and his ownership of two of the four sawmills in the village represented profitable commercial assets that were not accurately reflected by his village tax assessments. He was the wealthiest man in the village, but his tax assessment of 1690 was only 5 shillings. Porter proclaimed his opposition to the trials in a moving defense of Rebecca Nurse and should be counted among the wealthy opponents, despite his appearing in this table among the residents assessed less than 10 shillings who opposed the trials. Although further research might turn up other notable exceptions, this exercise assumes that the assessments give reasonably accurate estimates of residents' economic position. This is true of Joseph Porter, Israel's brother, who had the highest assessment in 1690 (50 shillings) and Daniel Andrew (36 shillings assessed in 1690), who had married a Porter heiress. The assumption also holds for the 11 Putnam men residing in the village at the time of the trials because their lands were almost entirely within the village boundaries. Andrew was accused as a witch in 1692, but escaped the gallows by going into hiding and using his social and political connections to forestall prosecution. For further details, see Boyer and Nissenbaum, Salem Possessed, especially chapter five.
(The Next Class)
Professor Tally: How did your tests come out? . . . . That bad? Where's the old sparkle? Or is it the rain today?. . . . Amy?
Amy: I did the calculations several times and got the same results. The chi-square is .19, but the critical value is 5.991, so the null hypothesis can't be rejected. There's no significant relationship between wealth and support for the trials.
Professor Tally: What was your alternative hypothesis?
Amy: I thought that the wealthy and the middle class groups supported the trials against the poorer people of the village. But I guess that's not right.
Professor Tally: Well, what conclusions can you draw from your test results?
Amy: Not much except that wealth and support for trials were unrelated.
Professor Tally: Anyone have something to add to Amy's summary statistics or her conclusion?
Karen: I did the chi-square by hand and by computer. My results are the same as Amy's except that the computer gave me the probability for the chi-square. It's .987, which means that a pattern like this could happen by chance