Response to Criticisms on Stylometry
Part 8 of "Critically Examining Oxfordian Claims"
Pat Dooley has done me the favor of responding to my posts on Ward Elliot's
and Don Foster's studies. Why don't I just add a few words, which
certainly don't exhaust what I could say.
DOOLEY:
"Ward Elliot's work has a few problems that render
its value in discerning authorship less than
conclusive. His most infamous test was based on
punctuation. The problem was that he used the
punctuation from a modern edition of Shakespeare's
works (the Riverside, if I recall correctly) that
had been re-punctuated by a modern editor.
"One can hardly compare the frequency of
exclamation points between poetry written in the
1570s/1580s before the exclamation point had come
into the language with poetry written after it had
come into vogue, yet that is also what Ward
Elliot's punctuation test did."
The exclamation-point test was used by Elliot in provisional early versions
of the study, but when critics (mostly Oxfordian) pointed out how dependent
exclamation points are on editors, Elliot fully acknowledged that the test
was questionable. Certainly no conclusions were based on this one
test.
DOOLEY:
"Ward Elliot's hyphenation tests are also suspect.
As Peter Moore pointed out in an SOS newsletter
article, Ward Elliot's tests showed a significant
difference in the hypenation rate between V&A and
Lucrece on the one hand and the Sonnets and a
Lovers Complaint on the other hand. It helps to
know that Richard Field printed the former while
George Els printed the latter. In other words,
much of what Ward Elliot was measuring was the
difference between two printers and had nothing to
do with authorship."
I have not seen this particular article by Moore, so I don't know what
"significant" means here. But Shakespeare used hyphenated compound words
at roughly six to eight times the rate Oxford did (virtually all other
contemporary writers are similar to Oxford in this regard), and I'm certain
that no printer can be blamed for such a marked and consistent difference.
DOOLEY:
"Another problem has to do with the size and
reliability of the sample of the Earl of Oxford's
poetry. By my count, there are about 4000 words
available in the poetry traditionally attributed
to the Earl. That doesn't give you very many
blocks of 500 words to work with. If you start
looking at individual words, things get worse. For
example, from that 4000 words you have just 10
instances of "you" and 12 instances of "thou."
That works out at about 1 or 2 words per block.
So, even with relatively common words, you end up
with very small numbers to work with. That alone
renders the statistics dubious, since the numbers
violate the "law of large numbers." I don't have
the list of keywords that Ward Elliot used, but I
would be surprised if many of them give counts
that can be used with statistical certainty. The
sample size is just too small for statistically
significant conclusions to be obtained."
My short answer here is that Robert Valenza, the co-author of the study, is
a professional statistician. I defer to his judgement as to what is
"statistically significant," and he finds the differences between
Shakespeare and Oxford very significant. All the math can be found in the
articles I cited. The list of keywords is in the Computers and the
Humanities article.
DOOLEY:
"The choice of words greatly influences the
statistics you get out. If you wanted to get a
match between Shakespeare and another author, all
you have to do is select the right words. For
example, if you compare the relative frequencies
of "my" and "mine" you can get a very good match
between Sidney's Astrophil and Stella and
Shakespeare's sonnets. If you choose "you" and
"thou," you get a very poor match; the author of
the sonnets favoured "thou" over "you" by a 2:1
margin whereas Sidney used them about equally.
I've got a spreadsheet where you plug in the words
you want to test and out pops the counts for about
a dozen authors. It is very easy to go through a
concordance and pick word-pairs that suit whatever
point you are trying to make."
Well, of course you can find individual words which match or don't match
for different authors; the point of the Elliot/Valenza study was to look at
the patterns for a large number of words simultaneously, and to determine
how statistically significant the similarities and differences are. If you
want to try to construct a test comparable in range to Elliot and
Valenza's, but in which Shakespeare matches Oxford to the exclusion of
other authors, be my guest. I'd be interested to see if such a thing is
possible.
DOOLEY:
"Yet another problem is that, according to Stephen
May in English Courtier Poets, much of the poetry
attributed to Oxford may be mis-attributed. If he
is correct, the pitifully small sample that Ward
Elliot started with is further reduced."
Ward Elliot used Stephen May's 1981 edition of Oxford's poetry, in which
the misattributed poetry was already taken out.
DOOLEY:
"PS. I think Jonathon Hope's work is much better
formulated than Ward Elliot's. He recognizes when
his sample sizes are too small to yield conclusive
results, and tells the reader."
I wonder if you have actually read Elliot and Valenza's work, because they
always very careful to caution the reader against overly rash
conclusions. I have not seen Hope's work, so I can't comment on it.
To other essays in "Critically Examining Oxfordian Claims":
Back to Shakespeare Authorship Home Page