Response to Criticisms on Stylometry

Part 8 of "Critically Examining Oxfordian Claims"

Pat Dooley has done me the favor of responding to my posts on Ward Elliot's and Don Foster's studies. Why don't I just add a few words, which certainly don't exhaust what I could say.
         "Ward Elliot's work has a few problems that render
         its value in discerning authorship less than
         conclusive. His most infamous test was based on
         punctuation. The problem was that he used the
         punctuation from a modern edition of Shakespeare's
         works (the Riverside, if I recall correctly) that
         had been re-punctuated by a modern editor.

         "One can hardly compare the frequency of
         exclamation points between poetry written in the
         1570s/1580s before the exclamation point had come
         into the language with poetry written after it had
         come into vogue, yet that is also what Ward
         Elliot's punctuation test did."
The exclamation-point test was used by Elliot in provisional early versions of the study, but when critics (mostly Oxfordian) pointed out how dependent exclamation points are on editors, Elliot fully acknowledged that the test was questionable. Certainly no conclusions were based on this one test.
         "Ward Elliot's hyphenation tests are also suspect.
         As Peter Moore pointed out in an SOS newsletter
         article, Ward Elliot's tests showed a significant
         difference in the hypenation rate between V&A and
         Lucrece on the one hand and the Sonnets and a
         Lovers Complaint on the other hand. It helps to
         know that Richard Field printed the former while
         George Els printed the latter.  In other words,
         much of what Ward Elliot was measuring was the
         difference between two printers and had nothing to
         do with authorship."
I have not seen this particular article by Moore, so I don't know what "significant" means here. But Shakespeare used hyphenated compound words at roughly six to eight times the rate Oxford did (virtually all other contemporary writers are similar to Oxford in this regard), and I'm certain that no printer can be blamed for such a marked and consistent difference.
         "Another problem has to do with the size and
         reliability of the sample of the Earl of Oxford's
         poetry. By my count, there are about 4000 words
         available in the poetry traditionally attributed
         to the Earl. That doesn't give you very many
         blocks of 500 words to work with. If you start
         looking at individual words, things get worse. For
         example, from that 4000 words you have just 10
         instances of "you" and 12 instances of "thou."
         That works out at about 1 or 2 words per block.
         So, even with relatively common words, you end up
         with very small numbers to work with. That alone
         renders the statistics dubious, since the numbers
         violate the "law of large numbers."  I don't have
         the list of keywords that Ward Elliot used, but I
         would be surprised if many of them give counts
         that can be used with statistical certainty. The
         sample size is just too small for statistically
         significant conclusions to be obtained."
My short answer here is that Robert Valenza, the co-author of the study, is a professional statistician. I defer to his judgement as to what is "statistically significant," and he finds the differences between Shakespeare and Oxford very significant. All the math can be found in the articles I cited. The list of keywords is in the Computers and the Humanities article.
         "The choice of words greatly influences the
         statistics you get out. If you wanted to get a
         match between Shakespeare and another author, all
         you have to do is select the right words.  For
         example, if you compare the relative frequencies
         of "my" and "mine" you can get a very good match
         between Sidney's Astrophil and Stella and
         Shakespeare's sonnets. If you  choose "you" and
         "thou," you get a very poor match; the author of
         the sonnets favoured "thou" over "you" by a 2:1
         margin whereas Sidney used them about equally.
         I've got a spreadsheet where you plug in the words
         you want to test and out pops the counts for about
         a dozen authors. It is very easy to go through a
         concordance and pick word-pairs that suit whatever
         point you are trying to make."
Well, of course you can find individual words which match or don't match for different authors; the point of the Elliot/Valenza study was to look at the patterns for a large number of words simultaneously, and to determine how statistically significant the similarities and differences are. If you want to try to construct a test comparable in range to Elliot and Valenza's, but in which Shakespeare matches Oxford to the exclusion of other authors, be my guest. I'd be interested to see if such a thing is possible.
         "Yet another problem is that, according to Stephen
         May in English Courtier Poets, much of the poetry
         attributed to Oxford may be mis-attributed. If he
         is correct, the pitifully small sample that Ward
         Elliot started with is further reduced."
Ward Elliot used Stephen May's 1981 edition of Oxford's poetry, in which the misattributed poetry was already taken out.
         "PS. I think Jonathon Hope's work is much better
         formulated than Ward Elliot's. He recognizes when
         his sample sizes are too small to yield conclusive
         results, and tells the reader."
I wonder if you have actually read Elliot and Valenza's work, because they always very careful to caution the reader against overly rash conclusions. I have not seen Hope's work, so I can't comment on it.

To other essays in "Critically Examining Oxfordian Claims":
Back to Shakespeare Authorship Home Page