by Donald Foster, Vassar College

First, what is it? SHAXICON is a lexical database that indexes all of the words that appear in the canonical plays 12 times or less, including a line-citation and speaking character for each occurrence of each word. (These are called "rare words," though they are not rare in any absolute sense--"family [n.]" and "real [ad.]" are rare words in Shakespeare.) All rare-word variants are indexed as well, including the entire "bad" quartos of H5, 2H6, 3H6, Ham, Shr, and Wiv; also the nondramatic works, canonical and otherwise (Ven, Luc, PP, PhT, Son, LC, FE, the Will, "Shall I die," et. al.); the additions to Mucedorus and The Spanish Tragedy, the Prologue to Merry Devil of Edmonton, all of Edward III and Sir Thomas More (hands S and D); Ben Jonson's Every Man in His Humour (both Q1 and F1) and Sejanus (F1); and more; but these other texts have no effect on the 12-occurrence cutoff that sets the parameters for SHAXICON's lexical universe.

What SHAXICON demonstrates is that the rare-words in Shakespearean texts are not randomly distributed either diachronically or synchronically, but are "mnemonically structured." Shakespeare's active lexicon as a writer was systematically influenced by his reading, and by his apparent activities as a stage-player. When writing, Shakespeare was measurably influenced by plays then in production, and by particular stage-roles most of all. Most significant is that, while writing, he disproportionately "remembers" the rare-word lexicon of plays concurrently "in repertory"; and from these plays he always registers disproportionate lexical recall (as a writer) of just one role (or two or three smaller roles); and these remembered roles, it can now be shown, are most probably those that Shakespeare himself drilled in stage performance.

SHAXICON electronically maps Shakespeare's language so that we can now usually tell which texts influence which other texts, and when. Moreover, when collated with the OED or with early modern texts in a normalized machine-readable format, SHAXICON provides an incomplete record of Shakespeare's apparent reading. The main value of this resource has less to do with biographical novelties, however, than with problems of textual transmission, dating, probable authorship of revisions, early stage history, and the like. And because SHAXICON is a closed system, human bias in measuring lexical influence of this sort is effectively eliminated. The evidentiary value of supposed "verbal parallels" is no longer a matter of private intuition or subjective judgment, but quantifiable, using a stable lexical index (and measurable against a virtually limitless cross-sample of machine-readble texts).

In 1991, I published a 3-part report in SNL (see Summer, Fall, Winter 1991) about SHAXICON (the database was not then completed, and not yet dubbed), in which I made (in a few cases, mistaken) projections concerning Shakespeare's apparent stage roles (based on entries for about a third of the final lexical sample). The few botched projections derived in part from key-punching errors--e.g., "Pand" (Pandarus of TRO) was often being entered for "CPan" (Pandulph of JN), and "QnElz" (R3) for QnEliz (3H6); and in part from unavoidable limitations, explained in the SNL series, concerning the variable "richness" of character-specific lexicons, which could not be measured until the whole canon was indexed. These problems have been eliminated.

The following list represents a corrected catalogue of those roles that Shakespeare is most likely to have acted. These assignments vary somewhat in statistical significance, depending on sample size, etc. A fuller report (with instructions on how to run cross-checks and fully automated statistical analysis) will appear in my "SHAXICON Notebook" (a written commentary that has yet to be completed). In the meantime, here follows a list of Shakespeare's most likely stage-roles, as statistically derived. Keep in mind that this catalogue cannot be proven to represent historical actuality. SHAXICON handily selects Adam of AYL and the Ghost of Ham as probable Shakespeare roles, both of which are supported by hearsay evidence from the 17th century; the remaining roles find no external historical confirmation (although Davies mentions that Shakespeare played some kings, and SHAXICON indicates that Shakespeare played king-roles in AWW, 1H4, 2H4, HAM, LLL, PER, and probably MAC). Having studied the evidence from every conceivable angle, I'd say that the assignments below are good bets, even despite the lack of archival evidence to back them up, for the disproportion in Shakespeare's persistent recall of these roles is quite striking relative to other roles in the corresponding texts. There are a few texts (principally ADO, MV, and Jonson's EMI) in which Shakespeare may have played two different roles in two successive seasons of the same theatrical "run." But the statistical weight of Shakespeare's selective recall of particular roles is in most instances pretty clear; in fact, when multiple roles are identified by SHAXICON as probably Shakespearean, they are in most instances roles that are easily doubled (exceptions and problems are are noted below).



  1. Patience.

  2. Disk space. In its present form, SHAXICON sucks up 40+ megs just for the raw data, plus another 20 megs or so for the commentary, help files, and graphics; plus another 20 megs or so for the software. But don't start erasing those electronic games just yet in order to make room for it. The main database for SHAXICON is now complete, purged of errors, and generally usable; but it's not yet ready for prime time: SHAXICON now runs on ETC Word-Cruncher, which is limited in its capabilities and requires way-too-much manual labor (keying in lexical searches, etc.). We're now using Excel for the summary figures and graphics, which is a big time-saver--but we're likely to change over, prior to publication, to a slicker and more fully automated database-management system so that SHAXICON is more user-friendly in ALL respects. I'm inquiring after Oracle, 4D, and Fox.
In advance of publication we're drawing on the expertise of people in various fields so that when it's finally distributed SHAXICON will be fully intelligible even to those users without expertise in computers, statistics, and/or textual scholarship. I'm shooting for 1996 publication, but cannot guess what technical problems may arise in the interim. CD-rom may be too slow to be practicable, but disk-space may otherwise be a problem for many users.

Address queries to Professor Donald Foster,

Back to Shakespeare Authorship Home Page