2,300 Years of Indexing: From the Library of Alexandria to AI

If you visit the librarianship section of a library and scan the books on ‘indexing’, you’ll find broadly three kinds. The first are the practical guides. How should one actually go about creating an index? For the purposes of this blog, we’re not so interested in these (although we have a guide of our own), so let’s leave them on the shelf. The second are the historical works, which isn’t much of a section, since it really only contains Dennis Duncan’s excellent History, an index of the. We want to know how the index came about, so we’ll be looking at that. The third section is striking because it’s rather more sprawling, the books also more recent. It consists of texts anxious about the meaning of indexing in the digital age, particularly the dangers of asking machines to collect, manage, connect (and inevitably sell) webs of information. In this blog, we’ll consider both how the book index came about and where things currently stand with it. Starting at the beginning, though, to create an index, one needs a certain mass of data. It also requires an ordering system.

Depending on who you ask, the Library of Alexandria housed somewhere between 40,000 and 500,000 scrolls. That’s a lot of scrolls, even at the most conservative end of those estimates, and certainly enough to merit some kind of organisation. The man to impose order was the poet Callimachus, who used the 24 letters of the alphabet to help account for and locate the library’s contents. His Pínakes (meaning ‘tables’, as in writing tablet) ran to 120 papyrus scrolls. Although it has since perished, along with the rest of the Library, classical writers referred to it, and writings scholars have used these to deduce how the Pínakes worked. First, it seems, the scrolls were organised by genre. Then, within each genre, authors were arranged alphabetically. Historians aren’t certain it was the first case of alphabetical ordering, but it looks pretty likely. And that’s significant because, though not everyone would call the Pínakes an ‘index’, in the words of Dennis Duncan, it showed you that ‘[s]omething herelocates something there: a heading in the catalogue points to its equivalent on the shelves’. The Pínakes didn’t help readers locate information within a discrete text, though. That innovation would have to wait until the thirteenth century and the work of another poet: Robert Grosseteste.

As both Bishop of Lincoln and Chancellor of Oxford, Grosseteste wrote and delivered sermons and lectures, and countless of each over his career. ‘Grosseteste’ might not be the man’s last name, in fact, but an epithet praising his ‘big head’ – his grosse tête – or incredible intelligence. In this way, we might see him as embodying the thirteenth-century rise of the mendicant orders and the university, and the newly emerging need to somehow read, digest, refer to, cross-reference, study, and use information in a variety of texts, across a growing number of new books. His solution was his Tabula. Like the Pínakes, it was an ordering system, but where Callimachus alphabetised names, Grosseteste used glyphs with which he would annotate his texts. Each glyph represented a concept, like ‘eternity’ or ‘truth’. Whenever he’d come across a mention of the Holy Trinity, for example, he’d mark the page with a triangle (though not all his glyphs were so literal). A reader could then look up all the triangles if they wanted to learn about all the different places that mention the Holy Trinity. It was an ingenious way to create a web of connections not only between the pages of a single book, but across texts. It helped readers – probably in this instance limited only to Grosseteste and his immediate circle – locate and cross-reference information from the Bible to the writings of the Church fathers, as well as the pagan and Arabic writers. The solution is all the more remarkable for working in the absence of page numbers – something that we now take for granted.

Page numbers wouldn’t make their appearance until 1470, in the form of a ‘J’ shape on the first page of Werner Rolevinck’s Sermo de presentation beatissime Virginia Marie. That’s roughly two centuries after scholars started to bind concepts and ideas into indexes. Rolevinck’s text is something of an outlier, though. Even in the late fifteenth century, only about 10% of books had page numbers. But once they caught on, after Gutenberg’s press, ‘the printed page number would turbocharge [the index’s] pervasiveness’. You’d quite suddenly find them in all kinds of texts from religious and historical, to legal and medical. They’d even grace the corners of songbooks.

Alphabetical ordering, abstracting, arranging subject matter, and page numbering altogether make indexes recognisable in their modern forms and have been around for a long time. As an aside, though: something curious happens to page numbers in the digital age. They quite suddenly become incidental, almost obsolete as finding aids. The page numbers on digital books change depending on the size of your device and even its orientation. A page-referenced index has been replaced by hyperlinks, and the ability to pull up a search bar or jab the Ctrl+F keys to directly search for a term, of even entire passage. This all feels very modern. But digital searches, according to Duncan, constitute a special kind of index called a ‘concordance’, with its own long history. A concordance is traditionally an alphabetical list of every word in a given text, along with all the places they appear. Computers extend their power ‘infinitely’ because they allow you to perform this function well beyond a single text. But we’re getting ahead of ourselves. We’ll return later to the power of digital indexing.

Traditional back-of-the-book indexes came to have functions beyond the organisational. Much like footnotes and endnotes, they seem to reveal information about the writer, as well as how works were conceived and their place in society. For instance, nineteenth-century historian J. Horace Round wrote a book called Feudal England in which he sets out to correct the errors of a rival scholar, Edward Augustus Freeman. In the text itself, he’s quite restrained. Over the book’s 600 pages, Freeman is barely mentioned. But here’s an excerpt from his index entry on his rival, where he really lets loose:

Freeman, Professor: unacquainted with the Inq. Com. Can’t. 4; ignores the Northamptonshire geld-roll 149; confuses the Inquisition geldi 149; his contemptuous criticism 150, 337, 385, 434, 454; when himself in error, 151…

And so on, pointing to all his rival’s faults. It’s one of many examples Dennis Duncan uses to demonstrate that even something as seemingly dry as a list of alphabetical terms and page locators can be imbued with a writer’s personality. But indexes can reveal far more than details of the individuals who compile them.

The index for J. G. Frazer’s twelve-volume The Golden Bough encompasses an entire book, chock full of entries like ‘ass in rainmaking ceremonies’, ‘walrus, taboos concerning’, and ‘mock battle at festival of new fruits’. You have to wonder who would run their finger down the page seeking entries like ‘ox knees, why soldiers shouldn’t eat’, but once you see something like that, it implores you to flick through to page 117 of volume 1 to find out. (We did this to save you the trouble; it’s because centuries ago in Madagascar, soldiers apparently felt they’d lose the ability to march if they ate the famously ‘weak’ knee of an ox.) Other than being indicative of a book’s contents, it tells us a little about Frazer – as Round’s index does for him – but also something of the time and the spirit in which his magnum opus was written. It has a certain cultural flavour. Roughly, it evokes perhaps one of those early twentieth-century explorers wandering around hot countries in a pith helmet. In Round’s case, the index is an entertaining window on a tiny world. In Frazer’s, though, it betrays an early twentieth-century scholarly obsession with nations ripe for ‘civilising’, and the alleged superiority of western models. Ultimately, however, the index is a microcosm for the author’s interests and values, irrespective of whether or not these align with those of society at large.

It is easy to make fun of early twentieth-century obsessions with ‘primitive cultures’, but we might see it as part of a darker role that the index began to assume in the twentieth century. What is Frazer’s index if not an attempt to represent and assign a value and place to the sacred practices of other cultures (as they’ve been understood by white men sporting pith hats)? The language is very different, but it’s not a million miles from the librarian Ronald E. Day, whose 2014 Indexing It All: The Subject in the Age of Documentation, Information and Data is concerned with indexing practices after the computing revolution, being leveraged ‘in the service of state and corporate power’.

Worries about AI data collection aside, one would have thought that the technological advances of the 21^st century would have at least rendered the indexing of one’s own books free from hassle. Nothing could be further for the truth. While it is certainly easier to search for a term on one’s PDF proofs than it would have been for Round or Frazer, new technologies inevitably give rise to new requirements. In the digital age, this includes publisher requests for authors to produce indexes with paragraph ID locators or hyperlinked/toggled terms. Such tasks require time, patience, skill, and experience that many academics do not have. If you are one such academic, then you may wish to enlist our professional indexing services.

Be notified each time we post a new blog article