Deflating Taxonomies -- do we really need another concept to organize knowledge?

Response to Heather Hedden's "Taxonomy Definition" -- https://accidental-taxonomist.blogspot.com/2022/12/taxonomy-definition.html

Hey Heather:

Thanks for posting. It really inspired me to write down a few things that I had been thinking about. It's taken me a while to put my finger on exactly what it was that set me spinning.

Here's my take: I just don't like the word "taxonomy" (other than as the name for the practice of arrangement). A "taxonomy" -- as a thing -- is just really hard to pin down. For some reason, Wayne Booth's memorable expression "simply the flinging of Greek-fed, polysyllabic bullshit" comes to mind. I can almost feel one of my grad school mentors standing over my shoulder, talking about Wittgenstein's language-games and Kuhn's guidance on the normalization of science (and practice). Ultimately, I'm just not sure what the word actually does for us. I don't think Cutter or Melvil Dewey talked about taxonomies. Nor did Julius Kaiser or S.R. Ranganathan. The word has just kind of appeared as a synonym for the cumbersome "knowledge organization system," throwing the unlikely bedfellows of gazetteers, subject headings, and semantic networks into the same semantic sleeping bag (and yes, I *am* echoing the first chapter of your book. Loved it!).

(Note that the mentor in question wrote an entire book called Deflating Information that basically asked what the word "information" does for us, so maybe my concern with "taxonomy" is simply an echo of that long-ago seminar!)

My old professor was quite keen on the idea of exploring what things actually do. He recommended taking an approach from the social studies of science and applying it to information resources in a kind of modern documentalist movement. He once asked a question to our class: "Is the person looking for a book on gold prospecting looking for information? Or are they looking for gold? Do they want information on bear hunting, or do they want a bear? What does the word 'information' do?"

So here's the question: what does "taxonomy" do for us in this case.

Let's start with a most basic library scenario. A new book is sitting on the librarian's desk. What do they do with it? First, they must capture its details ("descriptive cataloging") and then they have to describe what it's about ("subject cataloging"), by first applying "subject headings" and then assigning an appropriate call number (pressmark, shelfmark, bookmark, etc.) via "classification." Both subject headings and classifications could be considered taxonomies but they do different things. Subject headings (tags, terms, descriptors, etc.) categorize, they reveal the book's membership in any of a variety of different groups (although determining what those groups could be a priori is a bit of a challenge). Classification puts that book in a single class, where it is to live, ideally with genteel neighbors, so that users can be guided by Svenonius's "invisible hand of the classification system" to the resource that they didn't know they were seeking.

People really seem to struggle with this tension between categorization and classification. I once did a consulting engagement with a trade organization that was stricken with information overwhelm. We used cognitive work analysis to build out a basic controlled vocabulary, based on literary, user, and organizational warrant, roughly organized as per Ranganathan's facets. The result wasn't exactly a thesaurus but it was z39.15-adjacent, if not compliant. The client nodded and said: "cool, cool." And then somebody asked: "but I thought we were going to build like a Linnean tree where everything could just live in one place! Your list of terms doesn't tell us which directory to use."

I really didn't have the heart to launch into a long discussion of the Cranfield experiments, subject cataloging inter-rater reliability, or the difficulties of organizing all of human knowledge into some sort of ineludible final structure. I didn't even go into how Linnaeus actually got a bit lucky by using reproduction as the core idea underlying his binomial nomenclature and how his system is breaking down with our improved understanding of genetics. Nobody really remembers his third work on the regnum lapideum -- the kingdom of minerals. Applying the notion of sexual reproduction to rocks is a bit comical to the modern reader (although his identification of mineral features like crystal shape, number of faces, etc. was valuable).

The categorization/classification divide is deeply rooted in the history of knowledge organization. For much of history, we really weren't too picky in how we organized information. There weren't that many books and we were pretty comfortable with organizing them in a fairly ad hoc manner. Perhaps we had some alcoves that separated religious books from non-, and gave us some rough headings aligned with the trivium and quadrivium, with some law, medicine, and practical arts thrown in. It didn't really matter since the shelves were just places to park books and most libraries had restricted access. There were likely other piles of books somewhere else, lined up by accession date, or according to some strange whim of the librarian. As long as the librarian knew where a book was, everything was fine. People didn't interact directly with presses of books, they worked with the catalog, generally a dictionary catalog with entries for authors, titles, and subjects. The library might have a stack of cards somewhere but it was simply a way of organizing information to create the catalogue. The catalog was a book, not a cabinet. Direct access to the cards, and a general physical arrangement of books that matched the organization of those cards, was really a Dewey thing. The dictionary catalog -- an artifact of categorization -- was the key tool. Classification, on the other hand, was rough and ready.

We now expect our collections to be classified, so as to support that "invisible hand" and the bibliographic objective of browsing. But classifying books -- and knowledge -- is tough. There isn't an underlying concept like reproduction to organize them. From a Linnean perspective, knowledge is more mineral than animal. Gesner, Naudé, Leibniz, and Boulliau all produced appropriate systems. Gabriel Martin's system for the catalogues of Paris book-sellers was particularly popular. But the systems we're most used to come to us from how Thomas Jefferson organized his books at Monticello. He -- aspiring continentalist that he was -- basically cribbed the system from d'Alembert's introduction to the Encyclopédie, which used Bacon's faculties of the mind as an underlying order. Melvil Dewey borrowed the structure from William Torrey Harris, who got it from Edward William Johnston, who probably learned it directly from Jefferson (I suspect Johnston was cancelled as a Confederate sympathizer… but that's a different story). Dewey was perhaps not a great taxonomist so, after establishing the major classes, turned to various textbooks and professors at Amherst College to build them out. Otlet then built directly from Dewey, initially as a French translation. Library of Congress Classification was largely an effort to do what Dewey did, without Dewey, while addressing a number of his system's limitations.

And we've been dealing with this rather shaky foundation ever since!

Categorization is a different matter. We just need a way to describe things, concepts, and ideas. This practice has deep roots with the practices of ancient scholars. Even Pliny kept a commonplace book organized by various headings, the hashtags of their day. Our modern approach to subject headings comes to us via Charles Cutter's Rules for a Dictionary Catalog. It established the -- Svenonius again -- semantics, syntax, and pragmatics of subject headings. These principles were crucial for building catalogs. Julius Otto Kaiser, working in special libraries, developed "systematic indexing." The problem with subject headings is that they can be about anything. Kaiser taught us that we should focus on three facets: "concretes," "places," and "processes." Ranganathan expanded the list, giving us "personality, matter, energy, space, and time" (even if we're all -- Ranganathan included -- a bit foggy on what "personality" means.) The Classification Research Group ultimately expanded this list of facets to something that I think we can all be satisfied with: "thing – kind – part – material – property – process – operation – patient – agent – product – by-product – space – time." A thesaurus to control terms for these facets certainly helps.

Ultimately, the Cranfield experiments taught us that human categorization is perhaps of dubious value and Mortimer Taube's work informed us that combining terms with Boolean operators, even if those terms are based on the contents of the resource rather than assigned terms, actually gives us great results. Next step: Altavista with Boolean operators on the web, then Google with PageRank, an innovation on TF-IDF. And now information retrieval has conquered the world in a potentially dystopian way. Huzzah!

There's a lot of names here: Taube, Cranfield, Ranganathan, Kaiser, Cutter, Otlet, Dewey, Jefferson, Bacon. Did any of them use "taxonomy" to mean an actual thing? Does it add anything to our discourse beyond "categorization" and "classification"? I'm not sure it does. Is a taxonomy a concrete, place, OR process? OR is it a thing, kind, part, material, property, process, operation, patient, agent, product, byproduct, space, OR time? In honor of Taube, I suppose the answer is "yes." But not in a helpful way.

I suspect that in this context, the word "taxonomy" is more akin to marketing speak, like a three-letter acronym, or business guru concept, but suitably demure since it was coined by information science folks (and I say that as both an information science person and an industry analyst who has coined a few technology three letter acronyms!). It can be anything or nothing. We'd be better served by recognizing that taxonomies are a thing, but a thing with different identities depending on who is invoking them. They are boundary objects. As per Star and Griesemer, "boundary objects are both adaptable to different viewpoints and robust enough to maintain identity across them." They continue: "Boundary objects are objects which are both plastic enough to adapt to local needs and the constraints of the several parties employing them, yet robust enough to maintain common identity across sites. They are weakly structured in common use, and become strongly structured in individual site use. These objects may be abstract or concrete. They have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation."

Is a classification system a taxonomy? Yes. What about a controlled vocabulary? Yes. But they do different things. Need a term that appeals to computer scientists, librarians, archivists, and book indexers? "Taxonomy" is the ticket.

There's lots of room for boundary objects in the semantic sleeping bag. In this case, I wouldn't be so fast to rush in and provide further elaboration between "classification systems" and "controlled vocabularies." I’m not so sure that "taxonomy" does anything here or, perhaps, anywhere else!

Just my $0.02. Thank you for the opportunity to rummage around in my thoughts on the topic.

Knowledge Worker Curriculum

Search This Blog

Deflating Taxonomies -- do we really need another concept to organize knowledge?

Labels

Comments

Post a Comment

Popular posts from this blog

On protests, truckers, and the pace layers of justice

On Doug Ford, get-there-itis, and our COVID catastrophe