Response to Heather Hedden's "Taxonomy Definition" -- https://accidental-taxonomist.blogspot.com/2022/12/taxonomy-definition.html
Hey
Heather:
Thanks for posting. It really inspired me to write down a few things that I had been thinking about. It's taken me a while to put my finger on exactly what it was that set me spinning.
Here's my take: I just don't like the word "taxonomy" (other than as the name for the practice of arrangement). A "taxonomy" -- as a thing -- is just really hard to pin down. For some reason, Wayne Booth's memorable expression "simply the flinging of Greek-fed, polysyllabic bullshit" comes to mind. I can almost feel one of my grad school mentors standing over my shoulder, talking about Wittgenstein's language-games and Kuhn's guidance on the normalization of science (and practice). Ultimately, I'm just not sure what the word actually does for us. I don't think Cutter or Melvil Dewey talked about taxonomies. Nor did Julius Kaiser or S.R. Ranganathan. The word has just kind of appeared as a synonym for the cumbersome "knowledge organization system," throwing the unlikely bedfellows of gazetteers, subject headings, and semantic networks into the same semantic sleeping bag (and yes, I *am* echoing the first chapter of your book. Loved it!).
(Note
that the mentor in question wrote an entire book called Deflating
Information that basically asked what the word "information" does
for us, so maybe my concern with "taxonomy" is simply an echo of that
long-ago seminar!)
My
old professor was quite keen on the idea of exploring what things actually do.
He recommended taking an approach from the social studies of science and
applying it to information resources in a kind of modern documentalist
movement. He once asked a question to our class: "Is the person looking
for a book on gold prospecting looking for information? Or are they looking for
gold? Do they want information on bear hunting, or do they want a bear? What
does the word 'information' do?"
So
here's the question: what does "taxonomy" do for us in this case.
Let's
start with a most basic library scenario. A new book is sitting on the
librarian's desk. What do they do with it? First, they must capture its details
("descriptive cataloging") and then they have to describe what it's
about ("subject cataloging"), by first applying "subject
headings" and then assigning an appropriate call number (pressmark,
shelfmark, bookmark, etc.) via "classification." Both subject
headings and classifications could be considered taxonomies but they do
different things. Subject headings (tags, terms, descriptors, etc.) categorize,
they reveal the book's membership in any of a variety of different groups
(although determining what those groups could be a priori is a bit of a
challenge). Classification puts that book in a single class, where it is
to live, ideally with genteel neighbors, so that users can be guided by
Svenonius's "invisible hand of the classification system" to the
resource that they didn't know they were seeking.
People
really seem to struggle with this tension between categorization and classification.
I once did a consulting engagement with a trade organization that was stricken
with information overwhelm. We used cognitive work analysis to build out a
basic controlled vocabulary, based on literary, user, and organizational
warrant, roughly organized as per Ranganathan's facets. The result wasn't
exactly a thesaurus but it was z39.15-adjacent, if not compliant. The client
nodded and said: "cool, cool." And then somebody asked: "but I
thought we were going to build like a Linnean tree where everything could just
live in one place! Your list of terms doesn't tell us which directory to
use."
I
really didn't have the heart to launch into a long discussion of the Cranfield
experiments, subject cataloging inter-rater reliability, or the difficulties of
organizing all of human knowledge into some sort of ineludible final structure.
I didn't even go into how Linnaeus actually got a bit lucky by using reproduction
as the core idea underlying his binomial nomenclature and how his system is
breaking down with our improved understanding of genetics. Nobody really
remembers his third work on the regnum lapideum -- the kingdom of
minerals. Applying the notion of sexual reproduction to rocks is a bit comical
to the modern reader (although his identification of mineral features like
crystal shape, number of faces, etc. was valuable).
The
categorization/classification divide is deeply rooted in the history of knowledge
organization. For much of history, we really weren't too picky in how we
organized information. There weren't that many books and we were pretty
comfortable with organizing them in a fairly ad hoc manner. Perhaps we had some
alcoves that separated religious books from non-, and gave us some rough
headings aligned with the trivium and quadrivium, with some law, medicine, and
practical arts thrown in. It didn't really matter since the shelves were just
places to park books and most libraries had restricted access. There were
likely other piles of books somewhere else, lined up by accession date, or
according to some strange whim of the librarian. As long as the librarian knew
where a book was, everything was fine. People didn't interact directly with presses
of books, they worked with the catalog, generally a dictionary catalog with
entries for authors, titles, and subjects. The library might have a stack of
cards somewhere but it was simply a way of organizing information to create the
catalogue. The catalog was a book, not a cabinet. Direct access to the cards,
and a general physical arrangement of books that matched the organization of
those cards, was really a Dewey thing. The dictionary catalog -- an artifact of
categorization -- was the key tool. Classification, on the other hand, was
rough and ready.
We
now expect our collections to be classified, so as to support that
"invisible hand" and the bibliographic objective of browsing. But
classifying books -- and knowledge -- is tough. There isn't an underlying
concept like reproduction to organize them. From a Linnean perspective,
knowledge is more mineral than animal. Gesner, Naudé, Leibniz, and Boulliau all
produced appropriate systems. Gabriel Martin's system for the catalogues of
Paris book-sellers was particularly popular. But the systems we're most used to
come to us from how Thomas Jefferson organized his books at Monticello. He --
aspiring continentalist that he was -- basically cribbed the system from
d'Alembert's introduction to the Encyclopédie, which used Bacon's
faculties of the mind as an underlying order. Melvil Dewey borrowed the
structure from William Torrey Harris, who got it from Edward William Johnston,
who probably learned it directly from Jefferson (I suspect Johnston was cancelled
as a Confederate sympathizer… but that's a different story). Dewey was perhaps
not a great taxonomist so, after establishing the major classes, turned to
various textbooks and professors at Amherst College to build them out. Otlet
then built directly from Dewey, initially as a French translation. Library of
Congress Classification was largely an effort to do what Dewey did, without
Dewey, while addressing a number of his system's limitations.
And
we've been dealing with this rather shaky foundation ever since!
Categorization
is a different matter. We just need a way to describe things, concepts, and
ideas. This practice has deep roots with the practices of ancient scholars.
Even Pliny kept a commonplace book organized by various headings, the hashtags
of their day. Our modern approach to subject headings comes to us via Charles
Cutter's Rules for a Dictionary Catalog. It established the -- Svenonius
again -- semantics, syntax, and pragmatics of subject headings. These
principles were crucial for building catalogs. Julius Otto Kaiser, working in
special libraries, developed "systematic indexing." The problem with
subject headings is that they can be about anything. Kaiser taught us that we
should focus on three facets: "concretes," "places," and
"processes." Ranganathan expanded the list, giving us
"personality, matter, energy, space, and time" (even if we're all --
Ranganathan included -- a bit foggy on what "personality" means.) The
Classification Research Group ultimately expanded this list of facets to
something that I think we can all be satisfied with: "thing – kind – part
– material – property – process – operation – patient – agent – product –
by-product – space – time." A thesaurus to control terms for these facets
certainly helps.
Ultimately,
the Cranfield experiments taught us that human categorization is perhaps of
dubious value and Mortimer Taube's work informed us that combining terms with
Boolean operators, even if those terms are based on the contents of the
resource rather than assigned terms, actually gives us great results. Next
step: Altavista with Boolean operators on the web, then Google with PageRank,
an innovation on TF-IDF. And now information retrieval has conquered the world
in a potentially dystopian way. Huzzah!
There's
a lot of names here: Taube, Cranfield, Ranganathan, Kaiser, Cutter, Otlet,
Dewey, Jefferson, Bacon. Did any of them use "taxonomy" to mean an
actual thing? Does it add anything to our discourse beyond
"categorization" and "classification"? I'm not sure it
does. Is a taxonomy a concrete, place, OR process? OR is it a thing, kind,
part, material, property, process, operation, patient, agent, product,
byproduct, space, OR time? In honor of Taube, I suppose the answer is
"yes." But not in a helpful way.
I
suspect that in this context, the word "taxonomy" is more akin to
marketing speak, like a three-letter acronym, or business guru concept, but
suitably demure since it was coined by information science folks (and I say
that as both an information science person and an industry analyst who has
coined a few technology three letter acronyms!). It can be anything or nothing.
We'd be better served by recognizing that taxonomies are a thing, but a thing
with different identities depending on who is invoking them. They are boundary
objects. As per Star and Griesemer, "boundary objects are both adaptable
to different viewpoints and robust enough to maintain identity across
them." They continue: "Boundary objects are objects which are both
plastic enough to adapt to local needs and the constraints of the several
parties employing them, yet robust enough to maintain common identity across
sites. They are weakly structured in common use, and become strongly structured
in individual site use. These objects may be abstract or concrete. They have
different meanings in different social worlds but their structure is common
enough to more than one world to make them recognizable, a means of
translation."
Is
a classification system a taxonomy? Yes. What about a controlled vocabulary?
Yes. But they do different things. Need a term that appeals to computer
scientists, librarians, archivists, and book indexers? "Taxonomy" is
the ticket.
There's
lots of room for boundary objects in the semantic sleeping bag. In this case, I
wouldn't be so fast to rush in and provide further elaboration between
"classification systems" and "controlled vocabularies." I’m
not so sure that "taxonomy" does anything here or, perhaps, anywhere
else!
Just
my $0.02. Thank you for the opportunity to rummage around in my thoughts on the
topic.
Comments
Post a Comment