Drupal 7 and FRBR: A Mental Model

I have long been interested in librarian technology and cataloguing, even though I'm far from any sort of expert in the matter. When I started my "real" research into the subject, I jumped on the Functional Requirements for Bibliographic Records (FRBR) bandwagon, as it was a "new" and "great" leap forward in regards to describing and cataloguing the work that makes up a creative effort. Back in 2004, I started coding a Perl implementation of FRBR entitled LibDB, then eventually moved it to a (now unpublished) Drupal module and wrote up a relational database schema for it. The relational schema was the only decent output of both attempts, and it still lives on in the Drupal code repository. But, as things go, the attempt died, though I never stopped thinking about it.

General discontent with much-loved Delicious Library kept the desire to roll-my-own, but the time never arrived, nor do I see it coming anytime soon. After being extolled the wonders of CouchDB in 2009, I figured my first experiment should be with FRBR, and I even started a thread on the mailing list about it. General "eh"-ness with CouchDB, however, had me moving on before anything progressed.

Drupal 7 is "nearing" release and I'm once again thinking about FRBR. 7 now has the ability to add custom fields to its content types, functionality that previously required the contributed module CCK. While CCK, as a framework, had tons of additional third-party modules that mocked up different types of fields, Drupal 7 doesn't, solely because it isn't in the wild yet. I don't consider this bad news, really, because I've always been of the opinion that most of the contributed modules available to Drupal are crap. They scratch itches, certainly, but very few of them are what I'd consider quality productions. So, for me, thinking about Drupal 7 and FRBR is thus constrained to "core" and "my own custom code". Primarily, I'm interested to see just how much of FRBR could be modeled without custom code at all, so I've made some odd decisions to accentuate this. One could even accuse me of "just" making a boring old cataloguing system: regardless, I'm doing it with FRBR's model fully in mind.

The following serves mostly to jot down my notes and experiments.

Group 1 Entities

Like many who sit down to work with FRBR, one of the biggest stumbling blocks is exactly how to define the Work and Expression entities (the WE of WEMI). For my purposes, I'd only focus on Manifestations and Items (the MI of WEMI). You can always tack on WE on top of MI, but you could never implement FRBR without MI. As smallest building blocks go, MI is the place to start.

This leads to the first odd decision I've been leaning toward: Manifestations would be a Drupal content type, but Items would not. To back up even further, there'd be no single content type called Manifestation: instead, I'd implement a content type for Books, another content type for Movies, and so forth. This saves me from complicating the UI with an extra step and extra code: a single Manifestation content type with selectable sub-types would involve complicated form and UI custom code to pull off effectively. Individual content types allow me to define (and share, as necessary) Drupal 7 fields specific to the Book or Movie without custom code based on sub-type selection.

Each of these content types would be a form of Manifestation (an Album manifestation, a Comic Book manifestation, etc.). According to WEMI, the form of these creative endeavors is specified in the Work entity. Since I'm not modeling WE, putting them in Manifestation is good enough - if WE is ever added, the Work would happily inherit the type from M (since a W book could only contain Es, Ms, and Is that were also books, implied or otherwise).

Back to Items. One of the CCK field types that did not make it into Drupal 7 core was node reference: a field to relate the current node to another existing node. Its absence makes some sense from a usability level, as there's never been a clean solution to creating a new node within a reference field itself. Say you create a Book node, and one of the node reference fields is labeled "Author". You dutifully type in "Stephen King", but since you've never defined the Stephen King node previously, it just won't work - you'd be forced to stop what you actually wanted to do and go create "Stephen King" first. There have been a few attempts in CCK-land to solve this problem, but none that are entirely elegant or perfect enough for core.

So, for Items, I can't think of any other way than custom code: a simple relational table keyed to user ID and node ID, with a few fields for location, condition, and notes. There might be an "Add an Item" section on the Manifestation's view page (or even as a second form after the initial node submit), but even this isn't a huge requirement for a first version, since one could assume that if you're adding a Manifestation to your collection, that you also have custodianship over a matching Item. Either way, I don't see the necessary custom code as being anything but rote. I am sure, though willing to be convinced otherwise, that Items should not be nodes: I can't find any compelling use for a node's feature set, and I'd be aghast at 100,000 teenage girls creating 125,000 nodes detailing Twilight Book 1's location on their Favorite Books shelf (with some divining their autographed versions too). Similarly, in a single-user environment, the mental hurdles a non-librarian would have to leap through to justify two nodes for "the single book I bought today" is not cleanly solvable without more custom code than the non-node alternative.

Group 2 Entities

FRBR encourages relationships between entities: if there's a relationship called "Author", then a "Person" is related to a "Book" with relationship type "Author". This allows an unlimited number of relationship types to be created, but also allows relationships to tie any entity to any other entity: this "Work" is related to another "Work" with relationship type "Sequel", this "Corporate Body" is related to this "Expression" with relationship type "Special Effects", and so forth. Drupal 7 has no relationship API built in, so to accomplish the above generic relations, we'd need custom code.

In striving to satisfy the "as much as core as possible" mentality, the Person and Corporate Body of FRBR become another odd decision. For the same reason as Items, Persons and Corporate Bodies aren't nodes either: this time, they're taxonomy terms in a single vocabulary. I have no qualms about this: I am interested in endeavors, not the endeavorers. I have no intention of blurbing each particular entity, or defining their birthdate or address or website, or any such. In Drupal 7, being a taxonomy term doesn't preclude this if someone else wanted to do it (and FRBR does define fields for each of these entities, so it wouldn't be unheard of).

There are two important changes to taxonomies in Drupal 7: one, terms can be modified with fields just like content types (you could add a radio for Person or Corporate Body, a text field for birthdate or website, etc.) and two, a single vocabulary can be applied to the same content type as many times as you deem necessary. This allows us to fake up a hardcoded relationship system.

My mental model asserts a single vocabulary called "Endeavorers" (which sorely needs a better name; "Responsibility" sucks too) which contains terms for both Persons and Corporate Bodies. If distinguishing between the two is a must, we could define and maintain the radio field suggested above. This "Endeavorers" vocabulary is then associated as many times as necessary with a particular content type: once each for "Author", "Illustrator", "Editor", and "Publisher" for Books, once each for "Director", "Producer", "Distributor" for Movies, and so forth. Each field would be an "unlimited" "autocomplete term widget (tagging)"... in other words, you could create as many new entities, or autocomplete existing entities, as needed per field.

This nicely solves the UI problem described above for Items and Stephen King, and has a few additional extras thrown in. We use a single vocabulary for both entities so that a single "Producer" field could contain "J. R. Bookwalter" (a Person) and "Tempe Video" (a Corporate Body). If one were to head to a particular term's URL (ex. taxonomy/term/13), we would see all nodes associated with them, regardless if they were an Author here, an Editor there, or a Gaffer elsewhere. On the other hand, if we ONLY wanted to show nodes where the term was used solely in the Director field, that'd require custom code or, likely, Views. I don't think that sort of filtering would be necessary for the first few versions, nor do I think it's difficult to implement.

Group 3 Entities

Group 3 Entities are handled just like Group 2 entities: a single vocabulary called "Subjects" that would be added to a Manifestation's content type four times: once for "Concept", "Object", "Event", and "Place". This allows a single term to be used in multiple ways: "Shangri-La" might be a concept in this book, but a place in that movie. Browsing to the "Shangri-La" term would show matches for both Book and Movie. I find this particular approach necessary as I just don't trust myself to remember how I categorized the "Shangri-La" in a particular character's drug-induced hallucination from a book I read 12 years ago. Again, filtering down to particular types would come at a later date.

And that completes my current mental map on Drupal 7 and FRBR. Nothing has been implemented, but I'd lean toward a full-fledged module than an installation profile (which could setup all the above, but would suffer from the inability to upgrade existing installs) (see comments). I'm also envisioning another data exchange layer on top of the above, where one would put in an identifier (ISBN, ISSN, UPC, ASIN, etc.) and click a button to prefill the fields. The lack of fieldsets for custom fields in Drupal 7 would cause the above forms to look ungrouped and ugly, but a module could fix that up as well. I might end up actually implementing all the above in a demo site the next time I have a few hours to kill: if anyone is interested in seeing such a demo, don't hesitate to email morbus@disobey.com.

There are a few gaps in what this overall approach can do: it doesn't handle collections or serials nicely (i.e. a book with chapters, or a magazine with articles, written by different people, etc.) and if I were to go down that path, I probably would end up making a second "class" of content type, with a node reference, and there'd be a specific sub-content type per form (Sequence for Comics, Article for Magazines, Chapter/Section for Book, Short Film for movie anthologies, etc.). Also, Persons or Bodies with the same name could be solved the IMDb way ("Stephen King (I)", "Stephen King (II)") or the traditional way (birthdate on taxonomy term, autocomplete tweak to show "Stephen King, 1947-", etc.). And I'm sure I'll find more as I devote more mental slices to it.

For the three people who know what I'm on about: thoughts?