Drupal 7 and FRBR: A Mental Model

I have long been interested in librarian technology and cataloguing, even though I'm far from any sort of expert in the matter. When I started my "real" research into the subject, I jumped on the Functional Requirements for Bibliographic Records (FRBR) bandwagon, as it was a "new" and "great" leap forward in regards to describing and cataloguing the work that makes up a creative effort. Back in 2004, I started coding a Perl implementation of FRBR entitled LibDB, then eventually moved it to a (now unpublished) Drupal module and wrote up a relational database schema for it. The relational schema was the only decent output of both attempts, and it still lives on in the Drupal code repository. But, as things go, the attempt died, though I never stopped thinking about it.

General discontent with much-loved Delicious Library kept the desire to roll-my-own, but the time never arrived, nor do I see it coming anytime soon. After being extolled the wonders of CouchDB in 2009, I figured my first experiment should be with FRBR, and I even started a thread on the mailing list about it. General "eh"-ness with CouchDB, however, had me moving on before anything progressed.

Drupal 7 is "nearing" release and I'm once again thinking about FRBR. 7 now has the ability to add custom fields to its content types, functionality that previously required the contributed module CCK. While CCK, as a framework, had tons of additional third-party modules that mocked up different types of fields, Drupal 7 doesn't, solely because it isn't in the wild yet. I don't consider this bad news, really, because I've always been of the opinion that most of the contributed modules available to Drupal are crap. They scratch itches, certainly, but very few of them are what I'd consider quality productions. So, for me, thinking about Drupal 7 and FRBR is thus constrained to "core" and "my own custom code". Primarily, I'm interested to see just how much of FRBR could be modeled without custom code at all, so I've made some odd decisions to accentuate this. One could even accuse me of "just" making a boring old cataloguing system: regardless, I'm doing it with FRBR's model fully in mind.

The following serves mostly to jot down my notes and experiments.

Group 1 Entities

Like many who sit down to work with FRBR, one of the biggest stumbling blocks is exactly how to define the Work and Expression entities (the WE of WEMI). For my purposes, I'd only focus on Manifestations and Items (the MI of WEMI). You can always tack on WE on top of MI, but you could never implement FRBR without MI. As smallest building blocks go, MI is the place to start.

This leads to the first odd decision I've been leaning toward: Manifestations would be a Drupal content type, but Items would not. To back up even further, there'd be no single content type called Manifestation: instead, I'd implement a content type for Books, another content type for Movies, and so forth. This saves me from complicating the UI with an extra step and extra code: a single Manifestation content type with selectable sub-types would involve complicated form and UI custom code to pull off effectively. Individual content types allow me to define (and share, as necessary) Drupal 7 fields specific to the Book or Movie without custom code based on sub-type selection.

Each of these content types would be a form of Manifestation (an Album manifestation, a Comic Book manifestation, etc.). According to WEMI, the form of these creative endeavors is specified in the Work entity. Since I'm not modeling WE, putting them in Manifestation is good enough - if WE is ever added, the Work would happily inherit the type from M (since a W book could only contain Es, Ms, and Is that were also books, implied or otherwise).

Back to Items. One of the CCK field types that did not make it into Drupal 7 core was node reference: a field to relate the current node to another existing node. Its absence makes some sense from a usability level, as there's never been a clean solution to creating a new node within a reference field itself. Say you create a Book node, and one of the node reference fields is labeled "Author". You dutifully type in "Stephen King", but since you've never defined the Stephen King node previously, it just won't work - you'd be forced to stop what you actually wanted to do and go create "Stephen King" first. There have been a few attempts in CCK-land to solve this problem, but none that are entirely elegant or perfect enough for core.

So, for Items, I can't think of any other way than custom code: a simple relational table keyed to user ID and node ID, with a few fields for location, condition, and notes. There might be an "Add an Item" section on the Manifestation's view page (or even as a second form after the initial node submit), but even this isn't a huge requirement for a first version, since one could assume that if you're adding a Manifestation to your collection, that you also have custodianship over a matching Item. Either way, I don't see the necessary custom code as being anything but rote. I am sure, though willing to be convinced otherwise, that Items should not be nodes: I can't find any compelling use for a node's feature set, and I'd be aghast at 100,000 teenage girls creating 125,000 nodes detailing Twilight Book 1's location on their Favorite Books shelf (with some divining their autographed versions too). Similarly, in a single-user environment, the mental hurdles a non-librarian would have to leap through to justify two nodes for "the single book I bought today" is not cleanly solvable without more custom code than the non-node alternative.

Group 2 Entities

FRBR encourages relationships between entities: if there's a relationship called "Author", then a "Person" is related to a "Book" with relationship type "Author". This allows an unlimited number of relationship types to be created, but also allows relationships to tie any entity to any other entity: this "Work" is related to another "Work" with relationship type "Sequel", this "Corporate Body" is related to this "Expression" with relationship type "Special Effects", and so forth. Drupal 7 has no relationship API built in, so to accomplish the above generic relations, we'd need custom code.

In striving to satisfy the "as much as core as possible" mentality, the Person and Corporate Body of FRBR become another odd decision. For the same reason as Items, Persons and Corporate Bodies aren't nodes either: this time, they're taxonomy terms in a single vocabulary. I have no qualms about this: I am interested in endeavors, not the endeavorers. I have no intention of blurbing each particular entity, or defining their birthdate or address or website, or any such. In Drupal 7, being a taxonomy term doesn't preclude this if someone else wanted to do it (and FRBR does define fields for each of these entities, so it wouldn't be unheard of).

There are two important changes to taxonomies in Drupal 7: one, terms can be modified with fields just like content types (you could add a radio for Person or Corporate Body, a text field for birthdate or website, etc.) and two, a single vocabulary can be applied to the same content type as many times as you deem necessary. This allows us to fake up a hardcoded relationship system.

My mental model asserts a single vocabulary called "Endeavorers" (which sorely needs a better name; "Responsibility" sucks too) which contains terms for both Persons and Corporate Bodies. If distinguishing between the two is a must, we could define and maintain the radio field suggested above. This "Endeavorers" vocabulary is then associated as many times as necessary with a particular content type: once each for "Author", "Illustrator", "Editor", and "Publisher" for Books, once each for "Director", "Producer", "Distributor" for Movies, and so forth. Each field would be an "unlimited" "autocomplete term widget (tagging)"... in other words, you could create as many new entities, or autocomplete existing entities, as needed per field.

This nicely solves the UI problem described above for Items and Stephen King, and has a few additional extras thrown in. We use a single vocabulary for both entities so that a single "Producer" field could contain "J. R. Bookwalter" (a Person) and "Tempe Video" (a Corporate Body). If one were to head to a particular term's URL (ex. taxonomy/term/13), we would see all nodes associated with them, regardless if they were an Author here, an Editor there, or a Gaffer elsewhere. On the other hand, if we ONLY wanted to show nodes where the term was used solely in the Director field, that'd require custom code or, likely, Views. I don't think that sort of filtering would be necessary for the first few versions, nor do I think it's difficult to implement.

Group 3 Entities

Group 3 Entities are handled just like Group 2 entities: a single vocabulary called "Subjects" that would be added to a Manifestation's content type four times: once for "Concept", "Object", "Event", and "Place". This allows a single term to be used in multiple ways: "Shangri-La" might be a concept in this book, but a place in that movie. Browsing to the "Shangri-La" term would show matches for both Book and Movie. I find this particular approach necessary as I just don't trust myself to remember how I categorized the "Shangri-La" in a particular character's drug-induced hallucination from a book I read 12 years ago. Again, filtering down to particular types would come at a later date.

And that completes my current mental map on Drupal 7 and FRBR. Nothing has been implemented, but I'd lean toward a full-fledged module than an installation profile (which could setup all the above, but would suffer from the inability to upgrade existing installs) (see comments). I'm also envisioning another data exchange layer on top of the above, where one would put in an identifier (ISBN, ISSN, UPC, ASIN, etc.) and click a button to prefill the fields. The lack of fieldsets for custom fields in Drupal 7 would cause the above forms to look ungrouped and ugly, but a module could fix that up as well. I might end up actually implementing all the above in a demo site the next time I have a few hours to kill: if anyone is interested in seeing such a demo, don't hesitate to email morbus@disobey.com.

There are a few gaps in what this overall approach can do: it doesn't handle collections or serials nicely (i.e. a book with chapters, or a magazine with articles, written by different people, etc.) and if I were to go down that path, I probably would end up making a second "class" of content type, with a node reference, and there'd be a specific sub-content type per form (Sequence for Comics, Article for Magazines, Chapter/Section for Book, Short Film for movie anthologies, etc.). Also, Persons or Bodies with the same name could be solved the IMDb way ("Stephen King (I)", "Stephen King (II)") or the traditional way (birthdate on taxonomy term, autocomplete tweak to show "Stephen King, 1947-", etc.). And I'm sure I'll find more as I devote more mental slices to it.

For the three people who know what I'm on about: thoughts?

Comments

quote 1:

I'd lean toward a full-fledged module than an installation profile (which could setup all the above, but would suffer from the inability to upgrade existing installs)

quote 2 from: http://lists.drupal.org/pipermail/development/2009-September/033741.html

you can also do everything from install profiles you can do with modules including use the full Drupal API and write update functions to move from one version to another.

Wow, it sounds like you have done an amazing amount of work towards realizing FRBR. I'm afraid I don't do Drupal, but would be really interested in seeing a test site if you have progressed to that point. Also, I would like to say I greatly respect your willingness to follow FRBR to a T. If I had the time to take a hack at it, I would be tempted to mess with parts of it that rub me the wrong way.

Thanks for alerting us on the FRBR list.

Best of luck in your endeavours!

Unfortunately enough, I *do* know what you are on about.
I've been looking into current tech for archive records, bibliographies and citations for several groups now, some of it related to our http://digitalnz.org.nz/ project which aims to tie together NZs National Library, Archives, Museum collections, and private collections. Several of these,
Hosting for private collections: http://thecommunityarchive.org.nz/
The Encyclopedia of NZ: http://www.teara.govt.nz/
Are already Drupal. And more to come.
Jonathan Hunt has developed an interface to the public API http://drupal.org/project/digitalnz

So there is interesting movement in this field where I'm standing. This may not be *exactly* what you are looking at with bibliographic records ... but it's got a lot of cross-over in the requirements, I think. These guys have been using OAI-PMH for their indexing. I don;t know how that compares to FRBR, or if it's even the same domain. BUT my thoughts on your thoughts are the same as my thoughts on this several years ago The "generalized relationship module" discussion - I call it "RDF Metadata"

You are trying to map relationships between entities. And are looking for the right sort of model to do that with. And it's hairy. Drupal7 does not have node relationships in core. But it DOES have RDF!!! You can use THAT to add the hasAuthor or publishDate for all your info. And avoid the whole object modelling issue altogether. You can create nodes that have type=media:Video and stuff like that.

I'm not sure of the best deployment approach for this, though. It's a big ball of string whichever way you approach it. I'm in favour of "patterns" in install profiles myself (define content types, define views) - for the config stage, but you are going to need something custom to hold it all together.

I know (boy do I know) that this approach has drawbacks and overhead. But I think - now that RDF is a reality - it's worth looking at again.

dman: I'm familiar with your existing Drupal work, and it was a pleasure to see your comment!
Currently working on a demo of all the above; should have it ready within a week.

I ran through a lot of these same questions over a year ago when I was developing the Library module for Drupal 6. I ended up with a system where multiple content types could be designated as 'manifestations' (to use your language) and I used a non-node 'item' container for individual occurrences. I was working on a comic book library however - and did run into problems in trying to associate individual issues within a collection.

Add new comment