Drupal

 

This took me far too long to figure out, but here's how you create a vocabulary, and then a taxonomy field and instance, all through an .install file (or anywhere else you really care to put it). Took a bunch of debugging before I hit on the right magical combination. This is part of a demo I'm working on for the Drupal 7 and FRBR mental model I published last night.

  $libdb_vocabulary_endeavorers = (object) array(
    'name'          => 'Endeavorers',
    'machine_name'  => 'libdb_endeavorers',
    'description'   => t('FRBR Group 2 entities: Persons and Corporate Bodies.'),
    'help'          => t('Enter a comma-separated list of persons or corporations.'),
  );
  taxonomy_vocabulary_save($libdb_vocabulary_endeavorers);

  $vocabularies = taxonomy_vocabulary_get_names();

  $libdb_m_group2_field = array(
    'field_name'  => 'libdb_m_authors',
    'type'        => 'taxonomy_term',
    'cardinality' => FIELD_CARDINALITY_UNLIMITED,
    'settings' => array(
      'allowed_values' => array(
        array(
          'vid'     => $vocabularies['libdb_endeavorers']->vid,
          'parent'  => 0,
        ),
      ),
    ),
  );
  $libdb_m_group2_instance = array(
    'field_name'  => 'libdb_m_authors',
    'object_type' => 'node',
    'label'       => 'Authors',
    'bundle'      => 'libdb_book',
    'required'    => TRUE,
    'description' => t('Enter a comma-separated list of persons or corporations.'),
    'widget' => array(
      'type' => 'taxonomy_autocomplete',
    ),
  );
  field_create_field($libdb_m_group2_field);
  field_create_instance($libdb_m_group2_instance);

I have long been interested in librarian technology and cataloguing, even though I'm far from any sort of expert in the matter. When I started my "real" research into the subject, I jumped on the Functional Requirements for Bibliographic Records (FRBR) bandwagon, as it was a "new" and "great" leap forward in regards to describing and cataloguing the work that makes up a creative effort. Back in 2004, I started coding a Perl implementation of FRBR entitled LibDB, then eventually moved it to a (now unpublished) Drupal module and wrote up a relational database schema for it. The relational schema was the only decent output of both attempts, and it still lives on in the Drupal code repository. But, as things go, the attempt died, though I never stopped thinking about it.

General discontent with much-loved Delicious Library kept the desire to roll-my-own, but the time never arrived, nor do I see it coming anytime soon. After being extolled the wonders of CouchDB in 2009, I figured my first experiment should be with FRBR, and I even started a thread on the mailing list about it. General "eh"-ness with CouchDB, however, had me moving on before anything progressed.

Drupal 7 is "nearing" release and I'm once again thinking about FRBR. 7 now has the ability to add custom fields to its content types, functionality that previously required the contributed module CCK. While CCK, as a framework, had tons of additional third-party modules that mocked up different types of fields, Drupal 7 doesn't, solely because it isn't in the wild yet. I don't consider this bad news, really, because I've always been of the opinion that most of the contributed modules available to Drupal are crap. They scratch itches, certainly, but very few of them are what I'd consider quality productions. So, for me, thinking about Drupal 7 and FRBR is thus constrained to "core" and "my own custom code". Primarily, I'm interested to see just how much of FRBR could be modeled without custom code at all, so I've made some odd decisions to accentuate this. One could even accuse me of "just" making a boring old cataloguing system: regardless, I'm doing it with FRBR's model fully in mind.

The following serves mostly to jot down my notes and experiments.

Group 1 Entities

Like many who sit down to work with FRBR, one of the biggest stumbling blocks is exactly how to define the Work and Expression entities (the WE of WEMI). For my purposes, I'd only focus on Manifestations and Items (the MI of WEMI). You can always tack on WE on top of MI, but you could never implement FRBR without MI. As smallest building blocks go, MI is the place to start.

This leads to the first odd decision I've been leaning toward: Manifestations would be a Drupal content type, but Items would not. To back up even further, there'd be no single content type called Manifestation: instead, I'd implement a content type for Books, another content type for Movies, and so forth. This saves me from complicating the UI with an extra step and extra code: a single Manifestation content type with selectable sub-types would involve complicated form and UI custom code to pull off effectively. Individual content types allow me to define (and share, as necessary) Drupal 7 fields specific to the Book or Movie without custom code based on sub-type selection.

Each of these content types would be a form of Manifestation (an Album manifestation, a Comic Book manifestation, etc.). According to WEMI, the form of these creative endeavors is specified in the Work entity. Since I'm not modeling WE, putting them in Manifestation is good enough - if WE is ever added, the Work would happily inherit the type from M (since a W book could only contain Es, Ms, and Is that were also books, implied or otherwise).

Back to Items. One of the CCK field types that did not make it into Drupal 7 core was node reference: a field to relate the current node to another existing node. Its absence makes some sense from a usability level, as there's never been a clean solution to creating a new node within a reference field itself. Say you create a Book node, and one of the node reference fields is labeled "Author". You dutifully type in "Stephen King", but since you've never defined the Stephen King node previously, it just won't work - you'd be forced to stop what you actually wanted to do and go create "Stephen King" first. There have been a few attempts in CCK-land to solve this problem, but none that are entirely elegant or perfect enough for core.

So, for Items, I can't think of any other way than custom code: a simple relational table keyed to user ID and node ID, with a few fields for location, condition, and notes. There might be an "Add an Item" section on the Manifestation's view page (or even as a second form after the initial node submit), but even this isn't a huge requirement for a first version, since one could assume that if you're adding a Manifestation to your collection, that you also have custodianship over a matching Item. Either way, I don't see the necessary custom code as being anything but rote. I am sure, though willing to be convinced otherwise, that Items should not be nodes: I can't find any compelling use for a node's feature set, and I'd be aghast at 100,000 teenage girls creating 125,000 nodes detailing Twilight Book 1's location on their Favorite Books shelf (with some divining their autographed versions too). Similarly, in a single-user environment, the mental hurdles a non-librarian would have to leap through to justify two nodes for "the single book I bought today" is not cleanly solvable without more custom code than the non-node alternative.

Group 2 Entities

FRBR encourages relationships between entities: if there's a relationship called "Author", then a "Person" is related to a "Book" with relationship type "Author". This allows an unlimited number of relationship types to be created, but also allows relationships to tie any entity to any other entity: this "Work" is related to another "Work" with relationship type "Sequel", this "Corporate Body" is related to this "Expression" with relationship type "Special Effects", and so forth. Drupal 7 has no relationship API built in, so to accomplish the above generic relations, we'd need custom code.

In striving to satisfy the "as much as core as possible" mentality, the Person and Corporate Body of FRBR become another odd decision. For the same reason as Items, Persons and Corporate Bodies aren't nodes either: this time, they're taxonomy terms in a single vocabulary. I have no qualms about this: I am interested in endeavors, not the endeavorers. I have no intention of blurbing each particular entity, or defining their birthdate or address or website, or any such. In Drupal 7, being a taxonomy term doesn't preclude this if someone else wanted to do it (and FRBR does define fields for each of these entities, so it wouldn't be unheard of).

There are two important changes to taxonomies in Drupal 7: one, terms can be modified with fields just like content types (you could add a radio for Person or Corporate Body, a text field for birthdate or website, etc.) and two, a single vocabulary can be applied to the same content type as many times as you deem necessary. This allows us to fake up a hardcoded relationship system.

My mental model asserts a single vocabulary called "Endeavorers" (which sorely needs a better name; "Responsibility" sucks too) which contains terms for both Persons and Corporate Bodies. If distinguishing between the two is a must, we could define and maintain the radio field suggested above. This "Endeavorers" vocabulary is then associated as many times as necessary with a particular content type: once each for "Author", "Illustrator", "Editor", and "Publisher" for Books, once each for "Director", "Producer", "Distributor" for Movies, and so forth. Each field would be an "unlimited" "autocomplete term widget (tagging)"... in other words, you could create as many new entities, or autocomplete existing entities, as needed per field.

This nicely solves the UI problem described above for Items and Stephen King, and has a few additional extras thrown in. We use a single vocabulary for both entities so that a single "Producer" field could contain "J. R. Bookwalter" (a Person) and "Tempe Video" (a Corporate Body). If one were to head to a particular term's URL (ex. taxonomy/term/13), we would see all nodes associated with them, regardless if they were an Author here, an Editor there, or a Gaffer elsewhere. On the other hand, if we ONLY wanted to show nodes where the term was used solely in the Director field, that'd require custom code or, likely, Views. I don't think that sort of filtering would be necessary for the first few versions, nor do I think it's difficult to implement.

Group 3 Entities

Group 3 Entities are handled just like Group 2 entities: a single vocabulary called "Subjects" that would be added to a Manifestation's content type four times: once for "Concept", "Object", "Event", and "Place". This allows a single term to be used in multiple ways: "Shangri-La" might be a concept in this book, but a place in that movie. Browsing to the "Shangri-La" term would show matches for both Book and Movie. I find this particular approach necessary as I just don't trust myself to remember how I categorized the "Shangri-La" in a particular character's drug-induced hallucination from a book I read 12 years ago. Again, filtering down to particular types would come at a later date.

And that completes my current mental map on Drupal 7 and FRBR. Nothing has been implemented, but I'd lean toward a full-fledged module than an installation profile (which could setup all the above, but would suffer from the inability to upgrade existing installs) (see comments). I'm also envisioning another data exchange layer on top of the above, where one would put in an identifier (ISBN, ISSN, UPC, ASIN, etc.) and click a button to prefill the fields. The lack of fieldsets for custom fields in Drupal 7 would cause the above forms to look ungrouped and ugly, but a module could fix that up as well. I might end up actually implementing all the above in a demo site the next time I have a few hours to kill: if anyone is interested in seeing such a demo, don't hesitate to email morbus@disobey.com.

There are a few gaps in what this overall approach can do: it doesn't handle collections or serials nicely (i.e. a book with chapters, or a magazine with articles, written by different people, etc.) and if I were to go down that path, I probably would end up making a second "class" of content type, with a node reference, and there'd be a specific sub-content type per form (Sequence for Comics, Article for Magazines, Chapter/Section for Book, Short Film for movie anthologies, etc.). Also, Persons or Bodies with the same name could be solved the IMDb way ("Stephen King (I)", "Stephen King (II)") or the traditional way (birthdate on taxonomy term, autocomplete tweak to show "Stephen King, 1947-", etc.). And I'm sure I'll find more as I devote more mental slices to it.

For the three people who know what I'm on about: thoughts?

SimpleTest, the test suite that Drupal started using, and then improved upon, has primarily been used to test modules in their own little sandbox, unaffected by the outside world, user data, or client-desired tweaks. This is perfectly fine when you're working on a controlled piece of code, like a module intended for release. When you're building a client site, however, you often have a much more ephemeral set of quality assurances to make: that this CCK node has seven fields, that this field doesn't show up for that particular user role, that "Body" has been renamed to "Description", or that the front page display has a certain set of blocks.

Each of those case scenarios all involve changes to the "in-progress" database, the one where the client is adding content, or Views are being configured, or blocks are being made and placed, et cetera. Since SimpleTest's default goal is to start fresh for each particular test method, creating new database tables that have no content in them, you'd never be able to test any of the tweaks the client wants on their live site. If someone accidentally deleted a field from a previously perfect content type, SimpleTest wouldn't catch it.

Thankfully, we can fix this by overriding the setUp() and tearDown() methods that our test classes inherit from DrupalWebTestCase - these methods normally handle the creation of the fake database and the cleanup of any fake created data. (Note: this assumes you're using SimpleTest 6.x-2.x or Drupal 7: earlier versions will not work.)

class FunctionalGenericTestCase extends DrupalWebTestCase {
  function getInfo() {
    return array(
      'name' => t('Generic functionality'),
      'description' => t('Generic user functionality tests for custom code.'),
      'group' => t('Trellon Development'),
    );
  }

  // for our functional testing, we want to test the pages and code that
  // we've been generating in the real database. to do this, we need to
  // ignore SimpleTest's normal fake database creation and fake data
  // deletion by overriding it with our own setUp and tearDown. NOTE that
  // if we make our own fake data, we're responsible for cleaning it up!
  function setUp() {
    // support existing database prefixes. if we didn't,
    // the prefix would be set as '', causing failures.
    $this->originalPrefix = $GLOBALS['db_prefix'];
  }
  function tearDown() { }

  // ensure items from mocks exist.
  function testFrontpageChanged() {
    $this->drupalGet('');
    $this->assertNoText(t('Welcome to your new Drupal website!'),
      t('Default Drupal front page has been changed.'));
  }

  // check for all end-user fields.
  function testContentTypeFields() {
    $this->drupalLogin((object)array('name' => 'authed', 'pass_raw' => 'ahem'));

    $this->drupalGet('node/add/story');
    $this->assertNoRaw('',
      t('"Body" renamed to "Story" per content-type-outline.pdf (2009-02-18).'));
  }
}

As suggested in the comments, by overriding tearDown() we are now responsible for cleaning up any fake data that our tests create. To lessen the effects of this, we plan to create a number of standard test users - one for each client-required user role - that we can use drupalLogin() to become, as opposed to running a drupalCreateUser with a custom set of permissions (where we would then be responsible for deleting the created user and the role). We don't see this as anything too upsetting: these types of tests are all per-client anyways, so while we hope for some reuse (such as the useless default front page test above), the benefits of using SimpleTest for functionality tests such as this outweigh them.

Since the setUp() and tearDown() methods are per-class, we still have the ability to test any custom modules (or complex algorithms, etc.) in a fresh/sandbox environment - we'd just define a new testing class and leave out the overrides.

Note that this approach still emphasizes data and structure over actual display: tests would happily claim that Body has been renamed to Story, but wouldn't be able to tell us that an errant piece of CSS has caused that entire input field to be hidden from view. A human would still have to manually eyeball display issues (caused by CSS, etc.), as well as test any JavaScript functionality.

My Drupal IRC bot.module received a new release today, bringing it to 6.x-1.1:

  • bot_seen ping prevention matched inside strings; now only word boundaries.
  • #284666: We now use preg_quote() for various nickname escaping.
  • #349245: bot_tell doesn't consume username whitespace (thanks drewish).
  • #356003: bot_tell sorts queued messages by oldest first (thanks litwol).
  • #343245: Better regex for usernames like betz--; supports betz---- now.
  • #338723: Missing decode_entities() on project statuses (thanks RobLoach).
  • #313025: Better regex for log filtering to prevent substring matches.
  • #300206: Better factoid-ignoring of tell-like messages (thanks RobLoach).
  • #275042: Randomized messages now centralized in bot_randomized_choice().
  • #274888: Move all INSERTs and UPDATEs to use drupal_write_record().
  • #218577: bot_tell.module added (thanks Rob Loach). Additions/changes:
    • pending message queue now exists to remove SELECT on every message.
    • received messages use format_interval(), not a date (thanks litwol).
  • bot_factoid: PM a factoid with "BOTNAME: tell <who> about <factoid>".
  • #190825: Get URL to current logged discussion with "BOTNAME: log pointer?"
  • bot_name_regexp() now exists for matching inside a regular expression.
    • #117876: if bot has a nick clash, it now responds to both nicks.
    • #184015: bot name with non-word characters failed regexp addressing.
  • #137171: bot_karma.module added (thanks walkah/Rob Loach). Additions:
    • patch supported only words: committed version supports phrases.
    • if someone tries to karma themselves, the response is customizable.
    • drupal_write_record() is your friend; get used to using it!
    • "BOTNAME: karma foo?" is required to prevent bad parsing.
    • highest/lowest karmas are available at example.com/bot/karma.
    • terms less than 3 and more than 15 characters are ignored.
  • #267560: OS-specific newlines broke comparisons (thanks Gurpartap Singh).
  • #245610: bot_agotchi greeting triggers now customizable (thanks Alan Evans).
  • #184032: ignore improper hook_help implementations (thanks John Morahan).
  • #229880: bot_factoid stopwords were case sensitive (thanks John Morahan).
  • #187137: Drupal 7 style concats, and other style fixes (thanks dmitrig01).
  • #167097: fixed undefined index and better host check (thanks czarphanguye).
  • #142812: auto reconnect and retry are now configurable (Morbus/Shiny).

I've also recently taken over maintenance of Node Adoption, which I've upgraded and released as 6.x-1.0: "Node Adoption allows you to automatically reassign nodes created by a deleted user to another user of your choice. Similarly, a form is provided to change ownership of all nodes from one user to another at any time. Node Adoption was originally maintained by Mark Dickson (ideaoforder) and sponsored by The Chicago Technology Cooperative. As of Druapl 6.x, Node Adoption is maintained by Morbus Iff."

Back on October 7th, 2007, I wrote that I was a judge over at Adrian Hon's newest project, Let's Change the Game, "a competition to fund development of an [alternate reality game (ARG)] that would raise money for Cancer Research UK". Besides building the Let's Change the Game site (in Drupal), I continued my involvement in the project by becoming an advisor to the winning team, Law 37. Now, a year later, the winner of that competition has just launched the alternate reality game Operation: Sleeper Cell, another Drupal site:

"Operation: Sleeper Cell will see teams of players from around the world working together to solve 'puzzle cells' in a grid. By donating money to the game, they can unlock extra cells for all players, and also advance the story, which takes place over websites, blogs, Twitter and even in real life."

My advisor role largely played to "how do ya do this in Drupal?" so, gladly, I've remained out of the content, missions, and puzzles produced. Gladly because, with the site launched, it looks so tasty that I'm quite happy to be along for the ride with all the other players. I hope to be sponsoring some cells, with proceeds donated to cancer research, sometime soon. Follow the progress of, or sponsor, team #swhack.

Operation: Sleeper Cell launches as another alternate reality experience closes: Liberty News, a companion to the BBC's Spooks: Code 9 from Kudos. The site was created by Adrian Hon's Six to Start and was built in Drupal by yours truly. Unfortunately, an IP filter denies non-UK residents, so you'll need to use Anonymouse.org to see it.

Over at Drupal Tough Love, chx and I just reviewed Signatures for Forums 5.x-2.3 which "provides user signatures that will be familiar to users of popular forum software" such as "the administrator can choose the input filter for signatures", conditional signatures that are hidden "if a post is under a particular length", and showing the signature only once per conversation.

Friend Mark Bernstein promotes "software as craft" with the phrase NeoVictorian Computing. Jeremy recalls that "Part of his argument is that software creators have something to learn from the ideals of the arts and crafts movement: the software world is full of soulless bits and bytes, and maybe we would all be a little happier if we embraced handcraft ... During the talk, I remember Bernstein proposed that software creators should sign their work as a painter signs a painting, which is a lovely visual metaphor that I hope to keep around." And Greg Wilson has a book called Beautiful Code.

Happily, I already agree - they're all echoes of my own belief in "code shui", be it XML (a Morbus Rant from 2002 on "why beauty is important in computer file formats") or in code from 2004 ("His style is quite unique. [Morbus' AmphetaDesk] source reads almost like a paper, instead of terse code. He documents his code well and I've thus far found nothing that was very hard to understand. Best of all, its so un-Perl. He doesn't seem to use really clever tricks to do simple things, so the code has been very easy to understand").

In the Drupal content management system, a "node teaser" is small bit of content used to encourage you to "read more" of the post. Drupal can set the teaser to the entire length of the post (typically used for blogs where you don't need extra click-through), or can automatically generate the teaser to a specific character length. In the past, you could also manually generate teasers by including <--break--> in the node's body. In Drupal 6, manual teaser definition has been improved with JavaScript wizardry, along with a new checkbox: "Show summary in full view".

But there's a small problem with the use of the word "summary". Generally, when a Drupal teaser is included in the node's full view, it's because it's the introduction of the node itself, not necessarily a teaser or summary of the entire body. Over at gamegrene.com, a node's teaser is, in fact, a summary of the node, and is also displayed on the full view itself. It's not the first paragraph of the article but, rather, is styled differently to provide an overview of what you'll be reading. IBM uses the same model at their developerWorks.

If you placed a "summary" at the beginning of the node's body, unstyled, readability would tend to suffer - you'd have the summary (node teaser), and then, theoretically, the introduction (node body), with no clear indication that two different types of content, with two different purposes, are being served.

As I've been working on moving Gamegrene to Drupal 6 (in time for Dungeon and Dragons 4th Edition, coming June 7th), I had to solve the problem of: how do I theme the teaser differently than the body inside node.tpl.php? By the time the template gets the node data, only $body and $content exist; $content only contains the teaser (for list views) or body (for full views). The teaser never exists in a node's full view as its own variable.

To solve this and get the same view as seen on IBM's developerWorks, I used themename_preprocess_node() to detect if a teaser has been manually set and that the "Show summary in full view" checkbox has NOT been enabled. When that checkbox is checked, Drupal automatically adds the teaser to the node's $body (or $content) - it treats the teaser as the introduction to the post, not an actual summary of what you're reading:

function phptemplate_preprocess_node(&$variables) {
  // we like to display teasers on the node view pages in a different style,
  // but only if they were NOT set to "show summary on full view" (which seems
  // backward, but the implication with that checkbox is that the teaser is
  // PART of the node's body, instead of an actual summary of the entire
  // node's body). if a node's unbuilt body starts with <!--break-->, then
  // a teaser has been manually set, and "show summary" is not checked.
  if ($variables['page'] == TRUE) { // only do this on full page views.
    $node = node_load($variables['nid']); // we reload the node because
    // by the time it gets here <!--break--> has already been filtered out.
    // this if logic stolen from node.module's node_teaser_include_verify().
    if (strpos($node->body, '<!--break-->') === 0) {
      $variables['style_teaser_differently'] = TRUE;
      $variables['teaser'] = check_markup($node->teaser, $node->format, FALSE);
    }
  }
}

Note that the extra node_load() is nothing to worry about - since the node has already been loaded earlier in this execution, node_load() will happily return a cached version, saving us any performance concerns.

Now, it's just a matter of displaying it in node.tpl.php:

<?php if ($style_teaser_differently) { ?>
  <div class="node-summary"><?php print $teaser; ?></div>
<?php } ?>

Comments and concerns? Note that, for my particular needs, I wanted this entirely in a theme - I'm not changing data or its structure, merely its display, so doing this sort of stuff in hook_nodeapi() with a module's overhead would be a little much.

All in all, the response to Drupal Tough Love, the new code reviewing blog from chx and I, has been quite favorable, and we've already got a queue of at least a dozen submitted modules to look over. I had a grande chuckle at Amy Stephen's post on it over at OpenSourceCommunity.org:

I lift this service up because it's a perfect example of a functioning community ... What I am trying to say to those of you who are considering this service, but are not quite in touch with the inner geek inside, is this --> don't ask these guys if they think you are fat if you have your box of big clothes out and your skinny clothes are pushed in the back of the closet. It would be just like asking Simon Cowell if you can sing ... Károly and Morbus are not only acknowledged for who they are, and they are not only accepted for who they are, but who they are is celebrated. So, rock on, Drupal community!

We all make mistakes; that's how we learn. Sometimes, though, we need someone to point out our mistakes, to sift through the chaos that is Drupal's contributions repository. Inspired by jpoesen's comment on my code quality entry, chx and I have taken up the task of giving some tough love to Drupal's greatest strength: the army of developers using its APIs. Want your own code publicly reviewed at DrupalToughLove.com? Let us know!

Syndicate content