fedora object repository

I had a meeting with some people who have a lot of experience using the fedora project, to give me a better idea of the features, possibilities, strong points and possibly weaknesses of the project. First a short intro (from their final page).

Fedora is open source software that gives organizations flexible tools for managing and delivering their digital content. At it’s core is a powerful digital object model that supports multiple views of each digital object and relationships between digital objects. Digital objects can encapsulate locally managed content or reference remote content.

Fedora has quite a few interesting features.

  • Digital objects can be stored. In principle it treats object streams, which could be either inline xml (validated xml), managed content (bytestream) or external (a uri).
  • Data can be manipulated using functions, what they call “disseminators”. They are web services which can manipulate the objects. Some examples are an xml transformation, a transformer which can convert an image to a thumbnail, or to a black and white picture.
  • Data is versioned. When an object is updated, the previous versions stay in the database. Other versions can be accessed using the timestamps.
  • Objects can be linked with extra information (like Dublin Core metadata), or (more generally) to other objects. As such, fedora behaves as a triple store linking objects using a relationship and can be used for storing semantic relationships. There is a powerful search engine which can search based on the relations.
  • All operations (changes on the database) are atomic.
  • Fedora could be used as an object proxy as data can be remote. However, extra data (metadata) can be added to the link. Fedora does not cache remote objects.
  • There is powerful rights management (based on XACML). Each object and each disseminator can be given their own access control list.
  • Powerful search engine, based on either a full text search engine or searches on the metadata, on the dublin core fields. For the full text search, the data streams are indexed. This already works for formats like tex and pdf but can easily be extended.

To all these features there is one downside. The system knows no transactions, so I would not recommend storing volatile data. I have heard of plans to introduce a xforms layer on top of the data which is stored in objects. This would be creating some kind of simple application creation feature on top of the objects. Though this would be simple and relatively easy to change, there would be update problems (and lost changes without warnings). I would rather recommend storing that information in a traditional relation database as it would result in a more robust application which will handle better under load. However, once these applications store data a blob or clob data, then using fedora instead of the blob features of the database could offer some advantages. Ah, more ideas for improving equanda

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen