Looking for Archivematica the software project? see: archivematica.org


archivemati.ca header image 1

Archiving for CMS presentation at OSCMS Summit

February 7th, 2006 · No Comments · System Architecture, System Requirements, Web 2.0

Today I gave a presentation on archiving for CMS at the Open Source Content Management Systems shindig (OSCMS Summit) here in (currently sunny) Vancouver.

This is a PDF copy of my presentation slides: OSCMS Summit: Archiving Presentation (54kb)

We started the presentation with an introduction to some of the basic digital preservation issues that I deal with in my work as a digital archivist. I was surprised at the level of interest in legal requirements for recordkeeping and records retention scheduling during this part of the session (the day-to-day bane of archivists and records managers, including some of my colleagues from the City of Vancouver and Simon Fraser University who came down to sit in on this session at the ‘geek conference’). I had assumed that attendees at an open-source content management system conference would be more interested in talking about maintaining the long-term functionality and usability of web content rather than issues around authenticity and legal liability.

We moved on to talk about how digital preservation requirements could be addressed in the context of systems that manage and post content to the web (e.g. content management systems, blogs or wikis).

We talked about the two basic techniques that are available to us today (1) system backup and restore procedures and (2) snapshoting using a webcrawler such as HTTtrack.

We then discussed future directions that could improve upon those techniques, namely (1) incorporating data archiving and retention rule worfklows into web content management systems as well as a (2) a standardized preservation format for CMS content types (e.g. posts, articles, static pages, comments, etc.). This led to a discussion about open and/or public file format standards (i.e. PDF-Archival, OpenDocument, structured blogging microformats, etc.).

We closed by talking about how syndication formats and techniques (e.g. RSS, Atom, XML-RPC) might be used to help in the capture of web content for the purpose of archiving, both from the viewpoint of a website administrator and from the viewpoint of a content owner that has web-based content distributed throughout the web 2.0 universe (e.g. Flickr photos, Blogger posts, del.icio.us tags, etc.).

For me personally, the future direction topics are just at the brainstorming stage right now but I hope to incorporate them at some point into the system prototyping that I am doing as part of my research into digital archives access.

The first day of the OSCMS conference, by the way, was a big success. We’ve got developers, administrators, and designers crammed into the three session rooms at UBC Robson centre for some very stimulating and informative sessions. Kudos to Boris Mann, Roland Tanglao and the other organisers for getting this great ‘summit’ off the ground in such a short period of time. Looking forward to the next couple of days.