Looking for Archivematica the software project? see: archivematica.org

archivemati.ca

archivemati.ca header image 1

Zend Search Lucene, Symfony and the ICA-AtoM application

March 8th, 2007 · 25 Comments · ICA-AtoM

About six months ago Dave Dash posted a great little tutorial demonstrating how to integrate the Zend Framework’s Search component into a Symfony application. That is exactly what I did for the ICA-AtoM archival description application that I am developing using the Symfony platform. Now that I am working on the next version of this application I have upgraded to the latest version (0.8.0) of Zend Search. This upgrade adds proximity, grouped and boolean searches as well as term rank boosting to the ICA-AtoM application. At the end of this post I have some tips to add to Dave’s tutorial that anyone upgrading from older Zend Search versions should be aware of.

The Apache Lucene search engine is probably the most widely adopted open-source search engine. It is, in fact, gaining huge popularity in the digital collections world (as evidenced, for example, by the Lucene workshop at the recent Code4Lib conference). As Mark Jordan noted in his report on the Access 2006 library conference, “Lucene is emerging as the indexer of choice for a number of open source and commercial products since it provides fast searches, can search across separate indexes, and facilitates faceted browsing.”

The beauty of the Zend_Search_Lucene component is that is intended to be a direct PHP port of the Java-based Apache Lucene search engine. The index files it generates are natively accessible to Java-based Lucene applications as well as handy little Lucene utilities like Luke. So we’ll be getting all the functionality of a world-class search engine with the power and flexibility of the web-ready, object-oriented PHP5 language. [Read more →]

→ 25 CommentsTags:

Archives Access System Glossary, v1.0

February 26th, 2007 · 8 Comments · PhD Research, Terms & Definitions

Over the past few weeks I have worked and posted my way through definitions and explanations of some of the key concepts that are relevant to my research into archives access systems. I have compacted and combined each of these definitions into a working glossary that will be part of my system design and requirements documentation for a prototype archives access system.

This glossary replaces some of the definitions that I posted to this blog in late Fall 2005 when I was establishing the conceptual background for the research (see the Terms and Definitions category in the blog archive).

This is only the first draft of this glossary. There are still a number of other critical concepts that I need to study as part of my research literature review (e.g. archival description, collection, etc.). Over the course of my research I will update this glossary with new terms and any revisions of the existing definitions. [Read more →]

→ 8 CommentsTags:

O.K., Let’s Make Sure ‘Access’ is in that Definition.

February 19th, 2007 · No Comments · PhD Research, Terms & Definitions

Last week my friend and colleague Ian McAndrew humoured me by reviewing my definition of archival materials. Ian’s sharp mind immediately picked up on a couple of potential cracks in my definition and examples.

In particular, he pointed out that I needed to make an explicit link between preservation and access and he is absolutely right. I am so entrenched in the viewpoint of archives access systems (i.e. working under the assumption that the archival materials I am talking about are going to be made available by an archives access system) that I neglected to make this explicit in my definition. However, I can correct this error by adding a couple of words to my definition for archival materials:

archival materials are objects in any form that record information which is preserved for future access and use as a memory aid or proxy for a past event

[Read more →]

→ No CommentsTags:

The Anatomy of a Digital Information Object

February 12th, 2007 · No Comments · PhD Research, Terms & Definitions

From a theoretical point of view there is no difference between a digital and an analogue information object. As I explained in previous posts about the concept of information and information objects, both are entities that contains the content of a message and have the required structure and context to allow that message to be decoded and understood.

However, in practice, it is much easier for an analogue information object to carry the content and structure of a message forward in space and time because these are intrinsically linked to the information’s medium. For example, if information is inscribed on a stone table or a sheet of paper then the information’s content and structure will likely survive as long as the actual tablet or sheet of paper survive. [1]

However, digital information objects are much more complicated to preserve and keep accessible over time because their relationship to their storage medium is much more ephemeral. The content and structure of a digital information object is not easily contained within a single physical object like a sheet of paper. Instead, the binary inscriptions of a digital information object are dependent on a complex chain of encodings and electrical components for rendering. [Read more →]

→ No CommentsTags:

Information as an Object

February 5th, 2007 · 2 Comments · PhD Research, Terms & Definitions

Last week I wrote a post that looked in more detail at the concept of information as a way to help define archival materials. I am going through this process of definitions and explanations to help establish the scope and context for my PhD research on archives access systems. These software-intensive systems provide online users with contextual information about collections of archival materials, allow them to search and browse for archival materials, learn more about their context of creation, management and use, identify their storage location, and request their retrieval.

The key characteristic of archival materials is that they preserve information for future use. As discussed in the previous post, information is a set of related signals, symbols or patterns that communicate a message which is received with the requisite contextual knowledge to decode and understand it.

In order for archival materials to preserve information for future use, the message that communicates the information must be recorded so that at some point in the future it may be retrieved and re-communicated or re-experienced. This requires that the message transmission is captured and converted into an object that can be carried forward through space and time. This brings us to the concept of an information object. An information object is an entity that contains the content of a message and has the required structure and context to allow that message to be decoded and understood. [Read more →]

→ 2 CommentsTags:

What is Information Anyway?

January 29th, 2007 · No Comments · PhD Research, Terms & Definitions

Last week, I posted an access-based definition for the concept of archival materials to help establish the scope and context of my research into archives access systems.

Like most definitions in the archives and records management literature, it leans heavily on the concept of information. However, this term is seldom defined further. It is usually expected that the author and reader share the same understanding of what is assumed to be a universal concept. Therefore, it is necessary to ask “what is information?” [Read more →]

→ No CommentsTags:

Archival Materials: A Practical Definition (con’t)

January 23rd, 2007 · 1 Comment · PhD Research, Terms & Definitions

Yesterday I posted and explained a practical definition for archival materials that works within the scope and context of my research on archives access systems: “archival materials are objects in any form that record information which is preserved for future use as a memory aid or proxy for a past event.” The broad nature of this definition requires some further elaboration to distinguish archival materials from the entire set of all recorded information (through the criteria of preservation) but still defines it broadly enough that it may include the variety of published materials which are often part of archival collections. [Read more →]

→ 1 CommentTags:

Archival Materials: A Practical Definition

January 22nd, 2007 · 4 Comments · PhD Research, Terms & Definitions

In my PhD research I am investigating the nature of archives access systems and the enabling technologies and practices that can be used to improve these.

I am currently in the process of establishing the scope and context for my investigation. This requires that a number of key concepts are defined so that I am using terminology that is internally consistent within the research, to properly identify the nature and characteristics of the entities which are under investigation and to facilitate the communication of research results to archival professionals as well as laypersons (i.e. users of archives access systems and personal digital archives).

One of the more critical definitions is for the actual objects that are made available via archives access systems. I am referring to these as ‘archival materials.’ As I explain below, I’ve chosen to move away from some of the more traditional definitions for records and archives to establish a practical, access-based definition for ‘archival materials‘ that better reflects the reality of the types of information objects that are being made available via archives access systems today and into the foreseeable future. [Read more →]

→ 4 CommentsTags:

Merry Christmas…from Santa’s IT staff

December 22nd, 2006 · No Comments · News

source: Craig Castelaz (2004) - Creative Commons: Attribution-NonCommercial-ShareAlike 2.0Merry Christmas from Santa and his IT staff, who keep his global gift production and delivery operation running 24/7.

I wonder if they archive the naughty list?

→ No CommentsTags:

The Information Model to End All Information Models

December 8th, 2006 · 4 Comments · ICA-AtoM, System Architecture, System Requirements

I am now beginning work on the second alpha iteration of the open-source ICA-AtoM software application.

I am working on a considerable upgrade to the underlying object and database model which supports the application. As the application matures and evolves it will be relatively easy to make changes and updates to the application modules and user interfaces. However, it will be difficult to make significant changes to the database model once the application goes into active deployment and people start entering information. Therefore, I better get this right now…

ICA-AtoM must be flexible enough to support a wide variety of potential uses. Firstly, as an archival description package for individual archival institutions as well as a union catalog application that can combine descriptions from multiple repositories (e.g. http://humanrightsarchives.org).

Beyond that, I have high hopes that ICA-AtoM can evolve to become an even more universal information cataloging tool that can be used, for example, to manage personal digital archives (e.g. family photos, home movies, music collections, etc) as well as other reference resources that typically support archival collections (e.g. encyclopedias, dictionaries, bibliographies, reference libraries, web links, etc.).

Therefore, I want to be sure that the ICA-AtoM information model supports the International Council on Archives standards out-of-the-box. At the same time, I want it to be flexible enough to support additional information cataloging and classification needs. I want it to be open, extensible and to anticipate the future wave of semantic information organization and sharing. [Read more →]

→ 4 CommentsTags: