Skip to content.

The E-Learning Framework

Sections
Personal tools
You are here: Home » Common Services » Harvesting » Scope and Definition

Scope and Definition

Harvesting is most often associated with the gathering of metadata from distributed repositories which expose metadata into another repository, particularly in the OAI context. A 'Harvest interface' is identified by Andy Powell as having the following function: "Makes metadata records available for harvesting (regular gathering). Typically, this service will be invoked in order to harvest metadata records into a local database so that an end-user search or browse interface can be offered or so that the harvested records can be re-exposed for harvesting or searching by other services." [1] This harvest interface might be employed to harvest any of the following: collection description, user preferences, annotations, ratings, identifiers, metadata schemas (e.g. from a metadata schema registry), service descriptions (e.g. from a service registry) and licences [2]. The use of content packaging specifications enable the exposure/harvest of complex objects, extending the use of harvesting to resources along with their associated metadata.

'Harvest' is a single service, with two 'functions' - expose and harvest. For harvesting to occur, there needs to be one or more data sources that data is harvested *from* and a destination data store or repository that data is harvested *into* . In the context of OAI, the Data Provider exposes data for harvesting and the Content Provider harvests this data and uses it to build service(s). A harvesting specification or protocol enables this interoperability between repositories.

The specification in wide use is the OAI-PMH (Open Archive Initiative Protocol for Metadata Harvesting). OAI-PMH "defines a mechanism for harvesting XML-formatted metadata from repositories" [3]. The OAI-PMH mandates unqualified Dublin Core (DC) as its common metadata format. The OAI-PMH also "supports the notion of multiple metadata sets, allowing communities to expose metadata in formats that are specific to their applications and domains" [4]. Other metadata standards include LOM or MARC records, ODRL rights metadata, plus content packaging standards such as METS, IMS Content Packaging and MPEG-21 DIDL[5].

Other protocols and standards that are, or could be, used to harvest, or extract, metadata are HTML and RSS, both used in conjunction with a metadata schema, such as Dublin Core.

Implementations of 'Harvest', will often be working alongside other services, e.g. search.

[1] Andy Powell, 'A 'service-oriented' view of the JISC Information Environment' November 2005, p.14
[2] ibid, pp. 15-17
[3] Open Archives Initiative Frequently Asked Questions
[4] ibid
[3] Andy Powell, 'A 'service-oriented' view of the JISC Information Environment' November 2005, pp. 18-26

[added by Julie Allinson, 2006-02-02]
Created by wilbert
Last modified 2006-04-03 12:56 PM
Specification Links
Associated Files
No associated files
Implementations
No associated components
Reports
No associated reports