Skip to content.

The E-Learning Framework

Sections
Personal tools
You are here: Home » ELF Project Directory » Course Information Group » Curriculum Content Harvesting, Conversion and Publishing

Curriculum Content Harvesting, Conversion and Publishing

Many HE and FE institutions are attracted to the concept of a definitive data source of curriculum specifications from which relevant content can be extracted to meet the needs of multiple audiences and communication channels. However, the reality tends to be a host of word-processed documents maintained by different individuals. This page demonstrates that curriculum content built up on- or off-line can provide the basis for definitive XML data that can be transformed to meet varied needs.

Experience from the XCRI partnership (and further afield) suggests that mechanisms for maintaining definitive curriculum content vary, but many institutions use familiar Microsoft Office tools (usually Word) to create and maintain specification documents. These documents are often built up over several weeks following discussion and review involving course teams, service managers, quality assurance personnel and others. This incremental build leading to a predominantly paper-based validation or approval event has tended to make users favour Microsoft Word over online form-filling. Recently, with its Office 2003 release, Microsoft has blurred the line between the two with the InfoPath product. (The promotional video for InfoPath demonstrates its potential for on- and off-line working with content validation and back-end systems integration.) In another recent development, student record system vendors have started to add quality assurance workflows to incentivise on-line form-filling of a definitive curriculum database.

For institutions looking to maintain and transform definitive curriculum content, there appear to be three main options:

  1. Online data entry into a back-end database - student records systems vendors, such as Distinction Systems, are now offering web-based Curriculum Management Systems that surround online database forms with quality assurance workflows for the creation, management and approval of courses and units
    • The Nottingham Trent University case study demonstrates how a student records system can be extended through custom development and online forms to provide a definitive source of programme information with workflow support for the creation and maintenance of data therein
  2. Using Microsoft Word styles to separate monolithic content into constitutent section components and then extracting and converting the component content that appears within each style tag
    • The London Metropolitan University case study demonstrates how this approach was used to transform thousands of unit documents in a multiple-step conversion process
    • Although not in the area of curriculum documentation, it is useful to contrast London Metropolitan's batch external processing approach, with the document Macros approach used in the CourseGenie e-learning materials product to convert content marked up with Word styles into IMS/SCORM XML
  3. Using Microsoft InfoPath forms to enter and validate XML content directly
    • The Fife College case study demonstrates how this approach met their Marketing Department's requirement for an editable single source that could be converted for the web and printed prospectus

Case Studies

Nottingham Trent University

Nottingham Trent University (NTU) has a student population of 26,000 and positions itself as one of the leading new universities in the UK. In its 2004 Institutional Audit, the Quality Assurance Agency (QAA) stated that: "Broad confidence can be placed in the soundness of the university's current and future management of the quality of its academic courses and the academic standards of its awards". NTU's Centre for Academic Standards and Quality (CASQ) has been keen to underscore this success with a robust electronic tracking and management system for its academic courses. Scheduled for release in October 2005, the system is intended to:

  • ensure that all publicly available information relating to the University's programmes is accurate, complete and compliant with with the QAA Code of Practice, especially the section on Recruitment and Admissions
  • ensure that the status of programmes is clear to all applicants, i.e. whether they have completed, or are in the process of completing, a formal approval process
  • enable CASQ to track systematically the approval of programmes and to sign-off the release of categories of information to different users at different times
  • ensure that all relevant NTU users receive notification of developments at particular stages in the approval process
  • simplify the updating and sign-off of material for public information purposes
  • enable different NTU users to draw upon accurate and up-to-date programme information for their own purposes
  • generate templates and cover sheets at different stages in the approval process for use by schools, Colleges and approval panels/committees
  • enable the University to meet its Teaching Quality Information (TQI) obligations, through automatic linking and updating of Programme Specifications to the TQI website (www.tqi.ac.uk).

Approach

The initiative is driven by the concept of a single, definitive source of approved programme information captured through automated support for the programme approvals process. CASQ is the designated custodian of this information, sanctioning where necessary and appropriate, its copy-(re)writing for different audiences, for instance by Marketing staff for prospectus entries. To minimise re-writing, programme information is structured into re-usable fields and guidance notes encourage those supplying the information for each field to provide student-friendly text. To ensure that the initiative delivered what the business required, the project team adopted a business-process-led approach:

  1. Model the process for approving new programmes
  2. Identify the information requirements for each step in the process
  3. Map those requirements to existing information systems
  4. Devise suitable solutions for closing any gaps and a scaffolded workflow to support data entry

Information Requirements

Analysis of the business process information requirements revealed a core set of programme information that needed to be maintained:

  • Basic Programme Information
    • Awarding institution
    • School/Campus
    • Final Award, Programme Title and Modes of Study
    • Normal Duration
    • UCAS Code
  • Overview and general educational aims
    • Introductory text emphasising why the programme matters
    • Special features
  • Programme outcomes
    • Knowledge and understanding (noting any specific benchmark references)
    • Skills, qualities and attributes (noting any specific benchmark references)
  • Teaching and Learning Methods
    • Overview of methods emphasising how outcomes will be achieved
  • Assessment Methods
    • Overview of methods emphasising how intended learning outcomes are to be demonstrated
    • Note of any programme-specific assessment regulations / accrediting body requirements
  • Programme structure and curriculum
    • Overview including details of exit/interim awards
    • Details of placement experience (overview, year of programme, certification details)
    • List of approved core modules for each level
  • Admission
    • Statement about admission criteria and applicant qualities necessary to benefit from the programme
    • Codified entry requirements (Set, Subset, Qual, Desc, Subject, Grade/Tariff)
  • Support for Learning
    • Summary of pastoral and academic support and guidance mechanisms
  • Graduate destinations/employability
    • Overview of how students are prepared for the world of work
    • Principal graduate destinations and employment trends
  • Programme standards and quality
    • Indicators of quality and evidence from student feedback, employers, etc
  • Additional Information
    • Collaborative partner(s)
    • Relevant QAA benchmark statements
    • Professional, statutory or regulatory bodies (PSB) accreditations
    • Implementation date
    • Other

Solution

Key to the development was ensuring that the system integrated with the existing programmes data held in the Banner 2000 student system developed by SCT. NTU decided to extend the standard SCT model to include the additional fields. A series of Oracle Forms were then created to provide a scaffolded dialogue for entering the required programme information into the definitive banner database.

Recognising the culture shift required to make the initiative work, the team split the project into two initial phases:

  1. Core data entry
  2. Workflow and automated alerting and communication procedures for programme approval/modification

The project follows the CASQ mantra of "consistency not conformity" and it is hoped that the benefits of definitive programme information, from which all required documentation can be produced for quality assurance purposes and annual programme monitoring activities, will convince programme teams of the merits of the culture shift from loosely-structured word-processed documents to carefully-scaffolded online form-filling.

Conclusion

It is too early to write the conclusion for this case study as the system is scheduled for "go-live" in October 2005. Signs are encouraging: the project involves a multi-disciplinary team headed by a highly experienced senior academic; it enjoys support from the highest level of the university and has adopted a business-process led approach. A pragmatic, in-house solution has been adopted to extend the student record system with data captured through scaffolded on-line form-filling. Institutions with systems from other vendors are unlikely to enjoy such flexibility, but the desire for a single, definitive database of programme information controlled by a university's quality assurance people and processes is likely to resonate widely.

Case study derived from an interview with Professor Paul Periton, Head of CASQ, and Francis Lowry, Directory of IT Development at Nottingham Trent University



London Metropolitan University

London Metropolitan University was formed out of a recent merger between the University of North London and London Guildhall University. Following the merger a new, comprehensive Undergraduate Scheme has been developed. This has necessitated a complete revision of the specifications of courses and modules offered to undergraduates which has, in turn, required that the courses and modules be re-validated through a full quality audit process.

Background

As the numbers of specifications (600 courses, 2,600 modules approx) is large, and they needed to be validated in one process, the Quality Unit at LondonMet began to look for ways to make the process more efficient. Up until this point, specifications had been submitted as paper documents, prepared with the author's favourite word-processor (typically some version of Microsoft Word). The Quality Unit approached the Intranet Services Team with a proposal to try to automate some of this process. Some broad requirements were outlined at this point:

  • electronic versions, rather than paper copies, of the documents would be gathered
  • the documents would be authored in a structured way
  • it would be possible to author the documents offline
  • it would be possible to revise the documents (version control would be necessary)
  • the documents would eventually feed into prospectuses, course-catalogues on the web site, the Student Record system etc.

The Intranet Services Team began by outlining the major functions of a system to handle this new validation process

  • authoring
  • document storage
  • version control
  • work-flow
  • parsing
  • validation
  • re-publishing
It was quickly decided that the documents, however they were authored, would be converted into XML at some stage before being parsed and transformed for different target systems. We investigated the state of electronic forms technology, but could find no affordable solution which met our authoring/revision requirements. We identified the following options:
  • author, gather and validate the specifications as MS Word documents, and then convert these into XML using some third-party or bespoke conversion utility
  • use MS Office 2003/Infopath to XML from MS Word
  • write a bespoke forms-based application with a back-end database offering version control etc.

Solution

We adopted the first option because:

  • we already have a good document management system (Livelink)
  • the specification authors were accustomed to using MS Word to write their specifications
  • the University had no plans to roll out MS Office 2003/Infopath to staff

Approach

We began by formalising the structure of course and module specifications. We were able to completely standardise the modules specifications as they are relatively straightforward. The course specifications are more complex, so although a general structure was agreed for these, it could not always be consistently applied.

A set of MS Word templates were created, which provided a form for specification authors to use. The form was highly prescriptive for module specifications, less so for courses. The completed specification documents were submitted to the document management system as email attachments - from here they were managed by the Quality Unit as they were fed into the validation process. While being validated, the documents were only available to the authors and the Quality Unit staff. Once validated, the documents were released to the Intranet Services Team to begin the process of conversion to XML.

We evaluated some commercial MS Word to XML conversion utilities, but found nothing really suitable. We also looked at using OpenOffice.org to perform this task. All of these tools take the same (and only realistic approach) which is to use the style information encoded in the MS Word document to derive a sense of structure. For example, it may be implied that a section of the document styled as a 'header' introduces a new section of content which continues until the next new 'header'. However we found that, even with our prescriptive MS Word forms, we could not rely on the documents having the correct styles in the correct places as often MS Word content had been 'pasted' into the form from other, non-regulated MS Word documents.

The solution we adopted was to use OpenOffice.org to batch process the MS Word documents, converting them into plain ASCII, and then to write a custom parser to build XML documents out of the plain text files. This was possible because the structured form used to write the documents had fixed headings for each section which could be simply read by the parser. Line feed characters in the text were preserved in CDATA sections so that they could be re-used during XSLT transformations later in the process. OpenOffice.org provides an RPC type interface which allows you to run the application as a server - you can then call methods on this server to do conversions etc. Some source code outlining this functionality is here. You will need OpenOffice.org versions 1.x - this has not been tested with version 2.x

The process work-flow is outlined in the following diagram:

An example of the XML resulting from a specification:

Outcome

After a long period of revising the MS Word templates and debugging the Java code, we did achieve the primary objective of validating the courses and modules for the entire scheme. The automatic conversion and parsing of specification documents was very successful, with a success rate in the order of 98%. A few specifications could not be parsed because of illegal characters which had been pasted into the documents. The parser was configured with complex regular expressions to try to catch such anomalies, but a few slipped through. The number of failures was sufficiently small for it to be more efficient to amend such cases manually, rather than add to the growing exception-handling code.

The resulting xml was of sufficient quality to populate the University's web-based module catalogues. This involved calling a set of web services in the API for the University's content management system. The complexities of the course specifications meant that a degree of manual intervention was necessary to populate some of the data. However, the core course specification content was successfully processed, converted to XML and used to generate HTML pages in the same way.

The XML generated from the module specifications was also used to populate tables in the Student Record System (SRS). This was achieved by running an XSLT process to extract the content of specific tags from the XML documents to prepare a large CSV file which could then be imported into the relational database underpinning the SRS.

One interesting outcome of the project has been the way in which we have used XML. Originally, we assumed that we would end up creating an XML repository of some sort to store the XML documents derived from the original MS Word documents. The repository would then become the 'authoritative source' for specification data. We looked at various technologies for creating an XML repository, and had selected Oracle 9i with its XML data-type as a good candidate to fulfill this role (we are already licensed for this). However, as the project has panned out, it has become clear that it is much more about work-flow and process control, and XML has been used as a format for exchanging data, rather than for storing it. The original MS Word documents, still in the document repository, have remained the authoritative source for the specifications, with the XML being re-generated as needed. The specification processing system polls the document store regularly and detects the addition of new documents, or new versions of existing documents (via WebDAV) in order to automatically process them. The important point is that the XML itself is transient.

This project has been an interesting one for a number of reasons. We have learned a good deal about the University's business processes (or lack of them!) in this area. We now know more than we ever wanted to about parsing MS Word files, and the limitations of MS Word as an authoring component in an integrated, enterprise-wide system. We also learned a good deal about the state of the art of forms technology.

Conclusions

This project has been a qualified success. The document conversion system has worked, albeit in a fairly inelegant way. The generation of XML has been a useful approach, and we have been able to use this to drive the population of external data-stores such as the web CMS and Student Record Systems. From early trials in the XCRI project, it has been possible to transform the XML into experimental RDF formats. However, it has not been possible to capture and auto generate the more complex, tabular relationships between courses and modules. The management of the work-flow process, from authoring to re-publishing via the quality validation process has also needed a great deal of manual intervention.

For future development, we should consider:

  • finding a better authoring tool (MS Office 2003/Infopath, OpenOffice.org or some intelligent forms technology
  • creating a data repository for this information (not necessarily document-centric)
  • introducing an automated, or semi-automated, work-flow system
  • adopting an industry standard XML format (XCRI)

Case kindly supplied by Paul Walk, Senior Web Services Developer, London Metropolitan University


Fife College

Fife's Marketing Department wished to have a single source of descriptive course information that could be published automatically on the web and imported into a DTP prospectus document for print publishing. They wished to edit this themselves, and it had to be in a future-proofed format that could be re-purposed for external course portal website harvesting/upload.

Background

The initial Prospectus, containing the course descriptions, was in Quark XPress format. An export into Word was supplied. Fife College and Glenrothes College were in the process of merging, and a new web site and prospectus would need to carry the new course information, which was in a state of flux (e.g. new course codes were being written).

Of note to those familiar with CDM, Fife College(s) refer to the highest abstract entity as a course, of which a program is an instance. This nomenclature is reversed in CDM. This initiative was mainly concerned with publishing the highest-level course information.

Solution

Fife set out to export the content of the Prospectus into an XML document of a type corresponding as closely as possible to an accepted industry standard, then build a Microsoft InfoPath form around this which would allow Marketing personnel to edit the content, and possibly draw further information from the College databases. The XML file could be simply saved on a webserver, and an ASP.NET web page could provide a browsing facility for users.

Approach

As the initial Word document was unstructured, a set of Word styles had to be created and applied, eventually arriving at the document that appears in the supporting case study documentation

  • Course DescriptionsREVISED02.doc

A number of ways of exporting the content were examined, including HTML export. Export to XML using an XSLT which turned the WordprocessingML into a subset of Norway's CDM was decided upon (partly because Word apparently didn't support nested styles natively, but WordprocessingML allowed section export, grouping elements together). Various unnecessary elements could be removed by switching SmartTags off (in Word 2003).

A cut-down (and somewhat butchered) CDM-lite schema:

  • cdmproginfopath.xsd
was radically altered partly to simplify the process of building a form and a transformation (bearing in mind that Marketing were only interested in a subset of functionality), and partly because InfoPath doesn't support all the W3C XSD schema functionality (recursive elements were a problem, for example). InfoPath does support 'rich text' (XHTML) editing fields, so this was chosen over the CDM infoBlockType.

The XSLT stylesheet:

was applied as the Word document was saved as XML. One missing feature of the stylesheet was a method of combining <ul><li></li></ul><ul><li></li></ul> individual unordered lists into <ul><li></li><li></li></ul>. This was accomplished after the fact by a simple text find-and-replace.

In the resultant XML document, the XML declaration and root element opening tag were replaced by the one generated by the InfoPath form for its XML documents:

<?xml version="1.0" encoding="UTF-8"?>
<?mso-infoPathSolution productVersion="11.0.6415" PIVersion="1.0.0.0" href="file:///C:\Documents%20and%20Settings\t-reddick\My%20Documents\Projects\XCRI\Forms\cdmreducedorgprog.xsn" name="urn:schemas-microsoft-com:office:infopath:cdmreducedorgprog:" language="en-gb" solutionVersion="1.0.0.43" ?>
<?mso-application progid="InfoPath.Document"?>
<CDM xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-06-28T14:08:18" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">

Note that, because the InfoPath form was unpublished at this stage, the processing instruction attribute href points to a local file; this could be changed for testing purposes to wherever the form is.

The InfoPath form itself:

  • cdmreducedorgprog.xsn
is an early design, with only one (editing) view, and could do with more work to provide search functionality, ODBC or web service links with back-end databases, and a help guide (an XHTML document which can be viewed in the task pane). Once the desired edits are made, the file is saved and placed on a webserver. The final version of this I have is:
  • CourseDescriptionsRevisedCDM20050629a.xml
This is capable of feeding a course browser web page, and much more (for instance, it should be possible to search accurately for potential future careers).

Conclusions

The XML schema, drastically cut-down from CDM, is a temporary measure, and will be replaced by a recommended one from the XCRI project in due course (although there's nothing to stop organizations from using multiple schemas for course publishing, particularly if documents can be automatically transformed between them).

The export from Word (via transforming WordProcessingML) proved complicated, although in some respects more reliable than via HTML export. I wouldn't recommend relying on this to allow users to edit in Word (even if you lock users down to pre-designed styles in Word 2003 > Tools > Protect Document > Formatting restrictions, although this may help if you've legacy Word documents).

The InfoPath form works, and should be user-friendly for non-technical editors, although this hasn't been tested, and the form needs more work (it's rather unwieldy due to the large amount of information in it, and isn't entirely user-friendly). It could be connected (via secondary data sources) to back-end databases.

Case study kindly supplied by Tavis Reddick, Publishing Technologist, Fife College

Created by stubbsy
Last modified 2005-08-01 06:22 PM
Funding Partner
JISC Distributed eLearning Strand
« May 2012 »
Su Mo Tu We Th Fr Sa
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Files and Documents
Implements services
Related ELF services
No files or documents.