The Florida Heritage Project
was the first statewide digital library initiative in Florida.
The Project, proposed in 1998 by the libraries of the State
University System of Florida (SUS) in partnership with the
Florida Center for Library Automation (FCLA) and the State
Library of Florida, intends to build an openly-accessible
collection of digital materials documenting the history
and culture of Florida from prehistoric times to the modern
day.
The Project is supported through a central fund created
by the directors of the SUS libraries. Most funds are redistributed
to libraries to reimburse the direct costs of digitizing
Florida Heritage materials. A small percentage is allocated
for graphics design, historical consulting, and other professional
services. The cost of selection, cataloging, and other
support activities are bourne by the individual libraries.
Image storage, retrieval and website maintenance are provided
by FCLA.
Fiscal management is provided through the Florida Center
for Library Automation. Ongoing project oversight is provided
by the Digitization Services Planning Committee, a standing
committee of the SUS libraries.
OVERVIEW AND WORKFLOW
Participating libraries select materials for inclusion
in the Florida Heritage Collection and contribute catalog
records for the digitized version to a central database.
The libraries perform or outsource the digitization and
create files of structural metadata describing the relation
of images to logical parts of the resource. The structural
metadata record and the set of images for each resource
is transmitted to FCLA, where the data is loaded into a
DB2 application on a central Unix server. Identifiers which
serve the function of persistent URLs pointing to the DB2
application are inserted into the catalog records, which
are used for name and topical access to the electronic
resources.
IMAGE CAPTURE AND CONVERSION
Participating libraries are responsible for digitization
of selected materials from their own collections. Each
library may perform its own digitization, or contract with
a vendor or with another SUS library for digitization services.
Image capture must adhere to the standards promulgated
by the Cornell Department of Preservation and Conservation
(see Digital Imaging for Library and Archives, Kenny and
Chapman, 1996). A Quality Index of 5 or better for visual
images is required.
Three types of images are created for all textual materials
in the collection: TIFF, JPEG and PDF. A TIFF and JPEG
image is created for every page; related sets of pages
(e.g. chapters or articles) are bundled into PDF files.
In the first year of the project, participating libraries
created TIFF images and submitted them to FCLA, which subsequently
created PDF and JPEG derivatives.
TIFF images are created as the direct result of scanning
source materials (that is, as the native file format),
using a variety of scanning hardware, primarily flat-bed
scanners. TIFFs are archived as uncompressed electronic
masters. Bit-depth is appropriate to the source and its
anticipated use, and may be bitonal, 8-bit grey, 24-bit
color, or greater. Color images are created and maintained
in the sRGB color-space. Both grey and color images are
calibrated and scanned to within the tolerances promulgated
by the Library of Congress for the American Memory project.
Images created from microfilmed sources reflect the quality
of the source microfilm.
TIFF images are used to create JPEG derivatives using
Adobe ImageReady Version 2.0 in a batch executable process.
The TIFF image is resized setting the width to 600 pixels
and the height accordingly. The process then progressively
optimizes the image creating an image that displays progressively
in a Web browser. The image will display as a series of
overlays, enabling viewers to see a low-resolution version
of the image before it downloads completely.
Creation of PDF files is a function performed by the
locally written Florida Heritage loader software. The loader
calls LeadTools custom ActiveX control to open sets of
JPEG images, and then uses Thomas Mertz's PDFLib software
to build the PDF.
Text-based versions, whether encapsulated with PDF, HTML
or other mark-up, are produced either by re-keying from
source documents or by optical character recognition (OCR)
of TIFF images. A minimal accuracy rate of 99.995% is required.
RESOURCE DESCRIPTION -- CATALOGING
Participating libraries are responsible for creating
full MARC catalog records for selected materials from their
own collections. Cataloging records are maintained in a
union database of all Florida Heritage materials at FCLA
and are also contributed to the OCLC WorldCat.
Cataloging is expected to adhere to guidelines developed
by the Technical Services Planning Committee Cataloging
and Access Guidelines for Electronic Resources (CAGER).
The guidelines specify that records should represent the
electronic versions only, and include specific instructions
to:
* Put the date of the original in Fixed Field Date1,
the date of digitization in Date2, and use Form of Reproduction "s";
* Include a title (245) subfield h to indicate the resource
is electronic;
* Specify the digitizing institution and date of digitization
in the imprint (260);
* Include a series statement (830) for the Florida Heritage
Project, justified by a general note (500);
* Use an original version note (534) to record the location
of and publication information for the source document.
Catalog records also contain a target audience note (521)
indicating the grade level of the material according to
the Florida State Department of Education Sunshine State
Standards (FDOESS).
Each record should also contain at least one Florida
Heritage Timeline heading from the Florida History Timeline
added as a geographic subject heading (651).
Complete MARC cataloging instructions can be found in
the CAGER Guidelines.
RESOURCE DESCRIPTION -- STRUCTURAL METADATA
A file of structural metadata is created for every document
to indicate the relationship between the physical units
of digitization (TIFF, JPEG and other images) and the logical
units of publication (pages, chapters, and other parts).
The metadata format used is a modified version of the Elsevier
EFFECT format called DataSet.TOC.
For each electronic resource (book volume, journal issue,
manuscript, etc.), the DataSet.Toc file:
* identifies and names the image files comprising the
resource,
* defines the order of images,
* identifies and names the subsections (such as chapters),
* says which images belong to particular subsections,
* and establishes the order and hierarchy of subsections.
IMAGE LOADING, STORAGE and NAVIGATION
For each volume that is digitized, a directory containing
one DataSet.TOC file and a set of images is sent by FTP
from the contributing institution to FCLA. The metadata
and images are processed by a locally written loader, which
first checks that all the image files referenced by the
DataSet.TOC are present, copies the images into a Florida
Heritage directory, and loads the structural metadata into
DB2 tables maintained on a Unix server. If instructed,
the loader will also create derivative formats such as
PDF files.
Once structural metadata is loaded and images are moved
to the appropriate directories, access and navigation is
provided by another locally written DB2 server program.
Persistent URLs referencing the server application are
created by program and inserted into the bibliographic
record describing the resource.
RETRIEVAL
The cataloging records describing Florida Heritage resources
are loaded into a shared central library management system,
a locally developed application based on NOTIS, on an IBM
mainframe. The records can be searched through the SUS
Libraries' online catalog application, WebLUIS. All traditional
catalog access points are available (author, title, subject,
etc.) as well as Florida Heritage Timeline headings and
grade level from the Sunshine State Standards categories.
Once records are retrieved, the URLs in the bibliographic
record are used as hotlinks to the DB2 server application,
which initially presents a Table of Contents display.
FUTURE DIRECTIONS -- PLANS FOR YEAR 2000
Participating libraries will continue to contribute materials
to Florida Heritage. Funding has been provided for the
digitization of approximately 50,000 additional pages by
July 1, 2000.
A Panel
for the Identification of Florida Heritage Resources will be formed to advise the libraries on selection of
materials for digitization.
The Florida
History Timeline will be fully developed
to include narrative information and links to digitized
materials for all Timeline headings. A thematic index to
identify topics that cross Timeline categories will be
developed.
The project will develop the capability of storing ASCII
text obtained by performing Optical Character Recognition
(OCR) on textual image files. This "dirty ASCII" will
be used for full text retrieval of the documents.
The format for contributing structural metadata will
be changed from the current modified EFFECT format to an
XML-based structure. |