Case Study: Modern PARADISEC

Status: LIVE

PARADISEC - Pacific And Regional Archive for Digital Sources in Endangered Cultures has been operating for 20 years and currently holds material in 1,350 languages across Australia and the Pacific. The archive contains over 210TB of content including more than 16,000 hours of audio recordings, 1,600 hours of video and 8,000 transcriptions. It is a facility that acts as an archive of research recordings as well as forming an integral part of the research workflow in which primary data is made citable, is preserved, and is publicised (with licence agreements) for access.

The Modern PARADISEC demonstrator, developed with previous funding from the ARDC, demonstrates the use of RO-Crate to describe the collections and items and store those items within an OCFL system. The demonstrator includes an elastic search service and a webserver but the key feature is that the it keeps working with only the filesystem and a webserver.

The main page showing the quantity of content, simple search capabilities and links to specific items.
Transcription search. The indexing tools developed know how to parse various linguistic transcription formats and index each segment for deep searching into transcriptions.
Advanced search capability enabling the user to assemble complex queries with a simple to use GUI builder.
PARADISEC serves a number of functions. It is a curated set of citable primary data (using doi). It continues to convert analog material (audio, video, text, images) into standard digital files. It provides the motivation and means for researchers to build, describe, and cite their research materials which would otherwise be inaccessible or lost. It publishes metadata using OAI-PMH for the Open Language Archives Community (OLAC) that is also harvested by RDA and Trove. It connects researchers and the communities they have worked in. As most of the materials in PARADISEC record traditional cultural expression (TCE) they have high significance to the source communities, who can now access it via the web. Because the collection is structured using standards, we are able to deliver it in various forms, appropriate to the recipient. One of these is on raspberry pi transmitters for local phone access. We have done this with the same microservicces we developed for the online platform.

PARADISEC’s access and storage have been developed over 20 years, and some parts are in need of renewal. It is also the model for other archives and we are currently advising a consortium in Japan and working actively with the colleagues at the University of French Polynesia to build an archive in Papeete.

Tools adapted in the PARADISEC system

  • Elan - Media transcription - XML output, microservice developed to play media and transcripts. Allows citation of points in media.
  • Fieldworks Language Explorer - Text output is structured text with interlinear annotations, dictionaries are structured and can be output as formatted documents or phone apps.
  • LaMeta – A tool for creating metadata in a form that can be imported into an archive.
  • CSV - metadata entry sheet using a simple row/column layout
  • Elan file viewer – provides researchers with a dashboard to see how much of a file has been transcribed Media players - linking transcripts to media (granular citation of media for research purposes)
  • OLAC data viewer – presents all aggregated metadata in a map view