ILS Symosium: Alan Darnell
Welcoming the Prodigal Child: E-Resources and the OPAC, Alan Darnell, Scholars Portal Project
Scholars Portal
- repatriation of e-journal literature from publishers
- collection of 7500 journals and 10 million full-text articles
- local load of 130 abstracts and index databases representing over 150 million records
- interest in extending this model to ebooks
Why bother?
- archiving
- ease of access: single interface to find content
- capturing the conversation that occurs in scholarly research
The catalogue
- describes an important body of scholarly research and source material
- it’s absence is a huge gap in our effort to represent the scholarly conversation captured only partially in the electronic article literature
- mix of primary and secondary content
- historical coverage
- but the catalogue and e-journals (the tools we use to make e-journal content available) aren’t well intergrated
Back in the day (early 90s), OPACs were hot and journals were staid and boring. But then something happened – journals “left” (prodigal child!); Scholars Portal is focused on bringing back the journal content. Effort to make it modern, relevant, innovative, and user focus. Answer this question honestly: do you consider your OPAC to be a PC or a Mac?!
Scopus Lucene project
- Elsevier was interested in exploring Lucene to index its content (currently they use Fast) – Lucene is open source
- combine Scopus A&I content with full-text articles from Scholars Portal and XMLMARC records from the U of Toronto Catalogue
- Index them all under Lucene and what do you get?
Challenges, Benefits
- authority control vs. relevance ranking
- whole item vs. components (the OPAC is about not the chapter but the book itself)
- surrogate metadata vs. digital objects (electronic resources are true digital objects)
- open content vs. commercial content (electronic publishing, in the current context, has commercial value and needs to be protected with rights management)
Authority control
- important not only for searching but also, maybe more important, for clustering
- in electronic journals there is no consistency in recording author names – varies from journal to journal
- different vocabularies used by different publishers (sometimes only author supplied tags) – so subject access has never been great in ejournals
So how can we bridge the two?
- Scholars Universe (from CSA) is trying to bridge the gap.
- what if we could continue this by taking our authority records in our catalogues and applying it to ejournal content?
Leveraging the strength of the catalogue
- can we match articles and ebooks to print surrogates and then map vocabularies?
- can we see atomatic classification algorithms with authority terms?
Supplementing surrogate records
- many libraries use TOC, cover image, reviews (e.g. Syndetics) to supplement catalogue records – like eye candyafter you’ve gone thru the search process
- is there any way we can make this content supplementary access points? Like searching the reviews?
Elephant and mouse
- mixing surrpgate records with full-text digital objects creates complexities with relevance ranking
- word occurrence weighted against document length
- using traditional relevance ranking algorithms will favour less complete records
Commercial content
- OPACs are open to all
- if we integrate the content how do we make sure certain material is not available to unauthenticated users?
- move to a finer grained rights management when entitlements are complex (like Shibboleth)
- in the era of Google Scholar, can libraries begin pushing the envelope on public access to metadata from commercial sources?
Finding a common playing field
- do we load econtent into the OPAC or do we load OPAC records into econtent services?
– neither fits very well
– OPAC serves as both a resource discovery tool and an inventory tool
– both functions are necessary but not necessarily best combined into a common application
Liberating data from the OPAC
- XML encoding of MARC records and adoption of Unicode makes it easier to use these records in other contexts
- but do we need an XML schema that represents the object and not the cataogue record?
- provides ability to search and re-factor the content for different views to satisfy different information needs
How do we do this?
- search engine technology (e.g. Lucene)
- but also need structured content repository based on XML
XML databases
- emerging class of tech that allows for storage and querying of XML documents in native format
- Xquery allows for search
- Xquery allows for extraction of elements, refactoring these to create new documents, new views
Stupid xquery tricks!
- great demos and visuals for the rest of the session; I’m not sure if Alan is putting his slides online, but if he does, will let you know where they’re at.
1 Comment