JISC depoST Meetup
Yesterday was the JISC Deposit Show & Tell meetup in London (#depoST). The idea was to bring everyone together who had a vested interest or active development in a tool for depositing to an online repository (such as EPrints or DSpace etc).
The day started with 5 minute lightening talks from attendees, here are summaries of some of the most interesting/relevant to our desired use case:
Drag & Drop Deposit Tool
This was our own prototype which we demonstrated at the event. We have tried to take a strong end user perspective to try and design a simple tool that reduces the workload involved with submitting to an Institutional Repository (IR).
For this we have made the following basic assumptions:
- Academic Researchers are busy people
- They may not be technical users
- They are unlikely to ‘markup’ their documents
The current submission process requires far more work that is necessary to get a paper into an IR:
- Open web browser & browse to repository
- Log in
- Fill in metadata (EPrints has 12 compulsory fields)
- Upload files
We hypothesise that alot of the metadata required can actually be found within the document that you are uploading and created a small desktop client in Cocoa to demonstrate this.
The ideal end user experience is being able to drag a number of PDF’s onto the ‘droplet’ application and then be presented with a report. For the most case the user would just have to accept the report but there would also be the facility to make adjustments should the PDF extraction have made a mistake.
The current state of the prototype is such that only PDF extraction is done, were the project to be funded we would like to interact with the following existing projects to provide an end to end solution:
SWORD is a lightweight protocol for depositing content from one location to another. It stands for Simple Web-service Offering Repository Deposit and is a profile of the Atom Publishing Protocol
We would use this for the actual depositing to the repository. Also of interest (although not for our implementation) is the PHP library for SWORD which was mentioned on the events twitter stream: http://github.com/stuartlewis/swordapp-php-library
The Names Project
The project scoped the requirements of UK institutional and subject repositories for a service that will reliably and uniquely identify individuals and institutions.
More information on Names below. We would also like to make the suggestion of names more contextual, by prioritising results for researchers that the academic had previously co-authored with.
This was not strictly an academic repository deposit tool but was one of the most interesting technologies on show. Mendeley is a VC funded startup with investors from the backers behind Skype and last.fm. Mendeley is a combination of an iTunes-esque desktop client with a Last.fm styled website. Researchers use the desktop client to organise their research, annotating PDF’s, sorting, tagging etc. It is also possible to automatically generate bibliographic references that can be pushed straight to Word.
Much like the Audioscrobbler idea, all research and additional metadata is uploaded to a centralised service. This allows the researcher to access their library from the web but more interestingly allows Mendeley to start to extract research trends and in the future introduce researchers based on a profile that they have built up about them. The idea of co-author discovery is a known problem to us personally and hopefully this will go along way to creating a solution!
Jan Reichelt, who presented, stated that whilst not currently available, a full API was on its way!
EPrints WebDAV support
http://files.eprints.org/451/ (Video Walkthrough)
This solution was presented by the EPrints development team and offered a WebDAV or FTP mountable drive directly into an EPrints 3.2 install. The solution offered a browsable folder hierarchy with folders for each year, drilling down to an a folder to represent an EPrint (named via its unique numerical ID – not terribly user friendly).
On the deposit side, a user would browse to the ‘Inbox’ folder and deposit their documents there. Although the title of the deposit could be given to the folder created inside the inbox, no other metadata was provided automatically and the user was forced back to the standard EPrints web interface to complete the submission. It was suggested that the user might upload a Dublin Core XML file with all the metadata into the inbox folder but this seems altogether unlikely!
This approach seemed to offer a useful alternative for technical users as it produces a mountable view of the repository within an OS, an example given was the ability to GREP over all files in a repository. This however is not functionality that would benefit a non-technical academic and as such does not solve the issue for the primary end user!
There are, in my experience, also some issues with the WebDAV protocol:
- Lack of user feedback to errors
- Connections are slow and not particularly robust
Word SWORD Plugin
This was a plugin for the Windows version of Word which provided direct deposit access to an Institutional Repository. This was one of the few projects that had a working solution that could be downloaded today! From the demonstration it was however clear that the code was early in development and that steps would need to be made to improve userbuility.
One of the noteworthy features was the ability to support client side validation templates for repository specific requirements. Perhaps this will be a feature of SWORD 3?
Names 2 Project
The Names project is again not strictly concerned with depositing to repositories, instead it is a service which aims to disambiguate authors. This is a service which will be invaluable for any deposit tool/solution!
The Names project is something we have been keeping a close eye on for a while as the issue of author disambiguation is a reoccurring problem within a number of our projects. The first phase of the project ended in August with no publically available API so it was great to hear that they had recieved futher funding to continue the work and even better to hear that a public beta API was available:
The Names project currently uses data from Zetoc, British Library and contextual information from research documents to build a database of all UK research authors. Zetoc is quoted to contain over 10million authors, whilst the Library of Congress named authority file is said to have 6-7million unique authors.
The progress is promising but on initial tests authors found in the Zetoc web service were not found by the Names prototype.