Aug 032011

We are moving to the Roof Terrace!

But from the hackathon we already have something useful for you. A post from Mark MacGillivray showing his and Richard Jones entry for the Repofringe Developer Challenge. More info can be found in his blog post.

“This allows you to find out what is on at the fringe, whilst also checking what is on at RepoFringe, by embedding a search of the Fringe catalogue right here in the RepoFringe website! (May only work in Firefox tho…)”

 August 3, 2011  Posted by at 4:01 pm Live Blog Tagged with: , , ,  1 Response »
Aug 032011

Ian Stuart is chairing our first Pecha Kucha session. As usual you can vote for your favourite – and no eating the chocolate coins we’re using to vote!

In this session we will be hearing from:

  • Adrian Stevenson (UKOLN) – Linked Open Copac Archives Hub (LOCAH) project – use of Timemap for visualising linked data

I work on LOCAH which is part of #jiscexpo. The Archives Hub is a an aggregatiuon of archival descriptions, Copac is similar for library catalogies. We are exposing linked data for both of these and also creating a prototype visualisation – lots more on the blog. And see also linked data design issue resources.

We are linking to VIAF, to DBPedia, to the BBC etc. It’s important to the geotemporal side of stuff we are doing. We have linekd around name, location, people, subjects.

Archives Hub is already live:  – get your data here, it’s free CC0 data. There’s a SPRQL end point there as well as a more basic browser view. We are doing the same for Copac, it’s coming soon adn should be released about the beginning of September.

Visualisation prototype – several use cases – using tools like Simile, Many Eyes, Google Chart. So you can see timelines, maps etc.

We have a new project the Linking Lives Project – based around individuals much like BBC artist pages.

Key benefits of Linked Data? Has the potential to be a universal API in a way – you shouldn’t need to hand craft things though there is a challenge about matching things, particularly placenames etc. It’s quite a think to do this matching stuff. And sustainability wise the links to others work makes you vulnerable to losing your work. This guy caled Ed Summers put out the Library of Congress Subject Headings and it disappeared (though now resolved). Data modelling can be tricky especially can be complex. Licensing important but less of an issue these days. But linked data can make your repository work harder, find new channels in your data, and expose potentially hidden collections.


  • Sheila Fraser (EDINA) – Using OpenURL Activity Data

I have a challenge of talking about middleware and data, what kind of analogy can I use. I started with pizza (I’m a fan) as it gave me an idea. When I’m hungry I go to the web, I put in what I want, they figure out how to get that pizza made and at my door and that works really well. The OpenURL router is just like pizza. You go to it, it finds the right copy and brings you the full text back. Like my pizza supplier the OpenURL router stores some data so that I can get what I want as quickly as possible.

The log of all this data is hugely valuable but looked like a bit of mess – we have been turning our data into something much more usable. And to use that, for instance, to create recommendations for future reading. We did some work with our logs to prove that the data could help us find citations and further

What else can we find out? We can see that the busy periods of the year seem to be around the exam periods. Saturday is the quietest day of the week but then there is a weird peak of usage. We can have a good look through to find times when it may be appropriate to do system maintenance work for instance.


ODC PL – links to lots of others experiments

  • Jodie Double (Univ. Leeds) – RePosit

Last year Sarah Malloy talked about the beginning of this project. See also our blog on the project:

We are looking at how advocacy can effect engagement with repositories through connections to repositories, There are 5 HEI partners and 1 commercial partners. We al have a research management system and separate implementations with different numbers of repositories connected to our CRIS. We had to talk to a lot of stakeholders – that takes time so we are a little behind where we hoped to be but we do have more of our objectives achieved. We have a Google Group set up that you are welcome to join if you want to discuss CRIS issues.

These are community resources that we want others to use – materials that you can get out there. If life different with a connector? Not really but we connect in more places with more people. As faculty team went out to talk about the system we did find that easier login helps. we’ve stolen tag lines to help sell our repository. Open Access is the same no matter what the connector – whether a CRIS or your own stand alone system. Have the numbers increased? Those we did advocacy with did get better numbers. But we don’t really have carrot or stick – mandates would be important here. We have a survey out. Now that we offer a £100 Amazon voucher on our survey we suddenly have 300 responses – bribary works.

Please comment and join in discussion about our repositories.


Q1 – JAckie ? from the Repositories RP) You said numbers had not increased – was that of full text?

A1) Yes.

Q2 – Kevin Ashley, DCC) You were talking about excessive matches – the scale you were getting seemed reasonable to handle (not thousands). There are a fair few online services that use just the same level of match but they are still useful as humans can handle that short list of choices.

A2) Well four matches means a manual check. We’ve done some cheating here – we’ve taken that as a starting point. It still felt like a challenge. In some cases those matches may be more important/more difficult to sort between than others

Q3 – Peter Murray Rust) Glad to see that the author data is open. What percentage of the other data is open?

A3) It’s a small subset. We’d like the whole archive to be open. We have a stylesheet that we’d like to apply  but we think we should be able to make the whole thing CC0. Copac is the same actually. Initially RLUK were quite uncomfortable about us making their data open but they are fine with it now, it seems to have momentum now.

Q4) In the OpenURL Data how many years or months does it cover?

A4) It covers everything from the 1st April this year. We spent a lot of time looking at data privacy concerns. It’s hidden underneath, we needed permission for that data so we had that from the 1st April and that will be made available on an ongoing basis.

Q5 – Balvier Notay, JISC) Interesting use of SHERPA ROMEO in the symplectic system?

A5) It’s been an incredible journey actually. We get the added value that wasn’t in the system before. Email reminders to faculty saying ‘by the way you have a paper to deposit”.

A5) Richard, Symplectic: what’s been really interesting is that you can prove to repository managers what their potential deposits could be – realistics expectations of what could be collected in an ideal world.

  • Marie-Therese Gramstadt (VADS) – ‘Kultivating’ artistic research deposit

Kultivate is a JISC funded project under the Reposit strand. It came from the Kultur II Group which is a group of researchers adn repostory managers working around the arts. Once of the key issues is what is artistic research? It is quite a young subject. Another issue is describing the artistic researcher – e.g. the anonymous Carrot Collective. One of the barriers to deposit is terminology. When I say repository to people say “suppository”. And what is the alternative? Maybe archive but there are connotations around control and authority that can be negative.

But the repository can enable the collection of ephemera around exhibitions – such as an invitation. Some of the problems about artistic research is that it is an ongoing proces, they want to be able to edit. A Royal College of Art case study has looked at simplifying the process. There are also administrative barriers at present – that can be overcome – such as needing separate logins for repositories.

EPrints projects container – a wrapper for project items. Important as artistic research is often complex with multiple objects and as part of a project type process. Some depositors refused to include their content until the repository was customised for their artistic research.

One of the tips we picked up was to have high profile champions to your repository. The Goldsmiths case study also suggested tailoring advocacy for each departments. Researchers are not clear what the repository will be used for. We know it is for the REF, but some wondered if it was for performance review. Making links to personal researcher websites is also important to getting researchers to understand that the repository is a wone stop shop. Also highlighting improvement in Googlability has been crucial – one researcher said that when talking in the US you get Googled so having material visible there is a great driver.

  • Richard Jones (CottageLabs) – SWORD2

Ignore the slides – they are just for Ian!

I will talk about SWORD2 the process ( documents the protocol). This has been an awesome community project. I ran about asking people for feedback and comments. We took that away and turned our white paper into a JISCpress page – a great way to get commentary on your content paragraph by paragraph. We had people commenting from all over the world and that was very useful and fantastic and helped us see where we were going with SWORD2. At that point JISC funded us for the rest of the year to turn that proposal into a real thing. The advisory group is open – do join us! That group included many of the people that had commented on the white paper. A great blend of senior and technical people and we had a savage discussion that led to a technical paper and a business case. And we tore that apart too. And then we shared an alpha version, made available on the website with version control.

We have had developers from all the main repositories, we have programmers, we have lots of people developing different implementations of the same standard. So we have a massive amount of technology to let SWORD be used in contexts we haven’t even thought about yet. The current version is still only a beta – it was launched just before OR11 and led to whole new requests and use cases for SWORD. JISC have now given further funding to investigate data deposit aspects. We also have some money for client developments we have a call out to develop our for SWORD in your system.

We’ve worked hard to make this come out of the grassroots needs of this community. All who wanted to be involved can be involved. We’d like you to continue contributing to that and giving us feedback. I haven’t told you what SWORD is, that’s not so important.

That’s the last of our proper Pecha Kucha’s – for voting on – but this is a wee presentation of a similar overall time.

  • Extra talk: Charles Duncan (Intrallect) – Deposit from a mobile phone (PK with a difference!) *

This is about using SWORD as a standard to deposit via your mobile device – whether tablet or phone. So first an example with a video!

A pupil doing a school project on Mary Queen of Scots at the National Museum of Scotland. She’s taking pictures on her mobile and she can deposit her images on her phone. That could be images, sound, text etc. whatever you want to upload in this way. The whole thing is built into the system – so we take a picture outside the Intrallect office, share the image, and one of the ways on the list is IntraLibrary. You can see that you are asked for key information – Title, group, tags, etc. This is inormal metadata. Can then hit the magic Upload button. Now when you login to the repository – works on all IntraLibrary repositories – it has already grabbed the files name, size, etc. and it appears in your resource page – you can choose to automatically publish when you post stuff depending on your settings.

To get set up you need to have your username, password, and IntraLibrary URL (for SWORD deposit).

This is currently in beta. Only works on Android at present and supports IMS packaging – intended for learning object repositories. We are very interested in partnersing and projects and of course if anyone wants to test it.


Q1) Was wondering whether Intrallect is specific just to one type of repository or whether you could use it for any others?

A1) Currently it supports IMS packaging but other than that it could work with any IMS and SWORD compatible repository

Comment) Would be a great app to go forward for the call for SWORD 2 App development

Q2) For the culture presentation – you mentioned that sometimes repositories are quite hard to find, they have weird names – can’t you tell them to call their repository something obvious?

A2) Maybe we need a rebranding exercise but really comes down to advocacy

Q3 – Peter Burnhill, EDINA) Shameless plug: have you considered “put it in the Depot?”!

A3) We did have a workshop on terminology and we did try various phrases on them.

Q4 – Peter Murray Rust) If SWORD2 is awesome! why is there SWORD 3 or will there be?!

A4) Actually the SWORD2  is the fourth SWORD project. But we think SWORD2 covers all we think we’ll need so shouldn’t do a SWORD3 as we shouldn’t need to.

Q5 – Balviar Notay, JISC) HIghlight for Kultur project – there is a huge engaged community around the Kultivate project here. There is more work being done in the research project in terms of research papers. The project has done an amazing job of pulling in the community and engaging them in preservation, in terms of advocacy.

And with that we are done with the main bit of Day 1! Thanks to all who have followed the live blogging today. We will be back tomorrow and there’ll be tweets coming out of the drinks reception that we’re all heading to up on the lovely informatics roof terrace just now!

 August 3, 2011  Posted by at 3:30 pm Live Blog Tagged with: , ,  1 Response »
Aug 032011

I will be keeping occasional notes on my own Round Table session this afternoon – on Social Media and Repositories but we would welcome guest posts after the event on the other Round Tables.  Those running this afternoon are:

  • What Needs to be in a package when transferring into your repository? (Chair – Ian Stuart & Theo Andrew, EDINA)
  • Repositories and Linked Open Data (Chair – Adrian Stevenson, UKOLN)
  • Social Media & Repositories (Chair – Nicola Osborne, EDINA)

Full notes on the Social Media session will appear here this evening. For now we are back to live blogging.

Brief reports from round tables – Facilitator – Robin Rice

Linda Kerr on the Social Media round table

We started by going round the group to see what we were interested in. SOme were tweeting deposits, some just interested

Glasgow Enlighten has a tweet button for each paper, researchers can do that to raise their profile and to comment

Mostly we talked about raising the profile and promoting items through social media. We particularly talked about Twitter and the idea that impact could be demonstrated through that sort of activity. And we also talked about not mixing up automatic tweets with public engagement type tweets and materials. Also talked about researchers and their reluctance to uptake social media and the possibility to raise the profile of materials through social media. Also talked about tapping into social networks and communities – like those on Mendeley and in other social spaces.

Usage statistics – important to get feedback. If we are using repositories what do academics find useful. Links to social networking profiles – perhaps to a researcher profile page. A way in which a researcher can raise their profile in the community – and perhaps their amazon author profiles.

And we talked about Google Scholar Citations – William retweeted a link to blog post about this – a whole new community for researchers – is that a threat or an opportunity?

Theo Andrew on SWORD packages for repository deposits.

SWORD is a proticol for depositing content into repositories. We had a very focused chat on what kind of packages we need to actually put content from point A to point B. SWORD is very simple based on AtomPUB. We should use any extensions only very sparingly. We looked at the minimum data required for data transer – really a URL would be the basic minimum. We were very concerned with how do we enoucrage repositories to share especially when repositories all do their own customisations and have differing needs. We talked about standards – can be an answer but generally more of a problem. Negotiation would be a better way to handle this. Any services for transferring content can interrogate a repository for what it understands – what metadata fields. Particularly important for our Repository Junction project which will take data and place in a series of appropriate repositories. The broker in this sense makes a lot of sense – have a relationship with a single broker.

Peter Burnhill on the Linked Open Data round table

Our topic was Linked Data and repositories. In some sense we should have asked what can linked data do for repositories, and perhaps what can repositories do for linked data. Part of this issue was about whether linked data is for the metadata or for the object. In some sense objects are all different aside from having common forms of metadata.

Motivations for linked data. To some extent it’s more about the metadata – content is often in PDF form. There was talk of giving something a URI and have everything connect to that. Then we went off into why institutional repositories should be interested in Linked Data. Partly it was about making content more accessible, another channel if you like. Another interesting idea was that this was a way of putting repositories and their content on the linked data map. But then a debate about how to make a start. How to reach a base point in using linked data. Assigning URIs or publishing minted URIs could be the way to go. Fedora definitely does this. EPrints is doing. DSpace has it in sight to do. The URIs are already there, perhaps even for the metadata they are already there. Essentially it’s about assertions without trust – a big arguement that one should just do it. And that the value far outweighed the value that might be there. And for the authors, papers etc. Although names are messy, identifiers are less messy. Identifiers for organisations easier than the any time any place people.


Q1) What do you think about people that tweet publications or journals? Should closed materials be tweeted, should only open ones be tweeted. Surely leans to open acces smodel for greater impact.

A1) Put impact to one side, value even when non full text paper. Discovery element there. One of the things we need to consider there. Perhaps we distinguish between open access and non open access tweets. The question of Alt Metrics, tweets, etc. and impact is going to be more important. ANd the way that Google Scholar Citation works. REF will be looking at it at the narrative level of impact, not at the counts etc. Anecdotally tweeting impacts on rankings and searchability.

My comment: People do tweet about New York Times links. But there is an issue of expectation management here and we should distinguish news feeds or everything feeds from public engagement type content.

Q2 Balvier Notay, JISC) You were talking about usage statistics and I was wondering if anyone in the Social Media group mentioned the PIRUS2 project – aggregating statistics from repositories and publishers, normalising them and in harvesting statistics centrally they will do COUNTER compliance. We are looking at a statistics service type thing at a national level.

 August 3, 2011  Posted by at 2:06 pm Live Blog Tagged with: ,  1 Response »
Aug 032011

We are fresh from coffee and Philip Hunter has just introduced our first speaker in this session who is coming in live from Skype:

Thomas Krichel (Long Island University) – AuthorClaim (via Skype)

My co-author here is ? Wolfram is the Chrief Informaion Officer for Scholarly Information at Bielefeld University, they have run the BASE search engine since 2004. It’s not really attached to any one funded project but is a long run concern. I too am interested in running things over the long term. I run RePEc and have been involved in repositories since the early 1990s.

The motivation is to make (economics) papers freely available – the full text of those papers. To make information about the papers freely available, And to have self-sustaining infrastructure for these materials.

RePEc is misunderstood as a repository, actually it is a collection of around 1300 institutional (subject) repositories from libraries and research centres with specialist collections. It predates OAI, it is a reduced business model, more tightly interoperable. There are lots of sources of success. The business case is decentralised as much as possible, it runs on volunteer power, and RePEc encourages the reuse of RePEc data – we aggressively push out the data we have collected as we think this is in the best interest to those who have set up these repositories.

The RePEc technical case:

RePEc registers authors with the RePEc Author Service (RAS). We register institutions. And provide evaluative data for authors and institutions. So what is the relationship with repositories? Well it’s a bibliographic layer over repositories. IRs can/will benefit from a similar layer around them – a free bibliographic layer that places the IR in the wider context.The requiement for such a layer is that it is not dependent on external funding, it’s freely reusable instantaniously, and must be there for the long run.

A RePEc for all disciplines:

  • RePEc biblipgraphic data -> 3 lib
  • RePEc Author Service -> AuthorClaim
  • EDIRC -> ARIW – I won’t talk about this, it’s a topic for another day.

3lib is an initial attempt at building an aggregate of freely available bibliographic data, project by OLS sponsored by the Open Knowledge Foundation. The data elements are very simple as it is designed to not meet copyright issues and primarily be for author claiming: title; author name expressions; link to item page on provider site; identifier. 3lib is meant to serve AuthorClaim.

AuthorClaim is an authorship claiming service for 3lib data – Started the first author claiming system for RePEc in 1999. The system was set up by me and the system was written by Markus J. R. Klink. Author claiming is not the same thing as author identification. The difference is “Klink’s Problem”. The actual AuthorClaim data is CC0 licencsed and available as xml for reuse. The data on refused papers help the system to build learning models for author names.

IRs and author identification. Generally it’s too large to perform author identification for IRs. IRs are too small to make it meaningful for authors to clain papers in them directly though. Only registration of contributors is usually required. ORCID offers possibilities here but doing it for each publisher isn’t perfect. AthorClaim lets you put all papers by an author together and the task can be completely automated once an AuthorClaim record claims a paper in the IR. You have an incentive for people to actually claim their papers at first.

We have formed a partnership with BASE as they already have a centralized collection and can deliver the AuthorClaim data. Their constant monitoring of OAI-PMH world and they normalize data and provide an API, REST, SOAP and rsync for AuthorClaim. So the BASE data in AuthorClaim is selected by those which include author, title, link, identifier. AuthorClaim discards some IRs that contain student work, digitized old material, link collections, primary research data – though in principal it could be extended to data etc. There are also some minor manual exclusions (e.g. UK PubMed Central as already in PubMed). 

So far there are 1930 repositories and about 12 million records. About 534 records have been claimed in the system. Documentation is at: http://wotan/ – beware that this needs a little debugging. The collection is not yet announced because it is being read – some more time needed.

For more information contact myself ( or Wolfram (


Q1 – Peter Murray Rust) I congratulate you on what you’ve done. The key thing for repositories is to create this bibliographic overlay. It’s impossible to search repositories in the UK at present. Have all 1900 repositories been done by you – the analysis of the API etc. – or have you farmed out to volunteers?

A1) I’m not providing search services for repositories. I am working on a search service for authors. That’s a project called Author Profile (I spoke about this in Boston in June) – searching for authors, bringing their work together. I’m not doing searching at this time. We do have Google but we need element in repositories to be more available to search engines. The PageRank requires a more linked world – we need to bring in more links to items in repositories – an author profile used elsewhere will create in-bound links to the repository. These links will help the document to rise up in the search engine. So I’m not doing search particularly at this time. But we all need to work on different things. I’ll probably be doing this until the end of my working life but others will be working on search. We just all need to work together.

And with that Thomas is off to the (Siberian) beach! Next up…

Mo McRoberts (BBC Data Analyst) – BBC Digital Public Space project

I work on the BBC’s Digital Public Space Project. Three things you should know about the BBC:

  1. We like to do things big
  2. we like things where we have no idea what will come out of it
  3. We like silly names!

We are looking at ways to make the best use of the BBC archive. And we are trying to find how we should fit into the digital world. Last year we published the “Putting Quality First” BBC Strategy Review ( That review said we should open the archive to the public and to work with the British Library, BFI and Arts Council England to bring other public archives to a wider audience. My job is to see if this is technically possible and then how this could be done. This review went to the trust last year and the BBC Trust has approved the move to make the archive open to the public. So we have to but we don’t know when and we don’t know how – hence this project.

The BBC Archivehas 2.3m hours of film and video, 300k hours of audio, 4m photographs, 20k rolls of microfilm – it took us 2 years to find out the scale of it! There is also sheet music, ridiculous amounts of materials. A bit of it is digitised – 206k digitised radio programmes, 23k digitised tv programmes and an ongoing project to digitise this all – effectively a digital tape library. The underpinning mantra of this project is how do we maximise the value of this stuff?

A lot of the things we need to do here is not only important for the BBC but also to other archives of cultural heritage. Is YouTube part of the cultural heritage – that skateboarding cat might be a really important moment – but for now we are focusing on the well known institutions like BFI, Kew, NLS, LLGC NLW, National Archives, National Maritime Museum, Britial Library, Royal Opera House. So we thought why don’t we link these collections together. So we have been looking at how to make those journeys between materials work well. We don’t have long term internal funding but we are working in partnership and if we can demonstrate the potential of working then it could become something big and cool and useful. Right now it’s a tiny little thing that we hope will become big in the future.

Right now the technical bit!

All institutions contain catalogues of stuff best suited to archivists. Some points to physical assets, some to digital assets, some do not point to assets at all. We all deal with our data in very different ways. If we could express what we do in a common way, a way which allowed links between things and the assets, using a well known grammar, then we could probably do something quite interesting with that. So we are taking dumps of the participating institutions – there was no particular selection process by the way, the ones who gave us data fast are in – we are publishing RDF XML on a private server for each institution. That data pushes into a central aggregator. We make use of a single golden rule: “give everything a single permanent URI, and make the data about that thing accessile at that URI” (or rather you give your assertions about things a URI).

The aggregation is evaluated via straightforward logical process – are two things the same? – but also some heuristic stuff there – we build a full text index to mine and evaulate new material against. We use scoring of that evaluation to decide what is and is not the same. We also match the things to external sources – DBPedia Lite, GeoNames, FreeBase etc. We create a stub object. We are opening the archive to normal people – we rearrange the catalogues as they come in a bit. We break items into thing, person, place, event or collection. The stub object has a type (e.g. Person) and relationships to things it’s matched to (e.g. George Orwell). We deal with real world things rather than individual entries in the catalogue. We express relationships between Stubs and Source entities as Skos:exactMatch (or non exact matches). We also take any references and reflect them. We call these stub objects as they are just a reflection of the evaluation process. It’s a hard design constraint that whatever data goes in, you should be able to get it out again verbatim. We don’t need lots of data attached to the stub – enough to do top level browsing and indexing – we leave everything else in the source objects – and then you can follow your nose – which is why we have cached this data. If internet connectivity was better we wouldn’t even need to cache these.

Exciting! An actual stub object for the Republic of Brazil. The key things are that it’s a place, it’s taken on the types of the source data, and it has some references to DBPedia Lite and some Source Data (BBC News on this Day – which I cheekily scraped!). And from that data we can build some interesting interfaces. Building a user interface on top of that data is a doddle. In order to get people to build stuff you need to be able to get them to browse that data so we are building this for all resources. We are also building something called “Forage” – a search driven debugging tool to see the raw data and the relationships. And then we have the Digital Public Space interface that we commissioned a firm to produce for us – we asked them to produce something a bit left of field. They have a lot of experience of video aggregation. You’d think at the BBC we’d have lots of AV material for all our entries. We will but it’s not that easier. Getting anything internally is far harder than getting it externally from project partners. This will change over time but things don’t move quickly. So this interface combines our data with this companies existing video aggregation data.

There are a few hard constraints that we are trying to keep to. We want to maintain the provenance of everything. If the data is preserved but technology has changed massively, that you will still be able to do useful things with it. So we are looking at things like digitally signing the source data as it comes in – challenging in RDF – and we want it to be open to allcomers as a read and write database. Ultimately we want all partners to provide their own data and just link it together but that’s a way off for now. – a blog post here by me gives further information on the project


Q1) This is huge and awesome. Is there any chance of open sourcing the code?

A1) Yes, we will be open sourcing the code but we need to get to the end of this project, and we have some paperwork to do. We would like to open it up to the academic community within about 18 months – an actual running version. All of the metadata should be fine but how many of the assets will be open we are not quite sure. We are trying to find the right frameworks. The code should be open source in a fairly short space of time. As the author of it I have to say it’s not about to set the world alight.

Q2) Perhaps an unfair question. You’ve brought to our attention that the BBC defines a phenomenon of “The Public Space” and the “national interest”. This is a political move. In the sense that we are engaged in the same sort of activity and a public space rather than a private and owned space, how do our activities and do we start to recognise each other and work with each other…?

A2) It’s a difficult one. The edges are always fuzzy. We are getting better at it as well. We have been talking to the JISC and the OU in this project, also with the University of Westminster amongst others. We are not trying to draw a line in the sand about this being only arts and culture. We want this aimed and available to academics for research purposes. The BBC as an institution – I work in the Archive (part of BBC Vision) I also work with R&D and we like our research. We are very open to working with others. Perhaps the whole organisation doesn’t share that view now but it’s getting there. There is no choice but to engage with as many different interests as possible – for good and for bad. The academic community is a big and significant part of that though and that will only get bigger over time.

Ben Ryan (University of Leeds) – Timescapes Project

Timescapes is an ESRC funded project for 5 years looking at how family relationships change over time. I am the techncial officer for Timescapes and I’ll be talking about Timescapes Next Generation Archive – but we don’t have a second tranch of funding so we will deliver a proof of concept by the end of the project in early 2012.

We have been working with a product called Digital which is hosted by Leeds University. This platform sees all files as digital objects and doe snot allow modelling of complex structures of information and it’s inter-relationships. You can’t easily display connections and context around materials. We want to publish, archive and allow secondary research on data and that has huge challenges. We have been looking for solutions for social science longditudinal data storage and delivery.

We chose Fedora as it has a Content Model Architecture allowing the researcher to see connections and meaningful terms. And it allows multiple views onto the data. It allows the creation of content models – say we have an interview – is it anonimised? partly redacted? We have different levels of access to data so we need a flexible model that enables that. We also need to link data objects. Fedora allows us to link concepts, to set up our own relationships. It is all based on RDF triplestores and that is hugely powerful.

So our current archive shows the relationships between data on men as fathers (a particular study in the project), we can group material by interviewee, by waves of research, etc.

The services mentioned earlier are responsible for producing the views of relationships within the archive – these are built to suit the needs of the researcher. You can access whole groups of material or perhaps just case by case – both relying on who the viewer/reader is. We have flexibility there that allows us to differentiate between “types” of social science data such as DDI or QuDeX. You can’t just look at one object, we want to link internally, conceptually, thematically within the system.

SOLR is being used for searching and browsing – it’s off the shelf and easy to set up. It will look for data objects that have any of the search terms in pre-configured DISMAX metadata fields. We can set up custom searches really quickly for our researchers. We can also do advanced searchs and get these up and running fast. I am the only resource on this project so this has been a very fast way to build a nice system. We also use JQuery here. We have been using MIT SIMILE tools for faceted browsing and searching as well.

Another reason for Fedora is that is has XACML. It is crucial that we keep this data well protected, especially in it’s raw form. XACML lets us bring the policies from the repository right down to specific data object. Fedora manages this and that means we have a good reliability and audit trail around authentication and authorization.

So the systemm is based on three sources: DDI, QuDex, Timespaces. This is ingested, via an XSLT Transform, into Fedora via METS, We then connect up to multiple search and functionality elements and a PHP Web App that sits on the top.


Q1) Can you explain a bit about the benefits that you’ve seen – you described the subject, predicate, object model. Often people only find that useful when you combine data with lots of other systems. Presumably for your work you could have had a relational database instead – could you outline why this was useful? Is there an intention that the ESRC’s other projects might benefit from this?

A1) It was mainly because it was in Fedora. We could define our own topologies. We use the flexibility of the RDF to do our structural stuff. We could move into combining that with other data but we haven’t yet. We are working closely with UKDA about the use of these technologies, there are very close relationships and connections there.

Yvonne Howard (Southampton Univ.) – Campus ROAR

I work with Pat McSweeney and Andy Day at ECS at University of Southampton. We were looking at learning materials and we looked at EdSpare, Humbox, Language Box etc. But we started thinking beyond these repositories at scholarly discourse. Where does Scholarly Discourse take place? It was once about scholars in a big room where everyone knew what was new. It was easy to follow the discourse. That 19th century form didn’t change much until the mid twentieth century, perhaps until the internet.

What is scholarly discourse now? It’s websites, online journals, social media locations. It’s not just a small group in the room but conferences all over the world. And yes, you get the article but a lot of what happens is ephemeral. When people talk about their research at a conference it’s gone. When you see these slides, tweets, blogs, it disappears. It’s not connected anymore, it’s not all in that one room. One thing we know is that there is a lifecycle going on. Us research get inspired, it’s a dynamic process and so is the research at the heart of that discourse. So how can we start to support that within a scholarly discourse idea.

Mostly we think of repositories as being about archiving, storing and keeping material safe and permanent. But what if we had a research repository that captured some of that discourse. We would want to archive but also the data, the discource, the scholar and their presence. How do you showcase interesting research. Well we can syndicate new research, we can showcase researchers. We want to make things engaging.  So we hosted content as well as metadata, capturing discourse and commentary about it. And you have a community that highlights awareness. And we want to reuse what’s going on in the Web 2.0 world. We have new formats in place here – iPads and iPhones etc. We are extending the concept of the web/RSS feed – and we provide engaging magazine style produts. And this is based on content syndicated through RSS, Twitter etc.

But how do real repository users repond to the idea of using RSS. People see it as geeky. Take up of RSS from teaching and learning repositories was poor so we asked people why – it scares or seems unmanagable to users. But people seemed to like Twitter – what’s the difference? Well it’s easy to use and understand.

So Campus ROAR is an editorially mediated institutional publication – how do you make content available and capture that scholarly discourse – and tools for using that data. Cue a demo from Andy and Pat. We’ve been looking at making more digestible content form what we have in our repository. We have made an EPrints plugin (available in Bazaar for EPrints 3.3) that makes content customisable and digestible. You can build a custom feed for academic news in your area. A web spider crawls the university webspace, identifies the keywords, the user can input their own keywords and it outputs a custom feed – it filters content for you. See:

At the moment we have Edinburgh, Glasgow and Southampton campuses already crawled for today but we’re happy to add anyone here’s campus!

The feed is designed to look great on FlipPad on the iPad and in similar apps. We’ll be doing a Pecha Kucha in more detail as well! Go check out the website. The other part of Campus ROAR is the EPrints Publisher plugin.

Q1) Are you planning crawl more widely?

A1) At present we have 3 institutions included and we try to keep track of where the news is from. It’s brand brand new but we hope to be able to filter it down to specific campuses if you want to – for use by your comms team say. Worth noting that it takes time to crawl new universities so would take time to broaden out.

 August 3, 2011  Posted by at 10:45 am Live Blog Tagged with: ,  1 Response »
Aug 032011

Welcome and Introduction by Prof. Jeff Haywood, VP and CIO University of Edinburgh

Jeff spends quite a lot of time drifting around the edges of repositories and I get the sense of it being quite an interesting time in terms of repositories that hold publications, and that hold data. The very way of storing and translating data into academic publishing and also in the grey ways academics have traded their data in various ways. We used to openly trade within such domains but only between those who knew and trusted each other. There’s a step up now that is part of a general move towards open everything that will really challenge us in terms of long term sustainability at a sensible price.

It’s also interesting to see the number of events targetted at senior managers, those staff making policy decisions about how the university will act. We see examples of “open university x” as people make a concious decision to make their outputs available. There is a recognition that the open agenda is important, but there is also some hard action to be taken to work out how that is funded and sustainable. It was be good if we had a strong human and physical network across the UK and I know that JISC is doing some work on this. In Edinburgh we have done quite a lot of work around research data management and storage. We have spent some time making that process one process and defining roles for academics and support staff. We have work now to encourage and ensure compliance. My colleagues Robin Rice and Sheila Cannell will both be able to speak more

Two final events to announce. The Open Repositories 2012 is coming to Edinburgh next July. The annual international preservation conference from the DCC takes place in December in Bristol. Finally I do hope you will have time to see some of the Fringe whilst you’re here!

And with that Jeff hands over to Stuart Macdonald who is chairing our first show:

Opening Keynote: Eloy Rodrigues, Director of the University of Minho Documentation Services

Eloy has been heavily involved in repositories for some years and is currently working on the Open Access Science Repository, a project that began in 2008. Over to Eloy:

I will be talking mainly about the RCAAP project – Repositorio Cientifico de Acesso Aberto de Portugal

We started our first institutional repository – Repositorium – at Minho University in 2003, and our policy in 2005. In 2005 the activity on open access was still very limited in Portugal. Scielo already existed in South America and we set up a Portuguese section, we also set up the first conference on open access.

In November 2006 the Portuguese Rectors Council (CRUP) published a “Declaration on Open Access” and created a working group – there was real support for moving forward with this.

Before our RCAAP project started we had fewer than 10 repositories and fewer than ten thousand items across them. When the project began we aimed to set up a portal to promote the visibility of Portuguese research, improve access to national scientific output and to integrate our work into the international context. At this time there were very few university repositories so there was much to do.

UMIC – Knowledge Society Agency funded the project, also helping to govern the project were FCCN (general and infrastructure service), and the University of Minho (for the scientific and technical expertise for the project).

Although I will focus mainly on the policy and management of the repository I wanted to give some idea of how the infrastructure for the project is set up – we have two clusters and a database, some of this was set up in 2008, some is still being completed.

The RCAAP Portal is an OAI-PMH Harvester and Data Provider, a Search Interface, a Directory of repositories, and SARI.

Repositories are visible in google or they are not visible. So search interface useful but not so important.

We do a daily harvesting and indexing of the fulltext of every Portuguese Repository, we validate the results, and we send the harvesting repor to repository administrators on a daily basis. The OAI-PMH interface is also provided here:

Our search portal,, combines institutional reposiories, several national, and one Brazillian repository. We have a Search Portal where you can search either only for Portuguese or both Portuguese and Brazilian content. We have various search options, tag clouds, etc. When you search you see a result showing the title, linking to the fulltext in the repository, and authors link to current researchers information system – so you can find profiles for the authors. You can also share each item via social media.

We have a directory of repositories on the portal with pages on each repository, text about that repository, links to relevant information and icons showing the compliance of each repository with standards.

The last component is the validator which checks if the repository is compliant with the project asking requiremenets or rules. These are based on the draft guidelines (the second version was issued some months before this project began in 2008) and that offers some basic level of interoperability. So we check that each record has a title, author, URI, is in the correct language, provides rights, and a date.  You can find the validator:

If you have closed or embargoed access content in your repository or items that are not scholarly content, if those are more than 3% of what is in your repository, then we ask that those are not exposed to our harvester. 3% is fairly arbitrary but we knew we wanted the majority of materials to be open access. When our repository harvester goes to your repository it will go either to the driver set that you have specified or can harvest the whole collection – that’s why it’s important to provide drivers if you have more than 3% that is not scholarly or open access content.

An example validation report shows errors clearly and lets repository adminstrators test out what does and does not work.

SARI – Institutional Repository Hosting Service allows academic and research institutions free repository space on a SAAS model (Software As A Service) which is regulated by contract. We (RCAAP) house the data, provide infrastructure management, provide software management, training and helpdesk support. We also harvest the data automatically for the portal. The institution gets 1TB of storage, institutoinal branding and support in exchange for meeting a regular annual deposit target and complying with the rules of the project.

SARI funs on DSpace 1.6.2 + addons (stats, request copy, oaiextended, portuguese help, send to curricula Degois). Each institution is on the same code base, just installed locally for them.

You can’t tell on the RCAAP portal site who is on the hosting site and who is not. They are managed in an autonomous way and there are loads of external interface customisations. There is a free helpdesk by email and phone. There are 24 repositories hosted at present – three of those go live today!

On top of those 24 repositories we have a Common Repository – it was practical but also political as we didn’t want anyone to say we couldn’t have a national open access policy

We use OAIexteded Addon to create virtual sets for DRIVER, OpenAIRE, ETDs, and activates DIDL. Its based on Filters and Modifiers (such as dc.types, dc.rights, open access >> info:eu-repo/semantics/openAccess). It’s highly configurable and perhaps will be included in DSpace 1.8 in October? We would certainly be pleased to see that happen.

We also use the UMinho Stats add on which gathers and process information about repository usage on access, downloads, internal stats etc. There are different levels of analysis (repository, community, collection, document) and it is highly configurable.

We also have the Request Copy Addon and this is for the restricted access items. It sends an email to the person who deposited the item, requesting a copy. Connects you to the Author or Depositor who can reply in 2 clicks! They select either “send copy” or “don’t sent copy”. One thing we would like to change is that the depositor is the person who receives the request for the document, we want to alter this so they directly send to the author no matter who deposited.

We also have the Sharing Bar Addon. This allows visitors to share items on Social Networks and on Reference Management Tools (Endnote, Mendeley, Bibtex) and allows sending to DeGois, a Portuguese system. The DeGois Add on allows items to be sent to DeGois Curricula – the Portuguese Current Research Information System (CRIS) via SWORD.

A recent activity has been co-operation with Brasil. The Ministers for technology for both countries signed a memorandum of understanding. Mainly they agreed that there should be search portal interoperability, there should be an open access Portuguese-Brazilian Directory and there should be an annual conference.

Brasil and Portugal both aggregate national resources and we aggregate with each others’ content – we do this via OAIS. The Portuguese-Brazilian conference took place for the first time in Minho last year. This year it takes place in November in Rio de Janeiro in Brazil – there is still time to get your proposals in!

Disseminate, Advocacy, Networking and Training – we have created flyers, mousemats, we have created elearning materials and we have been very lucky to be featured on two national television channels giving us a chance to

We are finishing off a new website for RCAAP. You will find various modules there on open access, on the process, on copyright, etc. Well worth a look. We have also created several studies and documents including the State of the Art Report on Open Access in Portugal (2009) and two reports in 2010.

We set up 5 new repositories on SARI in 2008, now 24 and we have a total of 36 repositories in Portugal, We have gone from 1300 items to some 63700 now. The introduction of policies, mandates and toolkits have made a big difference here. Several Portuguese institutions have introduced mandates and policies in 2010 and we expect more to follow in 2011.

At present we are working on a pilot project on data repositories – this  is a small experiment for several institutions with datasets in local IRs. We hope to also have these on the national portal soon – we hope to let you filter for items, journals or data. We have different metadata schemas in use here and are using DSpace for this work again.

We are working on aggregated/centralized statistics (SCEUR workbench) for 36 repositories. We want to look at views, downloads and deposits. We want to create evolution data/charts and rankings. And we want to enable graphics customisation and embeddable charts around these statistics.

We are setting up a hosting service for OA journals (SARC) on the SaaS model.This one launches in September and this is based on OJS (version 2.3.3). We have one OJS instance for several journals (rather than one per journal). Highly cofigurable to accomodate different journal practices and brandings. We have selected 8 journals for 2011 and will have them in production by the end of the year.

We will do another State of the art report on digital preservation. And we are developing and progressing our collaboration with Brazil around repositories.


We think that RCAAP has been a successful project for various reasons. We have achieved our objectives. The growth indicators are positive, RCAAP has obtained national and internal visibility and we have increased the uptake of repositories in Portugal. We think this is down to having a real global and integrated vision here – we think this is particularly important for a country like Portugal but having awareness of what else happens outside of the country is still important for a country like the UK. Our governance model has been very successful – we have political, management and operational commitment here and it is based in centres of expertise. We are open to partnerships (e.g. Blimunda – translating SHERPA/ROMEO into Portuguese, Data Rep,, Brazil) . We have a service model that allows institutions to focus on their own core activities, and we also offer economies of scale through this model. We have a methodology for repository creation. From the first meeting with them to delivering a finished repository is less than 2 months now. And we have worked hard on community building – we include academics, researchers, libraries, the community at all levels. And we hope to host more repositories in the future building on these successes.


Q1) That was an impressive array of repositories. I was wondering if you have looking at author identifiers. There was some chat on Twitter about how small countries are doing better than bigger countries in some of these areas.

A1) That is an area we will consider for the workplan for the next year. We will try to get some partnership in Portugal on digital preservation. The issue of identifiers needs to be. There is an author identification system in Portugal but that is not interoperable with any other system. We aim to have identification schema that can be used and interoperable at a number of levels. We are interested and happy to co-operate and yes, we are following ORCID of course.

Q2 – Neive Brennan, Trinity College Dublin) It’s worth saying that Eloy has also been enormously helpful on compliance with standards like OpenAIRE for us in Ireland. Have you overcome that issue with OpenAIRE over provenance?

A2) No, the problem is identifying the provenance from the repository. For the time being we harvest directly from repositories. It should not be difficult to do technically but we have not defined yet how the problem should be addressed. It would be easier to aggregate one OAI aggregator but it’s not a problem. But we do want to go on and do it

Q3 – Vicki Picton, University of Northampton) Particularly interested in Journals hosting with OAJ. We use that at Northampton but combining that with  a repository is really challenging – it’s quite hard to advocate and promote and get that message across coherantly.

A3) Our main focus is on repositories. We don’t see it as a competing service. We decided to create the service – the project is named Remiunda after a woman with special powers to see what others cannot, which is an appropriate name for our project – it was after seeing if there was demand for this. We are converting some of those 8 journals from closed print runs to open access electronic journals. We have worked with journals, mainly from universities and scientific societies. We are helping them be open access friendly for repositories as well as for the journals. We will try to take advantage of that connection with the repository and the journals – we are helping them engage with the open access agenda. We want to at least get them to support self-archiving in repositories

Q4 – Les Carr, University of Southampton) A cheeky question: experience would say that technical infrastructures, software etc. but the real challenge to knowledge management is human. How does it feel to be in Portugal now you have all your technical problems taken care of: has that lightened the burdon for repository managers hugely? Or are repository users still your main problem?

A4) For many repositories the managers and librarians find things much easier now. We had 13 repositories when we started and 36 now. Hard to compare when you go from no repository to one. But those who already had a repository still host theirs locally. The migrated repositories it has helped them focus on users and promotion rather than technical issues – we have some successes on that front. But users can still be the problem and we have much to do to get content onto repositories.

 August 3, 2011  Posted by at 8:42 am Live Blog Tagged with: ,  Comments Off on LiveBlog: Welcome, Introduction & Opening Keynote