Aug 032011
 

I will be keeping occasional notes on my own Round Table session this afternoon – on Social Media and Repositories but we would welcome guest posts after the event on the other Round Tables.  Those running this afternoon are:

  • What Needs to be in a package when transferring into your repository? (Chair – Ian Stuart & Theo Andrew, EDINA)
  • Repositories and Linked Open Data (Chair – Adrian Stevenson, UKOLN)
  • Social Media & Repositories (Chair – Nicola Osborne, EDINA)

Full notes on the Social Media session will appear here this evening. For now we are back to live blogging.

Brief reports from round tables – Facilitator – Robin Rice

Linda Kerr on the Social Media round table

We started by going round the group to see what we were interested in. SOme were tweeting deposits, some just interested

Glasgow Enlighten has a tweet button for each paper, researchers can do that to raise their profile and to comment

Mostly we talked about raising the profile and promoting items through social media. We particularly talked about Twitter and the idea that impact could be demonstrated through that sort of activity. And we also talked about not mixing up automatic tweets with public engagement type tweets and materials. Also talked about researchers and their reluctance to uptake social media and the possibility to raise the profile of materials through social media. Also talked about tapping into social networks and communities – like those on Mendeley and in other social spaces.

Usage statistics – important to get feedback. If we are using repositories what do academics find useful. Links to social networking profiles – perhaps to a researcher profile page. A way in which a researcher can raise their profile in the community – and perhaps their amazon author profiles.

And we talked about Google Scholar Citations – William retweeted a link to blog post about this – a whole new community for researchers – is that a threat or an opportunity?

Theo Andrew on SWORD packages for repository deposits.

SWORD is a proticol for depositing content into repositories. We had a very focused chat on what kind of packages we need to actually put content from point A to point B. SWORD is very simple based on AtomPUB. We should use any extensions only very sparingly. We looked at the minimum data required for data transer – really a URL would be the basic minimum. We were very concerned with how do we enoucrage repositories to share especially when repositories all do their own customisations and have differing needs. We talked about standards – can be an answer but generally more of a problem. Negotiation would be a better way to handle this. Any services for transferring content can interrogate a repository for what it understands – what metadata fields. Particularly important for our Repository Junction project which will take data and place in a series of appropriate repositories. The broker in this sense makes a lot of sense – have a relationship with a single broker.

Peter Burnhill on the Linked Open Data round table

Our topic was Linked Data and repositories. In some sense we should have asked what can linked data do for repositories, and perhaps what can repositories do for linked data. Part of this issue was about whether linked data is for the metadata or for the object. In some sense objects are all different aside from having common forms of metadata.

Motivations for linked data. To some extent it’s more about the metadata – content is often in PDF form. There was talk of giving something a URI and have everything connect to that. Then we went off into why institutional repositories should be interested in Linked Data. Partly it was about making content more accessible, another channel if you like. Another interesting idea was that this was a way of putting repositories and their content on the linked data map. But then a debate about how to make a start. How to reach a base point in using linked data. Assigning URIs or publishing minted URIs could be the way to go. Fedora definitely does this. EPrints is doing. DSpace has it in sight to do. The URIs are already there, perhaps even for the metadata they are already there. Essentially it’s about assertions without trust – a big arguement that one should just do it. And that the value far outweighed the value that might be there. And for the authors, papers etc. Although names are messy, identifiers are less messy. Identifiers for organisations easier than the any time any place people.

Q&A

Q1) What do you think about people that tweet publications or journals? Should closed materials be tweeted, should only open ones be tweeted. Surely leans to open acces smodel for greater impact.

A1) Put impact to one side, value even when non full text paper. Discovery element there. One of the things we need to consider there. Perhaps we distinguish between open access and non open access tweets. The question of Alt Metrics, tweets, etc. and impact is going to be more important. ANd the way that Google Scholar Citation works. REF will be looking at it at the narrative level of impact, not at the counts etc. Anecdotally tweeting impacts on rankings and searchability.

My comment: People do tweet about New York Times links. But there is an issue of expectation management here and we should distinguish news feeds or everything feeds from public engagement type content.

Q2 Balvier Notay, JISC) You were talking about usage statistics and I was wondering if anyone in the Social Media group mentioned the PIRUS2 project – aggregating statistics from repositories and publishers, normalising them and in harvesting statistics centrally they will do COUNTER compliance. We are looking at a statistics service type thing at a national level.

Share
 August 3, 2011  Posted by at 2:06 pm Live Blog Tagged with: ,  1 Response »
Aug 032011
 

We are fresh from coffee and Philip Hunter has just introduced our first speaker in this session who is coming in live from Skype:

Thomas Krichel (Long Island University) – AuthorClaim (via Skype)

My co-author here is ? Wolfram is the Chrief Informaion Officer for Scholarly Information at Bielefeld University, they have run the BASE search engine since 2004. It’s not really attached to any one funded project but is a long run concern. I too am interested in running things over the long term. I run RePEc and have been involved in repositories since the early 1990s.

The motivation is to make (economics) papers freely available – the full text of those papers. To make information about the papers freely available, And to have self-sustaining infrastructure for these materials.

RePEc is misunderstood as a repository, actually it is a collection of around 1300 institutional (subject) repositories from libraries and research centres with specialist collections. It predates OAI, it is a reduced business model, more tightly interoperable. There are lots of sources of success. The business case is decentralised as much as possible, it runs on volunteer power, and RePEc encourages the reuse of RePEc data – we aggressively push out the data we have collected as we think this is in the best interest to those who have set up these repositories.

The RePEc technical case:

RePEc registers authors with the RePEc Author Service (RAS). We register institutions. And provide evaluative data for authors and institutions. So what is the relationship with repositories? Well it’s a bibliographic layer over repositories. IRs can/will benefit from a similar layer around them – a free bibliographic layer that places the IR in the wider context.The requiement for such a layer is that it is not dependent on external funding, it’s freely reusable instantaniously, and must be there for the long run.

A RePEc for all disciplines:

  • RePEc biblipgraphic data -> 3 lib
  • RePEc Author Service -> AuthorClaim
  • EDIRC -> ARIW – I won’t talk about this, it’s a topic for another day.

3lib is an initial attempt at building an aggregate of freely available bibliographic data, project by OLS sponsored by the Open Knowledge Foundation. The data elements are very simple as it is designed to not meet copyright issues and primarily be for author claiming: title; author name expressions; link to item page on provider site; identifier. 3lib is meant to serve AuthorClaim.

AuthorClaim is an authorship claiming service for 3lib data – http://authorclaim.org/. Started the first author claiming system for RePEc in 1999. The system was set up by me and the system was written by Markus J. R. Klink. Author claiming is not the same thing as author identification. The difference is “Klink’s Problem”. The actual AuthorClaim data is CC0 licencsed and available as xml for reuse. The data on refused papers help the system to build learning models for author names.

IRs and author identification. Generally it’s too large to perform author identification for IRs. IRs are too small to make it meaningful for authors to clain papers in them directly though. Only registration of contributors is usually required. ORCID offers possibilities here but doing it for each publisher isn’t perfect. AthorClaim lets you put all papers by an author together and the task can be completely automated once an AuthorClaim record claims a paper in the IR. You have an incentive for people to actually claim their papers at first.

We have formed a partnership with BASE as they already have a centralized collection and can deliver the AuthorClaim data. Their constant monitoring of OAI-PMH world and they normalize data and provide an API, REST, SOAP and rsync for AuthorClaim. So the BASE data in AuthorClaim is selected by those which include author, title, link, identifier. AuthorClaim discards some IRs that contain student work, digitized old material, link collections, primary research data – though in principal it could be extended to data etc. There are also some minor manual exclusions (e.g. UK PubMed Central as already in PubMed). 

So far there are 1930 repositories and about 12 million records. About 534 records have been claimed in the system. Documentation is at: http://wotan/liu.edu/base/ – beware that this needs a little debugging. The collection is not yet announced because it is being read – some more time needed.

For more information contact myself (krichel@openlib.org) or Wolfram (whorstma@uni-bielefeld.de).

Q&A

Q1 – Peter Murray Rust) I congratulate you on what you’ve done. The key thing for repositories is to create this bibliographic overlay. It’s impossible to search repositories in the UK at present. Have all 1900 repositories been done by you – the analysis of the API etc. – or have you farmed out to volunteers?

A1) I’m not providing search services for repositories. I am working on a search service for authors. That’s a project called Author Profile (I spoke about this in Boston in June) – searching for authors, bringing their work together. I’m not doing searching at this time. We do have Google but we need element in repositories to be more available to search engines. The PageRank requires a more linked world – we need to bring in more links to items in repositories – an author profile used elsewhere will create in-bound links to the repository. These links will help the document to rise up in the search engine. So I’m not doing search particularly at this time. But we all need to work on different things. I’ll probably be doing this until the end of my working life but others will be working on search. We just all need to work together.

And with that Thomas is off to the (Siberian) beach! Next up…

Mo McRoberts (BBC Data Analyst) – BBC Digital Public Space project

I work on the BBC’s Digital Public Space Project. Three things you should know about the BBC:

  1. We like to do things big
  2. we like things where we have no idea what will come out of it
  3. We like silly names!

We are looking at ways to make the best use of the BBC archive. And we are trying to find how we should fit into the digital world. Last year we published the “Putting Quality First” BBC Strategy Review (http://bbc.in/strategyreview). That review said we should open the archive to the public and to work with the British Library, BFI and Arts Council England to bring other public archives to a wider audience. My job is to see if this is technically possible and then how this could be done. This review went to the trust last year and the BBC Trust has approved the move to make the archive open to the public. So we have to but we don’t know when and we don’t know how – hence this project.

The BBC Archivehas 2.3m hours of film and video, 300k hours of audio, 4m photographs, 20k rolls of microfilm – it took us 2 years to find out the scale of it! There is also sheet music, ridiculous amounts of materials. A bit of it is digitised – 206k digitised radio programmes, 23k digitised tv programmes and an ongoing project to digitise this all – effectively a digital tape library. The underpinning mantra of this project is how do we maximise the value of this stuff?

A lot of the things we need to do here is not only important for the BBC but also to other archives of cultural heritage. Is YouTube part of the cultural heritage – that skateboarding cat might be a really important moment – but for now we are focusing on the well known institutions like BFI, Kew, NLS, LLGC NLW, National Archives, National Maritime Museum, Britial Library, Royal Opera House. So we thought why don’t we link these collections together. So we have been looking at how to make those journeys between materials work well. We don’t have long term internal funding but we are working in partnership and if we can demonstrate the potential of working then it could become something big and cool and useful. Right now it’s a tiny little thing that we hope will become big in the future.

Right now the technical bit!

All institutions contain catalogues of stuff best suited to archivists. Some points to physical assets, some to digital assets, some do not point to assets at all. We all deal with our data in very different ways. If we could express what we do in a common way, a way which allowed links between things and the assets, using a well known grammar, then we could probably do something quite interesting with that. So we are taking dumps of the participating institutions – there was no particular selection process by the way, the ones who gave us data fast are in – we are publishing RDF XML on a private server for each institution. That data pushes into a central aggregator. We make use of a single golden rule: “give everything a single permanent URI, and make the data about that thing accessile at that URI” (or rather you give your assertions about things a URI).

The aggregation is evaluated via straightforward logical process – are two things the same? – but also some heuristic stuff there – we build a full text index to mine and evaulate new material against. We use scoring of that evaluation to decide what is and is not the same. We also match the things to external sources – DBPedia Lite, GeoNames, FreeBase etc. We create a stub object. We are opening the archive to normal people – we rearrange the catalogues as they come in a bit. We break items into thing, person, place, event or collection. The stub object has a type (e.g. Person) and relationships to things it’s matched to (e.g. George Orwell). We deal with real world things rather than individual entries in the catalogue. We express relationships between Stubs and Source entities as Skos:exactMatch (or non exact matches). We also take any references and reflect them. We call these stub objects as they are just a reflection of the evaluation process. It’s a hard design constraint that whatever data goes in, you should be able to get it out again verbatim. We don’t need lots of data attached to the stub – enough to do top level browsing and indexing – we leave everything else in the source objects – and then you can follow your nose – which is why we have cached this data. If internet connectivity was better we wouldn’t even need to cache these.

Exciting! An actual stub object for the Republic of Brazil. The key things are that it’s a place, it’s taken on the types of the source data, and it has some references to DBPedia Lite and some Source Data (BBC News on this Day – which I cheekily scraped!). And from that data we can build some interesting interfaces. Building a user interface on top of that data is a doddle. In order to get people to build stuff you need to be able to get them to browse that data so we are building this for all resources. We are also building something called “Forage” – a search driven debugging tool to see the raw data and the relationships. And then we have the Digital Public Space interface that we commissioned a firm to produce for us – we asked them to produce something a bit left of field. They have a lot of experience of video aggregation. You’d think at the BBC we’d have lots of AV material for all our entries. We will but it’s not that easier. Getting anything internally is far harder than getting it externally from project partners. This will change over time but things don’t move quickly. So this interface combines our data with this companies existing video aggregation data.

There are a few hard constraints that we are trying to keep to. We want to maintain the provenance of everything. If the data is preserved but technology has changed massively, that you will still be able to do useful things with it. So we are looking at things like digitally signing the source data as it comes in – challenging in RDF – and we want it to be open to allcomers as a read and write database. Ultimately we want all partners to provide their own data and just link it together but that’s a way off for now.

http://bbc.in/dpsblog – a blog post here by me gives further information on the project

Q&A

Q1) This is huge and awesome. Is there any chance of open sourcing the code?

A1) Yes, we will be open sourcing the code but we need to get to the end of this project, and we have some paperwork to do. We would like to open it up to the academic community within about 18 months – an actual running version. All of the metadata should be fine but how many of the assets will be open we are not quite sure. We are trying to find the right frameworks. The code should be open source in a fairly short space of time. As the author of it I have to say it’s not about to set the world alight.

Q2) Perhaps an unfair question. You’ve brought to our attention that the BBC defines a phenomenon of “The Public Space” and the “national interest”. This is a political move. In the sense that we are engaged in the same sort of activity and a public space rather than a private and owned space, how do our activities and do we start to recognise each other and work with each other…?

A2) It’s a difficult one. The edges are always fuzzy. We are getting better at it as well. We have been talking to the JISC and the OU in this project, also with the University of Westminster amongst others. We are not trying to draw a line in the sand about this being only arts and culture. We want this aimed and available to academics for research purposes. The BBC as an institution – I work in the Archive (part of BBC Vision) I also work with R&D and we like our research. We are very open to working with others. Perhaps the whole organisation doesn’t share that view now but it’s getting there. There is no choice but to engage with as many different interests as possible – for good and for bad. The academic community is a big and significant part of that though and that will only get bigger over time.

Ben Ryan (University of Leeds) – Timescapes Project

Timescapes is an ESRC funded project for 5 years looking at how family relationships change over time. I am the techncial officer for Timescapes and I’ll be talking about Timescapes Next Generation Archive – but we don’t have a second tranch of funding so we will deliver a proof of concept by the end of the project in early 2012.

We have been working with a product called Digital which is hosted by Leeds University. This platform sees all files as digital objects and doe snot allow modelling of complex structures of information and it’s inter-relationships. You can’t easily display connections and context around materials. We want to publish, archive and allow secondary research on data and that has huge challenges. We have been looking for solutions for social science longditudinal data storage and delivery.

We chose Fedora as it has a Content Model Architecture allowing the researcher to see connections and meaningful terms. And it allows multiple views onto the data. It allows the creation of content models – say we have an interview – is it anonimised? partly redacted? We have different levels of access to data so we need a flexible model that enables that. We also need to link data objects. Fedora allows us to link concepts, to set up our own relationships. It is all based on RDF triplestores and that is hugely powerful.

So our current archive shows the relationships between data on men as fathers (a particular study in the project), we can group material by interviewee, by waves of research, etc.

The services mentioned earlier are responsible for producing the views of relationships within the archive – these are built to suit the needs of the researcher. You can access whole groups of material or perhaps just case by case – both relying on who the viewer/reader is. We have flexibility there that allows us to differentiate between “types” of social science data such as DDI or QuDeX. You can’t just look at one object, we want to link internally, conceptually, thematically within the system.

SOLR is being used for searching and browsing – it’s off the shelf and easy to set up. It will look for data objects that have any of the search terms in pre-configured DISMAX metadata fields. We can set up custom searches really quickly for our researchers. We can also do advanced searchs and get these up and running fast. I am the only resource on this project so this has been a very fast way to build a nice system. We also use JQuery here. We have been using MIT SIMILE tools for faceted browsing and searching as well.

Another reason for Fedora is that is has XACML. It is crucial that we keep this data well protected, especially in it’s raw form. XACML lets us bring the policies from the repository right down to specific data object. Fedora manages this and that means we have a good reliability and audit trail around authentication and authorization.

So the systemm is based on three sources: DDI, QuDex, Timespaces. This is ingested, via an XSLT Transform, into Fedora via METS, We then connect up to multiple search and functionality elements and a PHP Web App that sits on the top.

Q&A

Q1) Can you explain a bit about the benefits that you’ve seen – you described the subject, predicate, object model. Often people only find that useful when you combine data with lots of other systems. Presumably for your work you could have had a relational database instead – could you outline why this was useful? Is there an intention that the ESRC’s other projects might benefit from this?

A1) It was mainly because it was in Fedora. We could define our own topologies. We use the flexibility of the RDF to do our structural stuff. We could move into combining that with other data but we haven’t yet. We are working closely with UKDA about the use of these technologies, there are very close relationships and connections there.

Yvonne Howard (Southampton Univ.) – Campus ROAR

I work with Pat McSweeney and Andy Day at ECS at University of Southampton. We were looking at learning materials and we looked at EdSpare, Humbox, Language Box etc. But we started thinking beyond these repositories at scholarly discourse. Where does Scholarly Discourse take place? It was once about scholars in a big room where everyone knew what was new. It was easy to follow the discourse. That 19th century form didn’t change much until the mid twentieth century, perhaps until the internet.

What is scholarly discourse now? It’s websites, online journals, social media locations. It’s not just a small group in the room but conferences all over the world. And yes, you get the article but a lot of what happens is ephemeral. When people talk about their research at a conference it’s gone. When you see these slides, tweets, blogs, it disappears. It’s not connected anymore, it’s not all in that one room. One thing we know is that there is a lifecycle going on. Us research get inspired, it’s a dynamic process and so is the research at the heart of that discourse. So how can we start to support that within a scholarly discourse idea.

Mostly we think of repositories as being about archiving, storing and keeping material safe and permanent. But what if we had a research repository that captured some of that discourse. We would want to archive but also the data, the discource, the scholar and their presence. How do you showcase interesting research. Well we can syndicate new research, we can showcase researchers. We want to make things engaging.  So we hosted content as well as metadata, capturing discourse and commentary about it. And you have a community that highlights awareness. And we want to reuse what’s going on in the Web 2.0 world. We have new formats in place here – iPads and iPhones etc. We are extending the concept of the web/RSS feed – and we provide engaging magazine style produts. And this is based on content syndicated through RSS, Twitter etc.

But how do real repository users repond to the idea of using RSS. People see it as geeky. Take up of RSS from teaching and learning repositories was poor so we asked people why – it scares or seems unmanagable to users. But people seemed to like Twitter – what’s the difference? Well it’s easy to use and understand.

So Campus ROAR is an editorially mediated institutional publication – how do you make content available and capture that scholarly discourse – and tools for using that data. Cue a demo from Andy and Pat. We’ve been looking at making more digestible content form what we have in our repository. We have made an EPrints plugin (available in Bazaar for EPrints 3.3) that makes content customisable and digestible. You can build a custom feed for academic news in your area. A web spider crawls the university webspace, identifies the keywords, the user can input their own keywords and it outputs a custom feed – it filters content for you. See: http://panfeed.ecs.soton.ac.uk/

At the moment we have Edinburgh, Glasgow and Southampton campuses already crawled for today but we’re happy to add anyone here’s campus!

The feed is designed to look great on FlipPad on the iPad and in similar apps. We’ll be doing a Pecha Kucha in more detail as well! Go check out the website. The other part of Campus ROAR is the EPrints Publisher plugin.

Q1) Are you planning crawl more widely?

A1) At present we have 3 institutions included and we try to keep track of where the news is from. It’s brand brand new but we hope to be able to filter it down to specific campuses if you want to – for use by your comms team say. Worth noting that it takes time to crawl new universities so would take time to broaden out.

Share
 August 3, 2011  Posted by at 10:45 am Live Blog Tagged with: ,  1 Response »
Aug 032011
 

Welcome and Introduction by Prof. Jeff Haywood, VP and CIO University of Edinburgh

Jeff spends quite a lot of time drifting around the edges of repositories and I get the sense of it being quite an interesting time in terms of repositories that hold publications, and that hold data. The very way of storing and translating data into academic publishing and also in the grey ways academics have traded their data in various ways. We used to openly trade within such domains but only between those who knew and trusted each other. There’s a step up now that is part of a general move towards open everything that will really challenge us in terms of long term sustainability at a sensible price.

It’s also interesting to see the number of events targetted at senior managers, those staff making policy decisions about how the university will act. We see examples of “open university x” as people make a concious decision to make their outputs available. There is a recognition that the open agenda is important, but there is also some hard action to be taken to work out how that is funded and sustainable. It was be good if we had a strong human and physical network across the UK and I know that JISC is doing some work on this. In Edinburgh we have done quite a lot of work around research data management and storage. We have spent some time making that process one process and defining roles for academics and support staff. We have work now to encourage and ensure compliance. My colleagues Robin Rice and Sheila Cannell will both be able to speak more

Two final events to announce. The Open Repositories 2012 is coming to Edinburgh next July. The annual international preservation conference from the DCC takes place in December in Bristol. Finally I do hope you will have time to see some of the Fringe whilst you’re here!

And with that Jeff hands over to Stuart Macdonald who is chairing our first show:

Opening Keynote: Eloy Rodrigues, Director of the University of Minho Documentation Services

Eloy has been heavily involved in repositories for some years and is currently working on the Open Access Science Repository, a project that began in 2008. Over to Eloy:

I will be talking mainly about the RCAAP project – Repositorio Cientifico de Acesso Aberto de Portugal

We started our first institutional repository – Repositorium – at Minho University in 2003, and our policy in 2005. In 2005 the activity on open access was still very limited in Portugal. Scielo already existed in South America and we set up a Portuguese section, we also set up the first conference on open access.

In November 2006 the Portuguese Rectors Council (CRUP) published a “Declaration on Open Access” and created a working group – there was real support for moving forward with this.

Before our RCAAP project started we had fewer than 10 repositories and fewer than ten thousand items across them. When the project began we aimed to set up a portal to promote the visibility of Portuguese research, improve access to national scientific output and to integrate our work into the international context. At this time there were very few university repositories so there was much to do.

UMIC – Knowledge Society Agency funded the project, also helping to govern the project were FCCN (general and infrastructure service), and the University of Minho (for the scientific and technical expertise for the project).

Although I will focus mainly on the policy and management of the repository I wanted to give some idea of how the infrastructure for the project is set up – we have two clusters and a database, some of this was set up in 2008, some is still being completed.

The RCAAP Portal is an OAI-PMH Harvester and Data Provider, a Search Interface, a Directory of repositories, and SARI.

Repositories are visible in google or they are not visible. So search interface useful but not so important.

We do a daily harvesting and indexing of the fulltext of every Portuguese Repository, we validate the results, and we send the harvesting repor to repository administrators on a daily basis. The OAI-PMH interface is also provided here: http://www.rcaap.pt/oai

Our search portal, www.rcaap.pt, combines institutional reposiories, several national, and one Brazillian repository. We have a Search Portal where you can search either only for Portuguese or both Portuguese and Brazilian content. We have various search options, tag clouds, etc. When you search you see a result showing the title, linking to the fulltext in the repository, and authors link to current researchers information system – so you can find profiles for the authors. You can also share each item via social media.

We have a directory of repositories on the portal with pages on each repository, text about that repository, links to relevant information and icons showing the compliance of each repository with standards.

The last component is the validator which checks if the repository is compliant with the project asking requiremenets or rules. These are based on the draft guidelines (the second version was issued some months before this project began in 2008) and that offers some basic level of interoperability. So we check that each record has a title, author, URI, is in the correct language, provides rights, and a date.  You can find the validator: http://validator.rcaap.pt.

If you have closed or embargoed access content in your repository or items that are not scholarly content, if those are more than 3% of what is in your repository, then we ask that those are not exposed to our harvester. 3% is fairly arbitrary but we knew we wanted the majority of materials to be open access. When our repository harvester goes to your repository it will go either to the driver set that you have specified or can harvest the whole collection – that’s why it’s important to provide drivers if you have more than 3% that is not scholarly or open access content.

An example validation report shows errors clearly and lets repository adminstrators test out what does and does not work.

SARI – Institutional Repository Hosting Service allows academic and research institutions free repository space on a SAAS model (Software As A Service) which is regulated by contract. We (RCAAP) house the data, provide infrastructure management, provide software management, training and helpdesk support. We also harvest the data automatically for the portal. The institution gets 1TB of storage, institutoinal branding and support in exchange for meeting a regular annual deposit target and complying with the rules of the project.

SARI funs on DSpace 1.6.2 + addons (stats, request copy, oaiextended, portuguese help, send to curricula Degois). Each institution is on the same code base, just installed locally for them.

You can’t tell on the RCAAP portal site who is on the hosting site and who is not. They are managed in an autonomous way and there are loads of external interface customisations. There is a free helpdesk by email and phone. There are 24 repositories hosted at present – three of those go live today!

On top of those 24 repositories we have a Common Repository – it was practical but also political as we didn’t want anyone to say we couldn’t have a national open access policy http://comum.rcaap.pt/

We use OAIexteded Addon to create virtual sets for DRIVER, OpenAIRE, ETDs, and activates DIDL. Its based on Filters and Modifiers (such as dc.types, dc.rights, open access >> info:eu-repo/semantics/openAccess). It’s highly configurable and perhaps will be included in DSpace 1.8 in October? We would certainly be pleased to see that happen.

We also use the UMinho Stats add on which gathers and process information about repository usage on access, downloads, internal stats etc. There are different levels of analysis (repository, community, collection, document) and it is highly configurable.

We also have the Request Copy Addon and this is for the restricted access items. It sends an email to the person who deposited the item, requesting a copy. Connects you to the Author or Depositor who can reply in 2 clicks! They select either “send copy” or “don’t sent copy”. One thing we would like to change is that the depositor is the person who receives the request for the document, we want to alter this so they directly send to the author no matter who deposited.

We also have the Sharing Bar Addon. This allows visitors to share items on Social Networks and on Reference Management Tools (Endnote, Mendeley, Bibtex) and allows sending to DeGois, a Portuguese system. The DeGois Add on allows items to be sent to DeGois Curricula – the Portuguese Current Research Information System (CRIS) via SWORD.

A recent activity has been co-operation with Brasil. The Ministers for technology for both countries signed a memorandum of understanding. Mainly they agreed that there should be search portal interoperability, there should be an open access Portuguese-Brazilian Directory and there should be an annual conference.

Brasil and Portugal both aggregate national resources and we aggregate with each others’ content – we do this via OAIS. The Portuguese-Brazilian conference took place for the first time in Minho last year. This year it takes place in November in Rio de Janeiro in Brazil – there is still time to get your proposals in!

Disseminate, Advocacy, Networking and Training – we have created flyers, mousemats, we have created elearning materials and we have been very lucky to be featured on two national television channels giving us a chance to

We are finishing off a new website for RCAAP. You will find various modules there on open access, on the process, on copyright, etc. Well worth a look. We have also created several studies and documents including the State of the Art Report on Open Access in Portugal (2009) and two reports in 2010.

We set up 5 new repositories on SARI in 2008, now 24 and we have a total of 36 repositories in Portugal, We have gone from 1300 items to some 63700 now. The introduction of policies, mandates and toolkits have made a big difference here. Several Portuguese institutions have introduced mandates and policies in 2010 and we expect more to follow in 2011.

At present we are working on a pilot project on data repositories – this  is a small experiment for several institutions with datasets in local IRs. We hope to also have these on the national portal soon – we hope to let you filter for items, journals or data. We have different metadata schemas in use here and are using DSpace for this work again.

We are working on aggregated/centralized statistics (SCEUR workbench) for 36 repositories. We want to look at views, downloads and deposits. We want to create evolution data/charts and rankings. And we want to enable graphics customisation and embeddable charts around these statistics.

We are setting up a hosting service for OA journals (SARC) on the SaaS model.This one launches in September and this is based on OJS (version 2.3.3). We have one OJS instance for several journals (rather than one per journal). Highly cofigurable to accomodate different journal practices and brandings. We have selected 8 journals for 2011 and will have them in production by the end of the year.

We will do another State of the art report on digital preservation. And we are developing and progressing our collaboration with Brazil around repositories.

Finally…

We think that RCAAP has been a successful project for various reasons. We have achieved our objectives. The growth indicators are positive, RCAAP has obtained national and internal visibility and we have increased the uptake of repositories in Portugal. We think this is down to having a real global and integrated vision here – we think this is particularly important for a country like Portugal but having awareness of what else happens outside of the country is still important for a country like the UK. Our governance model has been very successful – we have political, management and operational commitment here and it is based in centres of expertise. We are open to partnerships (e.g. Blimunda – translating SHERPA/ROMEO into Portuguese, Data Rep,, Brazil) . We have a service model that allows institutions to focus on their own core activities, and we also offer economies of scale through this model. We have a methodology for repository creation. From the first meeting with them to delivering a finished repository is less than 2 months now. And we have worked hard on community building – we include academics, researchers, libraries, the community at all levels. And we hope to host more repositories in the future building on these successes.

Q&A

Q1) That was an impressive array of repositories. I was wondering if you have looking at author identifiers. There was some chat on Twitter about how small countries are doing better than bigger countries in some of these areas.

A1) That is an area we will consider for the workplan for the next year. We will try to get some partnership in Portugal on digital preservation. The issue of identifiers needs to be. There is an author identification system in Portugal but that is not interoperable with any other system. We aim to have identification schema that can be used and interoperable at a number of levels. We are interested and happy to co-operate and yes, we are following ORCID of course.

Q2 – Neive Brennan, Trinity College Dublin) It’s worth saying that Eloy has also been enormously helpful on compliance with standards like OpenAIRE for us in Ireland. Have you overcome that issue with OpenAIRE over provenance?

A2) No, the problem is identifying the provenance from the repository. For the time being we harvest directly from repositories. It should not be difficult to do technically but we have not defined yet how the problem should be addressed. It would be easier to aggregate one OAI aggregator but it’s not a problem. But we do want to go on and do it

Q3 – Vicki Picton, University of Northampton) Particularly interested in Journals hosting with OAJ. We use that at Northampton but combining that with  a repository is really challenging – it’s quite hard to advocate and promote and get that message across coherantly.

A3) Our main focus is on repositories. We don’t see it as a competing service. We decided to create the service – the project is named Remiunda after a woman with special powers to see what others cannot, which is an appropriate name for our project – it was after seeing if there was demand for this. We are converting some of those 8 journals from closed print runs to open access electronic journals. We have worked with journals, mainly from universities and scientific societies. We are helping them be open access friendly for repositories as well as for the journals. We will try to take advantage of that connection with the repository and the journals – we are helping them engage with the open access agenda. We want to at least get them to support self-archiving in repositories

Q4 – Les Carr, University of Southampton) A cheeky question: experience would say that technical infrastructures, software etc. but the real challenge to knowledge management is human. How does it feel to be in Portugal now you have all your technical problems taken care of: has that lightened the burdon for repository managers hugely? Or are repository users still your main problem?

A4) For many repositories the managers and librarians find things much easier now. We had 13 repositories when we started and 36 now. Hard to compare when you go from no repository to one. But those who already had a repository still host theirs locally. The migrated repositories it has helped them focus on users and promotion rather than technical issues – we have some successes on that front. But users can still be the problem and we have much to do to get content onto repositories.

Share
 August 3, 2011  Posted by at 8:42 am Live Blog Tagged with: ,  Comments Off on LiveBlog: Welcome, Introduction & Opening Keynote
Aug 022011
 

Things are getting very exciting here in slightly drizzly Edinburgh: Repository Fringe 2011 begins tomorrow!

Programme

The final programme is now available to view or download and we have a fantastic and busy line up of speakers, round tables and pecha kucha’s!

If you are speaking and are happy to share your presentation via the Repository Fringe website please remember to email a copy or link to it to the normal email address: repofringe@gmail.com.

Sharing Repository Fringe 2011: A Call for Help!

If you are bringing your laptop, smart phone or camera with you there are a few things we would love your help with:

  • If you are  taking pictures of the event please consider sharing them via the Repository Fringe 2011 Flickr Group.
  • If you are tweeting and blogging about the event please use the tag: #rfringe11.
  • If you have a smartphone and fancy creating SoundCloud clips around the event please include the word “Fringe” and the #rfringe11 tag in the title so that we can find them and they can also be included on the Edinburgh Festival Sounds of Fringe map!

We will be recording high resolution video of all presentations and Pecha Kucha sessions and will make these available as soon as possible after the event. We will be streaming comments and tweets around the event via CoverItLive and we hope to also stream lower resolution video via UStream. We will also be live blogging all of the sessions here on the Repository Fringe website and would welcome your comments or guest posts.

Travel and Weather
If you are travelling to Edinburgh tonight you may wish to bring a light rainproof jacket but otherwise the forecast is for warm, humid but hopefully rain-free Repository Fringe days.

We have added lots of information about the venue, nearby bus stops, Taxis, etc. over on our location and travel pages which should help you find your way through the city.

Bear in mind that Edinburgh is busier at this time of year so do expect buses and walks to take a little longer than advertised and, in a few places, you may encounter whole new temporary buildings on your route!

And finally…

We have set up a special hashtag for social arrangements at this year’s event.  If you tweet with or search for the tag #rf11social you will find discussions of what to do this evening (Tuesday) and which shows might be fun to jointly head to. If you need a little inspiration have a look at our Entertainment and Food & Drink pages.

The Open Access Monitor project is even offering free drinks for any Repository Fringe attendee who takes part in their beta test!

Share
 August 2, 2011  Posted by at 3:51 pm Announcements, Planning, RFringe11 Tagged with: , , ,  Comments Off on Repository Fringe 2011 is Almost Here!
Jul 192011
 

As of this morning you will find a new tab here on the Repository Fringe blog taking you to our Programme page. Here you can view the latest version of the programme.

We are delighted to have Eloy Rodrigues, Director of the University of Minho Documentation Services, as our opening keynote this year and have Gary Hall, Professor of Media and Performing Arts at Coventry University and an Open Access evangelist as our closing keynote.

The new draft programme features an updated list of presentations with highlights including:

  • Mo McRoberts, BBC Data Analyst on the BBC Digital Public Space project
  • Ben Ryan, University of Leeds, speaking on the Timescapes Project
  • Charles Duncan, Intrallect, on Deposit from a mobile phone
  • Anna Clements & Janet Aucock, St Andrews University on PURE-OAR Implementation
  • Theo Andrew, EDINA & Edinburgh University Library, on Open Access Repository Junction (OARJ)
  • Siobhán Jordan, ERI, on OpenBIZ – knowledge exchange between HE & Business

Round Tables will include discussion of Repositories and the REF; Repositories and Linked Open Data; Social Media & Repositories; and Open Scholarship Principles. There will also be a special session with a wide variety of presentations from the JISC Repositories Takeup and Embedding Programme.

Meanwhile the Pecha Kucha sessions will include presentations from:

  • Sheila Fraser, EDINA, on Using OpenURL Activity Data
  • Marie-Therese Gramstadt, VADS on ‘Kultivating’ artistic research deposit
  • Richard Jones, CottageLabs, SWORD2
  • Robbie Ireland & Toby Hannin, Enlighten, Glasgow University on the Glasgow Mini-REF exercise
  • Martin Donnelly, DCC on JISC 07/11 work
  • Dan Needham & Phil Cross, Mimas on the Names Project

Finally we are also hugely looking forward to both the DevSci Hackathon and Developer Day during the Repository Fringe and the one day JISC CETIS workshop on Teaching and Learning Repositories on the Friday workshop day.

It’s shaping up to be a fantastic programme so we would recommend that if you haven’t booked your place yet you should do so very soon via eventbrite. If you have already booked why not have a look at our new(ish) location pages which will help you find out a bit more about the venue, nearby eateries and fringe venues.

You will see that there are a few slots still available for Pecha Kucha’s so please do get in touch with us as soon as possible if you’d like to take part: repofringe@gmail.com.

Share
 July 19, 2011  Posted by at 4:07 pm RFringe11 Tagged with:  Comments Off on Updated Programme Now Available
May 022011
 

Welcome to our new blog for 2011. This site is currently under construction but we will be adding content here over the next few weeks and would love to hear from any potential contributors, sponsors or attendees of this year’s Repository Fringe if you have any ideas about what you would like to see on the website or at the event.

For now the best place to find information on Repository Fringe is our 2010 blog.

Share
 May 2, 2011  Posted by at 12:02 pm Planning Tagged with: ,  Comments Off on Welcome to Repository Fringe 2011!
Apr 152011
 

As we mentioned at the end of March (on our old site) we are currently planning the Repository Fringe 2011 and we now have a wee update to report.

We are hoping to again have a two day repository fringe event and one day of workshops/meetings this year so grab those diaries and pencil 3rd, 4th and 5th August in as the expected Repository Fringe dates.

The dates also cunningly dovetail the Edinburgh Festival Fringe preview week so once we have finalised the details we’ll let you know here on the blog and through the usual mailing lists as it will be important to book travel and accommodation early.

As ever if you are interested in taking part (with a talk, pecha kucha, roundtable, workshop etc) or sponsoring the event it’s never too early to give us a wee shout (repofringe@gmail.com).

Share
 April 15, 2011  Posted by at 12:13 pm Announcements, Planning Tagged with: , ,  Comments Off on Save the Date: Repository Fringe 2011