Aug 042011

After a refreshing coffee we’re back and Robin Rice of EDINA is introducing our next speaker. All of the work in the Research Data Management strand is about long term cultural change and I think Mark’s approach here is really inspired.

Mark Hahnel (Imperial College London) – Figshare – Publish All Your Data

Don’t be mad at me for not having a guitar!

Basically this is a bit different to the other repositories in terms of what it does. One problem everyone seems to have is incentivising people to upload and share their data. This is about what would incentivise me as someone from a science background.

I was doing a PhD, generating data, then generated lots of data, charts, graphs, etc. Only a tiny percentage of what I produced will ever be written up but that other data is useful too. That smaller subset will get out there with traditional publication methods. What can I do so that others can use, cite or be aware of it. This was the whole idea behind FigShare. This was originally an idea selfishly for myself. It’s built on a MediaWiki base. Others said – well it’s useful for you but it might be useful to others too…

But why do this? Well within that data I have tested what x does to y. But I know that 20 other labs may fund the same research. There is this whole issue of negative data – it’s part of what is broken in the current publishing systems. In those 20 labs you can get 19 with negative results, 1 with a false positive but it’s much easier to publish that one result than those negative ones.

So FigShare comes in here. A very simple set of boxes – I won’t use a repository that I have to be trained in. No one would use Facebook if you needed training for it! And researchers want their data to be visualised – we are working on making that embeddable. Each set of data is a persistant URL (no matter where hosted). And this has clickable everything. You can also preview datasets on the page without having to download everything. And automatically a researcher profile collects their work.

And we also have space for videos – again not publishable but show interesting things. You can link your theses to this permanent URL in the same way. One of the things I have learned is that if you build a platform for scientists they will do their own thing with it. I thought it would be great for disseminating data and finding stuff on Google. Others have said they want feedback on material for publication. People started sharing their research through different outputs. If you click on person you can pull in an RSS feed of your research. So people have been plugging in that RSS to friendfeed to disseminate and people have given great feedback, questioned his methodology and collaborating. You could also plug the RSS feed into a blog as an eLab Book.

And the permanent storage of something online – access your research anywhere which means you can instantly show people what you are working on. In terms of permanance we are working on exports to endnote and so on. The handles are similar to DOIs. Everything is listed by tags, searches etc. It is discoverable. You can search or browse by anuthing here. I wanted to do this for selfish reasons. When I started my PhD (on mobilisation of mscs) my lab had just had a huge paper released, reviewed in Nature, a feature on page 3 of the Guardian . If I search now my own work – which is useful for others – on FigShare are the top result even though it will not be published in a journal. I am happy to see that it is working in terms of discoverability. So the thing about this is that the data is more discoverable, it’s disseminated, it’s available for sharing. We have done all this on a budget of zero and for that reason we are asking researchers to make their data open when they upload it here. The thing about JISC is that they fund these amazing tools and resources but even as an interested researcher I don’t find things out. When I do I retweet, I get the word out. Retweet everything! Make the most of the amazing stuff that is being built.

In the first few months we had several hundred researchers and 700 ish data sets submitted. Even with 700 objects that’s not great to search. It was suggested that I seed the database. There is an open subset of PMC of articles but finding the figures is tough so this is about breaking figures out of repositories. About a month ago we began parsing the xml files and we have been pulling in about 2-3000 figures per day. About 50,ooo figures so far. We should make about half a million figures more discoverable in total in this process. The other thing is that if you publish in an open access journal you therefore may already have a profile and data available.

We’ve been looking at what else might be needed…

We were asked to allow grouped files – for projects but also for complex 3D imaging objects. Researchers like to big themselves up. We are included alt metrics here – allowing new ways to boast about their work. Also graphical representations of page views – in a nice graph it’s quite appealing. And we also provide Embed code for adding their data for their theses or papers etc.

So that is the long and short of the features as it is. And everyone I’ve talked to in science has an opinion – positive or negative. I am really pleaed that so many repositories are educating researchers on depositing data and articles and on open access.


Q1 – Les Carr) It’s just amazing what one can accomplish as a diversion almost from one’s PhD. Looking at all these figures from external data sources, the actual data sets – which are so important – you have a handful of dozens of those. Any sense of how will you increase this

A1) I have an idea that when we’re doing journal clubs and things like that you can use the QR code to look at the figure, see the data, explore further. Some journals require you to be uploading all of your data. There are projects like Driad. There are lots of datasets under CC0 – I could do that in the same way as we have for the figures but I’d prefer people to upload their own data.

Q2 – Peter Murray-Rust) I think this is fantastic. Have you had any interest from journals about this. For instance I work with BioMedCentral and this would be trivial to link back and forth

A2) BioMedCentral have been in touch, mainly as we have been compiling a list of repositories to deposit specialist materials.

Q3 – Robin Rice) If journals and publishers are becoming dependant on figures beingthere what do you see as the sustainability model for FigShare

A3) In the first week of pre-beta a not for profit organisations offered to host FigShare indefinitely – at least 3 years and it’s just had funding for at least the next 20 years.

 August 4, 2011  Posted by at 10:51 am Live Blog Tagged with: , ,  Comments Off on LiveBlog: Mark Hahnel – Figshare: Publish All Your Data