The EIF (European Internet Foundation) hosted a dinner debate on Open Data, also known as the Public Sector Information (PSI), at the European Parliament, Brussels on Tuesday 24th January 2012. Attendees came from a broad range of commerce (including Microsoft, Facebook and Google), education, NGOs, national and regional government departments, and the European Commission and European Parliament. The event was hosted by MEP and EIF governor Marietje Schaake, and led by three speakers: Marcus Dapp who drove an innovative open data project for the City of Munich, Richard Swetenham who heads the Access to Information unit of DG Information Society at the European Commission, and Willem van Valkenburg who is director of Delft University’s OpenCourseWare project. A link to the EIF event site and speaker podcast is here and we have added further reference links at the bottom of this article.
Such an “obviously” good idea
Just about everyone agrees open data is a good idea, but some public bodies are resistant to change, or perhaps to the release of data whose quality would be put in question if viewed by a wider audience. Talking to a group of delegates after the dinner it was pointed out that opening up data would mean that mistakes would be spotted more quickly. Half the group thought this a good thing, half thought it bad! This, it seems, is the main barrier to opening up of public data.
But need evidence to persuade reluctant public bodies
Public sector attendees’ main plea was for examples and case studies that they could take back to persuade their departments to release data. The private sector, especially the start-up space, is well aware of apps and businesses that have sprung up on the back of publicly available data, but it seems these have not been collated anywhere for public servants to peruse. PolicyBloggers has done some digging and come up with a starter-resource for anyone needing examples, at the end of this article.
Copyright probably a non-issue except in education
The issue of copyright was often raised during the evening, but it seems this worry is misplaced. Data, as opposed to human-created content, is not copyrightable. Time is wasted issuing data under a creative commons license that has no standing in law. Departments can charge a reasonable fee for provision of data, especially if its collection into publishable form adds to their costs, or if it is requested at a faster pace than needed by the department. In education, there is an opportunity to open up publicly funded research. Existing research publications fund just 2% of their content and often take copyright of their articles in exchange for them being published in a prestigious journal. Coincident to the event, we see academics beginning to boycott some research publications.
Open data, or Public Sector Information (PSI)
Open Data refers to the release of public sector data, often called Public Sector Information (PSI), by public departments or public funded bodies. Open data means any data collected by, or whose creation is funded by, public institutions. Examples mentioned during the debate include the release of weather information in the Netherlands spurring mobile apps telling you if it is safe to go shopping without your raincoat; data on Netherlands’ schools mashed up with a social network allowing parents to compare notes; land registry data in the UK and Spain; live bus and metro information in London and New York City and the creation of “open data” cities such as Munich. It could also mean the opening up of research data from universities that is funded by public money, or of digital art collections, or of aggregated demographic or financial or tax data held by public institutions.
What the European Commission is doing
The re-use of PSI has become a high priority for the EU, having garnered little attention at first. As Richard Swetenham noted, “open data is so obviously a good idea, how can you be against it?”. Swetenham outlined the commisson’s work to date. The November 2003 PSI directive outlined fair commercial reuse policies. Some public data is by nature a monopoly, so this has to be commercialised in a fair, non-discriminatory way. The directive worked, well, but not well enough and so a new proposal to amend and reinforce is being worked on.
The scope of the directive is being extended to cultural heritage including libraries, museums and archives. Publicly funded academic research will also be addressed.
Open Data in Munich
Marcus Dapp, whose background included research on open source software, led the first city open data project in Germany. He identified pride at being the first city to open up as being one of the drivers behind the decision. The project was a success but had some interesting learning points.
He emphasised the need to reach out to the community on what it wanted well before the project started. Talking to him afterwards, he noted that the release of individual house sale price information, which is now freely available online in the UK, would not be tolerated by German citizens.
His main issue with the process was measurement of success and the difficulty in persuading public sector management to release data. Tangible and also intangible (social return on investment, SROI) measures should be proposed up front. One group of citizens demanded the city’s financial accounts in digital form. The City pointed out that it published an online 500 page PDF of its accounts, but this wasn’t acceptable to the group and in the end the high level accounts data were published.
Opening up Education
Willem van Valkenburg made the point that much more could be addressed in the directive on the subject of education. Each narrow sub-sector of academic research has its preferred publications, and the publishers make use of the exclusivity and reputation enhancement of being cited to charge for inclusion and, surprisingly, retain copyright on published articles. He estimates that only 2% of research publication content is actually funded by those publications, with the bulk of the cost coming from public purses. In an online age this seems ripe for disruption. True, publications have rigorous selection processes, but even these are driven by groups of often unpaid academics, also funded by the public. A few days before the event, mathematician Timothy Gowers sparked a heated debate in the press by blogging about his boycott of one academic publisher.
But there is a less tangible angle to open data in education, and that is the opening up of coursework materials. OpenCourseWare at Delft University has 20,000 courses freely available and often used outside the EU. An example is Delft’s water management course, used in Indonesia and Africa, which in turn fed back local case studies that now enrich the course content. Stanford University in the US opened up an Artifical Intelligence course and saw 160,000 students sign up.
There are drawbacks – course texts are copyright protected and cannot be changed. And there is a lack of direct tutor contact. However in the US OpenStudy set up a feedback site and found that 70% of posted student questions were answered within 5 minutes. Willem felt that European open course work is lagging behind the US and Asia.
Barriers and issues
Richard Swetenham opened the debate on this question by noting that the behaviour of “Data Huggers” had already been described by Andrew Stott, the UK cabinet office’s Director of Digital Engagement. His 11 data hugging excuses were not listed at the event but for referenence are:
- It’s held separately by n different organisations and we can’t join it up
- It will make people angry and scared without helping them
- It is technically impossible
- We do not own the data
- The data is just too large to be published and used
- Our website cannot hold files this large
- We know the data is wrong
- We know the data is wrong, and people will tell us when it’s wrong
- We know the data is wrong, and we will waste valuable resources inputting the corrections people send us
- People will draw superficial conclusions from the data without understanding the wider picture
- People will construct league tables from it
- It will generate more Freedom of Information requests
- It might be combined with other data to identify individuals/sensitive information
- It will cost too much to put it into a standard format
- Our IT suppliers will charge us a fortune to do an ad hoc extract
The question of the cost of digitising analog (print or image) data was raised. 20 million digital items of cultural heritage have already been digitised by Europeana, funded by European money. It is true that the cost of digitisation is difficult to justify especially today, but it is important to remember this is an investment, not an expense with no future payback. It is also possible to post meta-data, meaning information about an object, much more cheaply..
The directive does not apply to personal data, but to aggregate data. This raises a question on how far this can lead. In the UK, individual house sale data is published, which is effectively personal data for privately owned properties. The data comes from the collection of stamp-duty tax. It is widely used by real estate agent and house-buyer websites in the UK. As Marcus Dapp noted above – this level of openness that could be traceable back to individual transactions might not go down so well in other countries.
There is a fundamental difference between the European and US’s approach to privacy and this led to a discussion on informed consent. The US, in the eyes of European legislators, is a curious situation where there is opposition to government stored private data, yet it is fine for private citizens or corporations from obtaining private information, such as credit status, that is not available in Europe. Europe is more in line with the spirit of OECD rules which centre on “informed consent”.
But the concept of informed consent is not working in the world of mobile apps and social networking. Very few bother to read the 10 pages on privacy before putting their personal information on Facebook. Yet hitting the “I agree” button is informed consent. Policybloggers notes that the situation is the same with iPhone or Android apps where it is easy to skip over the privacy warnings, which might include full access to your contact lists or to your location, and hit the “I agree button”. These are cases of un-informed consent.
Marcus Dapp thinks a simple icon based approach, similar to that used by creative commons to describe copyright, would simplify the consent process. This is off the PSI topic but is an interesting point and merges the questions of governments and citizens opening up their data.
Quality of data
One question concerned the usefulness of data, how quickly it would age, and work needed to keep it current. Here the connection between open-government and e-government (being the digitisation of government processes, move to electronic invoicing etc) becomes important. Marcus Dapp noted that making publication of data part of the process, rather than an added process, helps solve this problem. If commercial interests request data on a more timely basis than the public body needs, then this is a reason to charge for this ungraded service. The directive does not say that data has to be free.
Allied to the quality question is ease of access. A delegate from Microsoft (who offer cloud hosting) noted the much higher than expected traffic generated when London opened up its live tube and bus data.
The Open Source experience
Marcus Dapp cited work earlier in his career on open source software. The drivers of free open source software generation have interesting parallels for open data. He found 16 drivers that encouraged people to contribute to open source projects, and only two of them were financial – being the ability to use the product in business, or direct enhancement of career prospects by being associated with a project. Others included reputation, identification with a philosophy of openness, being seen as a first mover. He noted that arguments on a psychological level can be just as important as rational.