Data Skills Training with Open Knowledge Foundation

News Picture: 

“More eyes, better data.” This was how Anders Pedersen of the Open Knowledge Foundation (OKF) sold the concept of Open Data to the seminar attendees, mostly representatives from government agencies keen on partnering with Open Data Philippines (ODP). This was Monday, Day One of the weeklong OKF Data Skills Training held from May 12-17, 2014, in Taguig City. The event was organized in collaboration with ODP, and the trainings were generously supported by the World Bank in the Philippines and the Partnership for Open Data.

In his introduction, Pedersen explained OKF as a global network of people whose passion revolves around “opening” information through technology. This is important, as it helps us cope with the vast amounts of data we produce daily and make them work for us—whether in the areas of finance or health, education or employment. But in order to make data “open,” it is not enough to simply make it available, but also reusable and redistributable. What does this mean? Basically, it entails moving away from traditional formats like .pdf to more open ones such as .csv and .txt. The difference lies in the level of accessibility. Open formats can be accessed by a wide variety of programs and allow for the manipulation of data, as opposed to closed formats like .pdf. In between are files like .xls, whose contents can be manipulated but are biased toward certain programs.

For many of the participants, this was the first time they were hearing about such a distinction. PDF files had been the norm for so long in their agencies, that the thought of it being inaccessible had probably never even crossed their minds. It was at this point that Pedersen expounded on the principles of Open Data—a worldwide movement that aims to institutionalize good governance by sharing government data to the public in machine-readable formats. The idea is that once all this data become available, citizens will be able to verify for themselves key government transactions and make informed decisions, thus rendering data personally useful. Questions such as “Where does my tax go?” and “How does my son’s school compare to others?” will be answerable through Open Data.

Additionally, with so many people looking at government data, governments themselves will set even higher expectations among its agencies—to produce accurate reports and to release them in a timely fashion. Citizens, too, will demand the information they need and help validate those already published. Thus, “more eyes, better data.”

But Open Data is a sustained effort. As Pedersen emphasized, a lot of work still remains to be done after setting up a portal. There is a need to constantly clean, share, and update data, as well as to promote citizen participation, with the ultimate goal of making information not just open, but useful.

With this in mind, Pedersen launched into the day’s breakout sessions, which mostly involved practical training in cleaning, scraping, and visualizing data—essential tools for any Open Data advocate. Whereas cleaning comprises skills such as sorting and filtering within programs like Microsoft Excel, scraping requires familiarization with codes and software that can grab data from .pdf files and websites, for transfer into more open formats. While Pedersen handled basic data cleaning, Sergio Araiza of Escuela da Datos took charge of the other sessions. He introduced the attendees to scraping methods such as Table Capture and Tabula. But even as he explained pivot tables and other data manipulation techniques, Araiza emphasized that the most important thing is to effectively convey this information to the public. To this end, he offered tips on visualization, showcasing some examples (see end of article) and tools like Datawrapper, Raw, D3,js, Tableau, and ThingLink.

Day One ended on a high note, with all the attendees gathered back into a plenary after the breakout sessions. Pederson then presided over a brainstorming period, urging the government agency representatives to reflect on their own data initiatives and on areas of possible improvement.

The rest of the week followed a similar pattern. Tuesday and Wednesday were slated for members of media and civil society organizations (CSOs), while the sessions on Thursday and Friday built on the basic lessons already introduced to the government agency representatives on the first day. Araiza led more intermediate sessions on data cleaning using OpenRefine, and gave the attendees time to practice performing the same tasks using their own laptops.

This hands-on approach met with great enthusiasm from the participants, especially those from the media, as they learned new tricks they can use in their daily interaction with data. They were particularly impressed by OnlineOCR and ILovePDF, free online tools that allow users to liberate data from .pdf files and even merge or split pages, according to their needs. This sudden simplification of tasks previously considered near-impossible animated the attendees considerably, even eliciting “oohs” and “aahs” from them. Such highlights manifest the overall atmosphere during the training, which more than anything else centered on learning and the exchange of ideas.

Finally, the training ended on Saturday with a lengthy Data Expedition. The attendees—a mix of all three groups—used a Bureau of Customs (BOC) dataset to frame interesting questions, ranging from “Where does our meat come from?” to “What are our top 10 import origin countries and what do we import from them?” Using these as starting points, they formed narratives out of the data and presented their findings to the group.

Truly, the OKF Data Skills Training constituted a successful, meaningful “expedition” for both the organizers and participants. By fostering a learning environment, the speakers imparted to the audience not just practical techniques but also novel perspectives they can in turn pass on to their respective organizations. Through such continued engagements, we from the Open Data Task Force hope to cultivate a society where knowledge is shared, not withheld, and where citizens are empowered to make informed choices about the lives they lead—with the awareness that it is not just about me, but about my contributions toward building a better tomorrow.

Sample interactive infographics
http://guns.periscopic.com/?year=2013
http://dirtyenergymoney.com/view.php?type=congress#view=connections
http://hint.fm/wind/
http://bolid.es/
http://www.crimemapping.com/