Putting user needs first: challenges in accessing, linking and quality of data


I have worked with multi-country time series data since the mid-1980s. I first started using the World Bank’s *STARS* data which eventually morphed into today’s World Development Indicators. At the time, *STARS* was sold on a CD (although the Bank made free copies available to member countries and had concessionary rates for nationals of developing countries.) Most large UN organizations, including FAO, produced databases which were also sold commercially. With the development of the World Wide Web UN organizations and governments began providing some data online, usually teasers for their commercial products. In 2009/2010 the situation changed dramatically as first the World Bank and then FAO made all of their commercial databases freely available on the web and the UN itself converted a lot of its data bases from commercial into free products. At the same time these organizations began adding more databases to their data sites. The World Bank currently has 51 data bases available in the World DataBank of which two are specifically Health and Nutrition databases, while many of the others contain some agricultural data. FAO has 14 databases in its FAOSTAT system which provides a choice of two interfaces. The UN Population Division has a range of databases of population and demographic data. All three sites provide databases with online querying, as well as prepared bulk downloads in Excel or csv formats. In addition each organization has numerous other datasets within their websites. Other organizations with free databases relevant to agriculture and nutrition include WHO, UNICEF, OECD, the European Union, the US Census Bureau and USDA. Many national governments and NGOs also provide agricultural and nutritional information.

There is thus a massive amount of data already available on food and nutrition, although there are significant gaps—data on labour inputs, fertilizer response curves, cropping calendars, and agricultural systems are hard to find.

Key challenges

Although there are now many sources of data, the data is often difficult to actually use, even for experienced users. For example:

  • It is often difficult to link multi-country data between different databases because there is no common standard used for spelling country names or using country codes.
  • Bulk downloads may be so large they cannot be opened either with a spreadsheet or common database software.
  • Bulk data in the same group of files may be in different formats.
  • Use of different hierarchies of identification codes may not be consistent.
  • Data cells may include both text and numeric characters.
  • Accents in text fields may cause data corruption.
  • There may be obvious errors in the data.

These problems are not insurmountable, but they often require time consuming modification of the data. Some of these problems will be demonstrated on slides during the presentation.

At a more general level, particularly in the emerging countries, there may be many problems hampering the use of data at the user’s end involving internet capacity, hardware and software issues and user inexperience. For example:

  • Many of the databases may be difficult to download in countries with low bandwidth.
  • Government offices and other institutions in some countries may not have internet access.
  • In many developing countries hardware running illegal software may be difficult to maintain because Microsoft is increasingly cracking down on pirated software, but legal software is beyond the budgets of many organizations and individuals.
  • Database software is needed to process multi-country databases using more than one table.
  • Users often do not have the experience with spreadsheets or database software to process data.

Topics for discussion

A few of these problems will be demonstrated, but the main objective will be to identify problems which are making access and use of the agriculture and nutrition databases difficult. These will include:

  • Problems inherent to the databases themselves.
  • Bottlenecks in the “supply chain” of data in both high and low bandwidth countries .
  • Possible solutions to these problems.

Key objective of the session

The main objective is to identify and propose solutions to the problems that are limiting the productive use of this wealth of data. Possibly because this data is so difficult to access there has been little innovative use of the data. Perhaps the most popular web site promoting understanding of international data is Gapminder . It is significant that Gapminder features only four “agricultural” datasets—one from ILO on agricultural labour, two from the World Bank on land and agriculture’s share in GDP, and one from FAO on agricultural water withdrawal.

Relevant sources

The web site http://www.sustainableworld.com has been developed to start dealing with some of these issues. It is designed to load rapidly in low bandwidth countries. It is divided into three main parts:

  • Data Links which provides access to many of the important international time series databases.
  • Using the Data which gives advice on downloading data from various websites, identifies problems with databases and offers temporary solutions and workarounds and which also provides access to training material.
  • Open Source Software which provides advice on accessing and using the Linux open source operating system and procuring and downloading open source applications which run in Linux or Windows.
