Monday 16 September 2013

Eurostat data in R

Eurostat, the statistical office of the European Union, collects and publishes national and regional data for Europe. The site's features include Regional Statistics Illustrated, a data tool with interactive maps and charts and Statistics Explained articles.

It's possible to download data from the Eurostat database to carry out your own analysis and visualisation. There are two possible approaches to this:
  1. The browse/search page. Within this, the Database sections are customisable, while the Tables are ready-made. The advantages of downloading data by this route is that very little processing is required to get the data in the correct shape. The disadvantage is that you need to use a web browser to get data rather than downloading straight to software such as R. But the process can be made very quick by using bookmarks.
  2. From the bulk data downloads section, you can download full datasets for use in statistical software.
I'll begin by discussing option 1. When you click on a dataset from the browse/search page, a new window opens (see screenshot). In this window, you can choose the variables, countries etc that you are interested in.

To find out more about the dataset, click "Explanatory texts (metadata)".

It's possible to bookmark your data selection for future use.

To download the data for use in a stats package, click the Download button and choose to download in csv format.

This is an example script for plotting data downloaded from Eurostat. Here is the result:

For downloading from the bulk facility, this is useful blog post by Johannes Kutsam. Here is an example script I wrote using Johannes's function. The result is very similar to the chart above, although for some reason estimated values seem to be missing from the downloaded data.

No comments:

Post a Comment