-
Notifications
You must be signed in to change notification settings - Fork 32
Home
rsdmx is a package to read SDMX data and metadata in R. It provides an SDMX format abstraction library, and an SDMX web-services interface, including a embedded list of well-known national and international data providers.
rsdmx is currently seeking for institutional or individual sponsors to fund the rsdmx
package development in order to (1) enhance and strenghten existing functionalities, (2) provide new functionalities, and (3) ensure rsdmx maintenance and users support.
If you wish to sponsor rsdmx and/or fund additional functionalities, do not hesitate to contact me
Do you need support to use rsdmx
? You may consider to:
- ask on http://stackoverflow.com or using the rsdmx mailing list
- create a ticket in Github
Do you use rsdmx
? your feedback is welcome! You may consider to:
- report a bug
- send suggestions either by email or on the dev mailing-list.
Table of contents
1. Overview & Vision
2. Package status
3. Success stories
3.1 Inventory of Data Sources
3.2 Projects using rsdmx
4. Credits
5. Fundings
6. User guide
6.1 Install rsdmx
6.2 readSDMX
6.2 Examples
6.3 Community use cases
6.4 R documentation
6.5 User mailing list
7. Developer guide
7.1 How to contribute
7.2 Unit tests
7.3 Build tests
7.4 Developers mailing list
8. Issue reporting
rsdmx
intends to develop an iterative approach in order to read SDMX data & medata documents. The schema below illustrates the current scope of rsdmx
and its vision to facilitate the use of SDMX data & metadata in R:
Low-level SDMX format abstraction library
-
Today: Its primary role and current emphasis is to provide a low-level SDMX format abstraction library, supporting SDMX
1.0
,2.0
and2.1
format standards. Such low-level offers a flexibility required to read SDMX data whatever their location (web
orlocal
resources), and the way they are provided (throughweb-services
, or not), hence to guarantee that most of the SDMX datasources could be read in R -
SDMX-ML Format, and more ? Currently
rsdmx
focus on the SDMX-ML (i.e. SDMX - XML) format. Other formats, like SDMX-EDI or SDMX-JSON could considered at later stage. -
What abour writing SDMX documents ? Reading SDMX-ML documents is very useful when you need to extract and analyze data from scattered sources. What about writing? Indeed, SDMX-ML remains an exchange format, having the capacity to write R objects (such as
data.frame
) would be a step forward in statistical data exchange. One possible future objective ofrsdmx
is to provide a SDMX-MLwriter
to let R users export their data to SDMX format. In same way a user could read data withreadSDMX
, he could write some data analysis output withwriteSDMX
!
SDMX Web-services interfaces
- Currently,
rsdmx
does not provide an interface to web-services that implement SDMX web-service standards. The main reason, is that many SDMX datasources are not provided through SDMX web-services: SDMX documents are published in an adhoc manner (single SDMX datasets, zipped files); Some web-services also redirect to zipped SDMX files when the amount of data becomes huge; other documents are exchanged between people, etc... In order to guarantee that all SDMX datasources could be read in R, whatever they arelocal
,remote
, published withSDMX web-services
, or not,rsdmx
started with a flexible, generic low-level approach. - Building SDMX web-services interfaces with
rsdmx
is however considered in the package vision, as it would facilitate the interaction with SDMX-web-services and the data extraction, for those data sources that offer these web-services.
SDMX Graphic User Interfaces
- The logic path in the
rsdmx
vision is to make this SDMX data extraction and reading in R moreuser-friendly
. This is also considered under the scope of the package. Experiments using Rshiny
have been performed in this sense.
Object-oriented approach
rsdmx
currently follows an approach based on S4
classes and methods. Its means that the SDMX-ML object model is fully mapped to R. Beyond reading the SDMX content as data.frame
, rsdmx
then allows to inherit all the associated information / metadata that is exhanged in SDMX-ML documents.
Ease of use
The main end-user functionality of rsdmx
is a unique function, named readSDMX
, which takes care of reading the SDMX-ML document, and returns the appropriate R SDMX object, from which the user can run common R functionalities such as as.data.frame
.
Enhanced SDMX-ML document reader
Currently, each main SDMX R object instantiated according to the SDMX-ML document contains a slot
(property) that contains the XML R object. In case of very large datasets, this could lead to memory issues. We are investigating how rsdmx
could enhance its engine to use the XML event-driven xmlEventParse
function not to load the complete XML tree in R, while maintaining the object-oriented approach.
Moreover, reading SDMX-ML is one thing, but discovering and accessing SDMX data from web-services is another. Not all people know how to use SDMX web-services and related web protocols, and it is not necessarily straighforward for a user to prepare a SDMX query to get only the data he wants. A future vision of the package is to extend the role of SDMX format abstraction library to a SDMX web-service R interface, to facilitate the data discovery and data access for the R end-user.
The package currently allows to read SDMX datasets
, and data structure definitions (DSD)
(including concepts
, codelists
and data structures
. For datasets, t has been successfully tested on both SDMX 1.0 (CompactData
), 2.0 (GenericData
and CompactData
types) and 2.1 (GenericData
).
A first support for MessageGroup
type was enabled in order to read embedded generic or compact data.
Tests were performed essentially using several data sources, such as FAO, OECD,EUROSTAT, the European Central Bank (ECB), and many others! Check the complete list here
Check the [Change History] (https://github.com/opensdmx/rsdmx/wiki/Change-History) which provides a list of fixes and improvements by milestone.
Check also the success stories to see how and where rsdmx is used!
While the rsdmx is still growing, it is worth mentioning that its user community is growing, and positive feedback and acknowledgments were provided about its use. Support was provided to users either by supplying examples and help or even by improving the package (enhancements, bug fixing).
As success stories, the rsdmx package was used as SDMX data abstraction library in multiple both international and regional data sources, listed here below:
- international data sources:
Name | SDMX Web resource | Embedded web-service interface |
---|---|---|
UN data portal | Link | yes |
UN Food & Agriculture Organization (FAO) | Link | yes |
UN International Labour Organization (ILO) | Link | yes |
UN World Health Organization (WHO) | Link | no |
Organisation for Economic Co-operation and Development (OECD) | Link | yes |
EUROSTAT | Link | yes |
European Central Bank (ECB) | Link | yes |
International Monetary Fund (IMF) | Link | yes |
World Bank | Link | yes |
World Integrated Trade Solution | Link | yes |
Bank for International Settlements | Link | no |
- national data sources:
Country | Name | SDMX Web resource | Embedded web-service interface |
---|---|---|---|
Australia | Australian Bureau of Statistics (ABS) | Link | yes |
Belgium | National Bank of Belgium | Link | yes |
Canada | Statistics Canada | Link | no |
Deutshland | Deutsche Bundesbank | Link | no |
Deutshland | DESTATIS Statistisches Bundesamt | Link | no |
Estonia | Estonia Statistics | Link | yes |
France | Banque de France | Link | no |
France | Institut National de la Statistique et des Etudes Economiques (INSEE) | Link | yes |
Italy | Istituto nazionale di statistica | Link | yes |
Lithuania | Lithuanian Department of Statistics | Link | yes |
Mexico | Sistema Nacional de Información Estadística y Geográfica de México (SNIEG) | Link | yes |
Netherlands | De Nederlandsche Bank | Link | no |
Sultanate of Oman | National Center For Statistics & Information | Link | yes |
Spain | Instituto Nacional de Estadística (España) | Link | no |
Russia | Government Statistics (Russian Federation) | Link | no |
Sweden | Statistics Sweden | Link | no |
Switzerland | Swiss Statistics (classifications) | Link | no |
UK | UK's Office of National Statistics (ONS) | Link | no |
UK | UK's official labour market statistics (NOMIS) | Link | yes |
UK | UK Data Service (UKDS) | Link | yes |
USA | US Federal Reserve | Link | no |
USA | Federal Reserval Bank of New York | Link | no |
USA | Bureau of Labour Statistics | Link | no |
- other data sources:
Name | SDMX Web resource | Embedded web-service interface |
---|---|---|
KNOEMA Knowledge Plateform | Link | yes |
The rsdmx
package has also been used in the following projects:
- SYRTO project: Systemic Risk Tomography Signals, Measurements, Transmission Channels and Policy Interventions. The EC funded project uses rsdmx as part of its data quality framework
- iMarine data e-infrastructure within R statistical analysis processings made available through Web Processing Services (WPS).
- Live Labour Force project, to allow reading SDMX datasets from the Australian Bureau of Statistics (ABS) portal (ABS.Stat). The project won the first prize in the category Best Statistical Storytelling with ABS.Stat (API) at the Australian GovHack 2014 edition.
Did you use rsdmx
in your work?
We would be very grateful if you can add a citation in your published work. By citing rsdmx
, beyond acknowledging the work, you contribute to make it more visible and guarantee its growing and sustainability. For this, please use the DOI
The rsdmx package is borned from a volunteer development initiative to facilitate accessing and analyzing SDMX-ML data in R. At this stage, the project offers some functionalities to reach this objective.
Currently, the project is seeking for funding opportunities in order to make the package growing with new functionalities, improvements, guarantee a quality maintenance of the R package and users support, hence ensuring the sustainability of the rsdmx project. If you wish to donate to acknowledge for the work accomplished, please contact us.
Here below a list of enhancements for which we seek funds:
Enhancement | Description | Ticket |
---|---|---|
SDMX-ML SAX parser | capacity of rsdmx to rely on the Simple API for XML (SAX) event-driven XML styler, as additional SDMX-ML parsing functionality. Currently the approach relies on XPath and requires to load the complete SDMX-ML document tree in R. The SAX approach intends to provide rsdmx with the capacity to read huge datasets without leading to R memory leak issues. Such enhancement would provide an added value, especially where xml data becomes really huge, and where rsdmx intends to be used in the context of web-services. This enhancement will make rsdmx very flexible in the way it can read the SDMX data from the web |
#36 |
SDMX-ML ObsTime Date format | In datasets, there is a need to coerce observation Time into appropriate date format. Such coercing requires a generic functionality that takes into consideration time granularity specific to datasets, using time format information inherited from the datasource or through time pattern identification | #37 |
SDMX-ML writeSDMX support |
Supporting a SDMX-ML document writer in R, to faciliate SDMX data exchange for R users. |
The package installation requires at least R 2.15 and installing the devtools package
install.packages("devtools")
Once the devtools package loaded, you can use the install_github as follows:
require("devtools")
install_github("rsdmx", "opensdmx")
The readSDMX
function is then first designed at low-level so it can take as parameters a url (isURL=TRUE
by default) or a file (from XML or RData file). So wherever is located the SDMX document, readSDMX
will allow you to read it, as follows:
#read a remote file
sdmx <- readSDMX(file = "someUrl")
#read a local file
sdmx <- readSDMX(file = "somelocalfile", isURL = FALSE)
#read a SDMX object from RData
sdmx <- readSDMX(file = "tmp.RData", isRData = TRUE)
In addition, in order to facilitate querying datasources, readSDMX
also providers helpers to query well-known remote datasources. This allows not to specify the entire URL, but rather specify a simple provider ID, and the different parameters to build a SDMX query (e.g. for a dataset query: operation, key, filter, startPeriod and endPeriod).
This is made possible as a list of SDMX service providers is embedded within rsdmx
, and such list provides all the information required for readSDMX
to build the SDMX request (url) before accessing the datasource.
The list of known SDMX service providers can be queried as follows:
providers <- getSDMXServiceProviders()
#list all provider ids
sapply(providers, function(x) slot(x, "agencyId"))
It also also possible to create and add a new SDMX service providers in this list (so readSDMX
can be aware of it). A provider can be created with the SDMXServiceProvider
, and is made of three parameters: an agencyId
, its name
, and a request builder
.
The request builder can be created with SDMXRequestBuilder
which takes 3 arguments: the baseUrl
of the service endpoint, a suffix
logical parameter (either the agencyId
has to be used as suffix in the web-request), and a handler
function which will allow to build the web request.
rsdmx
intends to provider specific request builder that embedds yet an handler function (not need to implement it), and is now attempting to provide a SDMXRESTRequestBuilder
to build SDMX REST web-requests. All this is still under experiments.
Let's see it with an example:
First create a request builder for our provider: An SDMXRequestBuilder
is built by specifying the following parameters:
-
regUrl
andrepoUrl
respectively the URLs of the SDMX registry/repository. Although rsdmx offers the possibility to distinguish the different URLs (some data providers require it), both URLs will be generally the same. -
formatter
: The formatter is a list of functions (one function per type of resource to be handled), and allows to pre-format the values of the SDMX request parameters (handled through a singleSDMXRequestParams
object). This is particularly useful for customization. -
handler
: The handler is a list of functions (one function per type of resource to be handled), and allows to construct the SDMX resource request URL that will be invoked by rsdmx. -
compliant
: a boolean property to indicate if the SDMX provider is compliant with SDMX web-service specifications
myBuilder <- SDMXRequestBuilder(
regUrl = "http://www.myorg.org/sdmx/registry",
repoUrl = "http://www.myorg.org/sdmx/repository",
formatter = list(
dataflow = function(obj){
obj@resourceId <- paste0("DF_",obj@resourceId)
return(obj)
},
datastructure = function(obj){
obj@resourceId <- paste0("DSD_",obj@resourceId)
return(obj)
},
data = function(obj){return(obj)}
),
handler = list(
dataflow = function(obj){
req = sprintf("%s/%s/%s",obj@regUrl,obj@resource,obj@resourceId)
return(req)
},
datastructure = function(obj){
req = sprintf("%s/%s/%s",obj@regUrl,obj@resource,obj@resourceId)
return(req)
},
data = function(obj){
req = sprintf("%s/%s/%s",obj@regUrl,obj@resource,obj@flowRef)
return(req)
}
),
compliant = FALSE
)
We can create a provider with the above request builder, and add it to the list of known SDMX service providers:
#create the provider
provider <- SDMXServiceProvider(
agencyId = "MYORG",
name = "My Organization",
builder = myBuilder
)
#add it to the list
addSDMXServiceProvider(provider)
#check provider has been added
sapply(getSDMXServiceProviders(), function(x){slot(x, "agencyId")})
A another helper allows you to interrogate rsdmx
if a specific provider is known, given an id:
oecd <- findSDMXServiceProvider("OECD")
Now you know how to add a SDMX provider, you can consider using readSDMX
without having to specifying a entire URL, but just by specifying the providerId
(agency Id of the provider), and the different query parameters to reach your SDMX document:
sdmx <- readSDMX(agencyId = "MYORG", operation = "data", key="MYSERIE",
key="ALL", key.mode="SDMX", start = 2000, end = 2015)
It is possible to save SDMX R objects as RData file (.RData, .rda, .rds), to then be able to reload them into the R session. It could be of added value for users that want to keep their SDMX objects in R data files, but also for fast loading of large SDMX objects (e.g. DSD objects) for use in statistical analyses and R-based web-applications.
To save a SDMX R object to RData file:
saveSDMX(sdmx, "tmp.RData")
To reload a SDMX R object from RData file:
readSDMX("tmp.RData", isRData = TRUE)
The following sections will show you how to query SDMX documents, by using readSDMX
in different ways: either for local or remote files, using readSDMX
as low-level or with the helpers.
The following code shows you how to read a dataset from the FAO data portal: http://data.fao.org/sdmx/repository/data/CROP_PRODUCTION/.156.5312../FAO?startPeriod=2008&endPeriod=2008
myUrl <- "http://data.fao.org/sdmx/repository/data/CROP_PRODUCTION/.156.5312../FAO?startPeriod=2008&endPeriod=2008"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)
Try it out with other datasources!
- OECD StatExtracts portal: http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011
- EUROSTAT portal: http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011
- European Central Bank (ECB): https://sdw-wsrest.ecb.europa.eu/service/data/DD/M.SE.BSI_STF.RO.4F_N
- UN International Labour Organization (ILO): http://www.ilo.org/ilostat/sdmx/ws/rest/data/ILO,DF_CPI_FRA_CPI_TCPI_COI_RT/ALL?startPeriod=2000-01-01&endPeriod=2014-12-31
Now, the service providers above mentioned are known by rsdmx
which let users using readSDMX
with the helper parameters. Let's see how it would look like for querying an OECD datasource:
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
key = list("TOT", NULL, NULL), start = 2010, end = 2011)
df <- as.data.frame(sdmx)
head(df)
It is also possible to query a dataset together with its "definition", handled
in a separate SDMX-ML document named DataStructureDefinition
(DSD). It is
particularly useful when you want to enrich your dataset with all labels. For this,
you need the DSD which contains all reference data.
To do so, you only need to append dsd = TRUE
(default value is FALSE
),
to the previous request, and specify labels = TRUE
when calling as.data.frame
,
as follows:
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
key = list("TOT", NULL, NULL), start = 2010, end = 2011,
dsd = TRUE)
df <- as.data.frame(sdmx, labels = TRUE)
head(df)
Note that in case you are reading SDMX-ML documents with the native approach (with
URLs), instead of the embedded providers, it is also possible to associate a DSD
to a dataset by using the function setDSD
. Let's try how it works:
#data without DSD
sdmx.data <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
key = list("TOT", NULL, NULL), start = 2010, end = 2011)
#DSD
sdmx.dsd <- readSDMX(providerId = "OECD", resource = "datastructure", resourceId = "MIG")
#associate data and dsd
sdmx.data <- setDSD(sdmx.data, sdmx.dsd)
This example shows you how to use rsdmx
with local SDMX files, previously downloaded from EUROSTAT.
#bulk download from Eurostat
tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder
download.file("http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf)
sdmx_files <- unzip(tf, exdir = tdir)
sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)
stats <- as.data.frame(sdmx)
head(stats)
A similar use case is the download of SDMX
data and metadata from the UN data portal. For information, in such case, the XML
files are wrapped in a SOAP
request response, however rsdmx
provides a convenience mechanism to detect and read the embedded SDMX-ML
message.
- Read concept schemes from FAO data portal
csUrl <- "http://data.fao.org/sdmx/registry/conceptscheme/FAO/ALL/LATEST/?detail=full&references=none&version=2.1"
csobj <- readSDMX(csUrl)
csdf <- as.data.frame(csobj)
head(csdf)
- Read codelists from FAO data portal
clUrl <- "http://data.fao.org/sdmx/registry/codelist/FAO/CL_FAO_MAJOR_AREA/0.1"
clobj <- readSDMX(clUrl)
cldf <- as.data.frame(clobj)
head(cldf)
#####Data Structures (Key Families)
- Read the complete list of data structures (or key families) from the OECD StatExtracts portal
dsUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/ALL"
ds <- readSDMX(dsUrl)
dsdf <- as.data.frame(ds)
head(dsdf)
- Read a complete DSD from OECD StatExtracts portal
dsdUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/TABLE1"
dsd <- readSDMX(dsdUrl)
#get codelists from DSD
cls <- slot(dsd, "codelists")
codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id")) #get list of codelists
codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS") #get a codelist
#get concepts from DSD
concepts <- as.data.frame(slot(dsd, "concepts"))
Description | Link |
---|---|
A good introduction to the SDMX standard and the use of rsdmx to facilitate SDMX data extraction in R. | Link |
A nice example on how to use rsdmx to extract multiple SDMX unemployment timeseries, merge them and then display statistics in graphs & maps | Link |
The package embedds R documentation accessible from the R console (e.g. doing ?readSDMX
), or as PDF documentation available, once installed, in the package directory.
A google group / mailing list is available for users here: https://groups.google.com/forum/#!forum/rsdmx
You can subscribe directly in the google group, or by email: [email protected] To send a post, use: [email protected] To unsubscribe, send an email to: [email protected]
Here some guidelines how to contribute to the package:
- the first step is to write a post on the dev mailing list to discuss the enhancement and the how-to
- create an issue, and describe the bug/enhancement/new feature, in order to discuss/exchange around the new requirement
- create a branch on your fork with reference to the issue. A branch for an improvement can be named like that:
branch-issueNb-shortdescription
e.g. ``master-18-readsdmx-httpheader`. - commit/push to your branch after having a successfull
R CMD check
, and reference each commit in this way:branch #issuenb message
e.g.master #issuenb my commit
. By adding the issue number, it will be added to the github issue previously created. Indicating the branch is very useful, especially when we want to handle a fix in a previous version (backport) - once you commited all the work, with a successfull package building made with
R CMD check
, you can do a pull request
- Each new feature should be accompanied with unit tests, by using the
testthat
R package. - For each R-script file named
script.R
, a correspond test file should be created intests/testthat
directory, using the writing conventiontest_<script>.R
- The
test_<script>.R
should have the following structure:
require(rsdmx, quietly = TRUE) #load the rsdmx package
require(testthat) # load the testthat package
context("script") # create a unit test context for the given script file
#unit test 1
test_that("Test1",{
...
})
#unit test 2
test_that("Test2",{
...
})
- After any modification of the source code (bug fix, enhancement, added feature), a package build should be tested by the developer using the command
R CMD check
(requires installation of an R instance and RTools). The option--as-cran
should be enabled to ensure the updated package will be later accepted by CRAN. Such program will run a set of check operations required for a proper package build, including the unit tests. - In order to guarantee a proper R package build, the R CMD check will be performed automatically after each commit, through Travis Continuous Integration (see https://travis-ci.org/opensdmx/rsdmx). This second build test is required to ensure users will be able to successfully install the package from Github.
A google group / mailing list is available for discussing developments here: https://groups.google.com/forum/#!forum/rsdmx-dev
You can subscribe directly in the google group, or by email: [email protected] To send a post, use: [email protected] To unsubscribe, send an email to: [email protected]
Issues can be reported at https://github.com/opensdmx/rsdmx/issues