\frontmatter
\cleardoublepage
\mainmatter
label:sec:introduction
label:sec:thisBook
A data graphic is not only a static image but also tells a story about the data. It activates cognitive processes that are able to detect patterns and discover information not readily available with the raw data. This is particularly true for time series, spatial, and space-time datasets.
There are several excellent books about data graphics and visual perception theory, with guidelines and advice for displaying information, including visual examples. Let’s mention The Elements of Graphical Data cite:Cleveland1994 and Visualizing Data cite:Cleveland1993 by W. S. Cleveland, Envisioning Information cite:Tufte1990 and The Visual Display of Quantitative Information cite:Tufte2001 by E. Tufte, The Functional Art by A. Cairo cite:Cairo2012, and Visual Thinking for Design by C. Ware cite:Ware2008. Ordinarily, they do not include the code or software tools to produce those graphics.
On the other hand, there is a collection of books that provides code and detailed information about the graphical tools available with R
. Commonly they do not use real data in the examples and do not provide advice for improving graphics according to visualization theory. Three books are the unquestioned representatives of this group: R Graphics by P. Murrell cite:Murrell2011, Lattice: Multivariate Data Visualization with R by D. Sarkar cite:Sarkar2010, and ggplot2: Elegant Graphics for Data Analysis by H. Wickham cite:Wickham2016.
This book proposes methods to display time series, spatial, and space-time data using \textsf{R}, and aims to be a synthesis of both groups providing code and detailed information to produce high-quality graphics with practical examples.
label:sec:thisBookIsNot
- This is not a book to learn =R=.
Readers should have a fair knowledge of programming with
R
to understand the book. In addition, previous experience with thezoo
,sp
,raster
,lattice
,ggplot2
, andgrid
packages is helpful.If you need to improve your \textsf{R} skills, consider these information sources:
- Introduction to =R=[fn:3].
- Official manuals[fn:4].
- Contributed documents[fn:5].
- Mailing lists[fn:6].
- R-bloggers[fn:7].
- Books related to =R=[fn:8] and particularly Software for Data Analysis by John M. Chambers cite:Chambers2008.
- This book does not provide an exhaustive collection of visualization methods.
Instead, it illustrates what I found to be the most useful and effective methods. Notwithstanding, each part includes a section titled “Further Reading” with bibliographic proposals for additional information.
- This book does not include a complete review or discussion of =R= packages.
Their most useful functions, classes, and methods regarding data and graphics are outlined in the introductory chapter of each part, and conveniently illustrated with the help of examples. However, if you need detailed information about a certain aspect of a package, you should read the correspondent package manual or vignette. Moreover, if you want to know additional alternatives, you can navigate through the CRAN Task Views about Time Series[fn:9], Spatial Data[fn:10], Spatiotemporal Data[fn:11], and Graphics[fn:12].
- Finally, this book is not a handbook of data analysis, geostatistics, point pattern analysis, or time series theory.
Instead, this book is focused on the exploration of data with visual methods, so it may be framed in the Exploratory Data Analysis approach. Therefore, this book may be a useful complement for superb bibliographic references where you will find plenty of information about those subjects. For example, cite:Chatfield2016, cite:Cressie.Wikle2015, cite:Slocum.McMaster.ea2005 and cite:Bivand.Pebesma.ea2013.
label:sec:how-read
This book is organized into three parts, each devoted to different types of data. Each part comprises several chapters according to the various visualization methods or data characteristics. The chapters are structured as independent units so readers can jump directly to a certain chapter according to their needs. Of course, there are several dependencies and redundancies between the sets of chapters that have been conveniently signaled with cross-references.
The content of each chapter illustrates how to display a dataset starting with an easy and direct approach. Often this first result is not entirely satisfactory so additional improvements are progressively added. Each step involves additional complexity which, in some cases, can be overwhelming during a first reading. Thus, some sections, marked with the sign \floweroneleft, can be safely skipped for later reading.
Although I have done my best to help readers understand the methods and code, you should not expect to understand it after one reading. The key is practical experience, and the best way is to try out the code with the provided data and modify it to suit your needs with your own data. There is a website and a code repository to help you in this task.
label:sec:github
The book website with the main graphics of this book is located at
The full code is freely available from the repository:
On the other hand, the datasets used in the examples are either available at the repository or can be freely obtained from other websites. It must be underlined that the combination of code and data freely available allows this book to be fully reproducible.
I have chosen the datasets according to two main criteria:
- They are freely available without restrictions for public use.
- They cover different scientific and professional fields (meteorology and climate research, economy and social sciences, energy and engineering, environmental research, epidemiology, etc.).
The repository and the website can be downloaded as a compressed file[fn:13], or if you use git
, you can clone the repository with:
git clone https://github.com/oscarperpinan/bookvis.git
label:sec:r-graphics
There are two distinct graphics systems built into R
, referred to as traditional and grid graphics. Grid graphics are produced with the grid
package cite:Murrell2011, a flexible low-level graphics toolbox. Compared with the traditional graphics model, it provides more flexibility to modify or add content to an existent graphical output, better support for combining different outputs easily, and more possibilities for interaction. All the graphics in this book have been produced with the grid graphics model.
Other packages are constructed over it to provide high-level functions, most notably the lattice
and ggplot2
packages.
label:sec:lattice
The lattice
package cite:Sarkar2010 is an independent implementation of Trellis graphics, which were mostly influenced by The Elements of Graphing Data cite:Cleveland1994. Trellis graphics often consist of a rectangular array of panels. The lattice
package uses a formula interface to define the structure of the array of panels with the specification of the variables involved in the plot. The result of a lattice
high-level function is a trellis
object.
For bivariate graphics, the formula is generally of the form y ~ x
representing a single panel plot with y
versus x
. This formula can also involve expressions. The main function for bivariate graphics is xyplot
.
Optionally, the formula may be y ~ x | g1 * g2
and y
is represented against x
conditional on the variables g1
and g2
. Each unique combination of the levels of these conditioning variables determines a subset of the variables x
and y
. Each subset provides the data for a single panel in the Trellis display, an array of panels laid out in columns, rows, and pages.
For example, in the following code, the variable wt
of the dataset mtcars
is represented against the mpg
, with a panel for each level of the categorical variable am
. The points are grouped by the values of the cyl
variable.
xyplot(wt ~ mpg | am, data = mtcars, groups = cyl)
For trivariate graphics, the formula is of the form z ~ x * y
, where z
is a numeric response, and x
and y
are numeric values evaluated on a rectangular grid. Once again, the formula may include conditioning variables, for example z ~ x * y | g1 * g2
. The main function for these graphics is levelplot
.
The plotting of each panel is performed by the panel function, specified in a high-level function call as the panel
argument. Each high-level lattice
function has a default panel function, although the user can create new Trellis displays with custom panel functions.
lattice
is a member of the recommended packages list so it is commonly distributed with \textsf{R} itself. There are more than 250 packages depending on it, and the most important packages for our purposes (zoo
, sp
, and raster
) define methods to display their classes using lattice
.
On the other hand, the latticeExtra
package cite:Sarkar.Andrews2016 provides additional flexibility for the somewhat rigid structure of the Trellis framework implemented in lattice
. This package complements the lattice
with the implementation of layers via the layer
function, and
superposition of trellis
objects and layers with the +.trellis
function. Using both packages, you can define a graphic with the formula interface (under the lattice
model) and overlay additional content as layers (following the ggplot2
model).
label:sec:ggplot2
The ggplot2
package cite:Wickham2016 is an implementation of the system proposed in The Grammar of Graphics cite:Wilkinson2005, a general scheme for data visualization that breaks up graphs into semantic components such as scales and layers. Under this framework, the definition of the graphic with ggplot2
is done with a combination of several functions that provides the components, instead of the formula interface of
lattice
.
With ggplot2
, a graphic is composed of:
- A dataset,
data
, and a set of mappings from variables to aesthetics,aes
. - One or more layers, each composed of: a geometric object,
geom_*
, to control the type of plot you create (points, lines, etc.); a statistical transformation,stat_*
; and a position adjustment (and optionally, additional dataset and aesthetic mappings). - A scale,
scale_*
, to control the mapping from data to aesthetic attributes. Scales are common across layers to ensure a consistent mapping from data to aesthetics. - A coordinate system,
coords_*
. - Optionally, a faceting specification,
facet_*
, the equivalent of Trellis graphics with panels.
The function ggplot
is typically used to construct a plot incrementally, using the +
operator to add layers to the existing ggplot object. For instance, the following code (equivalent to the previous lattice
example) uses mtcars
as the dataset, and maps the mpg
variable on the x-axis and the wt
variable on the y-axis. The geometric object is the point using the cyl
variable to control the color. Finally, the levels of the am
variable define the panels of the graphic.
ggplot(mtcars, aes(mpg, wt)) +
geom_point(aes(colour=factor(cyl))) +
facet_grid(. ~ am)
This package is very popular, with a large list packages depending on it. In the context of this book, time series can be displayed with it because the zoo
package defines the autoplot
function based on ggplot2
. Regarding spatial data, recent versions of this package provide a geom
function designed for spatial data. Detailed information is provided in Section ref:sec:sf.
label:sec:comparison
Which package to choose is, for a wide range of datasets, a question of personal preferences. You may be interested in a comparison between them published in a series of blog posts[fn:1]. Consequently, where possible most of the code contains alternatives defined both with lattice
and with ggplot2
.
It is important to note that both latticeExtra
and ggplot2
defined a function named layer
. The ggplot2::layer
function is rarely called by the user, because the wrapper functions geom_*
and stats_
are preferred. On the other hand, the latticeExtra::layer
function is designed to be directly called by the user, and therefore its masking must be prevented. Consequently, when the latticeExtra
and ggplot2
packages are to be working together in the same session, the latticeExtra
package must be loaded after ggplot2
.
Both lattice
and ggplot2
(and every package based on grid
) generate static graphics. However, interactive web graphics produced with \textsf{R} have experienced a boost in recent years, mainly thanks to the package htmlwidgets
cite:Vaidyanathan2017. This package provides a framework for creating \textsf{R} bindings to JavaScript libraries. This package is the base for important visualization packages such as dygraphs
, highcharter
, plotly
, leaflet
and mapview
. They will be covered along the chapters of the book.
On the other hand, the package gridSVG
cite:Murrell.Potter2017 converts any grid scene to a Scalable Vector Graphics (\textsf{SVG}) document. The grid.hyperlink
function allows a hyperlink to be associated with any component of the scene, the grid.animate
function can be used to animate any component of a scene, and the grid.garnish
function can be used to add \textsf{SVG} attributes to the components of a scene. By setting event handler attributes on a component, plus possibly using the grid.script
function to add \textsf{JavaScript} to the scene, it is possible to make the component respond to user input such as mouse clicks.
\nomenclature{SVG}{Scalable Vector Graphics.}
label:sec:introduction-packages
Throughout the book, several \textsf{R} packages are used. All of them are available from \textsf{CRAN}, and you must install them before using the code. Most of them are loaded at the start of the code of each chapter, although some of them are loaded later if they are used only inside optional sections (marked with \floweroneleft). You should install the last version available at \textsf{CRAN} to ensure correct functioning of the code.
\nomenclature{CRAN}{Comprehensive R Archive Network.}
Although the introductory chapter of each part includes a section with an outline of the most relevant packages, some of them deserve to be highlighted here:
zoo
cite:Zeileis.Grothendieck2005 provides infrastructure for time series using arbitrary classes for the time stamps (Section ref:sec:zoo).sp
cite:Pebesma2012 andsf
cite:Pebesma2018 provide a coherent set of classes and methods for the major spatial data types: points, lines, polygons, and grids (Sections ref:sec:sp and ref:sec:sf).spacetime
cite:Pebesma2012 defines classes and methods for spatiotemporal data, and methods for plotting data as map sequences or multiple time series (Section ref:sec:spacetime).raster
cite:Hijmans2017 is a major extension of gridded spatial data classes. It provides a unified access method to different raster formats, permitting large objects to be analyzed with the definition of basic and high-level processing functions (Sections ref:sec:raster and ref:sec:rasterST).rasterVis
cite:Perpinan.Hijmans2017 provides enhanced visualization of raster data with methods for spatiotemporal rasters (Sections ref:sec:rasterVis and ref:sec:rastervisST).
label:sec:software-book
This book has been written using different computers running Debian GNU Linux and using several gems of open-source software:
- \textsf{org-mode} cite:Schulte.Davison.ea2012, \LaTeX{}, and AUC\TeX{}, for authoring text and code.
- \textsf{R} cite:R2017 with \textsf{Emacs Speaks Statistics} cite:Rossini.Heiberger.ea2004.
- \textsf{GNU Emacs} as development environment.
label:sec:aboutMe During the past 18 years, my main area of expertise has been photovoltaic solar energy systems, with a special interest in solar radiation. Initially I worked as an engineer for a private company, and I was involved in several commercial and research projects. The project teams were partly integrated by people with low technical skills who relied on the input from engineers to complete their work. I learned how a good visualization output eased the communication process.
Now I work as a professor and researcher at the university. Data visualization is one of the most important tools I have available. It helps me embrace and share the steps, methods, and results of my research. With students, it is an inestimable partner in helping them understand complex concepts.
I have been using \textsf{R} to simulate the performance of photovoltaic energy systems and to analyze solar radiation data, both as time series and spatial data. As a result, I have developed packages that include several graphical methods to deal with multivariate time series (namely, solaR
cite:Perpinan2012b, meteoForecast
cite:Perpinan.Almeida2015, and PVF
cite:Pinho-Almeida2015) and space-time data (rasterVis
cite:Perpinan.Hijmans2017).
label:sec:acknow
Writing a book is often described as a solitary activity. It is certainly difficult to write when you are with friends or spending time with your family,… although with three little children at home I have learned to write prose and code while my baby wants to learn typing and my daughters need help to share a family of dinosaurs.
Seriously speaking, solitude is the best partner of a writer. But when I am writing or coding I feel I am immersed in a huge collaborative network of past and present contributors. Piotr Kropotkin described it with the following words cite:Kropotkin1906:
Thousands of writers, of poets, of scholars, have laboured to increase knowledge, to dissipate error, and to create that atmosphere of scientific thought, without which the marvels of our century could never have appeared. And these thousands of philosophers, of poets, of scholars, of inventors, have themselves been supported by the labour of past centuries. They have been upheld and nourished through life, both physically and mentally, by legions of workers and craftsmen of all sorts.
And Lewis Mumford claimed cite:Mumford1934:
Socialize Creation! What we need is the realization that the creative life, in all its manifestations, is necessarily a social product.
I want to express my deepest gratitude and respect to all those women and men who have contributed and contribute to strengthening the communities of free software, open data, and open science. My special thanks go to the people of the \textsf{R} community: users, members of the \textsf{R} Core Development Team, and package developers.
With regard to this book in particular, I would like to thank John Kimmel for his constant support, guidance, and patience.
Last, and most importantly, thanks to Candela, Marina, and Javi, my crazy little shorties, my permanent source of happiness, imagination, and love. Thanks to María, mi amor, mi cómplice y todo.
label:part:Time
label:cha:timeIntro
A time series is a sequence of observations registered at consecutive time instants. When these time instants are evenly spaced, the distance between them is called the sampling interval. The visualization of time series is intended to reveal changes of one or more quantitative variables through time, and to display the relationships between the variables and their evolution through time.
The standard time series graph displays the time along the horizontal axis. Several variants of this approach can be found in Chapter ref:cha:timeHorizontalAxis. On the other hand, time can be conceived as a grouping or conditioning variable (Chapter ref:cha:timeGroupFactor). This solution allows several variables to be displayed together with a scatterplot, using different panels for subsets of the data (time as a conditioning variable) or using different attributes for groups of the data (time as a grouping variable). Moreover, time can be used as a complementary variable that adds information to a graph where several variables are confronted (Chapter ref:cha:timeComplementary).
These chapters provide a variety of examples to illustrate a set of useful techniques. These examples make use of several datasets (available at the book website) described in Chapter ref:cha:dataTime.
label:sec:time-series-packages
The CRAN Tasks View “Time Series Analysis” [fn:14] summarizes the packages for reading, vizualizing, and analyzing time series. This section provides a brief introduction to the zoo
and xts
packages. Most of the information has been extracted from their vignettes, webpages, and help pages. You should read them for detailed information.
Both packages extensively use the time classes defined in R
. The interested reader will find an overview of the different time classes in R
in cite:Ripley.Hornik2001 and cite:Grothendieck.Petzoldt2004.
label:sec:zoo
The zoo
package cite:Zeileis.Grothendieck2005 provides an S3
class with methods for indexed totally ordered observations. Its key design goals are independence of a particular index class and consistency with base R
and the ts
class for regular time series.
Objects of class zoo
are created by the function zoo
from a numeric vector, matrix, or a factor that is totally ordered by some index vector. This index is usually a measure of time but every other numeric, character, or even more abstract vector that provides a total ordering of the observations is also suitable. It must be noted that this package defines two new index classes, yearmon
and yearqtr
, for representing monthly and quarterly data, respectively.
The package defines several methods associated with standard generic functions such as print
, summary
, str
, head
, tail
, and [
(subsetting). In addition, standard mathematical operations can be performed with zoo
objects, although only for the intersection of the indexes of the objects.
On the other hand, the data stored in zoo
objects can be extracted with coredata
, which drops the index information, and can be replaced by coredata<-
. The index can be extracted with index
or time
, and can be modified by index<-
. Finally, the window
and window<-
methods extract or replace time windows of zoo
objects.
Two zoo
objects can be merged by common indexes with merge
and cbind
. The merge
method combines the columns of several objects along the union or the intersection of the indexes. The rbind
method combines the indexes (rows) of the objects.
The aggregate
method splits a zoo
object into subsets along a coarser index grid, computes a function (sum
is the default) for each subset, and returns the aggregated zoo
object.
This package provides four methods for dealing with missing observations:
na.omit
removes incomplete observations.na.contiguous
extracts the longest consecutive stretch of non-missing values.na.approx
replaces missing values by linear interpolation.na.locf
replaces missing observations by the most recent non-NA prior to it.
The package defines interfaces to read.table
and write.table
for reading, read.zoo
, and writing, write.zoo
, zoo
series from or to text files. The read.zoo
function expects either a text file or connection as input or a data.frame
. write.zoo
first coerces its argument to a data.frame
, adds a column with the index, and then calls write.table
.
label:sec:xts
The xts
package cite:Ryan.Ulrich2013 extends the zoo
class definition to provide a general time-series object. The index of an xts
object must be of a time or date class: Date
, POSIXct
, chron
, yearmon
, yearqtr
, or timeDate
. With this restriction, the subset operator [
is able to extract data using the ISO:8601 [fn:15] time format notation CCYY-MM-DD HH:MM:SS
. It is also possible to extract a range of times with a from/to
notation, where both from and to are optional. If either side is missing, it is interpreted as a request to retrieve data from the beginning, or through the end of the data object.
Furthermore, this package provides several time-based tools:
endpoints
identifies the endpoints with respect to time.to.period
changes the periodicity to a coarser time index.- The functions
period.*
andapply.*
evaluate a function over a set of non-overlapping time periods.
label:cha:further-reading-time
- cite:Wills2011 provides a systematic analysis of the visualization of time series, and a section of cite:Heer.Bostock.ea2010 summarizes the main techniques to display time series.
- cite:Cleveland1994 includes a section about time series visualization with a detailed discussion of the banking to
$\SI{45}{\degree}$ technique and the cut-and-stack method. cite:Heer.Agrawala2006 propose the multi-scale banking, a technique to identify trends at various frequency scales. - cite:Few2008,Heer.Kong.ea2009 explain in detail the foundations of the horizon graph (Section ref:cha:timeHorizontalAxis).
- The small multiples concept (Sections ref:SEC:sameScale and ref:SEC:groupVariable) is illustrated in cite:Tufte2001,Tufte1990.
- Stacked graphs are analyzed in cite:Byron.Wattenberg2008, and the ThemeRiver technique is explained in cite:Havre.Hetzler.ea2002.
- cite:Cleveland1994,Friendly.Denis2005 study the scatterplot matrices (Section ref:SEC:groupVariable), and cite:Carr.Littlefield.ea1987 provide information about hexagonal binning.
- cite:Harrower.Fabrikant2008 discuss the use of animation for the visualization of data. cite:Few2007 exposes a software tool resembling the Trendalyzer.
- The
D3
gallery [fn:16] shows several great examples of time-series visualizations using the JavaScript libraryD3.js
.
label:cha:timeHorizontalAxis
label:cha:timeGroupFactor
label:cha:timeComplementary
label:cha:dataTime
label:part:Spatial
label:cha:spatialIntro
Spatial data (also known as geospatial data) are directly or indirectly referenced to a location on the surface of the Earth. Their spatial reference is composed of coordinate values and a system of reference for these coordinates. Spatial data are often accessed, manipulated, or analyzed through Geographic Information Systems (GIS).
\nomenclature{GIS}{Geographic Information Systems.}
Real objects represented by GIS data can be divided into two
abstractions: discrete objects (e.g., a road or a river) represented
with vector data (points, lines, and polygons), and continuous fields
(such as elevation or solar radiation) represented with raster
data. The sp
and sf
packages are the preferred option to use
vector data in R
, and the raster
package is the choice for raster
data [fn:18].
This part exposes several examples where vector and raster data are displayed. These examples make use of several datasets (available at the book website) described in Chapter ref:cha:dataSpatial.
On the one hand, the Chapters ref:cha:bubble, ref:cha:choropleth, and ref:cha:raster focus on thematic maps, that display a specific variable commonly using geographic data such as coastlines, boundaries, and places as points of reference for the variable being mapped. These maps provide specific information about particular locations or areas (proportional symbol mapping and choropleth maps) and information about spatial patterns (isarithmic and raster maps).
On the other hand, the Chapter ref:cha:refer-phys-maps focuses on reference maps, to show geographic location of features, and on physical maps, to show the landscape and features of a place.
label:sec:spatial-packages
The CRAN Tasks View “Analysis of Spatial Data” [fn:19] summarizes the packages for reading, vizualizing, and analyzing spatial data. This section provides a brief introduction to sp
, sf
, raster
, rasterVis
, maptools
, rgdal
, gstat
, and maps
. Most of the information has been extracted from their vignettes, webpages, and help pages. You should read them for detailed information.
label:sec:sp
The sp
package cite:Pebesma.Bivand2005 provides classes and methods for dealing with spatial data in R
. The spatial data classes implemented are points (SpatialPoints
), grids (SpatialPixels
and SpatialGrid
), lines (Line
, Lines
and SpatialLines
), rings, and polygons (Polygon
, Polygons
, and SpatialPolygons
), each of them without data or with data (for example, SpatialPointsDataFrame
or SpatialLinesDataFrame
)[fn:37].
\nomenclature{SpatialPointsDataFrame}{Class for spatial attributes that have spatial point locations.} \nomenclature{SpatialLinesDataFrame}{Class for spatial attributes consisting of sets of lines, where each set of lines relates to an attribute row in a data.frame.} \nomenclature{SpatialPixelsDataFrame}{Class for spatial attributes that have spatial locations on a regular grid.} \nomenclature{SpatialPolygonsDataFrame}{Class to hold polygons with attributes.}
Selecting, retrieving, or replacing certain attributes in spatial objects with data is done using standard methods:
[
selects rows (items) and columns in thedata.frame
.[[
selects a column from thedata.frame
[[<-
assigns or replaces values to a column in thedata.frame
.
A number of spatial methods are available for the classes in sp
:
coordinates(object) <- value
sets spatial coordinates to create spatial data. It promotes adata.frame
into aSpatialPointsDataFrame
. value may be specified by a formula, a character vector, or a numeric matrix ordata.frame
with the actual coordinates.coordinates(object, ...)
returns a matrix with the spatial coordinates. If used withSpatialPolygons
it returns a matrix with the centroids of the polygons.bbox
returns a matrix with the coordinates bounding box.proj4string(object)
andproj4string(object) <- value
retrieve or set projection attributes on spatial classes.spTransform
transforms from one coordinate reference system (geographic projection) to another (requires packagergdal
).spplot
plots attributes combined with spatial data: Points, lines, grids, polygons.
label:sec:sf
The sf
package cite:Pebesma2018, the long term successor of sp
, implements simple features in R
. Simple features is an open (OGC and ISO) interface standard for access and manipulation of spatial vector data (points, lines, polygons).
This package represents simple features using simple data structures, commonly data.frame
objects. Feature geometries are stored in a data.frame column, using a list-column because geometries are not single-valued. The length of this list is equal to the number of records in the data.frame
, with the simple feature geometry of that feature in each element of the list.
sf
implements three classes to represent simple features:
sf
, adata.frame
with feature attributes and feature geometries. It containssfc
,the list-column with the geometries for each feature (record), which is composed ofsfg
, the feature geometry of an individual simple feature.
All functions and methods in sf
that operate on spatial data are prefixed by st_
(spatial and temporal). For the purposes of this book, the most important are:
st_read
,st_write
, for reading and writing spatial data, respectively.st_transform
for coordinate reference system transformations.st_as_sf.*
, a family of conversions functions betweensp
andsf
.
The sf
package implements plot
methods for displaying data using base
graphics. Besides, this package provides a number of methods for conversion to grob
objects in order to display these objects with packages working with the grid
system (lattice
and ggplot2
). Finally, the ggplot2
version[fn:35] to be released after 2.2.1 (on CRAN at the time of writing this book) contains the geom_sf
geom, designed for sf
objects.
label:sec:raster
The raster
package cite:Hijmans2017 has functions for creating, reading, manipulating, and writing raster data. The package provides general raster data manipulation functions. The package also implements raster algebra and most functions for raster data manipulation that are common in Geographic Information Systems (GIS).
The raster package can work with raster datasets stored on disk if they are too large to be loaded into memory. The package can work with large files because the objects it creates from these files only contain information about the structure of the data, such as the number of rows and columns, the spatial extent, and the filename, but it does not attempt to read all the cell values in memory. In computations with these objects, the data are processed in chunks.
The package defines a number of S4
classes. RasterLayer
, RasterBrick
, and RasterStack
are the most important:
- A
RasterLayer
object represents single-layer (variable) raster data. It can be created with the functionraster
. This function is able to create aRasterLayer
from another object, including anotherRaster*
object[fn:36], or from aSpatialPixels*
andSpatialGrid*
object, or even a matrix. In addition, it can create aRasterLayer
reading data from a file. Theraster
package can use raster files in several formats, some of them via thergdal
package. Supported formats for reading include GeoTIFF, ESRI, ENVI, and ERDAS.
\nomenclature{RasterLayer}{A class to represent single-layer (variable) raster data.}
RasterBrick
andRasterStack
are classes for multilayer data. ARasterStack
is a list ofRasterLayer
objects with the same spatial extent and resolution. ARasterStack
can be formed with a collection of files in different locations or even mixed withRasterLayer
objects that only exist in memory. ARasterBrick
is truly a multilayered object, and processing it can be more efficient than processing aRasterStack
representing the same data.
\nomenclature{RasterBrick}{A class to represent multilayer (variable) raster data.} \nomenclature{RasterStack}{A class to represent multilayer (variable) raster data.}
The raster
package defines a number of methods for raster algebra with Raster*
objects: arithmetic operators, logical operators, and functions such as abs
, round
, ceiling
, floor
, trunc
, sqrt
, log
, log10
, exp
, cos
, sin
, max
, min
, range
, prod
, sum
, any
, and all
. In these functions, Raster*
objects can be mixed with numbers.
There are several functions to modify the content or the spatial extent of Raster*
objects, or to combine Raster*
objects:
- The
crop
function takes a geographic subset of a largerRaster*
object.trim
crops aRasterLayer
by removing the outer rows and columns that only containNA
values.extend
adds new rows and/or columns withNA
values. - The
merge
function merges two or moreRaster*
objects into a single new object. projectRaster
transforms values of aRaster*
object to a new object with a different coordinate reference system.- With
overlay
, multipleRaster*
objects can be combined (for example, multiply them). mask
removes all values from one layer that areNA
in another layer, andcover
combines two layers by taking the values of the first layer except where these areNA
.calc
computes a function for aRaster*
object. WithRasterLayer
objects, anotherRasterLayer
is returned. With multilayer objects the result depends on the function: With a summary function (sum
,max
, etc.),calc
returns aRasterLayer
object, and aRasterBrick
object otherwise.stackApply
computes summary layers for subsets of aRasterStack
orRasterBrick
.cut
andreclassify
replace ranges of values with single values.zonal
computes zonal statistics, that is, summarizes aRaster*
object using zones (areas with the same integer number) defined by anotherRasterLayer
.
label:sec:rasterVis
The rasterVis
package cite:Perpinan.Hijmans2017 complements the raster
package, providing a set of methods for enhanced visualization and interaction. This package defines visualization methods (levelplot
) for quantitative data and categorical data, both for univariate and multivariate rasters.
It also includes several methods in the frame of the Exploratory Data Analysis approach: scatterplots with xyplot
, histograms and density plots with histogram
and densityplot
, violin and boxplots with bwplot
, and a matrix of scatterplots with splom
.
On the other hand, this package is able to display vector fields using arrows, vectorplot
, or with streamlines cite:Wegenkittl.Groeller1997, streamplot
. In this last method, for each point, droplet, of a jittered regular grid, a short streamline portion, streamlet, is calculated by integrating the underlying vector field at that point. The main color of each streamlet indicates local vector magnitude (slope). Streamlets are composed of points whose sizes, positions, and color degradation encode the local vector direction (aspect).
label:sec:rgdal
The rgdal
package cite:Bivand.Keitt.ea2017 provides bindings to the Geospatial Data Abstraction Library (GDAL) [fn:21]. With readOGR
and readGDAL
, both GDAL raster and OGR vector map data can be imported into R
, and GDAL raster data and OGR vector data can be exported with writeGDAL
and writeOGR
.
In addition, this package provides access to projection and transformation operations from the PROJ.4 library [fn:22]. This package implements several spTransform
methods providing transformation between datums and conversion between projections using PROJ.4 projection arguments.
label:sec:maptools
The maptools
package cite:Bivand.Lewin-Koh2017 provides a set of tools for manipulating geographic data. The package also provides interface wrappers for exchanging spatial objects with packages such as PBSmapping, spatstat, maps, RArcInfo, Stata tmap, WinBUGS, Mondrian, and others. The main functions in the context of this book are:
map2SpatialPolygons
andmap2SpatialLines
may be used to convert map objects returned by themap
function in themaps
package to the classes defined in thesp
package.spCbind
provides cbind-like methods forSpatial*DataFrame
anddata.frame
objects.
The topology operations on geometries performed by this package (for example, unionSpatialPolygons
) use the package rgeos
, an interface to the Geometry Engine Open Source (GEOS) [fn:20].
label:sec:gstat
The gstat
package cite:Pebesma2004 provides functions for geostatistical modeling, prediction, and simulation, including variogram modeling and simple, ordinary, universal, and external drift kriging.
Most of the functionality of this package is beyond the scope of this book. However, some functions must be mentioned:
variogram
calculates the sample variogram from data, or for the residuals if a linear model is given.vgm
generates a variogram andfit.variogram
fit ranges and/or sills from a variogram model to a sample variogram.krige
is the function for simple, ordinary or universal kriging.gstat
is the function for univariate or multivariate geostatistical prediction.
label:sec:maps
The maps
cite:Becker.Wilks.ea2017, mapdata
cite:Becker.Wilks.ea2017b, and mapproj
cite:McIlroy.Brownrigg.ea2017 packages are useful to draw or create geographical maps. mapdata
contains higher resolution databases, and mapproj
converts latitude/longitude coordinates into projected coordinates.
label:cha:further-reading-spatial
- cite:Slocum.McMaster.ea2005 and cite:Dent.Torguson.ea2008 are comprehensive books on thematic cartography and geovisualization. They include chapters devoted to data classification, scales, map projections, color theory, typography, and proportional symbol, choropleth, dasymetric, isarithmic, and multivariate mapping. Several resources are available at their accompanying websites [fn:23].
- cite:Bivand.Pebesma.ea2013 is the essential reference to work with spatial data in
R
. R. Bivand and E. Pebesma are the authors of the fundamentalsp
package, and they are the authors or maintainers of several important packages such asgstat
, for geostatistical modeling, prediction, and simulation,rgdal
,rgeos
andmaptools
. Chapter 3 is devoted to the visualization of spatial data. Code, figures, and data of the book are available at the accompanying website [fn:24]. - cite:Hengl2009 is an open-access book with seven spatial data analysis exercises. The author is the creator and maintainer of the Spatial-Analyst webpage [fn:25].
- The CRAN Tasks View “Analysis of Spatial Data” [fn:26] summarizes the packages for reading, vizualizing, and analyzing spatial data. The packages in development published at R-Forge are listed in the “Spatial Data & Statistics” topic view [fn:27]. The R-SIG-Geo mailing list [fn:28] is a powerful resource for obtaining help.
- The “Spatial.ly” [fn:29] and “Kartograph” [fn:30] webpages publish a variety of beautiful visualization examples.
label:cha:bubble
label:cha:choropleth
label:cha:raster
label:cha:vector
label:cha:refer-phys-maps
label:cha:dataSpatial
label:part:SpaceTime
label:cha:introductionST
Space-time datasets are indexed in both space and time. The data may consist of a spatial vector object (for example, points or polygons) or raster data at different times. The first case is representative of data from fixed sensors providing measurements abundant in time but sparse in space. The second case is the typical format of satellite imagery, which produces high spatial resolution data sparse in time cite:Pebesma2012.
There are several visualization approaches of space-time data trying to cope with the four dimensions of the data cite:Cressie.Wikle2015.
On the one hand, the data can be conceived as a collection of snapshots at different times. These snapshots can be displayed as a sequence of frames to produce an animation, or can be printed on one page with different panels for each snapshot using the small-multiple technique described repeatedly in previous chapters.
On the other hand, one of the two spatial dimensions can be collapsed through an appropriate statistic (for example, mean or standard deviation) to produce a space-time plot (also known as a Hovmöller diagram). The axes of this graphic are typically longitude or latitude as the x-axis, and time as the y-axis, with the value of the spatial-averaged value of the raster data represented with color.
Finally, the space-time object can be reduced to a multivariate time series (where each location is a variable or column of the time series) and displayed with the time series visualization techniques described in the Part ref:part:Time. This approach is directly applicable to space-time data sparse in space (for example, point measurements at different times). However, it is mandatory to use aggregation in the case of raster data. In this case, the multivariate time series is composed of the evolution of the raster data averaged along a certain direction.
The next chapters, focused on raster space-time data (Chapters ref:cha:rasterST and ref:cha:animationST) and point space-time data (Chapter ref:cha:pointsST), illustrate with examples how to produce animations, multipanel graphics, hovmöller diagrams, and time-series with R
.
label:sec:spacetime-packages
The CRAN Tasks View “Handling and Analyzing Spatiotemporal Data” [fn:31] summarizes the packages for reading, vizualizing, and analyzing space-time data. This section provides a brief introduction to the spacetime
, raster
, and rasterVis
packages. Most of the information has been extracted from their vignettes, webpages, and help pages. You should read them for detailed information.
label:sec:spacetime
The spacetime
package cite:Pebesma2012 is built upon the classes and methods for spatial data from the sp
package , and for time series data from the xts
package. It defines classes to represent four space-time layouts:
STF
,STFDF
: full space-time grid of observations for spatial features and observation time, with all space-time combinations.STS
,STSDF
: sparse grid layout, stores only the non-missing space-time combinations on a latticeSTI
,STIDF
: irregular layout, time and space points of measured values have no apparent organisation.STT
,STTDF
: simple trajectories.
Moreover, spacetime
provides several methods for the following classes:
stConstruct
,STFDF
, andSTIDF
create objects from single or multiple tables.as
coerces to other spatiotemporal objects, xts, Spatial, matrix, or data.frame.[[
selects or replaces data values.[
selects spatial or temporal subsets, and data variables.over
retrieves index or data values of one object at the locations and times of another.aggregate
aggregates data values over particular spatial, temporal, or spatiotemporal domains.stplot
creates spatiotemporal plots. It is able to produce multi-panel plots, space-time plots, animations, and time series plots.
label:sec:rasterST
The raster
package cite:Hijmans2017 is able to add time information associated with layers of a RasterStack
or RasterBrick
object with the setZ
function. This information can be extracted with getZ
.
If a Raster*
object includes this information, the zApply
function can be used to apply a function over a time series of layers of the object.
label:sec:rastervisST
rasterVis
cite:Perpinan.Hijmans2017 provides three methods to display spatiotemporal rasters:
hovmoller
produces Hovmöller diagrams cite:Hovmoeller1949a. The axes of this kind of diagram are typically longitude or latitude (x-axis) and time (ordinate or y-axis) with the value of some aggregated field represented through color. However, the user can define the direction withdirXY
and the summary function withFUN
.horizonplot
creates horizon graphs cite:Few2008, with many time series displayed in parallel by cutting the vertical range into segments and overplotting them with color representing the magnitude and direction of deviation. Each time series corresponds to a geographical zone defined withdirXY
and averaged withzonal
.xyplot
displays conventional time series plots. Each time series corresponds to a geographical zone defined withdirXY
and aggregated withzonal
.
On the other hand, the histogram
, densityplot
, and bwplot
methods accept a FUN
argument to be applied to the z
slot of Raster*
object (defined by setZ
). The result of this function is used as the grouping variable of the plot to create different panels.
rgl
is a package that produces real-time interactive 3D plots. It
allows to interactively rotate, zoom the graphics and select
regions. This package uses the OpenGL[fn:34] library as the rendering
backend providing an interface to graphics hardware. It contains
high-level graphics functions similar to base R graphics, but working
in three dimensions. Moreover, it provides low level functions
inspired by the grid
package.
label:cha:further-reading-spatiotime
- cite:Cressie.Wikle2015 is a systematic approach to key quantitative techniques on statistics for spatiotemporal data. The book begins with separate treatments of temporal data and spatial data, and later combines these concepts to discuss spatiotemporal statistical methods. There is a chapter devoted to exploratory methods, including visualization techniques.
- cite:Pebesma2012 presents the
spacetime
package, which implements a set of classes for spatiotemporal data. This paper includes examples that illustrate how to import, subset, coerce, and export spatiotemporal data, proposes several visualization methods, and discusses spatiotemporal geostatistical interpolation. - cite:Slocum.McMaster.ea2005 (previously cited in Chapter ref:cha:further-reading-spatial) includes a chapter about map animation, discussing several approaches for displaying spatiotemporal data.
- cite:Hengl2009 (previously cited in Chapter ref:cha:further-reading-spatial) includes a working example with spatiotemporal data to illustrate space-time variograms and interpolation.
- cite:Harrower.Fabrikant2008 explore the role of animation in geographic visualization and outline the challenges, both conceptual and technical, involved in the creation and use of animated maps.
- The CRAN Tasks View “Handling and Analyzing Spatiotemporal Data” [fn:32] summarizes the packages for reading, vizualizing, and analyzing space-time data. The R-SIG-Geo mailing list [fn:33] is a powerful resource for obtaining help.
label:cha:rasterST
label:cha:pointsST
label:cha:animationST
\backmatter
\printnomenclature
\clearpage
\printbibliography
\clearpage
\printindex
[fn:37] The asterisk is commonly used as a wildcard character to denote subsets of classes. Thus, SpatialLines*
comprises SpatialLines
and SpatialLinesDataFrame
classes. Moreover, Spatial*
represents all the classes defined by the sp
package.
[fn:36] The notation Raster*
represents all the classes of Raster
objects: RasterLayer
, RasterStack
, and RasterBrick
.
[fn:35] The development version can be installed with the remotes
package: remotes::install_github("tidyverse/ggplot2")
.
[fn:34] https://www.opengl.org/
[fn:31] http://cran.r-project.org/web/views/SpatioTemporal.html
[fn:32] http://cran.r-project.org/web/views/SpatioTemporal.html
[fn:33] https://stat.ethz.ch/mailman/listinfo/R-SIG-Geo/
[fn:18] Although sp
, sf
, and raster
are the most important packages, there are an increasing number of packages designed to work with spatial data. They are summarized in the corresponding CRAN Task View. Read Section ref:cha:further-reading-spatial for details.
[fn:19] http://CRAN.R-project.org/view=Spatial
[fn:20] http://trac.osgeo.org/geos/
[fn:21] http://www.gdal.org/
[fn:22] https://trac.osgeo.org/proj/
[fn:23] http://www.pearsonhighered.com/slocum3e/ and http://highered.mcgraw-hill.com/sites/0072943823/
[fn:24] http://www.asdar-book.org/
[fn:25] http://spatial-analyst.net
[fn:26] http://CRAN.R-project.org/view=Spatial
[fn:27] http://r-forge.r-project.org/softwaremap/trove_list.php?form_cat=353
[fn:28] https://stat.ethz.ch/mailman/listinfo/R-SIG-Geo/
[fn:29] http://spatial.ly/r/
[fn:30] http://kartograph.org/
[fn:14] http://CRAN.R-project.org/view=TimeSeries
[fn:15] http://en.wikipedia.org/wiki/ISO_8601
[fn:16] https://github.com/mbostock/d3/wiki/Gallery
[fn:13] https://github.com/oscarperpinan/bookvis/archive/master.zip
[fn:12] http://cran.r-project.org/web/views/Graphics.html
[fn:11] http://cran.r-project.org/web/views/SpatioTemporal.html
[fn:10] http://cran.r-project.org/web/views/Spatial.html
[fn:9] http://cran.r-project.org/web/views/TimeSeries.html
[fn:8] http://www.r-project.org/doc/bib/R-books.html
[fn:7] http://www.r-bloggers.com
[fn:6] http://www.r-project.org/mail.html
[fn:5] http://cran.r-project.org/other-docs.html
[fn:4] http://cran.r-project.org/manuals.html
[fn:3] http://cran.r-project.org/doc/manuals/R-intro.html
[fn:2] Take a look at the time comparison published as the final result of the previous series of blog posts, http://learnr.files.wordpress.com/2009/08/latbook.pdf