Introduction to spatial data in R

Andrea Sánchez-Tapia , Sara Mortara
7/26/2022

Examining an sf object

tmap package is one of the current strongest packages to build thematic maps in R. It has a syntax similar to ggplot and works with sf objects.

There are several sources of administrative boundaries data. We are going to load World, an internal dataset in contained in package tmap. It is an sf object.

The quickest map in tmap can be done like this:

library(sf)
library(tmap)
library(dplyr)
data(World)
# package tmap has a syntax similar to ggplot. The functions start all with tm_ 
tm_shape(World) + 
  tm_borders()

A simple examination with head(), str() or dplyr::glimpse() can tell us the class, dimensions, and other characteristics of object World.

#head(World)
names(World)
class(World)
dplyr::glimpse(World)

What happens when you execute plot(World)? How would you plot only one of the variables? What happens when you execute

plot(World[1])
plot(World[,1])
plot(World[1,])
plot(World["pop_est"])

The geometry “column” and geometries as objects

A key difference between data frames and sf objects is the presence of geometry, that looks like a column when printing the summary of any sf object

head(World[, 1:4])
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -73.41544 ymin: -55.25 xmax: 75.15803 ymax: 42.68825
Geodetic CRS:  WGS 84
  iso_a3                 name           sovereignt     continent
1    AFG          Afghanistan          Afghanistan          Asia
2    AGO               Angola               Angola        Africa
3    ALB              Albania              Albania        Europe
4    ARE United Arab Emirates United Arab Emirates          Asia
5    ARG            Argentina            Argentina South America
6    ARM              Armenia              Armenia          Asia
                        geometry
1 MULTIPOLYGON (((61.21082 35...
2 MULTIPOLYGON (((16.32653 -5...
3 MULTIPOLYGON (((20.59025 41...
4 MULTIPOLYGON (((51.57952 24...
5 MULTIPOLYGON (((-65.5 -55.2...
6 MULTIPOLYGON (((43.58275 41...

When calling World$geometry, however, we perceive it’s not a single column. Actually, geometries are a class in their own, sfc

class(World)
[1] "sf"         "data.frame"
class(World$geometry)
[1] "sfc_MULTIPOLYGON" "sfc"             

We don’t need to understand the deep structure of geometry right now, the sf package has extensive vignettes you can explore. For practical purposes, the existence of geometries and sf objects like data.frames facilitates a lot the manipulation of sf objects, including extracting and assigning geometries to existant data frames and the possibility to drop the geometries, (literally, function sf::st_drop_geometry() and use the data frame)

head(sf::st_coordinates(World))
            X        Y L1 L2 L3
[1,] 61.21082 35.65007  1  1  1
[2,] 62.23065 35.27066  1  1  1
[3,] 62.98466 35.40404  1  1  1
[4,] 63.19354 35.85717  1  1  1
[5,] 63.98290 36.00796  1  1  1
[6,] 64.54648 36.31207  1  1  1
no_geom <- sf::st_drop_geometry(World)
class(no_geom)
[1] "data.frame"
#bounding boxes
st_bbox(World)
      xmin       ymin       xmax       ymax 
-180.00000  -89.90000  179.99999   83.64513 

Manipulating sf objects

For practical purposes, sf objects work like data.frames or tibbles, you can subset, filter, merge, join, and create new variables.

For example, imagine I want to plot only countries in South America or have different colors for them

You can filter depending on the columns, for example, all the countries in South America:

names(World)
 [1] "iso_a3"       "name"         "sovereignt"   "continent"   
 [5] "area"         "pop_est"      "pop_est_dens" "economy"     
 [9] "income_grp"   "gdp_cap_est"  "life_exp"     "well_being"  
[13] "footprint"    "inequality"   "HPI"          "geometry"    
unique(World$continent)
[1] Asia                    Africa                 
[3] Europe                  South America          
[5] Antarctica              Seven seas (open ocean)
[7] Oceania                 North America          
8 Levels: Africa Antarctica Asia Europe North America ... South America
World %>% 
  filter(continent == "South America") %>% 
  tm_shape() + 
  tm_borders()

You can also create new variables and use them in your maps:

World %>%
  mutate(our_countries = if_else(iso_a3 %in% c("COL","BRA", "MEX"), "red", "grey")) %>%
  tm_shape() +
  tm_borders() +
  tm_fill(col = "our_countries") +
  tm_add_legend("fill",
                "Countries",
                col = "red")

A quick note about sp objects

sp objects are going to be replaced with sf objects and packages rgdal, rgeos will be archived by the end of 2023. However, it is extremely possible that the data and tutorials you find use sp objects rather than sf and that you may have to go transform them into sf objects (function sf::st_as_sf().

Data

Loading, ploting, and saving a shapefile from the disk

#install.packages("rnaturalearth")
#install.packages("remotes")
#remotes::install_github("ropensci/rnaturalearthhires")
library(rnaturalearth)
library(rnaturalearthhires)
bra <- ne_states(country = "brazil", returnclass = "sf")
plot(bra)
dir.create("data/shapefiles", recursive = TRUE)
st_write(obj = bra, dsn = "data/shapefiles/bra.shp", delete_layer = TRUE)
Deleting layer `bra' using driver `ESRI Shapefile'
Writing layer `bra' to data source 
  `data/shapefiles/bra.shp' using driver `ESRI Shapefile'
Writing 27 features with 83 fields and geometry type Multi Polygon.

Check the files that are created: .shp, .shx, .dbf, .cpg, prj.

To read again this shapefile, you would execute:

bra2 <- read_sf("data/shapefiles/bra.shp")
class(bra)
[1] "sf"         "data.frame"
class(bra2)
[1] "sf"         "tbl_df"     "tbl"        "data.frame"
plot(bra)
plot(bra2)

Loading, ploting, and saving a raster from the disk

library(raster)
dir.create(path = "data/raster/", recursive = TRUE)
tmax_data <- getData(name = "worldclim", var = "tmax", res = 10, path = "data/raster/")
plot(tmax_data)
is(tmax_data) #the data are a raster stack, several rasters piled 
[1] "RasterStack"      "Raster"           "RasterStackBrick"
[4] "BasicRaster"     
dim(tmax_data)
[1]  900 2160   12
extent(tmax_data)
class      : Extent 
xmin       : -180 
xmax       : 180 
ymin       : -60 
ymax       : 90 
res(tmax_data)
[1] 0.1666667 0.1666667

To move forward

This is really the tip of the iceberg. Choose your own adventure:


Palettes

#| eval: false
library(RColorBrewer)
display.brewer.all(type = "seq")
display.brewer.all(type = "div")