Analysing soil moisture data in NetCDF format with the ncdf4 library

The netCDF format is popular in sciences that analyse sequential spatial data. It is a self-describing, machine-independent data format for creating, accessing and sharing array-oriented information. The netCDF format provides spatial time-series such as meteorological or environmental data. This article shows how to visualise and analyse this data format by reviewing soil moisture data published by the Australian Bureau of Statistics.

Soil Moisture data

The Australian Bureau of Meteorology publishes hydrological data in both a simple map grid and in the NetCDF format. The map grid consists of a flat text file that requires a bit of data jujitsu before it can be used. The NetCDF format is much easier to deploy as it provides a three-dimensional matrix of spatial data over time.

We are looking at the possible relationship between sewer main blockages and deep soil moisture levels. You will need to manually download this dataset from the Bureau of Meteorology website. I have not been able to scrape the website automatically. For this analysis, I use the actual deep soil moisture level, aggregated monthly in NetCDF 4 format.

Soil moisture data from the Australian Bureau of meteorology in netCDF format

Reading, Extracting and Transforming the netCDF format

The ncdf4 library, developed by David W. Pierce, provides the necessary functionality to manage this data. The first step is to load the data, extract the relevant information and transform the data for visualisation and analysis. When the data is read, it essentially forms a complex list that contains the metadata and the measurements.

The ncvar_get function extracts the data from the list. The lon, lat and dates variables are the dimensions of the moisture data. The time data is stored as the number of days since 1 January 1900. The spatial coordinates are stored in decimal degrees with 0.05-decimal degree intervals. The moisture data is a three-dimensional matrix with longitue, latitude and time as dimensions. Storing this data in this way will make it very easy to use.

library(ncdf4)
# Load data
bom <- nc_open("Hydroinformatics/SoilMoisture/sd_pct_Actual_month.nc")
print(bom) # Inspect the data

# Extract data
lon <- ncvar_get(bom, "longitude")
lat <- ncvar_get(bom, "latitude")
dates <- as.Date("1900-01-01") + ncvar_get(bom, "time")
moisture <- ncvar_get(bom, "sd_pct")
dimnames(moisture) <- list(lon, lat, dates)

Visualising the data

The first step is to check the overall data. This first code snippet extracts a matrix from the cube for 31 July 2017 and plots it. This code pipe extracts the date for the end of July 2017 and creates a data frame which is passed to ggplot for visualisation. Although I use the Tidyverse, I still need reshape2 because the gather function does not like matrices.

library(tidyverse)
library(RColorBrewer)
library(reshape2)

d <- "2017-07-31"
m <- moisture[, , which(dates == d)] %>%
       melt(varnames = c("lon", "lat")) %>%
       subset(!is.na(value))

ggplot(m, aes(x = lon, y = lat, fill = value)) + borders("world") + 
    geom_tile() + 
    scale_fill_gradientn(colors = brewer.pal(9, "Blues")) + 
    labs(title = "Total moisture in deep soil layer (100-500 cm)",
    subtitle = format(as.Date(d), "%d %B %Y")) + 
    xlim(range(lon)) + ylim(range(lat)) + coord_fixed()

Deep soil moisture: Source Bureau of Meteorology, Australia

With the ggmap package we can create a nice map of a local area.

library(ggmap)
loc <- round(geocode("Bendigo") / 0.05) * 0.05 
map_tile <- get_map(loc, zoom = 12, color = "bw") %>% 
    ggmap()

map_tile + 
    geom_tile(data = m, aes(x = lon, y = lat, fill = value), alpha = 0.8) + 
    scale_fill_gradientn(colors = brewer.pal(9, "Blues")) + 
    labs(title = "Total moisture in deep soil layer (100-500 cm)",
        subtitle = format(as.Date(d), "%d %B %Y"))

Analysing the data

For my analysis, I am interested in the time series of moisture data for a specific point on the map. The previous code slices the data horizontally over time. To create a time series we can pierce through the data for a specific coordinate. The purpose of this time series is to investigate the relationship between sewer main blockages and deep soil data, which can be a topic for a future post.

mt <- data.frame(date = dates, 
                 dp = moisture[as.character(loc$lon), as.character(loc$lat), ])
ggplot(mt, aes(x = date, y = dp)) + geom_line() + 
    labs(x = "Month",
         y = "Moisture",
         title = "Total moisture in deep soil layer (100-500 cm)",
         subtitle = paste(as.character(loc), collapse = ", "))

The latest version of this code is available on my GitHub repository.

Deep soil moisture time series.

Pacific Island Hopping using R and iGraph

Last month I enjoyed a relaxing holiday in the tropical paradise of Vanuatu. One rainy day I contemplated how to go island hopping across the Pacific ocean visiting as many island nations as possible. The Pacific ocean is a massive body of water between, Asia and the Americas, which covers almost half the surface of the earth. The southern Pacific is strewn with island nations from Australia to Chile. In this post, I describe how to use R to plan your next Pacific island hopping journey.

Pacific Island hopping

The Pacific Ocean.

Listing all airports

My first step was to create a list of flight connections between each of the island nations in the Pacific ocean. I am not aware of a publically available data set of international flights so unfortunately, I created a list manually (if you do know of such data set, then please leave a comment).

My manual research resulted in a list of international flights from or to island airports. This list might not be complete, but it is a start. My Pinterest board with Pacific island airline route maps was the information source for this list.

The first code section reads the list of airline routes and uses the ggmap package to extract their coordinates from Google maps. The data frame with airport coordinates is saved for future reference to avoid repeatedly pinging Google for the same information.

# Init
library(tidyverse)
library(ggmap)
library(ggrepel)
library(geosphere)

# Read flight list and airport list
flights <- read.csv("Geography/PacificFlights.csv", stringsAsFactors = FALSE)
f <- "Geography/airports.csv"
if (file.exists(f)) {
    airports <- read.csv(f)
    } else airports <- data.frame(airport = NA, lat = NA, lon = NA)

# Lookup coordinates for new airports
all_airports <- unique(c(flights$From, flights$To))
new_airports <- all_airports[!(all_airports %in% airports$airport)]
if (length(new_airports) != 0) {
    coords <- geocode(new_airports)
    new_airports <- data.frame(airport = new_airports, coords)
    airports <- rbind(airports, new_airports)
    airports <- subset(airports, !is.na(airport))
    write.csv(airports, "Geography/airports.csv", row.names = FALSE)
}

# Add coordinates to flight list
flights <- merge(flights, airports, by.x="From", by.y="airport")
flights <- merge(flights, airports, by.x="To", by.y="airport")

Create the map

To create a map, I modified the code to create flight maps I published in an earlier post. This code had to be changed to centre the map on the Pacific. Mapping the Pacific ocean is problematic because the -180 and +180 degree meridians meet around the date line. Longitudes west of the antemeridian are positive, while longitudes east are negative.

The world2 data set in the borders function of the ggplot2 package is centred on the Pacific ocean. To enable plotting on this map, all negative longitudes are made positive by adding 360 degrees to them.

# Pacific centric
flights$lon.x[flights$lon.x < 0] <- flights$lon.x[flights$lon.x < 0] + 360
flights$lon.y[flights$lon.y < 0] <- flights$lon.y[flights$lon.y < 0] + 360
airports$lon[airports$lon < 0] <- airports$lon[airports$lon < 0] + 360

# Plot flight routes
worldmap <- borders("world2", colour="#efede1", fill="#efede1")
ggplot() + worldmap + 
    geom_point(data=airports, aes(x = lon, y = lat), col = "#970027") + 
    geom_text_repel(data=airports, aes(x = lon, y = lat, label = airport), 
      col = "black", size = 2, segment.color = NA) + 
    geom_curve(data=flights, aes(x = lon.x, y = lat.x, xend = lon.y, 
      yend = lat.y, col = Airline), size = .4, curvature = .2) + 
    theme(panel.background = element_rect(fill="white"), 
          axis.line = element_blank(),
          axis.text.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank()
          ) + 
    xlim(100, 300) + ylim(-40,40)

Pacific island hopping

Pacific Island Hopping

This visualisation is aesthetic and full of context, but it is not the best visualisation to solve the travel problem. This map can also be expressed as a graph with nodes (airports) and edges (routes). Once the map is represented mathematically, we can generate travel routes and begin our Pacific Island hopping.

The igraph package converts the flight list to a graph that can be analysed and plotted. The shortest_path function can then be used to plan routes. If I would want to travel from Auckland to Saipan in the Northern Mariana Islands, I have to go through Port Vila, Honiara, Port Moresby, Chuuk, Guam and then to Saipan. I am pretty sure there are quicker ways to get there, but this would be an exciting journey through the Pacific.

library(igraph)
g <- graph_from_edgelist(as.matrix(flights[,1:2]), directed = FALSE)
par(mar = rep(0, 4))
plot(g, layout = layout.fruchterman.reingold, vertex.size=0)
V(g)
shortest_paths(g, "Auckland", "Saipan")

View the latest version of this code on GitHub.

Pacific Flight network

Visualising Water Consumption using a Geographic Bubble Chart

A geographic bubble chart is a straightforward method to visualise quantitative information with a geospatial relationship. Last week I was in Vietnam helping the Phú Thọ Water Supply Joint Stock Company with their data science. They asked me to create a map of a sample of their water consumption data. In this post, I share this little ditty to explain how to plot a bubble chart over a map using the

In this post, I share this little ditty to explain how to plot a bubble chart over a map using the ggmap package.

Load and Explore the Data

The sample data contains a list of just over 100 readings from water meters in the city of Việt Trì in Vietnam, plus their geospatial location. This data uses the World Geodetic System of 1984 (WGS84), which is compatible with Google Maps and similar systems.

# Load the data
water <- read.csv("PhuTho/MeterReads.csv")
water$Consumption <- water$read_new - water$read_old

# Summarise the data
head(water)
summary(water$Consumption)

The consumption at each connection is between 0 and 529 cubic metres, with a mean consumption of 23.45 cubic metres.

Visualise the data with a geographic bubble chart

With the ggmap extension of the ggplot package, we can visualise any spatial data set on a map. The only condition is that the spatial coordinates are in the WGS84 datum. The ggmap package adds a geographical layer to ggplot by adding a Google Maps or Open Street Map canvas.

The first step is to download the map canvas. To do this, you need to know the centre coordinates and the zoom factor. To determine the perfect zoon factor requires some trial and error. The ggmap package provides for various map types, which are described in detail in the documentation.

# Load map library
library(ggmap)

# Find the middle of the points
centre <- c(mean(range(water$lon)), mean(range(water$lat)))

# Download the satellite image
viettri <- get_map(centre, zoom = 17, maptype = "hybrid")
g <- ggmap(viettri)

The ggmap package follows the same conventions as ggplot. We first call the map layer and then add any required geom. The point geom creates a nice bubble chart when used in combination with the scale_size_area option. This option scales the points to a maximum size so that they are easily visible. The transparency (alpha) minimises problems with overplotting. This last code snippet plots the map with water consumption.

# Add the points
g + geom_point(data = reads, aes(x = lon, y = lat, size = Consumption), 
        shape = 21, colour = "dodgerblue4", fill = "dodgerblue", alpha = .5) +
    scale_size_area(max_size = 20) + 
    # Size of the biggest point 
    ggtitle("Việt Trì sự tiêu thụ nước")

You can find the code and data for this article on my GitHub repository. With thanks to Ms Quy and Mr Tuyen of Phu Tho water for their permission to use this data.

This map visualises water consumption in the targeted area of Việt Trì. The larger the bubble, the larger the consumption. It is no surprise that two commercial customers used the most water. Ggplot automatically adds the legend for the consumption variable.

Geographic Bubble Chart: Visualising Water Consumption in Vietnam

Create Air Travel Route Maps in ggplot: A Visual Travel Diary

Create Air Travel Route Maps: Emirates Route MapI have been lucky to fly to a few countries around the world. Like any other bored traveller, I thumb through the airline magazines and look at the air travel route maps. These maps are beautifully stylised depictions of the world with gently curved lines between the many destinations. I always wanted such a map for my own travel adventures.

Create Air Travel Route Maps using ggplot2

The first step was to create a list of all the places I have flown between at least once. Paging through my travel photos and diaries, I managed to create a pretty complete list. The structure of this document is simply a list of all routes (From, To) and every flight only gets counted once. The next step finds the spatial coordinates for each airport by searching Google Maps using the geocode function from the ggmap package. In some instances, I had to add the country name to avoid confusion between places.

# Read flight list
flights <- read.csv("flights.csv", stringsAsFactors = FALSE)

# Lookup coordinates
library(ggmap)
airports <- unique(c(flights$From, flights$To))
coords <- geocode(airports)
airports <- data.frame(airport=airports, coords)

We now we have a data frame of airports with their coordinates and can create air travel route maps. The data frames are merged so that we can create air travel route maps using the curve geom. The borders function of ggplot2 creates the map data. The ggrepel package helps to prevent overplotting of text.

# Add coordinates to flight list
flights <- merge(flights, airports, by.x="To", by.y="airport")
flights <- merge(flights, airports, by.x="From", by.y="airport")

# Plot flight routes
library(ggplot2)
library(ggrepel)
worldmap <- borders("world", colour="#efede1", fill="#efede1") # create a layer of borders
ggplot() + worldmap + 
 geom_curve(data=flights, aes(x = lon.x, y = lat.x, xend = lon.y, yend = lat.y), col = "#b29e7d", size = 1, curvature = .2) + 
 geom_point(data=airports, aes(x = lon, y = lat), col = "#970027") + 
 geom_text_repel(data=airports, aes(x = lon, y = lat, label = airport), col = "black", size = 2, segment.color = NA) + 
 theme(panel.background = element_rect(fill="white"), 
 axis.line = element_blank(),
 axis.text.x = element_blank(),
 axis.text.y = element_blank(),
 axis.ticks = element_blank(),
 axis.title.x = element_blank(),
 axis.title.y = element_blank()
 )

I also tried to use ggmap package to display the maps to get a satellite image background. This did not work because the curve geom struggles with the map projection methods used in ggmap. Another problem is that the flight from Auckland to Los Angeles is drawn the wrong way. I hope no flat-earthers will see this map because they might use it as proof that the world is flat.

You can view the recent version of the code and associated files in GitHub.

Finding antipodes using the globe and ggmap packages

The antipodes of each point on the Earth's surface

The antipode of each point on the Earth’s surface—the points where the blue and yellow overlap, are the land antipodes.

When I was a kid, I was was fascinated by the conundrum of what happens when you drill a hole straight through the centre of the earth. I always believed that I would turn up in Australia. But is this really the case?

The antipodes of any place on earth is the place that is diametrically opposite to it. A pair of antipodes are connected by a straight line running through the centre of the Earth. These points are as far away from each other as is possible on this planet. Two people are antipodes when they live on opposite sides of the globe. Their feet (πούς/pous in Ancient Greek) are directly opposite each other.

How can we use coding in R to solve this conundrum?

Using R to find antipodes

We can roughly recreate the antipodean map on Wikipedia with the globe package. This package, written by Adrian Baddeley, plots 2D and 3D views of the earth. The package contains a data file with major coastlines that can be used to create a flipped map of the world.

The package contains a data file with major coastlines that can be used to create a flipped map of the world. To turn a spatial location into its antipode you subtract 180 degrees from the longitude and reverse the sign of the latitude, shown below.

library(globe)
# Create antipodean map
flip <- earth
flip$coords[,1] <- flip$coords[,1]-180
flip$coords[,2] <- -flip$coords[,2]
par(mar=rep(0,4)) # Remove plotting margins
globeearth(eye=c(90,0), col="blue")
globepoints(loc=flip$coords, eye=c(90,0), col="red", pch=".")

We can also use the ggmap package to visualise antipodes. This package, developed by David Kahle antipodean R-guru Hadley Wickham, has a neat geocoding function to obtain a spatial location. The antipode function takes the description of a location and a zoom level to plot a dot on the antipode location. The gridExtra package is used to created a faceted map, which is not otherwise possible in ggmap.

library(ggmap)
library(gridExtra)

antipode <- function(location, zm=6) {
    # Map location
    lonlat <- geocode(location)
    loc1 <- get_map(lonlat, zoom=zm)
    map1 <- ggmap(loc1) + geom_point(data=lonlat, aes(lon, lat, col="red", size=10)) + 
        theme(legend.position="none")
    # Define antipode
    lonlat$lon <- lonlat$lon-180
    if (lonlat$lon < -180) lonlat$lon <- 360 + lonlat$lon
    lonlat$lat <- -lonlat$lat
    loc2 <- get_map(lonlat, zoom=zm)
    map2 <- ggmap(loc2) + geom_point(data=lonlat, aes(lon, lat, col="red", size=10)) + 
        theme(legend.position="none")
    grid.arrange(map1, map2, nrow=1)
}

antipode("Rector Nelissenstraat 47 Hoensbroek", 4)

This code solves the problem I was thinking about as a child. Running the code shows that the antipodes location of the home I grew up in is not in Australia, but quite a way south of New Zealand. Another childhood fantasy shattered …

You can also view this code on GitHub.

Antipodes using ggmap and gridExtra.