Hydroinformatics is the science of analysing data related to water utilities. Wikipedia defines Hydro-informatics as “a branch of informatics which concentrates on the application of information and communications technologies (ICTs) in addressing the increasingly serious problems of the equitable and efficient use of water for many different purposes.”

This website contains a series posts with examples of data analysis for water utilities in the R language for statistical computing. R is an open-source programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners.

I started this website to help me create a coalition of data scientists that work for water utilities. I would love to jointly develop an open source repository of R code that helps water utilities with their data science problems. To achieve that goal I started writing a book to promote the virtues of writing code in R.

The articles below provide an insight into the types of water utility data science problems i am working on. Please contact me if you have ideas or like to share code.

Qualitative Data Science: Using RQDA to analyse interviews

How can R be used to conduct qualitative data science? An example analyses interviews with customer advocacy groups and regulators of water utilities using the RQDA package.

Tap Water Sentiment Analysis using Tidytext

In developed countries, tap water is safe to drink and available for a meagre price. Despite the fact that high-quality drinking water is almost freely available, the consumption of bottled water is increasing every year. Bottled water companies use sophisticated… Continue Reading →

Analysing Digital Water Meter Data using the Tidyverse

Many water utilities are implementing or considering digital metering. This article describes analysing digital water meter data using the data science Tidyverse library.

Simulating Water Consumption to Develop Analysis and Reporting

This article simulates water consumption to assist with developing leak detection algorithms. Simulating water consumption helps to develop business tools.

Analysing soil moisture data in NetCDF format with the ncdf4 library

The netCDF format is popular in sciences that analyse sequential spatial data. It is a self-describing, machine-independent data format for creating, accessing and sharing array-oriented information. The netCDF format provides spatial time-series such as meteorological or environmental data. This article shows how to visualise… Continue Reading →

Visualising Water Consumption using a Geographic Bubble Chart

A geographic bubble chart is a straightforward method to visualise quantitative information with a geospatial relationship. Last week I was in Vietnam helping the Phú Thọ Water Supply Joint Stock Company with their data science. They asked me to create… Continue Reading →

Data Science for Water Utilities Using R

Data Science for Water Utilities will be a new book that explains how to use R to undertake analytics for problems typical to water utilities.

How Virtual Tags have transformed SCADA data analysis

This article describes how to use Virtual tags to analyse SCADA data. Virtual tags provide context o SCADA or Historian data by combining information from various tags with meta data about these tags.

Data Science from a Strategic Business Perspective

Summary of my presentation to the Melbourne R user Group (MelbuRn) about Data science from a Strategic Business Perspective.

Percentile Calculations in Water Quality Regulations

Demonstrating the various ways percentile calculations can be undertaken in R and specifically with respect to measuring turbidity in water supplies.

© 2018 The Devil is in the Data — Powered by WordPress

Theme by Anders NorenUp ↑

Subscribe to The Devil is in the Data

Enter your email address to receive notifications of new articles by email.

%d bloggers like this: