# How Virtual Tags have transformed SCADA data analysis

Yesterday, I delivered the International Keynote at the Asset Data & Insights Conference in Auckland, New Zealand (the place where R was originally developed). My talk was about how to create value from SCADA data, using a method I developed with my colleagues called Virtual Tags. My talk started with my views on data science strategy, which I also presented to the R User Group in Melbourne. In this post, I like to explain what Virtual Tags are, and how they can be used to improve the value of SCADA data.

## SCADA Systems at Water Treatment Plants

Water treatment plants are mostly fully automated systems, using analysers and the SCADA system to communicate this data. For those of you not familiar with water treatment plants, this video below gives a cute summary of the process.

Water treatment plants need sensors to measure a broad range of parameters. These instruments record data 24 hours per day to control operations. When the process operates effectively, all values fall within a very narrow band. All these values are stored by the SCADA system for typically a year, after which they are destroyed to save storage space.

Water treatment plants measure turbidity (clarity of the water) to assess the effectiveness of filtration. The code snippet below simulates the measurements from a turbidity instrument at a water treatment plant over five hours. The code simulates measurements from a turbidity instrument at a water treatment plant over a period of five hours. Most water quality data has a log-normal distribution with a narrow standard deviation.

```# Simulate measured data
set.seed(1234)
n <- 300
wtp <- data.frame(DateTime = seq.POSIXt(ISOdate(1910, 1, 1), length.out = n, by = 60),
WTP = rlnorm(n, log(.1), .01))
library(ggplot2)
p <- ggplot(wtp, aes(x = DateTime, y = WTP)) + geom_line(colour = "grey") +
ylim(0.09, 0.11) + ylab("Turbidity") + ggtitle("Turbidity simulation")
p
```

The data generated by the SCADA system is used to take operational decisions. The data is created and structured to make decisions in the present, not to solve problems in the future. SCADA Historian systems archive this information for future analysis. Historian systems only store new values when the new reading is more or less than a certain percentage than the previous one. This method saves storage space without sacrificing much accuracy.

For example, when an instrument reads 0.20 and the limit is set at 5%, new values are only recorded when they are below 0.19 or above 0.21. Any further values are stored when they deviate 5% from the new value, and so on. The code snippet below simulates this behaviour, based on the simulated turbidity readings generated earlier. This Historian only stores the data points marked in black.

```# Historise data
threshold <- 0.03
h <- 1 # First historised point
# Starting conditions
wtp\$historise <- FALSE
wtp\$historise[c(1, n)] <- TRUE
# Testing for delta <> threshold
for (i in 2:nrow(wtp)) {
delta <- wtp\$WTP[i] / wtp\$WTP[h] if (delta > (1 + threshold) | delta < (1 - threshold)) {
wtp\$historise[i] <- TRUE
h <- i
}
}
historian <- subset(wtp, historise == TRUE)
historian\$Source <- "Historian"
p <- p + geom_point(data = historian, aes(x = DateTime, y = WTP)) + ggtitle("Historised data")
p
```

## Virtual Tags

This standard method to generate and store SCADA data works fine to operate systems but does not work so well when using the data for posthoc analysis. The data in Historian is an unequally-spaced time series which makes it harder to analyse the data. The Virtual Tag approach expands these unequal time series to an equally-spaced one, using constant interpolation.

The `vt` function undertakes the constant interpolation using the approx function. The function`vt` is applied to all the DateTime values, using the historised data points. The red line shows how the value is constant until it jumps by more than 5%. This example demonstrates that we have a steady process with some minor spikes, which is the expected outcome of this simulation.

```# Create Virtual Ttags
vt <- function(t) approx(historian\$DateTime, historian\$WTP, xout = t, method = "constant")
turbidity <- lapply(as.data.frame(wtp\$DateTime), vt)
wtp\$VirtualTag <- turbidity[[1]]\$y
p + geom_line(data = wtp, aes(x = DateTime, y = VirtualTag), colour = "red") + ggtitle("Virtual Tags")
```

The next step in Virtual Tags is to combine the tags from different data points. For example, we are only interested in the turbidity readings when the filter was running. We can do this by combining this data with a valve status or flow in the filter.

This approach might seem cumbersome but it simplifies analysing data from Historian. Virtual Tags enables several analytical processes that would otherwise be hard to. This system also adds context to the SCADA information by linking tags to each other and the processes they describe. If you are interested in more detail, then please download the technical manual for Virtual Tags and how they are implemented in SQL.

## The Presentation

And finally, these are the slides of my keynote presentation.

## 7 thoughts on “How Virtual Tags have transformed SCADA data analysis”

1. Forgive my pedantry but this seems more like a virtual sample than a virtual tag. Nevertheless the function you describe would be incredibly useful. It is amazing how difficult it can be to analyse data from a SCADA historian and how un-user-friendly they are. Your function should be a standard feature offered whenever you try to export data from a historian – there should be a popup saying “would you like to export the data as x interpolated samples every x seconds beginning at 00:00”. Sell it to Citect.

• Thanks for the kind words Tom.

We called them tags instead of samples because besides creating equally-spaced time series to simplify analysis we also add context to the tags and create new tags by combining several into one or through formulas.

We share the manual and the code as open source. We have implemented it as an SQL package. Both are linked in the article.