Tic Tac Toe Part 3: The Minimax Algorithm

Tic Tac Toe minimax alogrithmIn two previous posts, I presented code to teach R to play the trivial game of Tic Tac Toe. I thought this was an unbeatable algorithm. Alas, a comment from Alberto shattered my pride as he was able to beat my code.

The reason for the demise of my code was that I didn’t implement a full minimax algorithm, but instead looked only two moves ahead. I thought the common strategy rules (start in the centre and occupy the corners) would make the program unbeatable. When I simulated the game by instructing the computer to play against itself, Alberto’s strategy never arose because the code forces the centre field. Alberto’s code shows that you need to look at least three moves ahead for a perfect game. He has been so kind to share his code and gave me permission to publish it.

Alberto recreated two functions, for completeness I have added the complete working code that merges his improvements with my earlier work. The first two functions are identical to the previous post. These functions draw the game board and process the human player’s move by waiting for a mouse click.

# Draw the game board
draw.board <- function(game) {
    xo <- c("X", " ", "O") # Symbols
    par(mar = rep(1,4))
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col="darkgrey", lwd = 4)
    abline(v = c(10, 20), col="darkgrey", lwd = 4)
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), xo[game + 2], cex = 4)
    # Identify location of any three in a row
    square <- t(matrix(game, nrow = 3))
    hor <- abs(rowSums(square))
    if (any(hor == 3)) 
        hor <- (4 - which(hor == 3)) * 10 - 5 
    else 
        hor <- 0
    ver <- abs(colSums(square))
    if (any(ver == 3)) 
        ver <- which(ver == 3) * 10 - 5 
    else
        ver <- 0
    diag1 <- sum(diag(square))
    diag2 <- sum(diag(t(apply(square, 2, rev)))) 
    # Draw winning lines
    if (all(hor > 0)) 
        for (i in hor) lines(c(0, 30), rep(i, 2), lwd = 10, col="red")
    if (all(ver > 0)) 
        for (i in ver) lines(rep(i, 2), c(0, 30), lwd = 10, col="red")
    if (abs(diag1) == 3) 
        lines(c(2, 28), c(28, 2), lwd = 10, col = "red")
    if (abs(diag2) == 3) 
        lines(c(2, 28), c(2, 28), lwd = 10, col = "red")
}

# Human player enters a move
move.human <- function(game) {
    text(4, 0, "Click on screen to move", col = "grey", cex=.7)
    empty <- which(game == 0)
    move <- 0
    while (!move %in% empty) {
        coords <- locator(n = 1) # add lines
        coords$x <- floor(abs(coords$x) / 10) + 1
        coords$y <- floor(abs(coords$y) / 10) + 1
        move <- coords$x + 3 * (3 - coords$y)
    }
    return (move)
}

Alberto rewrote the functions that analyse the board and determine the move of the computer. The ganador (Spanish for winning) function assesses the board condition by assigning -10 or + 10 for a winning game and 0 for any other situation.

ganador <- function(juego, player) {
    game <- matrix(juego, nrow = 3, byrow = T)
    hor <- rowSums(game)
    ver <- colSums(game)
    diag <- c(sum(diag(game)), sum(diag(apply(game, 1, rev))))
    if (-3 %in% c(hor, ver, diag))
        return(-10)
    if (3 %in% c(hor, ver, diag))
        return(10)
    else
        return(0)
}

The next function is the actual minimax algorithm. If the computer starts then the first move (9!= 362880 options to assess) takes a little while. The commented lines can be used to force a corner and make the games faster by forcing a random corner.

The minimax function returns a list with the move and its valuation through the ganador function. The function works recursively until it has filled the board and retains the best scoring move using the minimax method. To avoid the computer always playing the same move in the same situation random variables are added.

minimax <- function(juego, player) {
    free <- which(juego == 0)
    if(length(free) == 1) {
        juego[free] <- player
        return(list(move = free, U = ganador(juego, player)))
    }
    poss.results <- rep(0, 9)
    for(i in free) {
        game <- juego
        game[i] <- player
        poss.results[i] <- ganador(game, player)
    }
    mm <- ifelse(player == -1, "which.min", "which.max")
    if(any(poss.results == (player * 10))) {
        move <- do.call(mm, list(poss.results))
        return(list(move = move, U = poss.results[move]))
    }
    for(i in free) {
        game <- juego
        game[i] <- player
        poss.results[i] <- minimax(game, -player)$U
    }
    random <- runif(9, 0, 0.1)
    poss.results[-free] <- 100 * -player
    poss.results <- poss.results + (player * random)
    move <- do.call(mm, list(poss.results))
    return(list(move = move, U = poss.results[move]))
}

This final function stitches everything together and lets you play the game. Simply paste all functions in your R console and run them to play a game. The tic.tac.toe function can take two parameters, “human” and/or “computer”. The order of the parameters determines who starts the game.

# Main game engine
tic.tac.toe <- function(player1 = "human", player2 = "computer") {
    game <- rep(0, 9) # Empty board
    winner <- FALSE # Define winner
    player <- 1 # First player
    players <- c(player1, player2)
    draw.board(game)
    while (0 %in% game & !winner) { # Keep playing until win or full board
        if (players[(player + 3) %% 3] == "human") # Human player
            move <- move.human(game)
        else { # Computer player
            move <- minimax(game, player)
            move <- move$move
            }
        game[move] <- player # Change board
        draw.board(game)
        winner <- max(eval.game(game, 1), abs(eval.game(game, -1))) == 6 # Winner, winner, chicken dinner?
        player <- -player # Change player
    }
}

tic.tac.toe()

This is my last word on Tic Tac Toe but now that the minimax conundrum is solved I could start working on other similar games such as Connect Four, Draughts or even the royal game of Chess.

View the latest version of this code on GitHub.

R-Cade Games: Simulating the Legendary Game of Pong

The legendary game of PongPong is one of the earliest arcade games on the market, first released in 1972. From the day I first saw this miracle box, I wanted to know more about computers.

I learnt how to write code from the 1983 book Dr. C. Wacko’s Miracle Guide to Designing and Programming your own Atari Computer Arcade Games. This book explains in a very clear and humorous way how to write computer games in Atari basic. I devoured this book and spent many hours developing silly games. This article is an ode to Dr Wacko, a computer geek’s midlife-crisis and an attempt to replicate the software I developed thirty years ago.

I showed in a previous post that R can be used for board games. The question is whether we create arcade games in R. My challenge is to recreate the look and feel of 1980s arcade games, or R-Cade games, using R? The code shown below simulates the legendary game of pong.

Playing Pong in R

The code is based on the Wacko’s Boing Program in the above-mentioned book. The R code is fully commented and speaks for itself. Please note that the animation is very clunky when you run it in RStudio. Only the native R Terminal displays the animation correctly.

Perhaps somebody can help me perfect this little ditty. I love to know how to read real-time USB input to control the game, so we get a step closer to the first R-Cade game.

The Pong Code

# Sound library
library(beepr) 

# Game parameters
skill <- 0.87 # Skill (0-1)
score <- 0
high.score <- 0

# Define playing field
par(mar = rep(1,4), bg = "black")
plot.new()
plot.window(xlim = c(0, 30), ylim = c(0, 30))
lines(c(1, 30, 30, 1), c(0, 0, 30, 30), type = "l", lwd = 5, col = "white")

# Playing field boundaries (depend on cex)
xmin <- 0.5
xmax <- 29.4
ymin <- 0.5
ymax <- 29.4

# initial position
x <- sample(5:25, 1)
y <- sample(5:25, 1)
points(x, y, pch = 15, col = "white", cex = 2)

# Paddle control
psize <- 4
ypaddle <- y

# Set direction
dx <- runif(1, .5, 1)
dy <- runif(1, .5, 1) 

# Game play 
while (x > xmin - 1) {
    sound <- 0 # Silence
    Sys.sleep(.05) # Pause screen. Reduce to increase speed
    points(x, y, pch = 15, col = "black", cex = 2) # Erase ball
    # Move ball
    x <- x + dx
    y <- y + dy 
    # Collision detection 
    if (x > xmax) {
        dx <- -dx * runif(1, .9, 1.1) # Bounce 
        if (x > xmin) x <- xmax # Boundary
        sound <- 10 # Set sound
        }
    if (y < ymin | y > ymax) {
        if (y < ymin) y <- ymin 
        if (y > ymax) y <- ymax
        dy <- -dy * runif(1, .9, 1.1)
        sound <- 10
    }
    # Caught by paddle?
    if (x < xmin & (y > ypaddle - (psize / 2)) & y < ypaddle + (psize / 2)) {
        if (x < xmin) x <- xmin
        dx <- -dx * runif(1, .9, 1.1)
        sound <- 2
        score <- score + 1
        }
    # Draw ball
    points(x, y, pch = 15, col = "white", cex = 2)
    if (sound !=0) beep(sound)
    # Move paddle
    if (runif(1, 0, 1) < skill) ypaddle <- ypaddle + dy # Imperfect follow
    # Draw paddle
    # Erase back line
    lines(c(0, 0), c(0, 30), type = "l", lwd = 8, col = "black")
    # Keep paddle inside window
    if (ypaddle < (psize / 2)) ypaddle <- (psize / 2) 
    if (ypaddle > 30 - (psize / 2)) ypaddle <- 30 - (psize / 2) 
    # Draw paddle 
    lines(c(0, 0), c(ypaddle - (psize / 2), ypaddle + (psize / 2)), type = "l", lwd = 8, col = "white") 
} 
beep(8) 
text(15,15, "GAME OVER", cex=5, col = "white") 
s <- ifelse(score == 1, "", "s")
text(15,5, paste0(score, " Point", s), cex=3, col = "white") 

View the latest version of this code on GitHub.

Tic Tac Toe War Games: The Intelligent Minimax Algorithm

Tic Tac Toe War GamesIn a previous post, I shared how to build a randomised Tic Tac Toe simulation. The computer plays against itself playing at random positions. In this post, I will share how to teach the computer to play the game strategically.

I love the 1983 classic movie War Games. In this film, a computer plays Tic Tac Toe against itself to learn that it cannot win the game to prevent a nuclear war.

Back in those days, I devoured the wonderful book Writing Strategy Games on your Atari by John White which contains an algorithm to play Tic Tac Toe War Games. This is my attempt to relive the eighties using R.

You can find the code on my GitHub page.

Drawing the Board

A previous post describes the function that draws the Tic Tac Toe board. For completeness, the code is replicated below. The game board is a vector of length nine consisting of either -1 (X), 0 (empty field) or 1 (O). The vector indices correspond with locations on the game board:

1 2 3
4 5 6
7 8 9

draw.board &lt;- function(board) { # Draw the board
    xo &lt;- c(&quot;X&quot;, &quot; &quot;, &quot;O&quot;) # Symbols
    par(mar = rep(0,4))
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col=&quot;darkgrey&quot;, lwd = 4)
    abline(v = c(10, 20), col=&quot;darkgrey&quot;, lwd = 4)
    pieces &lt;- xo[board + 2]
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), pieces, cex = 6)
    # Identify location of any three in a row
    square &lt;- t(matrix(board, nrow = 3))
    hor &lt;- abs(rowSums(square))
    if (any(hor == 3)) 
      hor &lt;- (4 - which(hor == 3)) * 10 - 5 
    else 
      hor &lt;- 0
    ver &lt;- abs(colSums(square))
    if (any(ver == 3)) 
      ver &lt;- which(ver == 3) * 10 - 5 
    else
      ver &lt;- 0
    diag1 &lt;- sum(diag(square))
    diag2 &lt;- sum(diag(t(apply(square, 2, rev)))) # Draw winning lines if (hor &gt; 0) lines(c(0, 30), rep(hor, 2), lwd=10, col=&quot;red&quot;)
    if (ver &gt; 0) lines(rep(ver, 2), c(0, 30), lwd=10, col=&quot;red&quot;)
    if (abs(diag1) == 3) lines(c(2, 28), c(28, 2), lwd=10, col=&quot;red&quot;)
    if (abs(diag2) == 3) lines(c(2, 28), c(2, 28), lwd=10, col=&quot;red&quot;)
}

Human Players

This second code snippet lets a human player move by clicking anywhere on the graphic display using the locator function. The click location is converted to a number to denote the position on the board. The entered field is only accepted if it has not yet been used (the empty variable contains the available fields).

# Human player enters a move
move.human &lt;- function(game) {
    text(4, 0, &quot;Click on screen to move&quot;, col = &quot;grey&quot;, cex=.7)
    empty &lt;- which(game == 0)
    move &lt;- 0
    while (!move %in% empty) {
        coords &lt;- locator(n = 1) # add lines
        coords$x &lt;- floor(abs(coords$x) / 10) + 1
        coords$y &lt;- floor(abs(coords$y) / 10) + 1
        move &lt;- coords$x + 3 * (3 - coords$y)
    }
    return (move)
}

Evaluate the Game

This code snippet defines the eval.game function which assesses the current board and assigns a score. Zero means no outcome, -6 means that the X player has won and +6 implies that the O player has won.

# Evaluate board position
eval.game &lt;- function(game, player) {
    # Determine game score
    square &lt;- t(matrix(game, nrow = 3))
    hor &lt;- rowSums(square)
    ver &lt;- colSums(square)
    diag1 &lt;- sum(diag(square))
    diag2 &lt;- sum(diag(t(apply(square, 2, rev))))
    eval &lt;- c(hor, ver, diag1, diag2)
    # Determine best score
    minimax &lt;- ifelse(player == -1, &quot;min&quot;, &quot;max&quot;)
    best.score &lt;- do.call(minimax, list(eval))
    if (abs(best.score) == 3) best.score &lt;- best.score * 2
    return (best.score)
}

Computer Moves

The computer uses a modified Minimax Algorithm to determine its next move. This article from the Never Stop Building blog and the video below explain this method in great detail.

The next function determines the computer’s move. I have not used a brute-force minimax algorithm to save running time. I struggled building a fully recursive minimax function. Perhaps somebody can help me with this. This code looks only two steps deep and contains a strategic rule to maximise the score.

The first line stores the value of the players move, the second remainder of the matrix holds the evaluations of all the opponents moves. The code adds a randomised variable, based on the strategic value of a field. The centre has the highest value because it is part of four winning lines. Corners have three winning lines and the rest only two winning lines. This means that the computer will, all things being equal, favour the centre over the corners and favour the other fields least. The randomised variables in the code ensure that the computer does not always pick the same field in a similar situation.

# Determine computer move
move.computer &lt;- function(game, player) {
    empty &lt;- which(game == 0)
    eval &lt;- matrix(nrow = 10, ncol = 9, data = 0)
    for (i in empty) {
        game.tmp &lt;- game
        game.tmp[i] &lt;- player
        eval[1, i] &lt;- eval.game(game.tmp, player)
        empty.tmp &lt;- which(game.tmp ==0)
        for (j in empty.tmp) {
            game.tmp1 &lt;- game.tmp
            game.tmp1[j] &lt;- -player
            eval[(j + 1), i] &lt;- eval.game(game.tmp1, -player)
        }
    }
    if (!any(abs(eval[1,]) == 6)) { # When winning, play move
        # Analyse opponent move
        minimax &lt;- ifelse(player == -1, &quot;max&quot;, &quot;min&quot;) # Minimax
        best.opponent &lt;- apply(eval[-1,], 1, minimax)
        eval[1,] &lt;- eval[1,] * -player * best.opponent
    }
    # Add randomisation and strategic values
    board &lt;- c(3, 2, 3, 2, 4, 2, 3, 2, 3) # Strategic values
    board &lt;- sapply(board, function(x) runif(1, 0.1 * x, (0.1 * x) + 0.1)) # Randomise
    eval[1, empty] &lt;- eval[1, empty] + player * board[empty] # Randomise moves
    # Pick best game
    minimax &lt;- ifelse(player == -1, &quot;which.min&quot;, &quot;which.max&quot;) # Minimax
    move &lt;- do.call(minimax, list(eval[1,])) # Select best move
    return(move)
}

This last code snippet enables computers and humans play each other or themselves. The players vector contains the identity of the two players so that a human can play a computer or vice versa. The human player moves by clicking on the screen.

The loop keeps running until the board is full or a winner has been identified. A previous Tic Tac Toe post explains the draw.board function.

# Main game engine
tic.tac.toe &lt;- function(player1 = &quot;human&quot;, player2 = &quot;computer&quot;) {
    game &lt;- rep(0, 9) # Empty board
    winner &lt;- FALSE # Define winner
    player &lt;- 1 # First player
    players &lt;- c(player1, player2)
    draw.board(game)
    while (0 %in% game &amp; !winner) { # Keep playing until win or full board
        if (players[(player + 3) %% 3] == &quot;human&quot;) # Human player
            move &lt;- move.human(game)
        else # Computer player
            move &lt;- move.computer(game, player)
        game[move] &lt;- player # Change board
        draw.board(game)
        winner &lt;- max(eval.game(game, 1), abs(eval.game(game, -1))) == 6 # Winner, winner, chicken dinner?
        player &lt;- -player # Change player
    }
}

You can play the computer by running all functions and then entering tic.tac.toe().

I am pretty certain this simplified minimax algorithm is unbeatable—why don’t you try to win and let me know when you do.

Tic Tac Toe War Games

Now that this problem is solved, I can finally recreate the epic scene from the WarGames movie. The Tic Tac Toe War Games code uses the functions explained above and the animation package. Unfortunately, there are not many opportunities to create sound in R.

# WAR GAMES TIC TAC TOE
source(&quot;Tic Tac Toe/Tic Tac Toe.R&quot;)

# Draw the game board
draw.board.wargames &lt;- function(game) {
    xo &lt;- c(&quot;X&quot;, &quot; &quot;, &quot;O&quot;) # Symbols
    par(mar = rep(1,4), bg = &quot;#050811&quot;)
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col = &quot;#588fca&quot;, lwd = 20)
    abline(v = c(10, 20), col = &quot;#588fca&quot;, lwd = 20)
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), xo[game + 2], cex = 20, col = &quot;#588fca&quot;)
    text(1,0,&quot;r.prevos.net&quot;, col = &quot;#588fca&quot;, cex=2)
    # Identify location of any three in a row
    square &lt;- t(matrix(game, nrow = 3))
    hor &lt;- abs(rowSums(square))
    if (any(hor == 3)) 
        hor &lt;- (4 - which(hor == 3)) * 10 - 5 
    else 
        hor &lt;- 0
    ver &lt;- abs(colSums(square))
    if (any(ver == 3)) 
        ver &lt;- which(ver == 3) * 10 - 5 
    else
        ver &lt;- 0
    diag1 &lt;- sum(diag(square))
    diag2 &lt;- sum(diag(t(apply(square, 2, rev)))) # Draw winning lines if (all(hor &gt; 0)) for (i in hor) lines(c(0, 30), rep(i, 2), lwd = 10, col=&quot;#588fca&quot;)
    if (all(ver &gt; 0)) for (i in ver) lines(rep(i, 2), c(0, 30), lwd = 10, col=&quot;#588fca&quot;)
    if (abs(diag1) == 3) lines(c(2, 28), c(28, 2), lwd = 10, col = &quot;#588fca&quot;)
    if (abs(diag2) == 3) lines(c(2, 28), c(2, 28), lwd = 10, col = &quot;#588fca&quot;)
}

library(animation)
player &lt;- -1
games &lt;- 100
saveGIF ({
    for (i in 1:games) {
        game &lt;- rep(0, 9) # Empty board
        winner &lt;- 0 # Define winner
        #draw.board.wargames(game)
        while (0 %in% game &amp; !winner) { # Keep playing until win or full board
            empty &lt;- which(game == 0)
            move &lt;- move.computer(game, player)
            game[move] &lt;- player
            if (i &lt;= 12) draw.board.wargames(game)
            winner &lt;- max(eval.game(game, 1), abs(eval.game(game, -1))) == 6
            player &lt;- -player } if (i &gt; 12) draw.board.wargames(game)
    }
},
interval = c(unlist(lapply(seq(1, 0,-.2), function (x) rep(x, 9))), rep(0,9*94)), 
movie.name = &quot;wargames.gif&quot;, ani.width = 1024, ani.height = 1024)

View the latest version of this code on GitHub.

Tic Tac Toe Simulation — Random Moves

Tic Tac Toe Simulation - Part 1Tic Tac Toe might be a futile children’s game but it can also teach us about artificial intelligence. Tic Tac Toe, or Naughts and Crosses, is a zero-sum game with perfect information. Both players know exactly what the other did and when nobody makes a mistake, the game will always end in a draw.

Tic Tac Toe is a simple game but also the much more complex game of chess is a zero-sum game with perfect information.

In this two-part post, I will build an unbeatable Tic Tac Toe Simulation. This first part deals with the mechanics of the game. The second post will present an algorithm for a perfect game.

Drawing the Board

This first code snippet draws the Tic Tac Toe simulation board. The variable xo holds the identity of the pieces and the vector board holds the current game. Player X is denoted with -1 and player O with +1. The first part of the function draws the board and the naughts and crosses. The second part of the code check for three in a row and draws the corresponding line.

draw.board <- function(board) { # Draw the board
    xo <- c("X", " ", "O") # Symbols
    par(mar = rep(0,4))
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col="darkgrey", lwd = 4)
    abline(v = c(10, 20), col="darkgrey", lwd = 4)
    pieces <- xo[board + 2]
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), pieces, cex = 6)
    # Identify location of any three in a row
    square <- t(matrix(board, nrow = 3))
    hor <- abs(rowSums(square))
    if (any(hor == 3)) 
      hor <- (4 - which(hor == 3)) * 10 - 5 
    else 
      hor <- 0
    ver <- abs(colSums(square))
    if (any(ver == 3)) 
      ver <- which(ver == 3) * 10 - 5 
    else
      ver <- 0
    diag1 <- sum(diag(square))
    diag2 <- sum(diag(t(apply(square, 2, rev)))) 
    # Draw winning lines 
    if (hor > 0) lines(c(0, 30), rep(hor, 2), lwd=10, col="red")
    if (ver > 0) lines(rep(ver, 2), c(0, 30), lwd=10, col="red")
    if (abs(diag1) == 3) lines(c(2, 28), c(28, 2), lwd=10, col="red")
    if (abs(diag2) == 3) lines(c(2, 28), c(2, 28), lwd=10, col="red")
}

Random Tic Tac Toe

The second part of the code generates ten random games and creates and animated GIF-file. The code adds random moves until one of the players wins (winner <> 0) or the board is full (no zeroes in the game vector). The eval.winner function checks for three in a row and declares a winner when found.

There are 255,168 possible legal games in Tic Tac Toe, 46,080 of which end in a draw. This implies that these randomised games result in a draw 18% of the time.

eval.winner <- function(board) { # Identify winner
    square <- t(matrix(board, nrow = 3))
    hor <- rowSums(square)
    ver <- colSums(square)
    diag1 <- sum(diag(square))
    diag2 <- sum(diag(t(apply(square, 2, rev))))
    if (3 %in% c(hor, ver, diag1, diag2)) return (1)
    else
        if (-3 %in% c(hor, ver, diag1, diag2)) return (2)
    else
        return(0)
}

# Random game
library(animation)
saveGIF ({
 for (i in 1:10) {
 game <- rep(0, 9) # Empty board
 winner <- 0 # Define winner
 player <- -1 # First player
 draw.board(game)
 while (0 %in% game & winner == 0) { # Keep playing until win or full board
   empty <- which(game == 0) # Define empty squares
   move <- empty[sample(length(empty), 1)] # Random move
   game[move] <- player # Change board
   draw.board(game)
   winner <- eval.winner(game) # Evaulate game
   player <- player * -1 # Change player
 }
 draw.board(game)
 }
 },
 interval = 0.25, movie.name = "ttt.gif", ani.width = 600, ani.height = 600)

View the latest version of this code on GitHub.

Tic Tac Toe Simulation

In a future post, I will outline how to program the computer to play against itself, just like in the 1983 movie War Games.

SCADA spikes in Water Treatment Data

Turbid waterSCADA spikes are events in the data stream of water treatment plants or similar installations. These SCADA spikes can indicate problems with the process and could result in an increased risk to public health.

The WSAA Health Based Targets Manual specifies a series of decision rules to assess the performance of filtration processes. For example, this rule assesses the performance of conventional filtration:

“Individual filter turbidity ≤ 0.2 NTU for 95% of month and not > 0.5 NTU for ≥ 15 consecutive minutes.”

Turbidity is a measure for the cloudiness of a fluid because of large numbers of individual particles otherwise invisible to the naked eye. Turbidity is an important parameter in water treatment because a high level of cloudiness strongly correlates with the presence of microbes. This article shows how to implement this specific decision rule using the R language.

Simulation

To create a minimum working example, I first create a simulated SCADA feed for turbidity. The turbidity data frame contains 24 hours of data. The seq.POSIXt function creates 24 hours of timestamps at a one-minute spacing. In addition, the rnorm function creates 1440 turbidity readings with an average of 0.1 NTU and a standard deviation of 0.01 NTU. The image below visualises the simulated data. The next step is to assess this data in accordance with the decision rule.

# Simulate data
set.seed(1234)
turbidity <- data.frame(DateTime = seq.POSIXt(as.POSIXct("2017-01-01 00:00:00"), 
                        by = "min", length.out=24*60),
                        Turbidity = rnorm(n = 24*60, mean = 0.1, sd = 0.01)
 )

The second section simulates five spikes in the data. The first line picks a random start time for the spike. The second line in the for-loop picks a duration between 10 and 30 minutes. In addition, the third line simulates the value of the spike. The mean value of the spike is determined by the rbinom function to create either a low or a high spike. The remainder of the spike simulation inserts the new data into the turbidity data frame.

# Simulate spikes
for (i in 1:5) {
   time <- sample(turbidity$DateTime, 1)
   duration <- sample(10:30, 1)
   value <- rnorm(1, 0.5 * rbinom(1, 1, 0.5) + 0.3, 0.05)
   start <- which(turbidity$DateTime == time)
   turbidity$Turbidity[start:(start+duration - 1)] <- rnorm(duration, value, value/10)
}

The image below visualises the simulated data using the mighty ggplot. Only four spikes are visible because two of them overlap. The next step is to assess this data in accordance with the decision rule.

library(ggplot2)
ggplot(turbidity, aes(x = DateTime, y = Turbidity)) + geom_line(size = 0.2) + 
   geom_hline(yintercept = 0.5, col = "red") + ylim(0,max(turbidity$Turbidity)) + 
   ggtitle("Simulated SCADA data")

Simulated SCADA data with spikes

SCADA Spikes Detection

The following code searches for all spikes over 0.50 NTU using the run length function. This function transforms a vector into a vector of values and lengths. For example, the run length of the vector c(1, 1, 2, 2, 2, 3, 3, 3, 3, 5, 5, 6) is:

  • lengths: int [1:5] 2 3 4 2 1
  • values : num [1:5] 1 2 3 5 6

The value 1 has a length of 1, the value 2 has a length of 3 and so on. The spike detection code creates the run length for turbidity levels greater than 0.5, which results in a boolean vector. The cumsum function calculates the starting point of each spike which allows us to calculate their duration.

The code results in a data frame with all spikes higher than 0.50 NTU and longer than 15 minutes. The spike that occurred at 11:29 was higher than 0.50 NTU and lasted for 24 minutes. The other three spikes are either lower than 0.50 NTU. The first high spike lasted less than 15 minutes.

# Spike Detection
spike.detect <- function(DateTime, Value, Height, Duration) {
 runlength <- rle(Value > Height)
 spikes <- data.frame(Spike = runlength$values,
 times <- cumsum(runlength$lengths))
 spikes$Times <- DateTime[spikes$times]
 spikes$Event <- c(0,spikes$Times[-1] - spikes$Times[-nrow(spikes)])
 spikes <- subset(spikes, Spike == TRUE & Event > Duration)
 return(spikes)
}
spike.detect(turbidity$DateTime, turbidity$Turbidity, 0.5, 15)

This approach was used to prototype a software package to assess water treatment plant data in accordance with the Health-Based Targets Manual. The finished product has been written in SQL and is available under an Open Source sharing license.

View this code on GitHub.