Euler Problem 19: Counting Sundays — When does the week start?

Euler Problem 19 is so trivial it is almost not worth writing an article about. One interesting aspect of this problem is the naming of weekdays and deciding which day the week starts. This issue is more complex than it sounds because data science is in essence not about data but about people.

Euler Problem 19 Definition

  • 1 Jan 1900 was a Monday.
  • Thirty days has September, April, June and November.
  • All the rest have thirty-one,
  • Saving February alone, Which has twenty-eight, rain or shine. And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

Solution

The problem can be quickly solved with R base code and a tiny bit faster when using the lubridate package.

# Base R-code
dates <- seq.Date(as.Date("1901/01/01"), as.Date("2000/12/31"), "days")
days <- rep(1:7, length.out = length(dates))
answer <- sum(days[substr(dates, 9, 10) == "01"] == 1)
print(answer)

#Using Lubridate
library(lubridate, quietly = TRUE)
answer <- sum(wday(dates[substr(dates, 9, 10) == "01"]) == 1)
print(answer) 	 

To draw out this post a little bit further I wrote some code to solve the problem without using the calendar functions in R.

week.day <- 0
answer <- 0
for (y in 1901:2000) {
    for (m in 1:12) {
        max.day <- 31
        if (m %in% c(4, 6, 9, 11)) max.day <- 30
        # Leap years
        if (m == 2) {
            if (y %% 4 == 0 & y %% 100 != 0 | y %% 400 == 0) max.day <- 29
            else max.day <- 28
            }
        for (d in 1:max.day) {
            week.day <- week.day + 1
            if (week.day == 8) week.day <- 1
            if (week.day == 1 & d == 1) answer <- answer + 1
        }
    }
}
print(answer)

View the latest version of this code on GitHub.

Which day does the week start?

The only aspect remotely interesting about this problem is the conversion from weekdays to numbers. In R, Monday is considered day one, which makes sense in the Christian context of Western culture. Saturday and Sunday are the weekend, the two last days of the week so they are day 6 and 7. According to international standard ISO 8601, Monday is the first day of the week. Although this is the international standard, several countries, including the United States and Canada, consider Sunday to be the first day of the week.

The international standard is biased towards Christianity. The Christian or Western world marks Sunday as their day of rest and worship. Muslims refer to Friday as their day of rest and prayer. The Jewish calendar counts Saturday—the Sabbath—as the day of rest and worship. This idea is also shared by Seventh-Day Adventists.

this example shows that data science is not only about data: it is always about people and how they interpret the world.

via chartsbin.com

Euler Problem 18 & 67: Maximum Path Sums

A pedigree is an example of a binary tree: Euler Problem 18

An example of a pedigree. Source: Wikimedia.

Euler Problem 18 and 67 are exactly the same besides that the data set in the second version is larger than in the first one. In this post, I kill two Eulers with one code.

These problems deal with binary trees, which is a data structure where each node has two children. A practical example of a binary tree is a pedigree chart, where each person or animal has two parents, four grandparents and so on.

Euler Problem 18 Definition

By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.

3
7 4
2 4 6
8 5 9 3

That is, 3 + 7 + 4 + 9 = 23. Find the maximum total from top to bottom of the triangle below:

75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23

As there are only 16,384 routes, it is possible to solve this problem by trying every route. However, Problem 67, is the same challenge with a triangle containing one-hundred rows; it cannot be solved by brute force, and requires a clever method! ;o)

Solution

This problem seeks a maximum path sum in a binary tree. The brute force method, as indicated in the problem definition, is a very inefficient way to solve this problem. The video visualises the quest for the maximum path, which takes eleven minutes of hypnotic animation.

A more efficient method is to define the maximum path layer by layer, starting at the bottom. The maximum sum of 2+8 or 2+5 is 10, the maximum sum of 4+5 or 4+9 is 13 and the last maximum sum is 15. These numbers are now placed in the next row. This process cycles until only one number is left. This algorithm solves the sample triangle in four steps:

Step 1:

3
7 4
2 4 6
8 5 9 3

Step 2:

3
7 4
10 13 15

Step 3:

3
20 19

Step 4:

23

In the code below, the data is triangle matrix. The variables rij (row) and kol (column) drive the search for the maximum path. The triangle for Euler Problem 18 is manually created and the triangle for Euler Problem 67 is read from the website.

path.sum <- function(triangle) {
    for (rij in nrow(triangle):2) {
        for (kol in 1:(ncol(triangle)-1)) {
            triangle[rij - 1,kol] <- max(triangle[rij,kol:(kol + 1)]) + triangle[rij - 1, kol]
        }
        triangle[rij,] <- NA
    }
    return(max(triangle, na.rm = TRUE))
}

# Euler Problem 18
triangle <- matrix(ncol = 15, nrow = 15)
triangle[1,1] <- 75
triangle[2,1:2] <- c(95, 64)
triangle[3,1:3] <- c(17, 47, 82)
triangle[4,1:4] <- c(18, 35, 87, 10)
triangle[5,1:5] <- c(20, 04, 82, 47, 65)
triangle[6,1:6] <- c(19, 01, 23, 75, 03, 34)
triangle[7,1:7] <- c(88, 02, 77, 73, 07, 63, 67)
triangle[8,1:8] <- c(99, 65, 04, 28, 06, 16, 70, 92)
triangle[9,1:9] <- c(41, 41, 26, 56, 83, 40, 80, 70, 33)
triangle[10,1:10] <- c(41, 48, 72, 33, 47, 32, 37, 16, 94, 29)
triangle[11,1:11] <- c(53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14)
triangle[12,1:12] <- c(70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57)
triangle[13,1:13] <- c(91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48)
triangle[14,1:14] <- c(63, 66, 04, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31)
triangle[15,1:15] <- c(04, 62, 98, 27, 23, 09, 70, 98, 73, 93, 38, 53, 60, 04, 23)

answer <- path.sum(triangle)
print(answer)

Euler Problem 67

The solution for problem number 67 is exactly the same. The data is read directly from the Project Euler website.

# Euler Problem 67
triangle.file <- read.delim("https://projecteuler.net/project/resources/p067_triangle.txt", stringsAsFactors = F, header = F)
triangle.67 <- matrix(nrow = 100, ncol = 100)
for (i in 1:100) {
    triangle.67[i,1:i] <- as.numeric(unlist(strsplit(triangle.file[i,], " ")))
}
answer <- path.sum(triangle.67)
print(answer)

View the latest version of this code on GitHub.

R-Cade Games: Simulating the Legendary Game of Pong

The legendary game of PongPong is one of the earliest arcade games on the market, first released in 1972. From the day I first saw this miracle box, I wanted to know more about computers.

I learnt how to write code from the 1983 book Dr. C. Wacko’s Miracle Guide to Designing and Programming your own Atari Computer Arcade Games. This book explains in a very clear and humorous way how to write computer games in Atari basic. I devoured this book and spent many hours developing silly games. This article is an ode to Dr Wacko, a computer geek’s midlife-crisis and an attempt to replicate the software I developed thirty years ago.

I showed in a previous post that R can be used for board games. The question is whether we create arcade games in R. My challenge is to recreate the look and feel of 1980s arcade games, or R-Cade games, using R? The code shown below simulates the legendary game of pong.

Playing Pong in R

The code is based on the Wacko’s Boing Program in the above-mentioned book. The R code is fully commented and speaks for itself. Please note that the animation is very clunky when you run it in RStudio. Only the native R Terminal displays the animation correctly.

Perhaps somebody can help me perfect this little ditty. I love to know how to read real-time USB input to control the game, so we get a step closer to the first R-Cade game.

The Pong Code

# Sound library
library(beepr) 

# Game parameters
skill <- 0.87 # Skill (0-1)
score <- 0
high.score <- 0

# Define playing field
par(mar = rep(1,4), bg = "black")
plot.new()
plot.window(xlim = c(0, 30), ylim = c(0, 30))
lines(c(1, 30, 30, 1), c(0, 0, 30, 30), type = "l", lwd = 5, col = "white")

# Playing field boundaries (depend on cex)
xmin <- 0.5
xmax <- 29.4
ymin <- 0.5
ymax <- 29.4

# initial position
x <- sample(5:25, 1)
y <- sample(5:25, 1)
points(x, y, pch = 15, col = "white", cex = 2)

# Paddle control
psize <- 4
ypaddle <- y

# Set direction
dx <- runif(1, .5, 1)
dy <- runif(1, .5, 1) 

# Game play 
while (x > xmin - 1) {
    sound <- 0 # Silence
    Sys.sleep(.05) # Pause screen. Reduce to increase speed
    points(x, y, pch = 15, col = "black", cex = 2) # Erase ball
    # Move ball
    x <- x + dx
    y <- y + dy 
    # Collision detection 
    if (x > xmax) {
        dx <- -dx * runif(1, .9, 1.1) # Bounce 
        if (x > xmin) x <- xmax # Boundary
        sound <- 10 # Set sound
        }
    if (y < ymin | y > ymax) {
        if (y < ymin) y <- ymin 
        if (y > ymax) y <- ymax
        dy <- -dy * runif(1, .9, 1.1)
        sound <- 10
    }
    # Caught by paddle?
    if (x < xmin & (y > ypaddle - (psize / 2)) & y < ypaddle + (psize / 2)) {
        if (x < xmin) x <- xmin
        dx <- -dx * runif(1, .9, 1.1)
        sound <- 2
        score <- score + 1
        }
    # Draw ball
    points(x, y, pch = 15, col = "white", cex = 2)
    if (sound !=0) beep(sound)
    # Move paddle
    if (runif(1, 0, 1) < skill) ypaddle <- ypaddle + dy # Imperfect follow
    # Draw paddle
    # Erase back line
    lines(c(0, 0), c(0, 30), type = "l", lwd = 8, col = "black")
    # Keep paddle inside window
    if (ypaddle < (psize / 2)) ypaddle <- (psize / 2) 
    if (ypaddle > 30 - (psize / 2)) ypaddle <- 30 - (psize / 2) 
    # Draw paddle 
    lines(c(0, 0), c(ypaddle - (psize / 2), ypaddle + (psize / 2)), type = "l", lwd = 8, col = "white") 
} 
beep(8) 
text(15,15, "GAME OVER", cex=5, col = "white") 
s <- ifelse(score == 1, "", "s")
text(15,5, paste0(score, " Point", s), cex=3, col = "white") 

View the latest version of this code on GitHub.

Data Science from a Strategic Business Perspective

Data science from a strategic business perspective - The MelbuRn R user group.Last night I spoke at the Melbourne R User Group (MelbuRn) about data science from a strategic business perspective. It was great to see so many people attending.

My presentation outlined the strategy that I developed and am implementing for my employer Coliban Water. This strategy is based on a common-sense approach that leverages our existing strengths. This strategy was also outlined in an article for the Source journal.

Water utilities are, pardon the pun, awash with data. For decades we have been using a rudimentary version of the Internet of Things called SCADA (Supervisory Control and Data Aquisition). This system controls our assets and provides operators and analysts with the needed data. All this data is used to control our systems and stored for future reference.

There is no such thing as ‘dark data’. All data has been used for its designated purpose when it was created. My job at Coliban Water is to create value from this information.

In this presentation, I explained how Coliban Water is creating more value from data by following a systematic strategy,

Data Science Strategy Presentation

Tic Tac Toe War Games: The Intelligent Minimax Algorithm

Tic Tac Toe War GamesIn a previous post, I shared how to build a randomised Tic Tac Toe simulation. The computer plays against itself playing at random positions. In this post, I will share how to teach the computer to play the game strategically.

I love the 1983 classic movie War Games. In this film, a computer plays Tic Tac Toe against itself to learn that it cannot win the game to prevent a nuclear war.

Back in those days, I devoured the wonderful book Writing Strategy Games on your Atari by John White which contains an algorithm to play Tic Tac Toe War Games. This is my attempt to relive the eighties using R.

You can find the code on my GitHub page.

Drawing the Board

A previous post describes the function that draws the Tic Tac Toe board. For completeness, the code is replicated below. The game board is a vector of length nine consisting of either -1 (X), 0 (empty field) or 1 (O). The vector indices correspond with locations on the game board:

1 2 3
4 5 6
7 8 9

draw.board &lt;- function(board) { # Draw the board
    xo &lt;- c(&quot;X&quot;, &quot; &quot;, &quot;O&quot;) # Symbols
    par(mar = rep(0,4))
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col=&quot;darkgrey&quot;, lwd = 4)
    abline(v = c(10, 20), col=&quot;darkgrey&quot;, lwd = 4)
    pieces &lt;- xo[board + 2]
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), pieces, cex = 6)
    # Identify location of any three in a row
    square &lt;- t(matrix(board, nrow = 3))
    hor &lt;- abs(rowSums(square))
    if (any(hor == 3)) 
      hor &lt;- (4 - which(hor == 3)) * 10 - 5 
    else 
      hor &lt;- 0
    ver &lt;- abs(colSums(square))
    if (any(ver == 3)) 
      ver &lt;- which(ver == 3) * 10 - 5 
    else
      ver &lt;- 0
    diag1 &lt;- sum(diag(square))
    diag2 &lt;- sum(diag(t(apply(square, 2, rev)))) # Draw winning lines if (hor &gt; 0) lines(c(0, 30), rep(hor, 2), lwd=10, col=&quot;red&quot;)
    if (ver &gt; 0) lines(rep(ver, 2), c(0, 30), lwd=10, col=&quot;red&quot;)
    if (abs(diag1) == 3) lines(c(2, 28), c(28, 2), lwd=10, col=&quot;red&quot;)
    if (abs(diag2) == 3) lines(c(2, 28), c(2, 28), lwd=10, col=&quot;red&quot;)
}

Human Players

This second code snippet lets a human player move by clicking anywhere on the graphic display using the locator function. The click location is converted to a number to denote the position on the board. The entered field is only accepted if it has not yet been used (the empty variable contains the available fields).

# Human player enters a move
move.human &lt;- function(game) {
    text(4, 0, &quot;Click on screen to move&quot;, col = &quot;grey&quot;, cex=.7)
    empty &lt;- which(game == 0)
    move &lt;- 0
    while (!move %in% empty) {
        coords &lt;- locator(n = 1) # add lines
        coords$x &lt;- floor(abs(coords$x) / 10) + 1
        coords$y &lt;- floor(abs(coords$y) / 10) + 1
        move &lt;- coords$x + 3 * (3 - coords$y)
    }
    return (move)
}

Evaluate the Game

This code snippet defines the eval.game function which assesses the current board and assigns a score. Zero means no outcome, -6 means that the X player has won and +6 implies that the O player has won.

# Evaluate board position
eval.game &lt;- function(game, player) {
    # Determine game score
    square &lt;- t(matrix(game, nrow = 3))
    hor &lt;- rowSums(square)
    ver &lt;- colSums(square)
    diag1 &lt;- sum(diag(square))
    diag2 &lt;- sum(diag(t(apply(square, 2, rev))))
    eval &lt;- c(hor, ver, diag1, diag2)
    # Determine best score
    minimax &lt;- ifelse(player == -1, &quot;min&quot;, &quot;max&quot;)
    best.score &lt;- do.call(minimax, list(eval))
    if (abs(best.score) == 3) best.score &lt;- best.score * 2
    return (best.score)
}

Computer Moves

The computer uses a modified Minimax Algorithm to determine its next move. This article from the Never Stop Building blog and the video below explain this method in great detail.

The next function determines the computer’s move. I have not used a brute-force minimax algorithm to save running time. I struggled building a fully recursive minimax function. Perhaps somebody can help me with this. This code looks only two steps deep and contains a strategic rule to maximise the score.

The first line stores the value of the players move, the second remainder of the matrix holds the evaluations of all the opponents moves. The code adds a randomised variable, based on the strategic value of a field. The centre has the highest value because it is part of four winning lines. Corners have three winning lines and the rest only two winning lines. This means that the computer will, all things being equal, favour the centre over the corners and favour the other fields least. The randomised variables in the code ensure that the computer does not always pick the same field in a similar situation.

# Determine computer move
move.computer &lt;- function(game, player) {
    empty &lt;- which(game == 0)
    eval &lt;- matrix(nrow = 10, ncol = 9, data = 0)
    for (i in empty) {
        game.tmp &lt;- game
        game.tmp[i] &lt;- player
        eval[1, i] &lt;- eval.game(game.tmp, player)
        empty.tmp &lt;- which(game.tmp ==0)
        for (j in empty.tmp) {
            game.tmp1 &lt;- game.tmp
            game.tmp1[j] &lt;- -player
            eval[(j + 1), i] &lt;- eval.game(game.tmp1, -player)
        }
    }
    if (!any(abs(eval[1,]) == 6)) { # When winning, play move
        # Analyse opponent move
        minimax &lt;- ifelse(player == -1, &quot;max&quot;, &quot;min&quot;) # Minimax
        best.opponent &lt;- apply(eval[-1,], 1, minimax)
        eval[1,] &lt;- eval[1,] * -player * best.opponent
    }
    # Add randomisation and strategic values
    board &lt;- c(3, 2, 3, 2, 4, 2, 3, 2, 3) # Strategic values
    board &lt;- sapply(board, function(x) runif(1, 0.1 * x, (0.1 * x) + 0.1)) # Randomise
    eval[1, empty] &lt;- eval[1, empty] + player * board[empty] # Randomise moves
    # Pick best game
    minimax &lt;- ifelse(player == -1, &quot;which.min&quot;, &quot;which.max&quot;) # Minimax
    move &lt;- do.call(minimax, list(eval[1,])) # Select best move
    return(move)
}

This last code snippet enables computers and humans play each other or themselves. The players vector contains the identity of the two players so that a human can play a computer or vice versa. The human player moves by clicking on the screen.

The loop keeps running until the board is full or a winner has been identified. A previous Tic Tac Toe post explains the draw.board function.

# Main game engine
tic.tac.toe &lt;- function(player1 = &quot;human&quot;, player2 = &quot;computer&quot;) {
    game &lt;- rep(0, 9) # Empty board
    winner &lt;- FALSE # Define winner
    player &lt;- 1 # First player
    players &lt;- c(player1, player2)
    draw.board(game)
    while (0 %in% game &amp; !winner) { # Keep playing until win or full board
        if (players[(player + 3) %% 3] == &quot;human&quot;) # Human player
            move &lt;- move.human(game)
        else # Computer player
            move &lt;- move.computer(game, player)
        game[move] &lt;- player # Change board
        draw.board(game)
        winner &lt;- max(eval.game(game, 1), abs(eval.game(game, -1))) == 6 # Winner, winner, chicken dinner?
        player &lt;- -player # Change player
    }
}

You can play the computer by running all functions and then entering tic.tac.toe().

I am pretty certain this simplified minimax algorithm is unbeatable—why don’t you try to win and let me know when you do.

Tic Tac Toe War Games

Now that this problem is solved, I can finally recreate the epic scene from the WarGames movie. The Tic Tac Toe War Games code uses the functions explained above and the animation package. Unfortunately, there are not many opportunities to create sound in R.

# WAR GAMES TIC TAC TOE
source(&quot;Tic Tac Toe/Tic Tac Toe.R&quot;)

# Draw the game board
draw.board.wargames &lt;- function(game) {
    xo &lt;- c(&quot;X&quot;, &quot; &quot;, &quot;O&quot;) # Symbols
    par(mar = rep(1,4), bg = &quot;#050811&quot;)
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col = &quot;#588fca&quot;, lwd = 20)
    abline(v = c(10, 20), col = &quot;#588fca&quot;, lwd = 20)
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), xo[game + 2], cex = 20, col = &quot;#588fca&quot;)
    text(1,0,&quot;r.prevos.net&quot;, col = &quot;#588fca&quot;, cex=2)
    # Identify location of any three in a row
    square &lt;- t(matrix(game, nrow = 3))
    hor &lt;- abs(rowSums(square))
    if (any(hor == 3)) 
        hor &lt;- (4 - which(hor == 3)) * 10 - 5 
    else 
        hor &lt;- 0
    ver &lt;- abs(colSums(square))
    if (any(ver == 3)) 
        ver &lt;- which(ver == 3) * 10 - 5 
    else
        ver &lt;- 0
    diag1 &lt;- sum(diag(square))
    diag2 &lt;- sum(diag(t(apply(square, 2, rev)))) # Draw winning lines if (all(hor &gt; 0)) for (i in hor) lines(c(0, 30), rep(i, 2), lwd = 10, col=&quot;#588fca&quot;)
    if (all(ver &gt; 0)) for (i in ver) lines(rep(i, 2), c(0, 30), lwd = 10, col=&quot;#588fca&quot;)
    if (abs(diag1) == 3) lines(c(2, 28), c(28, 2), lwd = 10, col = &quot;#588fca&quot;)
    if (abs(diag2) == 3) lines(c(2, 28), c(2, 28), lwd = 10, col = &quot;#588fca&quot;)
}

library(animation)
player &lt;- -1
games &lt;- 100
saveGIF ({
    for (i in 1:games) {
        game &lt;- rep(0, 9) # Empty board
        winner &lt;- 0 # Define winner
        #draw.board.wargames(game)
        while (0 %in% game &amp; !winner) { # Keep playing until win or full board
            empty &lt;- which(game == 0)
            move &lt;- move.computer(game, player)
            game[move] &lt;- player
            if (i &lt;= 12) draw.board.wargames(game)
            winner &lt;- max(eval.game(game, 1), abs(eval.game(game, -1))) == 6
            player &lt;- -player } if (i &gt; 12) draw.board.wargames(game)
    }
},
interval = c(unlist(lapply(seq(1, 0,-.2), function (x) rep(x, 9))), rep(0,9*94)), 
movie.name = &quot;wargames.gif&quot;, ani.width = 1024, ani.height = 1024)

View the latest version of this code on GitHub.

Data Pseudo-Science: Developing a Biorhythm Calculator

biorhythm: data pseudo-scienceData science is a serious occupation. Just like any other science, however, it can also be used for spurious topics, such as calculating your biorhythm.

This post provides an example of data Pseudo-Science though a function that calculates and visualises your biorhythm. Based on the graph, I must be having a great day right now.

The broader and more pertinent message in this post is that data pseudo-science is more common than you would think. Is our belief in machine learning justified or are some of these models also a pseudo-science with not much more reliability than a biorhythm?

Biorhythm Theory

The idea that our physical states follow a predetermined rhythm has been around as long as mathematics. The basic concept of biorhythm is that a regular sinusoid cycle accurately describes our physical, emotional and intellectual states. Each of these three cycles has a different wavelength (w ):

  • physical: w = 23 days
  • emotional: w = 28 days
  • intellectual: w = 33 days

The cycle is calculated with \sin (2 \pi t / w) , where t indicates the number of days since birth. This idea was developed by German surgeon Wilhelm Fliess in the late 19th century and was popularised in the United States in the late 1970s. There is no scientific evidence of the validity of this theory but it is an entertaining way to play with data.

The combination of the 23- and 28-day cycles repeats every 644 days, while the triple combination of 23-, 28-, and 33-day cycles repeats every 21,252 days, 58 years, two months and three weeks. You can, by the way, never reach a point where all cycles are maximised. The best you can achieve is 299.7 our of a maximum 300 which occurs when you are 17,003 days old.

Calculating your Biorhythm

When I was a teenager in the 1980s, several books and magazines described computer code to calculate your biorhythm. I used to play with these functions on my Atari 130XE computer.

Building a biorhythm calculator in R is easy. This function takes two dates as input and plots the biorhythm for the two weeks before and after the date. To calculate your biorhythm, run the function with your date of birth and target date: biorhythm(“yyyy-mm-dd”). The default version uses today as the target.

library(ggplot2)
library(reshape2)
biorhythm <- function(dob, target = Sys.Date()) {
    dob <- as.Date(dob)
    target <- as.Date(target)
    t <- round(as.numeric(difftime(target, dob)))
    days <- (t - 14) : (t + 14)
    period <- data.frame(Date = seq.Date(from = target - 15, by = 1, length.out = 29),
                         Physical = sin (2 * pi * days / 23) * 100, 
                         Emotional = sin (2 * pi * days / 28) * 100, 
                         Intellectual = sin (2 * pi * days / 33) * 100)
    period <- melt(period, id.vars = "Date", variable.name = "Biorhythm", value.name = "Percentage")
    ggplot(period, aes(x = Date, y = Percentage, col = Biorhythm)) + geom_line() +  
        ggtitle(paste("DoB:", format(dob, "%d %B %Y"))) + 
        geom_vline(xintercept = as.numeric(target))
}

biorhythm("1969-09-12", "2017-03-30")

View the latest version of this code on GitHub.

Biorhythms are an early attempt for human beings to predict the future. Although there is no relationship between this algorithm and reality, many people believed in its efficacy. Does the same hold true for the hyped capabilities of machine learning?

Data Pseudo-Science

Data pseudo-science is not only an issue when people use spurious mathematical relationships such as biorhythms or astrology. This post is also written as a warning not to only rely on numerical models to predict qualitative aspects of life.

The recent failures in predicting the results of elections, even days before the event, are a case in point. There are many reasons machine learning methods can go wrong. When machine learning algorithms fail, they are often just as useful as a biorhythm. It would be fun to write a predictive analysis package for R using only pseudoscientific approaches such as I-Ching, astrology or biorhythm.

Euler Problem 17: Number Letter Counts

Euler Problem 17: written numbersEuler Problem 17 asks to count the letters in numbers written as words. This is a skill we all learnt in primary school mainly useful when writing cheques—to those that still use them.

Each language has its own rules for writing numbers. My native language Dutch has very different logic to English. Both Dutch and English use compound words after the number twelve.

Linguists have theorised this is evidence that early Germanic numbers were duodecimal. This factoid is supported by the importance of a “dozen” as a counting word and the twelve hours in the clock. There is even a Dozenal Society that promotes the use of a number system based on 12.

The English language changes the rules when reaching the number 21. While we say eight-teen in English, we do no say “one-twenty”. Dutch stays consistent and the last number is always spoken first. For example, 37 in English is “thirty-seven”, while in Dutch it is written as “zevenendertig” (seven and thirty).

Euler Problem 17 Definition

If the numbers 1 to 5 are written out in words: one, two, three, four, five, then there are 3 + 3 + 5 + 4 + 4 = 19 letters used in total. If all the numbers from 1 to 1000 (one thousand) inclusive were written out in words, how many letters would be used?

NOTE: Do not count spaces or hyphens. For example, 342 (three hundred and forty-two) contains 23 letters and 115 (one hundred and fifteen) contains 20 letters. The use of “and” when writing out numbers is in compliance with British usage.

Solution

The first piece of code provides a function that generates the words for numbers 1 to 999,999. This is more than the problem asks for, but it might be a useful function for another application. The last line concatenates all words together and removes the spaces.

numword.en <- function(x) { if (x > 999999) return("Error: Oustide my vocabulary")
    # Vocabulary 
    single <- c("one", "two", "three", "four", "five", "six", "seven", "eight", "nine")
    teens <- c( "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen")
    tens <- c("ten", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety")
    # Translation
    numword.10 <- function (y) {
        a <- y %% 100
        if (a != 0) {
            and <- ifelse(y > 100, "and", "")
            if (a < 20)
                return (c(and, c(single, teens)[a]))
            else
                return (c(and, tens[floor(a / 10)], single[a %% 10]))
        }
    }
    numword.100 <- function (y) {
        a <- (floor(y / 100) %% 100) %% 10
        if (a != 0)
            return (c(single[a], "hundred"))
    }
    numword.1000 <- function(y) {
        a <- (1000 * floor(y / 1000)) / 1000
        if (a != 0)
            return (c(numword.100(a), numword.10(a), "thousand"))
    }
    numword <- paste(c(numword.1000(x), numword.100(x), numword.10(x)), collapse=" ")
    return (trimws(numword))
}

answer <- nchar(gsub(" ", "", paste0(sapply(1:1000, numword.en), collapse="")))
print(answer)

Writing Numbers in Dutch

I went beyond Euler Problem 17 by translating the code to spell numbers in Dutch. Interesting bit of trivia is that it takes 307 fewer characters to spell the numbers 1 to 1000 in Dutch than it does in English.

It would be good if other people can submit functions for other languages in the comment section. Perhaps we can create an R package with a multi-lingual function for spelling numbers.

numword.nl <- function(x) {
    if (x > 999999) return("Error: Getal te hoog.")
    single <- c("een", "twee", "drie", "vier", "vijf", "zes", "zeven", "acht", "negen")
    teens <- c( "tien", "elf", "twaalf", "dertien", "veertien", "fifteen", "zestien", "zeventien", "achtien", "negentien")
    tens <- c("tien", "twintig", "dertig", "veertig", "vijftig", "zestig", "zeventig", "tachtig", "negengtig")
    numword.10 <- function(y) {
        a <- y %% 100
        if (a != 0) {
            if (a < 20)
                return (c(single, teens)[a])
            else
                return (c(single[a %% 10], "en", tens[floor(a / 10)]))
        }
    }
    numword.100 <- function(y) {
        a <- (floor(y / 100) %% 100) %% 10
        if (a == 1)
            return ("honderd")
        if (a > 1) 
            return (c(single[a], "honderd"))
    }
    numword.1000 <- function(y) {
        a <- (1000 * floor(y / 1000)) / 1000
        if (a == 1)
            return ("duizend ")
        if (a > 0)
            return (c(numword.100(a), numword.10(a), "duizend "))
    }
    numword<- paste(c(numword.1000(x), numword.100(x), numword.10(x)), collapse="")
    return (trimws(numword))
}

antwoord <- nchar(gsub(" ", "", paste0(sapply(1:1000, numword.nl), collapse="")))
print(antwoord)

print(answer - antwoord)

View the latest version of this code on GitHub.

Euler Problem 16: Power Digit Sum

Euler Problem 16: Power Digit SumEuler Problem 16 is reminiscent of the famous fable of wheat and chess. Lahur Sessa invented the game of chess for King Iadava. The king was very pleased with the game and asked Lahur to name his reward.

Lahur asked the king to place one grain of rice on the first square of a chessboard, two on the next square, four on the third square and so on until the board is filled. The king was happy with his humble request until his mathematicians worked out that it would take millions of tonnes of grain. Assuming there are 25 grains of wheat in a gramme, the last field will contain more than 461,168,602,000 tonnes of grain.

Euler Problem 16 Definition

2^{15} = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26 . What is the sum of the digits of the number 2^{1000} ?

Solution

The most straightforward solution uses the GMP package for Multiple Precision Arithmetic to calculate big integers. The as.bigz function results in a special class of arbitrarily large integer numbers

# Raise 2 to the power 1000
library(gmp)
digits <- as.bigz(2^1000) # Define number
# Sum all digits
answer <- sum(as.numeric(unlist(strsplit(as.character(digits), ""))))
print(answer)

We can also solve this problem in base-r with the bigg.add function which I developed for Euler Problem 13. This function uses basic string operations to add to arbitrarily large numbers. Raising a number to the power of two can also be written as a series of additions:

2^4 = 2 \times 2 \times 2 \times 2 = ((2+2)+(2+2)) + ((2+2)+(2+2))

The solution to this problem is to add 2 + 2 then add the outcome of that equation to itself, and so on. Repeat this one thousand times to raise the number two to the power of one thousand.

# Raise 2 to the power 1000
pow <- 2
for (i in 2:1000)
    pow <- big.add(pow, pow)
# Sum all digits
answer <- sum(as.numeric(unlist(strsplit(pow, ""))))
print(answer)

View the latest version of this code on GitHub.

Tic Tac Toe Simulation — Random Moves

Tic Tac Toe Simulation - Part 1Tic Tac Toe might be a futile children’s game but it can also teach us about artificial intelligence. Tic Tac Toe, or Naughts and Crosses, is a zero-sum game with perfect information. Both players know exactly what the other did and when nobody makes a mistake, the game will always end in a draw.

Tic Tac Toe is a simple game but also the much more complex game of chess is a zero-sum game with perfect information.

In this two-part post, I will build an unbeatable Tic Tac Toe Simulation. This first part deals with the mechanics of the game. The second post will present an algorithm for a perfect game.

Drawing the Board

This first code snippet draws the Tic Tac Toe simulation board. The variable xo holds the identity of the pieces and the vector board holds the current game. Player X is denoted with -1 and player O with +1. The first part of the function draws the board and the naughts and crosses. The second part of the code check for three in a row and draws the corresponding line.

draw.board <- function(board) { # Draw the board
    xo <- c("X", " ", "O") # Symbols
    par(mar = rep(0,4))
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col="darkgrey", lwd = 4)
    abline(v = c(10, 20), col="darkgrey", lwd = 4)
    pieces <- xo[board + 2]
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), pieces, cex = 6)
    # Identify location of any three in a row
    square <- t(matrix(board, nrow = 3))
    hor <- abs(rowSums(square))
    if (any(hor == 3)) 
      hor <- (4 - which(hor == 3)) * 10 - 5 
    else 
      hor <- 0
    ver <- abs(colSums(square))
    if (any(ver == 3)) 
      ver <- which(ver == 3) * 10 - 5 
    else
      ver <- 0
    diag1 <- sum(diag(square))
    diag2 <- sum(diag(t(apply(square, 2, rev)))) 
    # Draw winning lines 
    if (hor > 0) lines(c(0, 30), rep(hor, 2), lwd=10, col="red")
    if (ver > 0) lines(rep(ver, 2), c(0, 30), lwd=10, col="red")
    if (abs(diag1) == 3) lines(c(2, 28), c(28, 2), lwd=10, col="red")
    if (abs(diag2) == 3) lines(c(2, 28), c(2, 28), lwd=10, col="red")
}

Random Tic Tac Toe

The second part of the code generates ten random games and creates and animated GIF-file. The code adds random moves until one of the players wins (winner <> 0) or the board is full (no zeroes in the game vector). The eval.winner function checks for three in a row and declares a winner when found.

There are 255,168 possible legal games in Tic Tac Toe, 46,080 of which end in a draw. This implies that these randomised games result in a draw 18% of the time.

eval.winner <- function(board) { # Identify winner
    square <- t(matrix(board, nrow = 3))
    hor <- rowSums(square)
    ver <- colSums(square)
    diag1 <- sum(diag(square))
    diag2 <- sum(diag(t(apply(square, 2, rev))))
    if (3 %in% c(hor, ver, diag1, diag2)) return (1)
    else
        if (-3 %in% c(hor, ver, diag1, diag2)) return (2)
    else
        return(0)
}

# Random game
library(animation)
saveGIF ({
 for (i in 1:10) {
 game <- rep(0, 9) # Empty board
 winner <- 0 # Define winner
 player <- -1 # First player
 draw.board(game)
 while (0 %in% game & winner == 0) { # Keep playing until win or full board
   empty <- which(game == 0) # Define empty squares
   move <- empty[sample(length(empty), 1)] # Random move
   game[move] <- player # Change board
   draw.board(game)
   winner <- eval.winner(game) # Evaulate game
   player <- player * -1 # Change player
 }
 draw.board(game)
 }
 },
 interval = 0.25, movie.name = "ttt.gif", ani.width = 600, ani.height = 600)

View the latest version of this code on GitHub.

Tic Tac Toe Simulation

In a future post, I will outline how to program the computer to play against itself, just like in the 1983 movie War Games.

Euler Problem 15: Pathways Through a Lattice

Euler Problem 15 analyses taxicab geometry. This system replaces the usual distance function with the sum of the absolute differences of their Cartesian coordinates. In other words, the distance a taxi would travel in a grid plan. The fifteenth Euler problem asks to determine the number of possible routes a taxi can take in a city of a certain size.

Euler Problem 15 Definition

Starting in the top left corner of a 2×2 grid, and only being able to move to the right and down, there are exactly 6 routes to the bottom right corner. How many possible routes are there through a 20×20 grid?

Euler Problem 15: Lattice Paths

Solution

The defined lattice is one larger than the number of squares. Along the edges of the matrix, only one pathway is possible: straight to the right or down. We can calculate the number of possible pathways for the remaining number by adding the number to the right and below the point.

p_{i,j}=p_{i,j-1}+p_{{i+1},j}

For the two by two lattice the solution space is:

6  3  1
3  2  1
1  1  0

The total number of pathways from the upper left corner to the lower right corner is thus 6. This logic can now be applied to a grid of any arbitrary size using the following code.

The code defines the lattice and initiates the boundary conditions. The bottom row and the right column are filled with 1 as there is only one solution from these points. The code then calculates the pathways by working backwards through the matrix. The final solution is the number is the first cell.

# Define lattice
nLattice <- 20
lattice = matrix(ncol=nLattice + 1, nrow=nLattice + 1)

# Boundary conditions
lattice[nLattice + 1,-(nLattice + 1)] <- 1
lattice[-(nLattice + 1), nLattice + 1] <- 1

# Calculate Pathways
for (i in nLattice:1) {
    for (j in nLattice:1) {
        lattice[i,j] <- lattice[i+1, j] + lattice[i, j+1]
    }
}

answer <- lattice[1,1]

View the latest version of this code on GitHub.

Taxicab Geometry