Large integers in R: Fibonacci number with 1000 digits, Euler Problem 25

The Fibonacci Sequence in nature: The nautilus shell. Euler problem 25

The Fibonacci Sequence occurs in nature: The nautilus shell.

Euler Problem 25 takes us back to the Fibonacci sequence and the problems related to working with very large integers.

The Fibonacci sequence follows a simple mathematical rule but it can create things of great beauty. This pattern occurs quite often in nature, like to nautilus shell shown in the image. The video by Arthur Benjamin at the end of this post illustrates some of the magic of this sequence.

Large Integers in R

By default, numbers with more than 7 digits are shown in scientific notation in R, which reduces the accuracy of the calculation. You can change the precision of large integers with the options function but R struggles with integers with more than 22 digits. This example illustrates this issue.

factorial(24)
[1] 6.204484e+23
> options(digits=22)
> factorial(24)
[1] 620448401733239409999872

However, finding a number of 1000 digits is a problem with using special functions.

Euler Problem 25 Definition

The Fibonacci sequence is defined by the recurrence relation:

F_n = F_{n-1} + F_{n-2} , where F_1 = 1 and F_2 = 1 . The 12th term, F_{12} , is the first term to contain three digits.

What is the index of the first term in the Fibonacci sequence to contain 1000 digits?

Proposed Solutions

The first solution uses the GMP library to manage very large integers. This library also contains a function to generate Fibonacci numbers. This solution cycles through the Fibonacci sequence until it finds a number with 1000 digits.

library(gmp) # GNU Multiple Precision Arithmetic Library
n <- 1
fib <- 1
while (nchar(as.character(fib)) < 1000) {
   fib <- fibnum(n) # Determine next Fibonacci number
   n <- n + 1
}
answer <- n
print(answer)

This is a very fast solution but my aim is to solve the first 100 Project Euler problems using only base-R code. The big.add function I developed to solve Euler Problem 13.

t <- proc.time()
fib <- 1 # First Fibonaci number
cur <- 1 # Current number in sequence
pre <- 1 # Previous number in sequence
index <- 2
while (nchar(fib) < 1000) {
    fib <- big.add(cur, pre) # Determine next Fibonacci number
    pre <- cur
    cur <- fib
    index <- index + 1
}
answer <- index
print(answer)

This code is much slower than the GMP library but is was fun to develop.

The Magic of the Fibonacci Numbers

Lexicographic Permutations: Euler Problem 24

Euler Problem 24 asks to develop lexicographic permutations which are ordered arrangements of objects in lexicographic order. Tushar Roy of Coding Made Simple has shared a great introduction on how to generate lexicographic permutations.

Euler Problem 24 Definition

A permutation is an ordered arrangement of objects. For example, 3124 is one possible permutation of the digits 1, 2, 3 and 4. If all of the permutations are listed numerically or alphabetically, we call it lexicographic order. The lexicographic permutations of 0, 1 and 2 are:

012 021 102 120 201 210

What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?

Brute Force Solution

The digits 0 to 9 have 10! = 3628800 permutations (including combinations that start with 0). Most of these permutations are, however, not in lexicographic order. A brute-force way to solve the problem is to determine the next lexicographic permutation of a number string and repeat this one million times.

nextPerm <- function(a) {
    # Find longest non-increasing suffix
    i <- length(a) while (i > 1 && a[i - 1] >= a[i])
        i <- i - 1
    # i is the head index of the suffix
    # Are we at the last permutation?
    if (i <= 1) return (NA)
    # a[i - 1] is the pivot
    # Find rightmost element that exceeds the pivot
    j <- length(a)
    while (a[j] <= a[i - 1]) 
        j <- j - 1
    # Swap pivot with j
    temp <- a[i - 1]
    a[i - 1] <- a[j]
    a[j] <- temp
    # Reverse the suffix
    a[i:length(a)] <- rev(a[i:length(a)])
    return(a)
    }

numbers <- 0:9
for (i in 1:(1E6 - 1)) numbers <- nextPerm(numbers)
answer <- numbers
print(answer)

This code takes the following steps:

  1. Find largest index i such that a_{i-1} < a_i.
    1. If no such index exists, then this is already the last permutation.
  2. Find largest index j such that j \geq i and a_j > a_{i-1}.
  3. Swap a_j and a_{i-1}.
  4. Reverse the suffix starting at a_i.

Combinatorics

A more efficient solution is to use combinatorics, thanks to MathBlog. The last nine digits can be ordered in 9! = 362880 ways. So the first 9! permutations start with a 0. By extending this thought, it follows that the millionth permutation must start with a 2.

\lfloor (1000000 - 1) / 9! \rfloor  = 2

From this rule, it follows that the 725761st permutation is 2013456789. We now need 274239 more lexicographic permutations:

(1000000 - 1) - (2 \times 9!) = 274239

We can repeat this logic to find the next digit. The last 8 digits can be ordered in 40320 ways. The second digit is the 6th digit in the remaining numbers, which is 7 (2013456789).

\lfloor 274239 / 8! \rfloor  = 6

274239 - (6 \times 7!) = 32319

This process is repeated until all digits have been used.

numbers <- 0:9
n <- length(numbers)
answer <- vector(length = 10)
remain <- 1E6 - 1
for (i in 1:n) {
    j <- floor(remain / factorial(n - i))
    answer[i] <- numbers[j + 1]
    remain <- remain %% factorial(n - i)
    numbers <- numbers[-(j + 1)]
}
answer <- paste(answer, collapse = "")
print(answer)

R blogger Tony’s Bubble Universe created a generalised function to solve this problem a few years ago.

Tic Tac Toe Part 3: The Minimax Algorithm

Tic Tac Toe minimax alogrithmIn two previous posts, I presented code to teach R to play the trivial game of Tic Tac Toe. I thought this was an unbeatable algorithm. Alas, a comment from Alberto shattered my pride as he was able to beat my code.

The reason for the demise of my code was that I didn’t implement a full minimax algorithm, but instead looked only two moves ahead. I thought the common strategy rules (start in the centre and occupy the corners) would make the program unbeatable. When I simulated the game by instructing the computer to play against itself, Alberto’s strategy never arose because the code forces the centre field. Alberto’s code shows that you need to look at least three moves ahead for a perfect game. He has been so kind to share his code and gave me permission to publish it.

Alberto recreated two functions, for completeness I have added the complete working code that merges his improvements with my earlier work. The first two functions are identical to the previous post. These functions draw the game board and process the human player’s move by waiting for a mouse click.

# Draw the game board
draw.board <- function(game) {
    xo <- c("X", " ", "O") # Symbols
    par(mar = rep(1,4))
    plot.new()
    plot.window(xlim = c(0,30), ylim = c(0,30))
    abline(h = c(10, 20), col="darkgrey", lwd = 4)
    abline(v = c(10, 20), col="darkgrey", lwd = 4)
    text(rep(c(5, 15, 25), 3), c(rep(25, 3), rep(15,3), rep(5, 3)), xo[game + 2], cex = 4)
    # Identify location of any three in a row
    square <- t(matrix(game, nrow = 3))
    hor <- abs(rowSums(square))
    if (any(hor == 3)) 
        hor <- (4 - which(hor == 3)) * 10 - 5 
    else 
        hor <- 0
    ver <- abs(colSums(square))
    if (any(ver == 3)) 
        ver <- which(ver == 3) * 10 - 5 
    else
        ver <- 0
    diag1 <- sum(diag(square))
    diag2 <- sum(diag(t(apply(square, 2, rev)))) 
    # Draw winning lines
    if (all(hor > 0)) 
        for (i in hor) lines(c(0, 30), rep(i, 2), lwd = 10, col="red")
    if (all(ver > 0)) 
        for (i in ver) lines(rep(i, 2), c(0, 30), lwd = 10, col="red")
    if (abs(diag1) == 3) 
        lines(c(2, 28), c(28, 2), lwd = 10, col = "red")
    if (abs(diag2) == 3) 
        lines(c(2, 28), c(2, 28), lwd = 10, col = "red")
}

# Human player enters a move
move.human <- function(game) {
    text(4, 0, "Click on screen to move", col = "grey", cex=.7)
    empty <- which(game == 0)
    move <- 0
    while (!move %in% empty) {
        coords <- locator(n = 1) # add lines
        coords$x <- floor(abs(coords$x) / 10) + 1
        coords$y <- floor(abs(coords$y) / 10) + 1
        move <- coords$x + 3 * (3 - coords$y)
    }
    return (move)
}

Alberto rewrote the functions that analyse the board and determine the move of the computer. The ganador (Spanish for winning) function assesses the board condition by assigning -10 or + 10 for a winning game and 0 for any other situation.

ganador <- function(juego, player) {
    game <- matrix(juego, nrow = 3, byrow = T)
    hor <- rowSums(game)
    ver <- colSums(game)
    diag <- c(sum(diag(game)), sum(diag(apply(game, 1, rev))))
    if (-3 %in% c(hor, ver, diag))
        return(-10)
    if (3 %in% c(hor, ver, diag))
        return(10)
    else
        return(0)
}

The next function is the actual minimax algorithm. If the computer starts then the first move (9!= 362880 options to assess) takes a little while. The commented lines can be used to force a corner and make the games faster by forcing a random corner.

The minimax function returns a list with the move and its valuation through the ganador function. The function works recursively until it has filled the board and retains the best scoring move using the minimax method. To avoid the computer always playing the same move in the same situation random variables are added.

minimax <- function(juego, player) {
    free <- which(juego == 0)
    if(length(free) == 1) {
        juego[free] <- player
        return(list(move = free, U = ganador(juego, player)))
    }
    poss.results <- rep(0, 9)
    for(i in free) {
        game <- juego
        game[i] <- player
        poss.results[i] <- ganador(game, player)
    }
    mm <- ifelse(player == -1, "which.min", "which.max")
    if(any(poss.results == (player * 10))) {
        move <- do.call(mm, list(poss.results))
        return(list(move = move, U = poss.results[move]))
    }
    for(i in free) {
        game <- juego
        game[i] <- player
        poss.results[i] <- minimax(game, -player)$U
    }
    random <- runif(9, 0, 0.1)
    poss.results[-free] <- 100 * -player
    poss.results <- poss.results + (player * random)
    move <- do.call(mm, list(poss.results))
    return(list(move = move, U = poss.results[move]))
}

This final function stitches everything together and lets you play the game. Simply paste all functions in your R console and run them to play a game. The tic.tac.toe function can take two parameters, “human” and/or “computer”. The order of the parameters determines who starts the game.

# Main game engine
tic.tac.toe <- function(player1 = "human", player2 = "computer") {
    game <- rep(0, 9) # Empty board
    winner <- FALSE # Define winner
    player <- 1 # First player
    players <- c(player1, player2)
    draw.board(game)
    while (0 %in% game & !winner) { # Keep playing until win or full board
        if (players[(player + 3) %% 3] == "human") # Human player
            move <- move.human(game)
        else { # Computer player
            move <- minimax(game, player)
            move <- move$move
            }
        game[move] <- player # Change board
        draw.board(game)
        winner <- max(eval.game(game, 1), abs(eval.game(game, -1))) == 6 # Winner, winner, chicken dinner?
        player <- -player # Change player
    }
}

tic.tac.toe()

This is my last word on Tic Tac Toe but now that the minimax conundrum is solved I could start working on other similar games such as Connect Four, Draughts or even the royal game of Chess.

Euler Problem 23: Non-Abundant Sums

A demonstration of the abundance of the number 12 using Cuisenaire rods (Euler Problem 23)

A demonstration of the abundance of the number 12 using Cuisenaire rods (Wikipedia).

Euler problem 23 asks us to solve a problem with abundant or excessive numbers.

These are numbers for which the sum of its proper divisors is greater than the number itself.

12 is an abundant number because the sum of its proper divisors (the aliquot sum) is larger than 12: (1 + 2 + 3 + 4 + 6 = 16).

All highly composite numbers or anti-primes greater than six are abundant numbers. These are numbers that have so many divisors that they are considered the opposite of primes, as explained in the Numberphile video below.

Euler Problem 23 Definition

A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of 28 would be 1 + 2 + 4 + 7 + 14 = 28, which means that 28 is a perfect number.

A number n is called deficient if the sum of its proper divisors is less than n and it is called abundant if this sum exceeds n.

As 12 is the smallest abundant number, 1 + 2 + 3 + 4 + 6 = 16, the smallest number that can be written as the sum of two abundant numbers is 24. By mathematical analysis, it can be shown that all integers greater than 28123 can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis, even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.

Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.

Solution

This solution repurposes the divisors function that determines the proper divisors for a number, introduced for Euler Problem 21. The first code snippet creates the sequence of all abundant numbers up to 28123 (sequence A005101 in the OEIS). An abundant number is one where its aliquot sum is larger than n.

# Generate abundant numbers (OEIS A005101)
A005101 <- function(x){
    abundant <- vector()
    a <- 1
    for (n in 1:x) {
        aliquot.sum <- sum(proper.divisors(n)) - n 
        if (aliquot.sum > n) {
            abundant[a] <- n
            a <- a + 1
        }
    }
    return(abundant)
}

abundant <- A005101(28123)

The solution to this problem is also a sequence in the Online Encyclopedia of Integer Sequences (OEIS A048242). This page states that the highest number in this sequence is 20161, not 28123 as stated in the problem definition.

The second section of code creates a list of all potential numbers not the sum of two abundant numbers. The next bit of code sieves any sum of two abundant numbers from the list. The answer is determined by adding remaining numbers in the sequence.

# Create a list of potential numbers that are not the sum of two abundant numbers
A048242 <- 1:20161

# Remove any number that is the sum of two abundant numbers
for (i in 1:length(abundant)) {
    for (j in i:length(abundant)) {
        if (abundant[i] + abundant[j] <= 20161) {
            A048242[abundant[i] + abundant[j]] <- NA
        }
    }
}
A048242 <- A048242[!is.na(A048242)]
answer <- sum(A048242)
print(answer)

The Sierpinski Triangle: Visualising infinity in R

Sierpinski triangleWacław Sierpiński was a mathematical genius who developed several of the earliest fractals. The Sierpiński triangle is an easy to conceptualise geometrical figure but it hides a fascinating mathematical complexity. Start by drawing an equilateral triangle and draw another one in its centre. Then draw equilateral triangles in the four resulting triangles, and so on, ad infinitum.

The original Sierpinski triangle will eventually disappear into Cantor dust, a cloud of ever shrinking triangles of infinitesimal size. The triangle is self-similar, no matter how far you zoom in, the basic geometry remains the same.

The Chaos Game

A fascinating method to create a Sierpinski Triangle is a chaos game. This method uses random numbers and some simple arithmetic rules. Sierpinski Triangles can be created using the following six steps:

  1. Define three points in a plane to form a triangle.
  2. Randomly select any point on the plane.
  3. Randomly select any one of the three triangle points.
  4. Move half the distance from your current position to the selected vertex.
  5. Plot the current position.
  6. Repeat from step 3.

This fractal is an implementation of chaos theory as this random process attracts to a complex ordered geometry. The game only works with random numbers and when selecting random vertices of the triangle.

Sierpinski Triangle Code

This code implements the six rules in R. The code first initializes the triangle, defines a random starting point and then runs a loop to place random dots. The R plot engine does not draw pixels but uses characters, which implies that the diagram is not as accurate as it could be but the general principle is clear. The x(11) and Sys.sleep() commands are used to plot during the for-loop.

# Sierpinsky Triangle

# Initialise triangle
p <- c(0, 500, 1000)
q <- c(0, 1000, 0)
x11()
par(mar = rep(0, 4))
plot(p, q, col= "red", pch = 15, cex = 1, axes = FALSE)

# Random starting point
x <- sample(0:1000, 1)
y <- sample(0:1000, 1)

# Chaos game
for (i in 1:10000) {
    Sys.sleep(.001)
    n <- sample(1:3, 1)
    x <- floor(x + (p[n] - x) / 2)
    y <- floor(y + (q[n] - y) / 2)
    points(x, y, pch = 15, cex = 0.5)
}

This algorithm demonstrates how a seemingly chaotic process can result in order. Many other versions of chaos games exist, which I leave to the reader to play with. If you create your own versions then please share the code in the comment box below.

Euler Problem 22 : Names Scores

Euler Problem 22

R logo in ASCII art by picascii.com

Euler problem 22 is another trivial one that takes us to the realm of ASCII codes. ASCII is a method to convert symbols into numbers, originally invented for telegraphs.

Back in the 8-bit days, ASCII art was a method to create images without using lots of memory. Each image consists of a collection of text characters that give the illusion of an image. Euler problem 22 is, unfortunately, a bit less poetic.

Euler Problem 22 Definition

Using names.txt, a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49,714.

What is the total of all the name scores in the file?

Solution

This code reads and cleans the file and sorts the names alphabetically. The charToRaw function determines the numerical value of each character in each name. This output of this function is the hex ASCII code for each character. The letter A is number 65, so we subtract 64 from each value to get the sum total.

# ETL: reads the file and converts it to an ordered vector.
names <- readLines("https://projecteuler.net/project/resources/p022_names.txt", warn = F)
names <- unlist(strsplit(names, ","))
names <- gsub("[[:punct:]]", "", names)
names <- sort(names)

# Total Name scores
answer <- 0
for (i in names) {
    value <- sum(sapply(unlist(strsplit(i, "")), function(x) as.numeric(charToRaw(x)) - 64))
    value <- value * which(names==i)
    answer <- answer + value
}
print(answer)

We can have a bit more fun with this problem by comparing this list with the most popular baby names in 2016. The first section of the code extracts the list of popular names from the website. The rest of the code counts the number of matches between the lists.

# Most popular baby names
library(rvest)
url <- "https://www.babycenter.com/top-baby-names-2016.htm"
babynames <- url %>%
    read_html() %>%
    html_nodes(xpath = '//*[@id="babyNameList"]/table') %>%
    html_table()
babynames <- babynames[[1]]

# Convert Project Euler list and test for matches
proper=function(x) paste0(toupper(substr(x, 1, 1)), tolower(substring(x, 2)))
names <- proper(names)

sum(babynames$GIRLS %in% names)
sum(babynames$BOYS %in% names)

Euler Problem 21: Amicable Numbers

Euler Problem 21: Amicable numbers key chainEuler problem 21 takes us to the realm of amicable numbers, which are listed in sequence A259180 in the OEIS. Amicable, or friendly, numbers are the most romantic numbers known to maths. Amicable numbers serve absolutely no practical purpose, other than mathematical entertainment.

A related concept is a perfect number, which is a number that equals the sum of its proper divisors. Mathematicians have also defined sociable numbers and betrothed numbers which are similar to amicable numbers. But perhaps these are for another Euler problem.

Euler Problem 21 Definition

Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n). If d(a) = b and d(b) = a, where a \neq b, then a and b are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so, d(284) = 220.

Evaluate the sum of all the amicable numbers under 10000.

Solution

The first part of the code provides for a function to list all proper divisors for a given integer x. The loop determines the divisors for the numbers 220 to 10,000, calculates their sum and then checks if these numbers are amicable. When the code finds an amicable number, the counter jumps to the sum of the divisors to check for the next one.

proper.divisors <- function(x) {
    divisors <- vector()
    d <- 1
    for (i in 1:floor(sqrt(x))) {
        if (x %% i == 0) {
            divisors[d] <- i
            if (i != x/i) {
                d <- d + 1
                divisors[d] <- x / i
            }
            d <- d + 1
        }
    }
    return(divisors)
}

answer <- 0
n <- 220
while (n <= 10000) {
    div.sum <- sum(proper.divisors(n)) - n
    if (n == sum(proper.divisors(div.sum)) - div.sum & n != div.sum) {
        answer <- answer + n + div.sum
        print(paste0("(", n, ",", div.sum, ")"))
        n <- div.sum
    }
    n <- n + 1
}
print(answer)

Amicable numbers were known to the Pythagoreans, who credited them with many mystical properties. Before we had access to computers, finding amicable numbers was a task that required a lot of patience. No algorithm can systematically generate all amicable numbers, and until 1946 only 390 pairs were known. Medieval Muslim mathematicians developed several formulas to create amicable numbers, but the only way to be complete is using brute force.

Euler Problem 20: Large Integer Factorials

Euler Problem 20Euler Problem 20 is the third problem that requires special consideration for working with very large integers. In this problem, we look at factorials. These numbers are useful in combinatorics if, for example, you like to know in how many ways you can arrange a deck of cards.

The problem with computing factorials is that they are mostly very large numbers, beyond the generic capabilities of computers to process. This problem can be solved using a specialised R package and using only base-R code.

Euler Problem 20 Definition

n! = n \times (n - 1) \times (n-2) \times \ldots \times 3 \times 2 \times 1 .

For example:  10! = 10 \times 9 \times \ldots \times 3 \times 2 \times 1 = 3628800 .

The sum of the digits in the number 10! is 3 + 6 + 2 + 8 + 8 + 0 + 0 = 27 .

Find the sum of the digits in the number 100!

Euler Problem 20 Solution

The factorial of the number 100 contains 158 digits, which is a lot more digits than a 64-bit operating system can accurately produce. Using the standard function: factorial(100) = 9.332622e+157. Without using a specialised algorithm, we cannot determine the sum of all digits. We need to deploy arbitrary-precision arithmetic to solve this problem.

Many computer languages, including R, have special libraries to deal with such large numbers. The GMP Multiple Precision Arithmetic package renders this problem almost trivial.

library(gmp)
digits <- factorialZ(100)
digits <- as.character(digits)
answer <- sum(as.numeric(unlist(strsplit(digits, ""))))

Base-R Code

The problem becomes more interesting when only using basic R code. I developed the big.add function to solve Euler Problem 13 through the addition of very large integers. We can extend this function to also calculate factorials. A factorial can be replaced by a series of additions, for example:

3! = 1 \times 2 \times 3 = (((1+1) + (1+1)) + (1+1))

This can be mimicked in R using the Reduce function. This function reduces a vector to a single value by recursively calling a function. Reduce(“+”, rep(4, 5)) is the same as:

4 \times 5 = ((((4 + 4) + 4) + 4) + 4) = 20

Using a loop, we can use the Reduce function to calculate a factorial, using only additions:

fact <- 1
x <- 100
for (i in 2:x) {
    fact <- Reduce("+", rep(fact, i))
}
print(fact)

The big.factorial function below implements this idea by combining the big.add and Reduce functions to calculate large integer factorials. The function returns a value of 1 when the factorial of 0 or 1 is requested. This function does not calculate the Gamma-function for fractions. For all other values, it goes through a loop from 2 to the requested factorial. The temporary values are stored in the bf variable. The code loops through the factorials by using the result of the previous Reduce call into the current one.

big.factorial <- function(x) {
    x <- floor(x)
    bf <- 1 if (x > 1) {
        for (i in 2:x) {
            bf <- Reduce(big.add, rep(bf,i))
        }
    }
    return (bf)
}

digits <- big.factorial(100)
answer <- sum(as.numeric(unlist(strsplit(as.character(digits), ""))))
print(answer)

This function is most certainly not as fast as the GMP package but it was fun to write and to learn about the mechanics behind arbitrary precision arithmetic at work.

If you like to know how factorials can be used to determine the number of ways a deck can be shuffled the watch this video.

Euler Problem 19: Counting Sundays — When does the week start?

Euler Problem 19 is so trivial it is almost not worth writing an article about. One interesting aspect of this problem is the naming of weekdays and deciding which day the week starts. This issue is more complex than it sounds because data science is in essence not about data but about people.

Euler Problem 19 Definition

  • 1 Jan 1900 was a Monday.
  • Thirty days has September, April, June and November.
  • All the rest have thirty-one,
  • Saving February alone, Which has twenty-eight, rain or shine. And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

Solution

The problem can be quickly solved with R base code and a tiny bit faster when using the lubridate package.

# Base R-code
dates <- seq.Date(as.Date("1901/01/01"), as.Date("2000/12/31"), "days")
days <- rep(1:7, length.out = length(dates))
answer <- sum(days[substr(dates, 9, 10) == "01"] == 1)
print(answer)

#Using Lubridate
library(lubridate, quietly = TRUE)
answer <- sum(wday(dates[substr(dates, 9, 10) == "01"]) == 1)
print(answer) 	 

To draw out this post a little bit further I wrote some code to solve the problem without using the calendar functions in R.

week.day <- 0
answer <- 0
for (y in 1901:2000) {
    for (m in 1:12) {
        max.day <- 31
        if (m %in% c(4, 6, 9, 11)) max.day <- 30
        # Leap years
        if (m == 2) {
            if (y %% 4 == 0 & y %% 100 != 0 | y %% 400 == 0) max.day <- 29
            else max.day <- 28
            }
        for (d in 1:max.day) {
            week.day <- week.day + 1
            if (week.day == 8) week.day <- 1
            if (week.day == 1 & d == 1) answer <- answer + 1
        }
    }
}
print(answer)

Which day does the week start?

The only aspect remotely interesting about this problem is the conversion from weekdays to numbers. In R, Monday is considered day one, which makes sense in the Christian context of Western culture. Saturday and Sunday are the weekend, the two last days of the week so they are day 6 and 7. According to international standard ISO 8601, Monday is the first day of the week. Although this is the international standard, several countries, including the United States and Canada, consider Sunday to be the first day of the week.

The international standard is biased towards Christianity. The Christian or Western world marks Sunday as their day of rest and worship. Muslims refer to Friday as their day of rest and prayer. The Jewish calendar counts Saturday—the Sabbath—as the day of rest and worship. This idea is also shared by Seventh-Day Adventists.

this example shows that data science is not only about data: it is always about people and how they interpret the world.

via chartsbin.com

Euler Problem 18 & 67: Maximum Path Sums

A pedigree is an example of a binary tree: Euler Problem 18

An example of a pedigree. Source: Wikimedia.

Euler Problem 18 and 67 are exactly the same besides that the data set in the second version is larger than in the first one. In this post, I kill two Eulers with one code.

These problems deal with binary trees, which is a data structure where each node has two children. A practical example of a binary tree is a pedigree chart, where each person or animal has two parents, four grandparents and so on.

Euler Problem 18 Definition

By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.

3
7 4
2 4 6
8 5 9 3

That is, 3 + 7 + 4 + 9 = 23. Find the maximum total from top to bottom of the triangle below:

75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23

As there are only 16,384 routes, it is possible to solve this problem by trying every route. However, Problem 67, is the same challenge with a triangle containing one-hundred rows; it cannot be solved by brute force, and requires a clever method! ;o)

Solution

This problem seeks a maximum path sum in a binary tree. The brute force method, as indicated in the problem definition, is a very inefficient way to solve this problem. The video visualises the quest for the maximum path, which takes eleven minutes of hypnotic animation.

A more efficient method is to define the maximum path layer by layer, starting at the bottom. The maximum sum of 2+8 or 2+5 is 10, the maximum sum of 4+5 or 4+9 is 13 and the last maximum sum is 15. These numbers are now placed in the next row. This process cycles until only one number is left. This algorithm solves the sample triangle in four steps:

Step 1:

3
7 4
2 4 6
8 5 9 3

Step 2:

3
7 4
10 13 15

Step 3:

3
20 19

Step 4:

23

In the code below, the data is triangle matrix. The variables rij (row) and kol (column) drive the search for the maximum path. The triangle for Euler Problem 18 is manually created and the triangle for Euler Problem 67 is read from the website.

path.sum <- function(triangle) {
    for (rij in nrow(triangle):2) {
        for (kol in 1:(ncol(triangle)-1)) {
            triangle[rij - 1,kol] <- max(triangle[rij,kol:(kol + 1)]) + triangle[rij - 1, kol]
        }
        triangle[rij,] <- NA
    }
    return(max(triangle, na.rm = TRUE))
}

# Euler Problem 18
triangle <- matrix(ncol = 15, nrow = 15)
triangle[1,1] <- 75
triangle[2,1:2] <- c(95, 64)
triangle[3,1:3] <- c(17, 47, 82)
triangle[4,1:4] <- c(18, 35, 87, 10)
triangle[5,1:5] <- c(20, 04, 82, 47, 65)
triangle[6,1:6] <- c(19, 01, 23, 75, 03, 34)
triangle[7,1:7] <- c(88, 02, 77, 73, 07, 63, 67)
triangle[8,1:8] <- c(99, 65, 04, 28, 06, 16, 70, 92)
triangle[9,1:9] <- c(41, 41, 26, 56, 83, 40, 80, 70, 33)
triangle[10,1:10] <- c(41, 48, 72, 33, 47, 32, 37, 16, 94, 29)
triangle[11,1:11] <- c(53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14)
triangle[12,1:12] <- c(70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57)
triangle[13,1:13] <- c(91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48)
triangle[14,1:14] <- c(63, 66, 04, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31)
triangle[15,1:15] <- c(04, 62, 98, 27, 23, 09, 70, 98, 73, 93, 38, 53, 60, 04, 23)

answer <- path.sum(triangle)
print(answer)

Euler Problem 67

The solution for problem number 67 is exactly the same. The data is read directly from the Project Euler website.

# Euler Problem 67
triangle.file <- read.delim("https://projecteuler.net/project/resources/p067_triangle.txt", stringsAsFactors = F, header = F)
triangle.67 <- matrix(nrow = 100, ncol = 100)
for (i in 1:100) {
    triangle.67[i,1:i] <- as.numeric(unlist(strsplit(triangle.file[i,], " ")))
}
answer <- path.sum(triangle.67)
print(answer)