Euler Problem 14 looks at the Collatz Conjecture. These playful sequences, named after German mathematician Lothar Collatz (1910–1990), cause mathematicians a lot of headaches. This video introduces the problem much better than I can describe it.

## Euler Problem 14 Definition

The following iterative sequence is defined for the set of positive integers:

( is even)

( is odd)

Using the rule above and starting with 13, we generate the following sequence:

This sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? Note: Once the chain starts the terms are allowed to go above one million.

## Solution

This problem is highly computationally intensive and it highlights R’s lack of speed. Generating one million Collatz sequences and finding the longest one requires a lot more than a minute of processing time allowed for in Project Euler.

collatz.chain <- function(n) { chain <- vector() i <- 1 while (n! = 1) { if (n%%2 == 0) n <- n / 2 else n <- 3 * n + 1 chain[i] <- n i <- i + 1 } return(chain) } answer <- 0 collatz.max <- 0 for (n in 1:1E6) { collatz.length <- length(collatz.chain(n)) if (collatz.length > collatz.max) { answer <- n collatz.max <- collatz.length } } print(answer)

The second version of the code contains some optimisations. The code stores the length of all sequences in an array. When the code generates a sequence and lands on a number already analysed, then it adds that previous number to the current one and moves on. This approach requires more memory but saves a lot of computation time. A minor tweak to the code optimises the rule for uneven numbers. Tripling an uneven number and adding one will always result in an even number so we can skip one step. This solution is more than twice as fast as the first version.

collatz.length <- vector(length=1e6) collatz.length[1] <- 0 for (n in 2:1e6) { x <- n count <- 0 while (x != 1 & x >= n) { if (x %% 2 == 0) { x <- x / 2 count <- count + 1 } else { x <- (3 * x + 1) / 2 count <- count + 2 } } count <- count + collatz.length[x] collatz.length[n] <- count } answer <- which.max(collatz.length) print(answer)

## Visualising Collatz Sequences

The Collatz sequence is an example of a simple mathematical rule that can create an unpredictable pattern. The number of steps required to reach 1 is listed in A006577 of the Online Encyclopedia of Integer Sequences.

The image below visualises the number of steps for the first 1000 positive numbers. The scatterplot shows some interesting patterns. Does this visualisation show that the Collatz Sequence does have a pattern after all?

## Collatz Chains

The Collatz sequences can also be visualised using networks. Each step between two numbers is an edge and the numbers are the vertices. For example, the network for the Collatz sequence for number 10 is 5–16, 16–8, 8–4, 4–2, 2–1. When generating subsequent sequences the network will start to overlap and a tree of sequences appears. The tree below combines the Collatz sequences for the numbers 2 to 26. Number 27 has a very long sequence, making the tree much harder to read.

edgelist <- data.frame(a = 2, b = 1) for (n in 3:26) { chain <- as.character(c(n, collatz.chain(n))) chain <- data.frame(a = chain[-length(chain)], b = chain[-1]) edgelist <- rbind(edgelist, chain) } library(igraph) g <- graph.edgelist(as.matrix(edgelist)) g <- simplify(g) par(mar=rep(0,4)) V(g)$color <- degree(g, mode = "out") + 1 plot(g, layout=layout.kamada.kawai, vertex.color=V(g)$color, vertex.size=6, vertex.label.cex=.7, vertex.label.color="black", edge.arrow.size=.1, edge.color="black" )

View the code on GitHub.

That’s a pretty impressive speedup you’ve got there from reducing duplication of calculation in the code. By vectorising your code I think you can reduce it much further, this gives a 16x improvement on your second code (at least on my computer) by considering the chains that remain above their start:

system.time({ #0.57

collatz.length <- rep(0,1e6)

collatz.n <- seq_along(collatz.length)

#consider the subset which remains above its initial value

sub.n0 <- sub.n 0)

{

evens <- (sub.n%%2==0)

sub.n[evens] <- sub.n[evens]/2

collatz.length[sub.n0[evens]] <- collatz.length[sub.n0[evens]]+1

sub.n[!evens] <- (3*sub.n[!evens]+1)/2

collatz.length[sub.n0[!evens]] <- collatz.length[sub.n0[!evens]] +2

above.n0 sub.n0

collatz.n[sub.n0[!above.n0]] <- sub.n[!above.n0]

sub.n0 <- sub.n0[above.n0]

sub.n <- sub.n[above.n0]

}

#Now lookup length to get back to 1

collatz.length <- collatz.length + collatz.length[collatz.n]

answer <- which.max(collatz.length)

print(answer)

})

Great work Miff, vectorisation always trumps loops. I think there a bits missing in your code, i could not get it to run.

Your solution looks pretty interesting! I would love to try it out, but it seems to be broken or some lines are missing… Could you fix it?

Pingback: Euler Problem 14: Longest Collatz Sequence – Mubashir Qasim

Pingback: Euler Problem 14: Longest Collatz Sequence | A bunch of data

Pingback: Euler Problem 14: Longest Collatz Sequence - Use-R!Use-R!