Euler Problem 14: Longest Collatz Sequence

Euler Problem 14 looks at the Collatz Conjecture. These playful sequences, named after German mathematician Lothar Collatz (1910–1990), cause mathematicians a lot of headaches. This video introduces the problem much better than I can describe it.

Euler Problem 14 Definition

The following iterative sequence is defined for the set of positive integers:

n \rightarrow n/2 ( n is even)
n \rightarrow 3n + 1 ( n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 \rightarrow 40 \rightarrow 20 \rightarrow 10 \rightarrow 5 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1

This sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? Note: Once the chain starts the terms are allowed to go above one million.


This problem is highly computationally intensive and it highlights R’s lack of speed. Generating one million Collatz sequences and finding the longest one requires a lot more than a minute of processing time allowed for in Project Euler.

collatz.chain <- function(n) {
    chain <- vector()
    i <- 1
    while (n! = 1) {
        if (n%%2 == 0) 
            n <- n / 2
            n <- 3 * n + 1
        chain[i] <- n
        i <- i + 1
answer <- 0
collatz.max <- 0
for (n in 1:1E6) {
    collatz.length <- length(collatz.chain(n)) 
    if (collatz.length > collatz.max) {
        answer <- n
        collatz.max <- collatz.length

The second version of the code contains some optimisations. The code stores the length of all sequences in an array. When the code generates a sequence and lands on a number already analysed, then it adds that previous number to the current one and moves on. This approach requires more memory but saves a lot of computation time. A minor tweak to the code optimises the rule for uneven numbers. Tripling an uneven number and adding one will always result in an even number so we can skip one step. This solution is more than twice as fast as the first version.

collatz.length <- vector(length=1e6)
collatz.length[1] <- 0
for (n in 2:1e6) {
    x <- n
    count <- 0 while (x != 1 & x >= n) {
        if (x %% 2 == 0) {
            x <- x / 2
            count <- count + 1
        else {
            x <- (3 * x + 1) / 2
            count <- count + 2
    count <- count + collatz.length[x]
    collatz.length[n] <- count
answer <- which.max(collatz.length)

Visualising Collatz Sequences

The Collatz sequence is an example of a simple mathematical rule that can create an unpredictable pattern. The number of steps required to reach 1 is listed in A006577 of the Online Encyclopedia of Integer Sequences.

The image below visualises the number of steps for the first 1000 positive numbers. The scatterplot shows some interesting patterns. Does this visualisation show that the Collatz Sequence does have a pattern after all?

Euler Problem 14: Number of halving and tripling steps to reach 1 in the Collatz problem.

Collatz Chains

The Collatz sequences can also be visualised using networks. Each step between two numbers is an edge and the numbers are the vertices. For example, the network for the Collatz sequence for number 10 is 5–16, 16–8, 8–4, 4–2, 2–1. When generating subsequent sequences the network will start to overlap and a tree of sequences appears. The tree below combines the Collatz sequences for the numbers 2 to 26. Number 27 has a very long sequence, making the tree much harder to read.

Network of Collatz sequences n=2-26

Network of Collatz sequences n=2-26

edgelist <- data.frame(a = 2, b = 1)
for (n in 3:26) {
   chain <- as.character(c(n, collatz.chain(n)))
   chain <- data.frame(a = chain[-length(chain)], b = chain[-1])
   edgelist <- rbind(edgelist, chain)
g <- graph.edgelist(as.matrix(edgelist))
g <- simplify(g)
V(g)$color <- degree(g, mode = "out") + 1

View the code on GitHub.

6 thoughts on “Euler Problem 14: Longest Collatz Sequence

  1. Pingback: Euler Problem 14: Longest Collatz Sequence - Use-R!Use-R!

  2. Pingback: Euler Problem 14: Longest Collatz Sequence | A bunch of data

  3. Pingback: Euler Problem 14: Longest Collatz Sequence – Mubashir Qasim

  4. That’s a pretty impressive speedup you’ve got there from reducing duplication of calculation in the code. By vectorising your code I think you can reduce it much further, this gives a 16x improvement on your second code (at least on my computer) by considering the chains that remain above their start:

    system.time({ #0.57
    collatz.length <- rep(0,1e6)
    collatz.n <- seq_along(collatz.length)
    #consider the subset which remains above its initial value
    sub.n0 <- sub.n 0)
    evens <- (sub.n%%2==0)
    sub.n[evens] <- sub.n[evens]/2
    collatz.length[sub.n0[evens]] <- collatz.length[sub.n0[evens]]+1
    sub.n[!evens] <- (3*sub.n[!evens]+1)/2
    collatz.length[sub.n0[!evens]] <- collatz.length[sub.n0[!evens]] +2
    above.n0 sub.n0
    collatz.n[sub.n0[!above.n0]] <- sub.n[!above.n0]
    sub.n0 <- sub.n0[above.n0]
    sub.n <- sub.n[above.n0]
    #Now lookup length to get back to 1
    collatz.length <- collatz.length + collatz.length[collatz.n]
    answer <- which.max(collatz.length)

Feel free to share your thoughts about this article