Euler Problem 19: Counting Sundays — When does the week start?

Euler Problem 19 is so trivial it is almost not worth writing an article about. One interesting aspect of this problem is the naming of weekdays and deciding which day the week starts. This issue is more complex than it sounds because data science is in essence not about data but about people.

Euler Problem 19 Definition

  • 1 Jan 1900 was a Monday.
  • Thirty days has September, April, June and November.
  • All the rest have thirty-one,
  • Saving February alone, Which has twenty-eight, rain or shine. And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

Solution

The problem can be quickly solved with R base code and a tiny bit faster when using the lubridate package.

# Base R-code
dates <- seq.Date(as.Date("1901/01/01"), as.Date("2000/12/31"), "days")
days <- rep(1:7, length.out = length(dates))
answer <- sum(days[substr(dates, 9, 10) == "01"] == 1)
print(answer)

#Using Lubridate
library(lubridate, quietly = TRUE)
answer <- sum(wday(dates[substr(dates, 9, 10) == "01"]) == 1)
print(answer) 	 

To draw out this post a little bit further I wrote some code to solve the problem without using the calendar functions in R.

week.day <- 0
answer <- 0
for (y in 1901:2000) {
    for (m in 1:12) {
        max.day <- 31
        if (m %in% c(4, 6, 9, 11)) max.day <- 30
        # Leap years
        if (m == 2) {
            if (y %% 4 == 0 & y %% 100 != 0 | y %% 400 == 0) max.day <- 29
            else max.day <- 28
            }
        for (d in 1:max.day) {
            week.day <- week.day + 1
            if (week.day == 8) week.day <- 1
            if (week.day == 1 & d == 1) answer <- answer + 1
        }
    }
}
print(answer)

Which day does the week start?

The only aspect remotely interesting about this problem is the conversion from weekdays to numbers. In R, Monday is considered day one, which makes sense in the Christian context of Western culture. Saturday and Sunday are the weekend, the two last days of the week so they are day 6 and 7. According to international standard ISO 8601, Monday is the first day of the week. Although this is the international standard, several countries, including the United States and Canada, consider Sunday to be the first day of the week.

The international standard is biased towards Christianity. The Christian or Western world marks Sunday as their day of rest and worship. Muslims refer to Friday as their day of rest and prayer. The Jewish calendar counts Saturday—the Sabbath—as the day of rest and worship. This idea is also shared by Seventh-Day Adventists.

this example shows that data science is not only about data: it is always about people and how they interpret the world.

via chartsbin.com

Data Pseudo-Science: Developing a Biorhythm Calculator

biorhythm: data pseudo-scienceData science is a serious occupation. Just like any other science, however, it can also be used for spurious topics, such as calculating your biorhythm.

This post provides an example of data Pseudo-Science though a function that calculates and visualises your biorhythm. Based on the graph, I must be having a great day right now.

The broader and more pertinent message in this post is that data pseudo-science is more common than you would think. Is our belief in machine learning justified or are some of these models also a pseudo-science with not much more reliability than a biorhythm?

Biorhythm Theory

The idea that our physical states follow a predetermined rhythm has been around as long as mathematics. The basic concept of biorhythm is that a regular sinusoid cycle accurately describes our physical, emotional and intellectual states. Each of these three cycles has a different wavelength (w ):

  • physical: w = 23 days
  • emotional: w = 28 days
  • intellectual: w = 33 days

The cycle is calculated with \sin (2 \pi t / w) , where t indicates the number of days since birth. This idea was developed by German surgeon Wilhelm Fliess in the late 19th century and was popularised in the United States in the late 1970s. There is no scientific evidence of the validity of this theory but it is an entertaining way to play with data.

The combination of the 23- and 28-day cycles repeats every 644 days, while the triple combination of 23-, 28-, and 33-day cycles repeats every 21,252 days, 58 years, two months and three weeks. You can, by the way, never reach a point where all cycles are maximised. The best you can achieve is 299.7 our of a maximum 300 which occurs when you are 17,003 days old.

Calculating your Biorhythm

When I was a teenager in the 1980s, several books and magazines described computer code to calculate your biorhythm. I used to play with these functions on my Atari 130XE computer.

Building a biorhythm calculator in R is easy. This function takes two dates as input and plots the biorhythm for the two weeks before and after the date. To calculate your biorhythm, run the function with your date of birth and target date: biorhythm(“yyyy-mm-dd”). The default version uses today as the target.

library(ggplot2)
library(reshape2)
biorhythm <- function(dob, target = Sys.Date()) {
    dob <- as.Date(dob)
    target <- as.Date(target)
    t <- round(as.numeric(difftime(target, dob)))
    days <- (t - 14) : (t + 14)
    period <- data.frame(Date = seq.Date(from = target - 15, by = 1, length.out = 29),
                         Physical = sin (2 * pi * days / 23) * 100, 
                         Emotional = sin (2 * pi * days / 28) * 100, 
                         Intellectual = sin (2 * pi * days / 33) * 100)
    period <- melt(period, id.vars = "Date", variable.name = "Biorhythm", value.name = "Percentage")
    ggplot(period, aes(x = Date, y = Percentage, col = Biorhythm)) + geom_line() +  
        ggtitle(paste("DoB:", format(dob, "%d %B %Y"))) + 
        geom_vline(xintercept = as.numeric(target))
}

biorhythm("1969-09-12", "2017-03-30")

Biorhythms are an early attempt for human beings to predict the future. Although there is no relationship between this algorithm and reality, many people believed in its efficacy. Does the same hold true for the hyped capabilities of machine learning?

Data Pseudo-Science

Data pseudo-science is not only an issue when people use spurious mathematical relationships such as biorhythms or astrology. This post is also written as a warning not to only rely on numerical models to predict qualitative aspects of life.

The recent failures in predicting the results of elections, even days before the event, are a case in point. There are many reasons machine learning methods can go wrong. When machine learning algorithms fail, they are often just as useful as a biorhythm. It would be fun to write a predictive analysis package for R using only pseudoscientific approaches such as I-Ching, astrology or biorhythm.