*This blog post was built using Rmarkdown; all source code and data is available on GitHub. If you want to know more about me, go back to my personal homepage.*

So I have a friend—let’s call him Rob, since that’s his name. Rob is serious about fantasy football, and he’s been stewing about last season’s results. He messaged me the other day:

hey rich explain this shit if youre so smart

(I had not claimed to be so smart, but this is always a fun way to start a conversation.)

last year in fantasy football my team scored 1393 cumulative points against an average of 1136

i was 2.5 standard deviations above average which makes it the best output in 212 player seasons of fantasy football

but i went 6-7 and missed the playoffs

After a great many follow-up questions, Rob’s objection boiled down to this:

like if i was the greatest team of all time and 50% of teams make the playoffs every year than statistically i should have made the playoffs

If you’re not familiar with fantasy football, it’s a game that lines up with the NFL season in which a group of friends/enemies/strangers (in this case, 12) each draft a “fantasy team” of currently active NFL players. Each week, your team goes head-to-head against one of the other teams. You score is determined by the performance of the players on your team during the actual NFL season: If your quarterback plays that Sunday afternoon and throws for 280 yards and four touchdowns, you may get something like get 6 points per touchdown, plus 1 point for every 30 yards of passing, so your quarterback in that week would have earned your team a total of 33 points. Similar calculations are done for your running backs, wide receivers, and so on. At the end of the week, your team gets a “win” if your team scored more fantasy points than your opponent’s fantasy team. In the fantasy league in question, they played a 13-week regular season schedule: For the first 11 weeks, every team plays every other team exactly once. In the last two weeks, the match-ups start over, and the schedule from weeks 1 and 2 repeats.

The reason fantasy football is a practical simulation exercise is that the fantasy teams aren’t really playing *against* each other—they’re both “playing,” but there’s no way for one fantasy team to (directly) influence the performance of the other. If Rob’s team scores 85 points in week 1, they scored 85 points completely independently of how many points Rob’s week 1 “opponent” scored. There *could* be a very minor bit of strategy in choosing your starting lineup based on your opponent’s team—how should you handle it if you have a good quarterback but your opponent has that quarterback’s best wide receiver, for example—but overall, the match-ups don’t directly influence each other.

So now we look at Rob’s first statement: Based on his total points at the end of the season, “i was the greatest team of all time.” The contention here is basically that his record didn’t reflect how good his team actually was. This is interesting (and convenient) because he’s not challenging the notion that *points* are a good measurement of a team’s ability—the problem is the weekly match-ups.

We could use total points over the course of the season to measure ability, similarly to how it’s done in “rotisserie” fantasy baseball, but that doesn’t seem exactly right either: For example, if Team A scores 90 points every single week, and Team B scores 95 points every week, it would be easy to say Team B performed better over the course of a season. But if Team A has one week in which they score 150 points rather than 90, their total season points passes that of Team B. Saying Team A had a better season in this case doesn’t seem accurate—they had a better *week*, for sure, but did Team B still have a better season?

So it seems like what we’re after is related to consistency—week over week, which team is good *more often*. There are endless ways we could try to evaluate this—removing outliers, figuring out weekly ranks for each team, and so on. However, **assuming fantasy points are an accurate reflection of how good a team is**, the most straightforward and most easily interpreted way to evaluate a team’s success is to just try out a lot of schedules and see what happens. We can bypass questions about player performance and unbalanced scoring rubrics and get at a simpler but compelling question: whether *the schedule itself* was the cause of Rob’s woes. If his team is as good as his season total suggests, what are the odds of the match-ups only giving him 6 wins in a season? If his team is doing well every week, it seems very improbable that a bunch of inferior teams would each have their *one good week* in the week they happen to play against Rob—but how improbable?

First, we start with a matrix of all the weekly score totals. Each column is a team, and each row is a week. (It would have been tidier to make each column a week, but I was too far along to go back and fix this later.)

```
data <- read.csv('./scores.csv')
data
```

```
## john brian jacob rob matt lockwood mark nick larson kyle adam greg
## 1 81 124 57 115 84 86 89 126 65 98 79 112
## 2 85 83 73 67 54 124 50 85 101 89 79 111
## 3 136 78 65 150 94 73 97 78 62 86 101 94
## 4 63 82 68 83 73 78 90 99 94 79 85 65
## 5 100 92 77 115 64 91 103 122 87 126 111 131
## 6 63 96 108 96 81 88 56 91 106 116 105 75
## 7 115 93 81 55 70 87 86 67 97 67 54 77
## 8 89 98 89 122 117 91 76 63 130 118 102 82
## 9 72 82 102 166 66 79 58 89 57 85 69 63
## 10 125 95 90 140 71 94 81 97 55 46 78 113
## 11 88 78 59 83 60 93 107 85 88 69 84 80
## 12 52 74 65 108 63 87 126 122 55 107 62 51
## 13 86 69 78 93 96 62 68 98 86 66 87 109
```

Since we know the rules for how schedules are determined, we can generate a random schedule for Rob:

```
schedule <- sample(colnames(data[, names(data) != 'rob']))
schedule <- append(schedule, schedule[1:2])
schedule
```

```
## [1] "adam" "mark" "john" "kyle" "brian" "larson"
## [7] "nick" "greg" "matt" "lockwood" "jacob" "adam"
## [13] "mark"
```

Now we know every team’s score in every week, plus who Rob’s hypothetical opponent is in each week for this randomly generated schedule. Now we just have to see what the results are:

```
wins <- 0
losses <- 0
ties <- 0
week <- 0
for(team in schedule) {
week <- week + 1
if(data[week, 'rob'] > data[week, team]) {
wins <- wins +1
} else if(data[week, 'rob'] < data[week, team]) {
losses <- losses + 1
} else {
ties <- ties + 1
}
}
print(paste(wins, losses, ties, sep='-'))
```

`## [1] "11-2-0"`

So at the end of this fictional season, Rob wins an amazing 11 out of 13 games, *without anything actually changing about his team*. This shows that it’s *possible* for the team to do better than 6-7, but we’re still just looking at one randomly selected season, and we know there are millions. We could just run this simulation a bunch more times, but unfortunately, there was more to Rob’s objection than just his record: His main issue was that **he didn’t make the playoffs**, which is based not only on *his* record, but everyone else’s, too. If the six teams with the best records make the playoffs in a 12-team league, then in some seasons, going 6-7 might actually make the cut.

Now we get to the challenging part. Generating a schedule for Rob’s team only required a single line of code. But to figure out the playoffs, we need to generate random schedules *for the league*. It might not be impossible to model every single possibility, but it’s close enough to impossible on a laptop that so we’re going to attack this using the Monte Carlo method instead: Testing thousands of possible schedules isn’t exhaustive, but it will give us more than enough accuracy for what we’re asking.

First is a script to generate a 12-team schedule:

```
makeschedule <- function(data, weeks) {
schedule <- matrix(nrow=12, ncol=weeks)
for(week in 1:weeks) {
done <- FALSE
attempts <- 0
while(!done) {
done <- TRUE
attempts <- attempts + 1
for(team in 1:12) {
if(!is.na(schedule[team,week])) {
next # skip if they already have a match-up
}
# start with all teams
options <- 1:12
# don't match up a team against itself
options <- options[options != team]
# don't match up a team against a team it's already played
if(week > 1) {
options <- options[!options %in% schedule[team,1:week-1]]
}
# don't match up a team against one that's already playing that week
options <- options[!options %in% schedule[,week]]
# Check if we're out of options
if(length(options) == 0) {
if(attempts > 30) {
# If we try to resolve a week's schedule but fail 30 times
# in a row, just throw away the whole schedule and try again
return(FALSE)
}
# If we haven't tried 30 times, but we're still out of options,
# throw out this week and try again
schedule[,week] <- NA
done <- FALSE
break
} else if(length(options) == 1) {
# this is here because sample(list, 1)
# does NOT work like you think when $list is of
# length 1.
opponent <- options[1]
} else {
# if we still have options for an opponent, pick one
opponent <- sample(options,1)
}
schedule[team,week] <- opponent
schedule[opponent,week] <- team # add entry for opponent also
}
}
}
return(schedule)
}
schedule <- FALSE
# make the 11-game unique schedule
while(length(schedule)==1) {
# the makeschedule() function will return FALSE
# if it got stuck in a scheduling loop that couldn't
# be resolved. so then we try again.
schedule <- makeschedule(data, 11)
}
# Once we have the 11-week schedule, repeat the first 2 weeks
schedule <- cbind(schedule, schedule[,1:2])
schedule
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 4 7 12 9 2 5 11 3 6 10 8 4 7
## [2,] 5 3 4 10 1 9 6 7 12 8 11 5 3
## [3,] 11 2 6 12 9 10 4 1 8 7 5 11 2
## [4,] 1 6 2 8 10 7 3 11 9 5 12 1 6
## [5,] 2 11 8 6 12 1 9 10 7 4 3 2 11
## [6,] 8 4 3 5 7 11 2 12 1 9 10 8 4
## [7,] 12 1 10 11 6 4 8 2 5 3 9 12 1
## [8,] 6 10 5 4 11 12 7 9 3 2 1 6 10
## [9,] 10 12 11 1 3 2 5 8 4 6 7 10 12
## [10,] 9 8 7 2 4 3 12 5 11 1 6 9 8
## [11,] 3 5 9 7 8 6 1 4 10 12 2 3 5
## [12,] 7 9 1 3 5 8 10 6 2 11 4 7 9
```

In the above matrix, each column is a week, and each row is a team. Each cell indicates the opponent for a single team in a single week. Now that we have the weekly match-ups, and we already know the weekly scores, we can figure out what everyone’s records would be if the league had been randomly assigned this schedule instead.

```
# figure out team records
winmatrix <- matrix(nrow=12, ncol=13)
lossmatrix <- matrix(nrow=12, ncol=13)
for(week in 1:13) {
for(team in 1:12) {
opponent <- schedule[team,week]
if(data[week, team] > data[week, opponent]) {
winmatrix[team,week] <- 1
lossmatrix[opponent,week] <- 1
} else if(data[week, team] < data[week, opponent]) {
winmatrix[opponent,week] <- 1
lossmatrix[team,week] <- 1
}
}
}
# tally up the records
rankings <- data.frame(
team = integer(),
wins=integer(),
losses=integer(),
ties=integer()
)
for(team in 1:12) {
wins = sum(na.omit(winmatrix[team,]))
losses = sum(na.omit(lossmatrix[team,]))
ties = 13 - wins - losses
rankings[team,] <- c(team, wins, losses, ties)
}
```

This gives us a data frame for this random schedule listing the record of all the teams in the league. We can then add some logic to tell us which teams would have made it to the playoffs.

```
library(dplyr)
# The runif() call here is a cheat: After teams are ranked by wins
# and losses, it sorts any ties by using random numbers. This is a
# hacky way to deal with situations where multiple teams are tied
# for the cutoff to make the playoffs. We just give it to random teams.
rankings <- rankings %>% arrange(desc(wins), losses, runif(1))
rankings$playoffs <- c(rep(TRUE,6), rep(FALSE,6))
# add names to the team column
rankings$team <- colnames(data)[rankings$team]
print.data.frame(rankings)
```

```
## team wins losses ties playoffs
## 1 rob 9 4 0 TRUE
## 2 john 7 5 1 TRUE
## 3 brian 7 6 0 TRUE
## 4 lockwood 7 6 0 TRUE
## 5 nick 7 6 0 TRUE
## 6 kyle 7 6 0 TRUE
## 7 jacob 6 6 1 FALSE
## 8 mark 6 7 0 FALSE
## 9 greg 6 7 0 FALSE
## 10 matt 5 8 0 FALSE
## 11 larson 5 8 0 FALSE
## 12 adam 5 8 0 FALSE
```

So in our single simulated schedule, we see that not only did Rob’s team have the best record in the league (9-4), but that his team also made the playoffs. To figure out the probability of this happening in the long run, we can use this code inside a loop to simulate many possible seasons. (This is an embarrassingly parallel problem, so you can speed it up dramatically for your own uses if you want to.)

```
alltime <- data.frame(
team = integer(),
wins=integer(),
losses=integer(),
ties=integer(),
iteration=integer()
)
seasons <- 25000 # how many sims to do
for(iteration in 1:seasons) {
if(iteration %% 5000 == 0) {
print(paste('iteration',iteration))
}
schedule <- FALSE
# make the 11-game unique schedule
while(length(schedule)==1) {
# the makeschedule() function will return FALSE
# if it got stuck in a scheduling loop that couldn't
# be resolved. so then we try again.
schedule <- makeschedule(data, 11)
}
# you play the first two weeks again at the end of the season
schedule <- cbind(schedule, schedule[,1:2])
# figure out team records
winmatrix <- matrix(nrow=12, ncol=13)
lossmatrix <- matrix(nrow=12, ncol=13)
for(week in 1:13) {
for(team in 1:12) {
opponent <- schedule[team,week]
if(data[week, team] > data[week, opponent]) {
winmatrix[team,week] <- 1
lossmatrix[opponent,week] <- 1
} else if(data[week, team] < data[week, opponent]) {
winmatrix[opponent,week] <- 1
lossmatrix[team,week] <- 1
}
}
}
# tally up the records
rankings <- data.frame(
team = integer(),
wins=integer(),
losses=integer(),
ties=integer(),
iteration=integer()
)
for(team in 1:12) {
wins = sum(na.omit(winmatrix[team,]))
losses = sum(na.omit(lossmatrix[team,]))
ties = 13 - wins - losses
rankings[team,] <- c(team, wins, losses, ties,iteration)
}
rankings <- rankings %>% arrange(desc(wins), desc(losses), runif(1))
rankings$playoffs <- c(rep(TRUE,6), rep(FALSE,6))
if(iteration > 1) {
alltime <- rbind(alltime,rankings)
} else {
alltime <- rankings # so we don't blow up the column titles
}
}
```

```
## [1] "iteration 5000"
## [1] "iteration 10000"
## [1] "iteration 15000"
## [1] "iteration 20000"
## [1] "iteration 25000"
```

Now we have a data frame called `alltime`

with the league results for 25,000 seasons. We can make the same histogram of Rob’s wins:

```
library(ggplot2)
rob <- alltime[alltime$team==4,]
ggplot(rob, aes(x=wins)) +
geom_histogram(binwidth=1, fill='#6baed6', color='black') +
scale_x_continuous(breaks=c(0:20)) +
theme_bw() +
theme(panel.grid.minor = element_blank())
```

So we can see that most of the time, Rob’s team actually wins 9 games, and in 0.064 percent of the simulations, his team—which, again, went 6-7 in real life—**actually goes undefeated**. (In 0.036 percent of seasons, the team wins only 4 out of 13 games, which would have been much funnier.) Now that we have the distribution of wins, we can use it to estimate just how unusual it was for the team to only win 6 games.

`nrow(rob[rob$wins <= 6,])/nrow(rob)`

`## [1] 0.03232`

`nrow(rob[rob$losses >= 7,])/nrow(rob)`

`## [1] 0.0274`

The first number is essentially the area under the left tail of the win distribution—in our simulations, Rob’s team achieves 6 or fewer wins only *3.2 percent of the time*, which means **it was very unusual that his team did not win more games**. If we look at losses, the probability is even lower: His team only recorded 7 or more losses in 2.7 percent of simulations. This discrepancy is even more pronounced when we compare it to the distribution of the records of everyone else:

```
library(scales)
facetlabel <- function(labels) {
return(data.frame(label=c('Rob','everyone else')))
}
ggplot(alltime, aes(x=wins, y = stat(density))) +
geom_histogram(binwidth=1, fill='#6baed6',color='black') +
scale_x_continuous(breaks=c(0:20)) +
theme_bw() +
theme(panel.grid.minor = element_blank()) +
scale_y_continuous(labels = percent_format()) +
facet_grid(
rows=alltime$team!=4,
labeller = facetlabel
)
```

The top panel is the same histogram of Rob’s wins, and the bottom panel is the histogram for everyone else. Clearly, the median wins per season for Rob’s team is much higher—in fact, in 25,000 seasons, Rob’s was the **only** team to have an undefeated season. However, lumping everyone together in the bottom panel could obscure the presence of teams that are better than average, which is relevant here. This is what it looks like when we split out everyone separately:

```
labeled <- alltime
labeled$team <- colnames(data)[labeled$team]
library(RColorBrewer)
getPalette = colorRampPalette(brewer.pal(11, "Spectral"))
ggplot(labeled, aes(x=wins, fill=team)) +
geom_density(adjust=5, alpha=0.8) +
theme_bw() +
scale_fill_manual(values=getPalette(13))
```