- The Birthday "Paradox"
- The Monte Hall Problem
- Many people obtain the wrong answer to the Monty Hall
problem because their probabilistic intuition has fail them. In fact,
many columnists, including Keith Devlin and Marilyn vos Savant, upon
presenting the problem and its solution in their columns, have found
their inboxes deluged with reader emails (some from mathematicians!)
insisting they were wrong. You can find Keith Devlin's columns about
the Monty Hall problem here and here.
- The Problem of Points
- Coin-flipping
- A paper by Persi Diaconis, Susan Holmes, and Richard
Montgomery, titled "Dynamical Bias in the Coin Toss", available here,
discusses the physics of coin flipping. The authors show that under a
certain physical model of coin flipping, the side of the coin facing up
initially has a greater than 50% chance of facing up when the coin lands
(due to the concept of precession, which often arises in astronomy when
discussing planetary motion). They also run experiments suggesting that
this physical model is an accurate description of what happens when
people actually flip coins. The physics in the middle of the paper is
not light reading, but I found the beginning and end of the paper to be
very readable and interesting.
- SET
- The card game SET is published by SET Enterprises . The company
website has a list of stores that sell SET decks. The mathematics of
SET runs quite deep, and a Google search will reveal several sites and
papers that explain some aspects of it.
- SET can be played online. This is one possible website.
- The Kevin Bacon Game
- At the Oracle of
Bacon, computer scientists maintain a interface to the IMDB which
computes the distance between any two actors, which is the smallest
number of links required to connect them, where two actors are linked if
they have appeared in a movie together. Classically, one plays this
game using Kevin Bacon as one actor. This raises the question of which
actor averages the smallest distance to any other actor in the IMDB.
The answer is Rod Steiger, while Kevin Bacon only ranks 1049.
- A similar
concept is used in mathematics with Paul Erdos and coauthorship of
papers. Erdos wrote around 1500 papers and had 511 coauthors. My Erdos
number is 5. Collaboration distances in math can be computed via this website
- Simpson's Paradox
- Simpson's Paradox is a phenomenon that occurs often in
data analysis. Data seems to lead to one conclusion (e.g., treatment A
is more effective than treatment B), but there is a hidden factor, and
when the data is separated into groups according to the hidden factor,
the opposite conclusion arises. This is related to sample
bias and selection bias.
- There are compelling real examples of Simpson's Paradox
at Wikipedia, in
a paper of Tom Moore's, the
Exploring
Data website, and Darci Kracht's
website. The examples come from such varied sources as: kidney
stone treatments; racial bias in death penalty decisions; airline
ontime flight data; death rates in smokers; sex bias in Berkeley
graduate school admissions; and batting averages in baseball.
- Mean vs. Median
- An article in the New York Times by Gina Kolata discusses
the mathematical issues surrounded data on the mean number of sex
partners reported by men and women.
- An
article on Slate.com by Jordan Ellenberg points out that one needs
to be careful to make a distinction between the median and the mean, and
that while the mean sex partners for men and women must agree, the
medians can be different.
- I point out that they are both missing a crucial
assumption, which is that the number of men and women are the same.
Granted, in most populations this is roughly the case, so the means must
be close to each other, but both articles attempt to state mathematical
theorems, so significantly more care should have been taken by both
authors.
- (Bayesian) Spam Filters
- The Professor, the Prosecutor, and the Blonde
- Keith Devlin and Gary Lorden have written a book
entitled "The Numbers Behind NUMB3RS: Solving Crimes with Mathematics".
Keith Devlin wrote an abridged excerpt from
the book for his column, Devlin's Angle, on the 1964 assault conviction
of a couple based on a ludicrous probabilistic argument by the
prosecution.
- The seriously fascinating judicial decision reversing
the conviction can be found here
- Card-shuffling machines
- Ivars Peterson has a MathTrek column here
about the mathematics of card-shuffling machines. In particular, he
discusses how Persi Diaconis and Susan Holmes were asked to analyze the
prototype of a card-shuffling machine, and they showed that it did a
poor job of randomizing the deck of cards. To demonstrate this, Holmes
developed a very simple card guessing game in which the expected number
of correct guesses after one use of the shuffling machine was about
twice the number of correct guesses when done with a totally randomized
deck. This shift in odds is easily enough for a blackjack player to
develop a betting system to "beat" the casino.
- Optimizing your life partner
- This page gives a
solution to the following problem: Suppose there are n objects, one of
which is better than all the others in some way. The objects are
presented to you one at a time, and you get precisely one opportunity to
choose the object currently being presented to you. How do you maximize
your chances of choosing the best object? The above link states this
problem in the somewhat sexist context of "Optimizing your wife", in
which the "objects" are potential wives that you will meet over the
course of your life. The analysis, while perfectly correctly
mathematically, has questionable real-life applications. But it is not
impossible to imagine a situation (say, house-hunting in a really tight
market) in which this type of strategy might be reasonable.
- Stable marriages
- The stable matching, or stable marriage, algorithm is
actually used to match up hospitals with residents. In its simplest
form, the problem assumes that each of N hospitals ranks the residents
in order of preference and that each of N residents ranks the hospitals
in order of preference. We want to pair up hospitals and residents so
that there does not exist an unmatched hospital A and a resident B so
that hospital A prefers resident B to its resident and resident B
prefers hospital A to his/her hospital (such a matching is unstable in
the sense that the resident and hospital would be tempting to make a
private deal).
- The program is called the National Resident Match Program.
- Wikipedia has a page
describing the problem.
- DNA profiling
- The discussion for this special topics centers around
the accuracy of DNA matching, the chance that two people's DNA will
result in the same test data, and the admissibility of certain DNA
evidence in court.
- These issues are discussed in this article and this
article by
Keith Devlin and this writeup by
Dave Trautman.
- Benford's Law (or How to Embezzle Better)
- Benford's Law is a probability distribution on the
digits 1-9 that often approximates the distribution of the leading
non-zero digits in a list of numbers. In this distribution, 1 occurs as
the leading digit roughly 30 percent of the time, while 9 occurs roughly
4.6 percent of the time.
- Benford's Law tends to work well with numerical data
satisfying 3 criteria: the data should measure the size of something
(the Dow Jones Industrial Average, lengths of rivers, populations of
cities, sizes of incomes); the data should not have fixed upper and
lower bounds (other than 0, say) (unlike heights of people, for example,
which are roughly between 36 inches and 84 inches); and the data should
not been assigned in any way (unlike Social Security numbers or credit
card numbers).
- One heuristic explanation for Benford's Law comes from
the stock market. Consider the year-end closing numbers for the Dow
Jones Industrial Average over the past 100 years. Suppose, on average,
that the Dow Jones Industrial Average has gain 8 percent each year (this
is totally made-up). Well, to go from 1000, say, to 2000 requires a 100
percent gain, which would take 10 or 11 years. But to go from 8000 to
9000 requires a 12.5 percent gain, which would take over 1.5 years. So
the DJIA should generally spend a lot more time at levels where the
leading digit is small than at levels where the leading digit is large.
If you formalize this, you get Benford's Law.
- There is a video lecture available
here about Benford's Law.
- A Probablity Puzzle
- Here is a great probability puzzle. You have an adversary who chooses (via any means she wants) two distinct real numbers x and y. She flips a fair coin. If it comes up heads, she tells you what x is. If it comes up tails, she tells you what y is. Your goal is to say if the number she tells you is larger or smaller than her other number. Find a strategy that will succeed over 50% of the time. In fact, your strategy should succeed over 50% of the time even if you tell your adversary the strategy before she picks the numbers.
- Census
- There have been controversial proposals to change the method by which the U.S. Census Bureau conduct the census every decade. The proposals involve the use of statistic sampling to improve the accuracy of the results, a method supported by most statisticians and Democrats and opposed by most Republicans and a few statisticians. A starting place for more information is this article from Science News.
- Baseball Streaks
- Card Shuffling
- There is a great deal to say about the mathematics of card shuffling. Much of it is summarized in a book by Brent Morris called "Magic Tricks, Card Shuffling, and Dynamic Computer Memories," while a Google search will reveal many references, often to articles about or by Persi Diaconis.
- A perfect (out-)shuffle consists of cutting a deck of 52 cards exactly in half, and perfectly interleaving the two halves (with the original top card staying on top and the original bottom card staying on the bottom). It is a curious and surprising fact that only 8 perfect out-shuffles are needed to return the deck to its original position. This can be vividly demonstrated by writing a word in black marker on the side of the deck. The word will become mixed up for seven shuffles and restored on the eighth shuffle (make sure to see what it looks like after 4,6, and 7 shuffles, though). Very few people in the world can perfectly shuffle a full deck of cards (in a smooth way). However, the same trick works for an 18-card deck, and such a deck is significantly easier to shuffle perfectly.
- Mathematical models are all(?) Markov chain models, because the result of a shuffle depends only on the previous state of the deck. Using the GSR model of random riffle shuffling, Bayer and Diaconis showed that asymptotically (3/2) log n shuffles are needed to randomize a deck of n cards pretty well. For a deck of 52 cards, essentially 7 shuffles are needed. You can read about this result in an article entitled "Trailing the Dovetail Shuffle to its Lair."
- There are ways to demonstrate that 3 or 4 shuffles really are not enough. When bridge tournament hands began being computer-dealt, players found the bridge hands had much wilder distributions. It turns out that when too few shuffles are performed, the bridge hands tend to have more boring distributions than expected.
- One nice trick to do is the following. Order a deck of cards as Ace of spades, ..., 2 of spades, A of hearts, ..., 2 of hearts, ..., 2 of clubs. Do three riffle shuffles, cut the deck, take the top card off the deck and put it in the middle 2/3 or so of the deck. Now look at the cards in the deck. Reading from left-to-right, find the Ace of spades, King of spades, ... until you reach the right-most card. Read again from left-to-right, starting with the card coming after the last card you found. Repeat. Exactly once during this procedure you will scan the deck left-to-right and only find one card that is next in the sequence. This is the card that was taken off the top.
- Google PageRank
- Google PageRank algorithm gives a ranking of all webpages on the Internet that is intended to signify the relative importance of the webpages. This is a key factor in determining how webpages are ranked in Google search results. While the full algorithm is no doubt quite complex, the basic idea of the algorithm is described by a Markov chain whose states are the webpages. A quick Google search will show many sites with descriptions of the algorithms; one such site is here.
- NCAA Football Rankings
- For a nice, reasonable way to rank NCAA football teams using a Markov chain algorithm (very similar to Google PageRank), see this page
- Monopoly
- The squares you land on in successive turns in Monopoly can be modeled as a Markov chain. This lets one compute the long-term probability of being on a particular square, which is quite important for developing a good Monopoly strategy. There are several websites with analysis with varying degrees of complexity. Check out the analysis here or here.
|