M362K Probability, Fall 2007, Special Topics Instructor: Geir Helleloid

Main Page | Lecture Postmortem | Homework | Course Documents | Special Topics | Calendar | My Homepage

  • The Birthday "Paradox"
  • The Monte Hall Problem
    • Many people obtain the wrong answer to the Monty Hall problem because their probabilistic intuition has fail them. In fact, many columnists, including Keith Devlin and Marilyn vos Savant, upon presenting the problem and its solution in their columns, have found their inboxes deluged with reader emails (some from mathematicians!) insisting they were wrong. You can find Keith Devlin's columns about the Monty Hall problem here and here.
  • The Problem of Points
  • Coin-flipping
    • A paper by Persi Diaconis, Susan Holmes, and Richard Montgomery, titled "Dynamical Bias in the Coin Toss", available here, discusses the physics of coin flipping. The authors show that under a certain physical model of coin flipping, the side of the coin facing up initially has a greater than 50% chance of facing up when the coin lands (due to the concept of precession, which often arises in astronomy when discussing planetary motion). They also run experiments suggesting that this physical model is an accurate description of what happens when people actually flip coins. The physics in the middle of the paper is not light reading, but I found the beginning and end of the paper to be very readable and interesting.
  • SET
    • The card game SET is published by SET Enterprises . The company website has a list of stores that sell SET decks. The mathematics of SET runs quite deep, and a Google search will reveal several sites and papers that explain some aspects of it.
    • SET can be played online. This is one possible website.
  • The Kevin Bacon Game
    • At the Oracle of Bacon, computer scientists maintain a interface to the IMDB which computes the distance between any two actors, which is the smallest number of links required to connect them, where two actors are linked if they have appeared in a movie together. Classically, one plays this game using Kevin Bacon as one actor. This raises the question of which actor averages the smallest distance to any other actor in the IMDB. The answer is Rod Steiger, while Kevin Bacon only ranks 1049.
    • A similar concept is used in mathematics with Paul Erdos and coauthorship of papers. Erdos wrote around 1500 papers and had 511 coauthors. My Erdos number is 5. Collaboration distances in math can be computed via this website
  • Simpson's Paradox
    • Simpson's Paradox is a phenomenon that occurs often in data analysis. Data seems to lead to one conclusion (e.g., treatment A is more effective than treatment B), but there is a hidden factor, and when the data is separated into groups according to the hidden factor, the opposite conclusion arises. This is related to sample bias and selection bias.
    • There are compelling real examples of Simpson's Paradox at Wikipedia, in a paper of Tom Moore's, the Exploring Data website, and Darci Kracht's website. The examples come from such varied sources as: kidney stone treatments; racial bias in death penalty decisions; airline ontime flight data; death rates in smokers; sex bias in Berkeley graduate school admissions; and batting averages in baseball.
  • Mean vs. Median
    • An article in the New York Times by Gina Kolata discusses the mathematical issues surrounded data on the mean number of sex partners reported by men and women.
    • An article on Slate.com by Jordan Ellenberg points out that one needs to be careful to make a distinction between the median and the mean, and that while the mean sex partners for men and women must agree, the medians can be different.
    • I point out that they are both missing a crucial assumption, which is that the number of men and women are the same. Granted, in most populations this is roughly the case, so the means must be close to each other, but both articles attempt to state mathematical theorems, so significantly more care should have been taken by both authors.
  • (Bayesian) Spam Filters
  • The Professor, the Prosecutor, and the Blonde
    • Keith Devlin and Gary Lorden have written a book entitled "The Numbers Behind NUMB3RS: Solving Crimes with Mathematics". Keith Devlin wrote an abridged excerpt from the book for his column, Devlin's Angle, on the 1964 assault conviction of a couple based on a ludicrous probabilistic argument by the prosecution.
    • The seriously fascinating judicial decision reversing the conviction can be found here
  • Card-shuffling machines
    • Ivars Peterson has a MathTrek column here about the mathematics of card-shuffling machines. In particular, he discusses how Persi Diaconis and Susan Holmes were asked to analyze the prototype of a card-shuffling machine, and they showed that it did a poor job of randomizing the deck of cards. To demonstrate this, Holmes developed a very simple card guessing game in which the expected number of correct guesses after one use of the shuffling machine was about twice the number of correct guesses when done with a totally randomized deck. This shift in odds is easily enough for a blackjack player to develop a betting system to "beat" the casino.
  • Optimizing your life partner
    • This page gives a solution to the following problem: Suppose there are n objects, one of which is better than all the others in some way. The objects are presented to you one at a time, and you get precisely one opportunity to choose the object currently being presented to you. How do you maximize your chances of choosing the best object? The above link states this problem in the somewhat sexist context of "Optimizing your wife", in which the "objects" are potential wives that you will meet over the course of your life. The analysis, while perfectly correctly mathematically, has questionable real-life applications. But it is not impossible to imagine a situation (say, house-hunting in a really tight market) in which this type of strategy might be reasonable.
  • Stable marriages
    • The stable matching, or stable marriage, algorithm is actually used to match up hospitals with residents. In its simplest form, the problem assumes that each of N hospitals ranks the residents in order of preference and that each of N residents ranks the hospitals in order of preference. We want to pair up hospitals and residents so that there does not exist an unmatched hospital A and a resident B so that hospital A prefers resident B to its resident and resident B prefers hospital A to his/her hospital (such a matching is unstable in the sense that the resident and hospital would be tempting to make a private deal).
    • The program is called the National Resident Match Program.
    • Wikipedia has a page describing the problem.
  • DNA profiling
    • The discussion for this special topics centers around the accuracy of DNA matching, the chance that two people's DNA will result in the same test data, and the admissibility of certain DNA evidence in court.
    • These issues are discussed in this article and this article by Keith Devlin and this writeup by Dave Trautman.
  • Benford's Law (or How to Embezzle Better)
    • Benford's Law is a probability distribution on the digits 1-9 that often approximates the distribution of the leading non-zero digits in a list of numbers. In this distribution, 1 occurs as the leading digit roughly 30 percent of the time, while 9 occurs roughly 4.6 percent of the time.
    • Benford's Law tends to work well with numerical data satisfying 3 criteria: the data should measure the size of something (the Dow Jones Industrial Average, lengths of rivers, populations of cities, sizes of incomes); the data should not have fixed upper and lower bounds (other than 0, say) (unlike heights of people, for example, which are roughly between 36 inches and 84 inches); and the data should not been assigned in any way (unlike Social Security numbers or credit card numbers).
    • One heuristic explanation for Benford's Law comes from the stock market. Consider the year-end closing numbers for the Dow Jones Industrial Average over the past 100 years. Suppose, on average, that the Dow Jones Industrial Average has gain 8 percent each year (this is totally made-up). Well, to go from 1000, say, to 2000 requires a 100 percent gain, which would take 10 or 11 years. But to go from 8000 to 9000 requires a 12.5 percent gain, which would take over 1.5 years. So the DJIA should generally spend a lot more time at levels where the leading digit is small than at levels where the leading digit is large. If you formalize this, you get Benford's Law.
    • There is a video lecture available here about Benford's Law.
  • A Probablity Puzzle
    • Here is a great probability puzzle. You have an adversary who chooses (via any means she wants) two distinct real numbers x and y. She flips a fair coin. If it comes up heads, she tells you what x is. If it comes up tails, she tells you what y is. Your goal is to say if the number she tells you is larger or smaller than her other number. Find a strategy that will succeed over 50% of the time. In fact, your strategy should succeed over 50% of the time even if you tell your adversary the strategy before she picks the numbers.
  • Census
    • There have been controversial proposals to change the method by which the U.S. Census Bureau conduct the census every decade. The proposals involve the use of statistic sampling to improve the accuracy of the results, a method supported by most statisticians and Democrats and opposed by most Republicans and a few statisticians. A starting place for more information is this article from Science News.
  • Baseball Streaks
  • Card Shuffling
    • There is a great deal to say about the mathematics of card shuffling. Much of it is summarized in a book by Brent Morris called "Magic Tricks, Card Shuffling, and Dynamic Computer Memories," while a Google search will reveal many references, often to articles about or by Persi Diaconis.
    • A perfect (out-)shuffle consists of cutting a deck of 52 cards exactly in half, and perfectly interleaving the two halves (with the original top card staying on top and the original bottom card staying on the bottom). It is a curious and surprising fact that only 8 perfect out-shuffles are needed to return the deck to its original position. This can be vividly demonstrated by writing a word in black marker on the side of the deck. The word will become mixed up for seven shuffles and restored on the eighth shuffle (make sure to see what it looks like after 4,6, and 7 shuffles, though). Very few people in the world can perfectly shuffle a full deck of cards (in a smooth way). However, the same trick works for an 18-card deck, and such a deck is significantly easier to shuffle perfectly.
    • Mathematical models are all(?) Markov chain models, because the result of a shuffle depends only on the previous state of the deck. Using the GSR model of random riffle shuffling, Bayer and Diaconis showed that asymptotically (3/2) log n shuffles are needed to randomize a deck of n cards pretty well. For a deck of 52 cards, essentially 7 shuffles are needed. You can read about this result in an article entitled "Trailing the Dovetail Shuffle to its Lair."
    • There are ways to demonstrate that 3 or 4 shuffles really are not enough. When bridge tournament hands began being computer-dealt, players found the bridge hands had much wilder distributions. It turns out that when too few shuffles are performed, the bridge hands tend to have more boring distributions than expected.
    • One nice trick to do is the following. Order a deck of cards as Ace of spades, ..., 2 of spades, A of hearts, ..., 2 of hearts, ..., 2 of clubs. Do three riffle shuffles, cut the deck, take the top card off the deck and put it in the middle 2/3 or so of the deck. Now look at the cards in the deck. Reading from left-to-right, find the Ace of spades, King of spades, ... until you reach the right-most card. Read again from left-to-right, starting with the card coming after the last card you found. Repeat. Exactly once during this procedure you will scan the deck left-to-right and only find one card that is next in the sequence. This is the card that was taken off the top.
  • Google PageRank
    • Google PageRank algorithm gives a ranking of all webpages on the Internet that is intended to signify the relative importance of the webpages. This is a key factor in determining how webpages are ranked in Google search results. While the full algorithm is no doubt quite complex, the basic idea of the algorithm is described by a Markov chain whose states are the webpages. A quick Google search will show many sites with descriptions of the algorithms; one such site is here.
  • NCAA Football Rankings
    • For a nice, reasonable way to rank NCAA football teams using a Markov chain algorithm (very similar to Google PageRank), see this page
  • Monopoly
    • The squares you land on in successive turns in Monopoly can be modeled as a Markov chain. This lets one compute the long-term probability of being on a particular square, which is quite important for developing a good Monopoly strategy. There are several websites with analysis with varying degrees of complexity. Check out the analysis here or here.