For example, suppose we consider tossing a fair die. There are six possible numbers that could come up ("outcomes"), and, since the die is fair, each one is equally likely to occur. So we say each of these outcomes has probability 1/6. Since the event "an odd number comes up" consists of exactly three of these basic outcomes, we say the probability of "odd" is 3/6, i.e. 1/2.

More generally, if we have a situation (a "random process") in which there are n equally likely outcomes, and the event A consists of exactly m of these outcomes, we say that the probability of A is m/n. We may write this as "P(A) = m/n" for short.

This perspective has the advantage that it is conceptually simple for many situations. However, it is limited, since many situations do not have finitely many equally likely outcomes. Tossing a weighted die is an example where we have finitely many outcomes, but they are not equally likely. Studying people's incomes over time would be a situation where we need to consider infinitely many possible outcomes, since there is no way to say what a maximum possible income would be, especially if we are interested in the future.

To get the idea, suppose that we have a die which we are told is weighted, but we don't know how it is weighted. We could get a rough idea of the probability of each outcome by tossing the die a large number of times and using the proportion of times that the die gives that outcome to estimate the probability of that outcome.

This idea is formalized to define the probability of the event A as

P(A) = the limit as n approaches
infinity of m/n,

where n is the number of times the process (e.g., tossing the die) is performed, and m is the number of times the outcome A happens.

(Notice
that m and n stand for different things in this definition from what
they meant in Perspective 1.)where n is the number of times the process (e.g., tossing the die) is performed, and m is the number of times the outcome A happens.

In other words, imagine tossing the die 100 times, 1000 times, 10,000 times, ... . Each time we expect to get a better and better approximation to the true probability of the event A. The mathematical way of describing this is that the true probability is the limit of the approximations, as the number of tosses "approaches infinity" (that just means that the number of tosses gets bigger and bigger indefinitely). Example

This view of probability generalizes the first view: If we indeed have a fair die, we expect that the number we will get from this definition is the same as we will get from the first definition (e.g., P(getting 1) = 1/6; P(getting an odd number) = 1/2). In addition, this second definition also works for cases when outcomes are not equally likely, such as the weighted die. It also works in cases where it doesn't make sense to talk about the probability of an individual outcome. For example, we may consider randomly picking a positive integer ( 1, 2, 3, ... ) and ask, "What is the probability that the number we pick is odd?" Intuitively, the answer should be 1/2, since every other integer (when counted in order) is odd. To apply this definition, we consider randomly picking 100 integers, then 1000 integers, then 10,000 integers, ... . Each time we calculate what fraction of these chosen integers are odd. The resulting sequence of fractions should give better and better approximations to 1/2.

However, the empirical perspective does have some disadvantages. First, it involves a thought experiment. In some cases, the experiment could never in practice be carried out more than once. Consider, for example the probability that the Dow Jones average will go up tomorrow. There is only one today and one tomorrow. Going from today to tomorrow is not at all like rolling a die. We can only imagine all possibilities of going from today to a tomorrow (whatever that means). We can't actually get an approximation.

A second disadvantage of the empirical perspective is that it leaves open the question of how large n has to be before we get a good approximation. The example linked above shows that, as n increases, we may have some wobbling away from the true value, followed by some wobbling back toward it, so it's not even a steady process.

The empirical view of probability is the one that is used in most statistical inference procedures. These are called frequentist statistics. The frequentist view is what gives credibility to standard estimates based on sampling. For example, if we choose a large enough random sample from a population (for example, if we randomly choose a sample of 1000 students from the population of all 50,000 students enrolled in the university), then the average of some measurement (for example, college expenses) for the sample is a reasonable estimate of the average for the population.

However, subjective probability also has its downsides. First, since it is subjective, one person's probability (e.g., that the Dow Jones will go up tomorrow) may differ from another's. This is disturbing to many people. Sill, it models the reality that often people do differ in their judgments of probability.

The second downside is that subjective probabilities must obey certain "coherence" (consistency) conditions in order to be workable. For example, if you believe that the probability that the Dow Jones will go up tomorrow is 60%, then to be consistent you cannot believe that the probability that the Dow Jones will do down tomorrow is also 60%. It is easy to fall into subjective probabilities that are not coherent.

The subjective perspective of probability fits well with Bayesian statistics, which are an alternative to the more common frequentist statistical methods. (This website will mainly focus on frequentist statistics.)

The axiomatic perspective says that probability is any function (we'll call it P) from events to numbers satisfying the three conditions (axioms) below. (Just what constitutes events will depend on the situation where probability is being used.)

The three axioms of probability:

- 0 ≤ P(E) ≤ 1 for every allowable event E. (In other words, 0 is the smallest allowable probability and 1 is the largest allowable probability).
- The certain event has probability 1. (The certain event is the event "some outcome occurs." For example, in rolling a die, the certain event is "One of 1, 2, 3, 4, 5, 6 comes up." In considering the stock market, the certain event is "The Dow Jones either goes up or goes down or stays the same.")
- The probability of the union of mutually exclusive events is the sum of the probabilities of the individual events. (Two events are called mutually exclusive if they cannot both occur simultaneously. For example, the events "the die comes up 1" and "the die comes up 4" are mutually exclusive, assuming we are talking about the same toss of the same die. The union of events is the event that at least one of the events occurs. For example, if E is the event "a 1 comes up on the die" and F is the event "an even number comes up on the die," then the union of E and F is the event "the number that comes up on the die is either 1 or even."

P(1 comes up) + P( 2 comes up) +
... + P(6 comes up) = P(the certain event),

which is 1 (by Axiom 2). Since all six probabilities on the left
are equal, that common probability must be 1/6.