COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

# Random Variables and Probability Distributions

I. In most applications, a random variable can be thought of as a variable that depends on a random process. Here are some examples to help explain the concepts involved:

1. Toss a die and look at what number is on the side that lands up.
Tossing the die is an example of a random process; the number on top is the random variable.

2. Toss two dice and take the sum of the numbers that land up.
Tossing the dice is the random process; the sum is the random variable.

3. Toss two dice and take the product of the numbers that land up.
Tossing the dice is the random process; the product is the random variable.

Examples 2 and 3 together show that the same random process can be involved in two different random variables.

4. Randomly pick (in a way that gives each student an equal chance of being chosen) a UT student and measure their height.
Picking the student is the random process; their height is the random variable.

5. Randomly pick (in a way that gives each student an equal chance of being chosen) a student in a particular class and measure their height.
Picking the student is the random process; their height is the random variable.

Examples 4 and 5 illustrate that using the same variable (in this case, height) but different random processes (in this case, choosing from different populations) gives different random variables.

6. Measure the height of the third student who walks into this class.
In all the examples before this one, the random process was done deliberately; in Example 6, the random process is one that occurs naturally.1

Because Examples 5 and 6 involved different random processes, they are different random variables.

7. Toss a coin and see whether it comes up heads or tails.
Tossing the coin is the random process; the variable is heads or tails.

Example 7 shows that a random variable doesn't necessarily have to take on numerical values.

II. Usually, some values of a random variable occur more frequently than others. For example, if we are talking about heights of university students, heights of around 5' 7" are much more common that heights of around 4' or heights around 7'. In other words, some values of the random variable occur with higher probability than others. This can be represented graphically by the probability distribution of the random variable. For example, a random variable might have a probability distribution that looks like this:

The possible values for the random variable are along the horizontal axis. The height of the curve above a possible value roughly tells how likely the nearby values are. This particular distribution tells us that values of the random variable around 2 (where the curve is highest) are most common, and that very large values (where the curve is lowest) are uncommon. More precisely, the area under the curve between two values a and b is the probability that the random variable will take on values between a and b. In this example, we can see that the value of the random variable is much more likely to lie between 2 and 4 (where the curve is high) than between 12 and 14 (where the curve is low).

Notes:
1. See footnote 1 on the page What Is a Random Sample?