This was originally written on Nov 25, 2013, for the probability theory course I was serving as TA.

Converted from .tex using latex2wp.

In this note, we explained why the conditional expectation of a random variable given a -field can be seen as “smoothed version of over ” (in Example 2), and we briefly related the definition of conditional expectation to the elementary notation.

**1. Preliminaries**

Let and be two measruable spaces (that is to say, is a -field on , and on ), and let be a mapping from to . We say is -measurable, if . Usually, when there is no confusion, we abbreviate is -measurable as is -measurable. Another convention is that, when people write , sometimes they are implying is -measurable (and we’ll adopt this convention in this note).

**Example 1** Say , i.e. assume is -measurable, where denotes the Borel -field on the real line. Consider the following figure (see top of next page), where the large rectangle represents , and the small grids represent , so in this case the -field consists of all the small cells, and all combinations of them. In this case, say is one of the “finest” element in , indicated in red (the “finest” here means that there is no nonempty subset of that is also in ), then ** must be constant on **, i.e. there exists some such that . This is because, if could take two values on , say and for some where , then must be a part of and maybe along with some other part of outside , illustrated in blue, and hence this pullback cannot be in (recall that consists of all the small cells, and all combinations of them), which violates the assumption that is -measurable. Note that this observation is also true when the target space is a general , i.e., when , because is -measurable, must be constant on each “finest” piece of .

In particular, take to be (recall that is defined as the smallest -field such that is measurable). Since is always measurable , we know that roughly speaking is constant on each “finest” piece of . To me, this is why people always say that “contains the information” of : by knowing how “fine” is, one knows how “complicated” is, i.e., how many different values could take.

However, do note that this observation is valid when or is “discrete” in the sense that you can identify the “finest” grid, like in the above example. Otherwise, if say , then we cannot find a “finest” piece of – you can say a single number in is a “finest” piece of , but this won’t be useful, because will always be constant on a single point in the sample space.

**2. Illustration of conditional expectation**

(Billinsley, Section 34, P445) Suppose is an integrable random variable on , and that is a sub -field of (i.e. ). Then there exists a function , called the conditional expectation of given , denoted as , such that has the following two properties:

- is -measurable and integrable;
- satisfies the following equation:

One can show that the conditional expectation is a.s. unique, i.e., if and both satisfy the above two conditions, then a.s.

In the following example, we illustrate how the conditional expectation can be seen as the “smoothed version of over ”.

**Example 2** Suppose is an integrable random variable on , and that is a sub -field of . The following two figures represent and , respectively. So contains all the 24 rectangles with grey edges and all combinations of them, while contains all the 6 rectangles with red edges and all combinations of them. Note that is a “finer” partition of then .

Now, consider the 4 grey cells in the figure, and call their union , note that , and can be seen as a “finest” piece of . Denote . By definition is -measurable, so need to be constant on (as explained in Example 1). Also, by definition we need to satisfy the equation:

Using is constant on and is constant on each , we have (with a little abuse of notation, denote the constant value of on as , and similar to )

i.e.

This means is the average of the 4 values that takes on ‘s, with weight proportional to the probability of . This observation extends to other cells of the grid as well.To conclude what we get from this example, we find that on each cell of , is constant, and the constant value equals the weighted average of the values can take within this cell. So with this observation, we can say that is a “weighted average version of over ”.

More generally, if is a rather complicated -field than in this example, when considering , instead of calling it a “weighted average version of over ”, we can call it a ““smoothed version of over ”. In other words, tries its best to perform just like , under the constraint that needs to be -measurable.

One can also substitute by for some , assuming , and the logic follows the same way.

**3. Relation to conditional expectation defined in introductory probability course**

How is the above definition of conditional expectation related to the definition in introductory probability courses, i.e. ?

Given , let , and , with . For the usual , consider it as a function of , and denote it by , i.e., . Now let , as in the following figure (figure adopted from R. Ash’s Probability and Measure Theory):

One can show that the random variable here serves as the conditional expectation . For detailed proof, see R. Ash’s Probability and Measure Theory (2nd edition) Section 5.4, P215-216.

**Reference**

- Billingsley, P. (2008). Probability and measure. John Wiley & Sons.
- Ash, R. B., & Doleans-Dade, C. A. (2000). Probability and measure theory. Access Online via Elsevier.