# Illustration of Conditional Expectation: a random variable smoothed on a sigma field

This was originally written on Nov 25, 2013, for the probability theory course I was serving as TA.

Converted from .tex using latex2wp.

In this note, we explained why the conditional expectation of a random variable ${Y}$ given a ${\sigma}$-field ${\mathscr{G}}$ can be seen as “smoothed version of ${Y}$ over ${\mathscr{G}}$” (in Example 2), and we briefly related the definition of conditional expectation to the elementary ${E\left(Y\mid X=x\right)}$ notation.

1. Preliminaries

Let ${(\Omega,\mathscr{F})}$ and ${(\Omega',\mathscr{F}')}$ be two measruable spaces (that is to say, ${\mathscr{F}}$ is a ${\sigma}$-field on ${\Omega}$, and ${\mathscr{F}'}$ on ${\Omega'}$), and let ${X}$ be a mapping from ${\Omega}$ to ${\Omega'}$. We say ${X}$ is ${\mathscr{F}/\mathscr{F}'}$-measurable, if ${X^{-1}(B):=\left\{ \omega\in\Omega:X\left(\omega\right)\in B\right\} \in\mathscr{F},\forall B\in\mathscr{F}'}$. Usually, when there is no confusion, we abbreviate ${X}$ is ${\mathscr{F}/\mathscr{\mathscr{F}}'}$-measurable as ${X}$ is ${\mathscr{F}}$-measurable. Another convention is that, when people write ${X:\left(\Omega,\mathscr{F}\right)\rightarrow(\Omega',\mathscr{F}')}$, sometimes they are implying ${X}$ is ${\mathscr{F}/\text{\ensuremath{\mathscr{F}}'}}$-measurable (and we’ll adopt this convention in this note).

Example 1 Say ${X:\left(\Omega,\mathscr{F}\right)\rightarrow\left(\mathbb{R},\mathscr{B}\left(\mathbb{R}\right)\right)}$, i.e. assume ${X}$ is ${\mathscr{F}/\text{\ensuremath{\mathscr{B}\left(\mathbb{R}\right)}}}$-measurable, where ${\mathscr{B}\left(\mathbb{R}\right)}$ denotes the Borel ${\sigma}$-field on the real line. Consider the following figure (see top of next page), where the large rectangle represents ${\Omega}$, and the small grids represent ${\mathscr{F}}$, so in this case the ${\sigma}$-field ${\text{\ensuremath{\mathscr{F}}}}$ consists of all the small cells, and all combinations of them. In this case, say ${A}$ is one of the “finest” element in ${\mathscr{F}}$, indicated in red (the “finest” here means that there is no nonempty subset of ${A}$ that is also in ${\mathscr{F}}$), then ${X}$ must be constant on ${A}$, i.e. there exists some ${c\in\mathbb{R}}$ such that ${X(\omega)=c\;\forall\omega\in A}$. This is because, if ${X}$ could take two values on ${A}$, say ${X\left(\omega_{1}\right)=c_{1}}$ and ${X\left(\omega_{2}\right)=c_{2}}$ for some ${\omega_{1},\omega_{2}\in A}$ where ${c_{1}\neq c_{2}}$, then ${X^{-1}\left(\left\{ c_{1}\right\} \right)}$ must be a part of ${A}$ and maybe along with some other part of ${\Omega}$ outside ${A}$, illustrated in blue, and hence this pullback cannot be in ${\mathscr{F}}$ (recall that ${\mathscr{F}}$ consists of all the small cells, and all combinations of them), which violates the assumption that ${X}$ is ${\mathscr{F}/\text{\ensuremath{\mathscr{B}\left(\mathbb{R}\right)}}}$-measurable. Note that this observation is also true when the target space is a general ${(\Omega',\mathscr{F}')}$, i.e., when ${X:\left(\Omega,\mathscr{F}\right)\rightarrow(\Omega',\mathscr{F}')}$, because ${X}$ is ${\mathscr{F}}$-measurable, ${X}$ must be constant on each “finest” piece of ${\mathscr{F}}$.

In particular, take ${\mathscr{F}}$ to be ${\sigma\left(X\right)}$ (recall that ${\sigma\left(X\right)}$ is defined as the smallest ${\sigma}$-field such that ${X}$ is measurable). Since ${X}$ is always ${\sigma\left(X\right)}$ measurable , we know that roughly speaking ${X}$ is constant on each “finest” piece of ${\sigma\left(X\right)}$. To me, this is why people always say that ${\sigma\left(X\right)}$ “contains the information” of ${X}$: by knowing how “fine” ${\sigma\left(X\right)}$ is, one knows how “complicated” ${X}$ is, i.e., how many different values ${X}$ could take.

However, do note that this observation is valid when ${\sigma\left(X\right)}$ or ${\mathscr{F}}$ is “discrete” in the sense that you can identify the “finest” grid, like in the above example. Otherwise, if say ${\left(\Omega,\mathscr{F}\right)=\left(\mathbb{R},\mathscr{B}\left(\mathbb{R}\right)\right)}$, then we cannot find a “finest” piece of ${\mathscr{F}}$ – you can say a single number in ${\mathbb{R}}$ is a “finest” piece of ${\mathscr{B}\left(\mathbb{R}\right)}$, but this won’t be useful, because ${X}$ will always be constant on a single point in the sample space.

2. Illustration of conditional expectation

(Billinsley, Section 34, P445) Suppose ${Y}$ is an integrable random variable on ${\left(\Omega,\mathscr{F},P\right)}$, and that ${\mathscr{G}}$ is a sub ${\sigma}$-field of ${\mathscr{F}}$ (i.e. ${\mathscr{G}\subset\mathscr{F}}$). Then there exists a function ${Z:\Omega\rightarrow\Omega'}$, called the conditional expectation of ${Y}$ given ${\mathscr{G}}$, denoted as ${E\left(Y\mid\mathscr{G}\right)}$, such that ${Z}$ has the following two properties:

1. ${Z}$ is ${\mathscr{G}}$-measurable and integrable;
2. ${Z}$ satisfies the following equation:

$\displaystyle \int_{A}Z\, dP=\int_{A}Y\, dP,\quad\forall A\in\mathscr{G}. \ \ \ \ \ (1)$

One can show that the conditional expectation is a.s. unique, i.e., if ${Z}$ and ${W}$ both satisfy the above two conditions, then ${Z=W}$ a.s.

In the following example, we illustrate how the conditional expectation ${E\left(Y\mid\mathscr{G}\right)}$ can be seen as the “smoothed version of ${Y}$ over ${\mathscr{G}}$”.

Example 2 Suppose ${Y}$ is an integrable random variable on ${\left(\Omega,\mathscr{F},P\right)}$, and that ${\mathscr{G}}$ is a sub ${\sigma}$-field of ${\mathscr{F}}$. The following two figures represent ${\mathscr{F}}$ and ${\mathscr{G}}$, respectively. So ${\mathscr{F}}$ contains all the 24 rectangles with grey edges and all combinations of them, while ${\mathscr{G}}$ contains all the 6 rectangles with red edges and all combinations of them. Note that ${\mathscr{F}}$ is a “finer” partition of ${\Omega}$ then ${\mathscr{G}}$.

Now, consider the 4 grey cells ${A_{1},A_{2},A_{3},A_{4}}$ in the figure, and call their union ${A=\bigcup_{i=1}^{4}A_{i}}$, note that ${A\in\mathscr{G}}$, and ${A}$ can be seen as a “finest” piece of ${\mathscr{G}}$. Denote ${Z:=E\left(Y\mid\mathscr{G}\right)}$. By definition ${Z}$ is ${\mathscr{G}}$-measurable, so ${Z}$ need to be constant on ${A}$ (as explained in Example 1). Also, by definition we need ${Z}$ to satisfy the equation:

$\displaystyle \int_{A}Z\, dP=\int_{A}Y\, dP=\sum_{i=1}^{4}\int_{A_{i}}Y\, dP. \ \ \ \ \ (2)$

Using ${Z}$ is constant on ${A}$ and ${Y}$ is constant on each ${A_{i}}$, we have (with a little abuse of notation, denote the constant value of ${Z}$ on ${A}$ as ${Z(A)}$, and similar to ${Y}$)

$\displaystyle Z\left(A\right)P\left(A\right)=\sum_{i=1}^{4}Y\left(A_{i}\right)P\left(A_{i}\right), \ \ \ \ \ (3)$

i.e.

$\displaystyle Z\left(A\right)=\frac{1}{P\left(A\right)}\sum_{i=1}^{4}P\left(A_{i}\right)Y\left(A_{i}\right)=\sum_{i=1}^{4}\frac{P\left(A_{i}\right)}{P\left(A\right)}Y\left(A_{i}\right). \ \ \ \ \ (4)$

This means ${Z(A)}$ is the average of the 4 values that ${Y}$ takes on ${A_{i}}$‘s, with weight proportional to the probability of ${A_{i}}$. This observation extends to other cells of the grid as well.To conclude what we get from this example, we find that on each cell of ${\mathscr{G}}$, ${E\left(Y\mid\mathscr{G}\right)}$ is constant, and the constant value equals the weighted average of the values ${Y}$ can take within this cell. So with this observation, we can say that ${E\left(Y\mid\mathscr{G}\right)}$ is a “weighted average version of ${Y}$ over ${\mathscr{G}}$”.

More generally, if ${\mathscr{G}}$ is a rather complicated ${\sigma}$-field than in this example, when considering ${E\left(Y\mid\mathscr{G}\right)}$, instead of calling it a “weighted average version of ${Y}$ over ${\mathscr{G}}$”, we can call it a ““smoothed version of ${Y}$ over ${\mathscr{G}}$”. In other words, ${E\left(Y\mid\mathscr{G}\right)}$ tries its best to perform just like ${Y}$, under the constraint that ${E\left(Y\mid\mathscr{G}\right)}$ needs to be ${\mathscr{G}}$-measurable.

One can also substitute ${\mathscr{G}}$ by ${\sigma\left(X\right)}$ for some ${X}$, assuming ${\sigma\left(X\right)\subset\mathscr{F}}$, and the logic follows the same way.

3. Relation to conditional expectation defined in introductory probability course

How is the above definition of conditional expectation related to the definition in introductory probability courses, i.e. ${E\left(Y\mid X=x\right)}$?

Given ${\left(\Omega,\mathscr{F},P\right)}$, let ${Y:\left(\Omega,\mathscr{F}\right)\rightarrow\left(\mathbb{R},\mathscr{B}\left(\mathbb{R}\right)\right)}$, and ${X:\left(\Omega,\sigma\left(X\right)\right)\rightarrow\left(\Omega',\mathscr{F}'\right)}$, with ${\sigma\left(X\right)\subset\mathscr{F}}$. For the usual ${E\left(Y|X=x\right)}$, consider it as a function of ${x}$, and denote it by ${g\left(x\right)}$, i.e., ${g:\Omega'\rightarrow\mathbb{R},g\left(x\right)=E\left(Y|X=x\right)}$. Now let ${h:=g\circ X:\Omega\rightarrow\mathbb{R},h\left(\omega\right)=g\left(X\left(\omega\right)\right)}$, as in the following figure (figure adopted from R. Ash’s Probability and Measure Theory):

One can show that the random variable ${h}$ here serves as the conditional expectation ${E\left(Y\mid\sigma\left(X\right)\right)}$. For detailed proof, see R. Ash’s Probability and Measure Theory (2nd edition) Section 5.4, P215-216.

Reference

1. Billingsley, P. (2008). Probability and measure. John Wiley & Sons.
2. Ash, R. B., & Doleans-Dade, C. A. (2000). Probability and measure theory. Access Online via Elsevier.