3.1 Point Estimation
(Casella - Berger Chapter 7, Wasserman Chapter 6.1)
Definition 3.1 Let \(\{X_i\}\), \(i = 1, \dots, n\) be a sample. A point estimator of \(\{X_i\}\) is a function \(g(X_1, \dots, X_n)\).
The purpose of the point estimator is to provide the “best guess” of certain quantity of interest. Those quantities could be a parameter in a parametric model, a CDF, PDF,…
Typically, the quantity of interest is denoted by \(\theta\), the point estimator is denoted by \(\hat \theta\) or \(\hat \theta_n\). So, combined with the above definition, \[ \hat \theta_n = g(X_1, \dots, X_n).\] Note that, \(\hat \theta_n\) is still a random variable as this is a function of your sample data, which are RVs themselves.
Of course, we know that there are cases when samples suffer from biases. A way to measure biases is to compare the expected value of \(\hat \theta_n\) and the true value of the quantity of interest \(\theta\).
Definition 3.2 The bias of an estimator is defined by \[ b(\hat \theta_n) = \mathbb{E}(\hat \theta_n) - \theta. \] We say that \(\hat \theta_n\) is unbiased if \(b(\hat \theta_n) = 0\).
We also define the variance of an estimator by \[ \mathbb{V}_\theta(\hat \theta_n) = \mathbb{E}_\theta (\hat \theta_n - \mathbb{E}_\theta (\hat \theta_n))^2 .\]
The standard error (\(\mathrm{se}\) for short sometimes) is then \[ \mathrm{se}(\hat \theta_n) = \sqrt{\mathbb{V}_\theta (\hat \theta_n)}. \]
Classically, unbiased estimators received a lot of attention since people wanted to have unbiased samples. However, modern statistics has a different point of view: because data is large, it doesn’t matter if the samples are biased as long as the estimators converge to the true quantity of interest. This gives rise to the following definition
Definition 3.3 A point estimator \(\hat \theta_n\) of a parameter \(\theta\) is consistent if \(\hat \theta_n\) converges to \(\theta\) in probability.
Here comes the million-dollar question:
One possible approach is to use the so-called mean squared error.
Definition 3.4 The mean squared error of an estimator is defined by \[ MSE = \mathbb{E}_\theta (\theta - \hat \theta_n)^2 \,.\]
Theorem 3.1 (Bias-Variance decomposition) \[ MSE = b_\theta^2(\hat \theta_n) + \mathbb{V}_\theta(\hat \theta_n) \]
Exercise 3.1 Prove the Bias-Variance decomposition.
Theorem 3.2 If, as \(n\to \infty\), \(b_\theta^2(\hat \theta_n) \to 0\) and \(\mathbb{V}_\theta(\hat \theta_n) \to 0\), then \(\hat\theta_n\) is consistent.
Exercise 3.2 Prove the above theorem.
A big part of elementary statistics dealt with estimators being approximately related to the Normal distribution.
Definition 3.5 An estimator is said to be asymptotically Normal if \[ \frac{\hat \theta_n - \theta}{\mathrm{se}(\hat \theta_n)} \to N(0,1) \] in distribution.