6.2 Maximum Likelihood Estimation

Definition 6.1 Suppose \(X_1, \dots, X_n \sim f_\theta\). The likelihood function is defined by \[ \mathcal{L}_n(\theta) = \prod_{i = 1}^n f (X_i; \theta) \,. \] The log-likelihood function is defined by \[ \ell_n (\theta) h = \ln \mathcal{L}_n (\theta) \,. \]

The maximum likelihood estimator MLE, denoted by \(\hat \theta_n\), is the value of \(\theta\) that maximizes \(\mathcal{L}_n(\theta)\).

Notation. Another common notation for the likelihood function is \[ L(\theta| X) = \mathcal{L}_n(\theta).\]

Example 6.3 Let \(X_1, \dots, X_n\) be sample from \(\mathrm{Bernoulli}(p)\). Use MLE to find an estimator for \(p\).

Example 6.4 Let \(X_1, \dots, X_n\) be sample from \(N(\theta, 1)\). Use MLE to find an estimator for \(\theta\).

Exercise 6.2 Let \(X_1, \dots, X_n\) be sample from \(Uniform(0,\theta)\), where \(\theta >0\).

  1. Find the MLE for \(\theta\).

  2. Find an estimator by the method of moments.

  3. Compute the mean and the variance of the two estimators above.

Theorem 6.2 Let \(\tau = g(\theta)\) be a bijective function of \(\theta\). Suppose that \(\hat \theta_n\) is the MLE of \(\theta\). Then \(\hat \tau_n = g(\hat \theta_n)\) is the MLE of \(\tau\).

To discuss about the consistency of the MLE, we define the Kullback-Leibler distance between two pdf \(f\) and \(g\).

\[ D(f,g) = \int f(x) \ln \left( \frac{f(x)}{g(x)} \right) \, dx.\]

Abusing notation, we will write \(D(\theta, \varphi)\) to mean \(D(f(x;\theta), f(x;\varphi))\).

We say that a model \(\mathcal{F}\) is identifiable if \(\theta \not= \varphi\) implies \(D(\theta, \varphi) > 0\).

Theorem 6.3 Let \(\theta_{\star}\) denote the true value of \(\theta\). Define \[ M_n(\theta)=\frac{1}{n} \sum_i \log \frac{f\left(X_i ; \theta\right)}{f\left(X_i ; \theta_{\star}\right)} \] and \(M(\theta)=-D\left(\theta_{\star}, \theta\right)\). Suppose that \[ \sup _{\theta \in \Theta}\left|M_n(\theta)-M(\theta)\right| \to 0 \] in probability and that, for every \(\epsilon>0\), \[ \sup _{\theta:|\theta-\theta,| \geq \epsilon} M(\theta)<M\left(\theta_{\star}\right) . \]

Let \(\widehat{\theta}_n\) denote the MLE. Then \(\widehat{\theta}_n \to \theta_{\star}\) in probability.

Exercise 6.3 Let \(X_1, \ldots, X_n\) be a random sample from a distribution with density: \[ p(x; \theta) = \theta x^{-2}, \quad 0 < \theta \leq x < \infty. \]

  1. Find the MLE for \(\theta\).

  2. Find the Method of Moments estimator for \(\theta\).

Exercise 6.4 Let \(X_1, \ldots, X_n \sim \text{Poisson}(\lambda)\).

  1. Find the method of moments estimator, the maximum likelihood estimator, and the Fisher information \(I(\lambda)\).

  2. Use the fact that the mean and variance of the Poisson distribution are both \(\lambda\) to propose two unbiased estimators of \(\lambda\).

  3. Show that one of these estimators has a larger variance than the other.