“Prediction” and “estimation” indeed are sometimes used interchangeably in non-technical writing and they seem to function similarly, but there is a sharp distinction between them in the standard model of a statistical problem. **An estimator uses data to guess at a parameter while a predictor uses the data to guess at some random value that is not part of the dataset.** For those who are unfamiliar with what “parameter” and “random value” mean in statistics, the following provides a detailed explanation.

In this standard model, data are assumed to constitute a (possibly multivariate) observation x of a random variable X whose distribution is known only to lie within a definite set of possible distributions, the “states of nature”. An *estimator* t is a mathematical procedure that assigns to each possible value of x some property t(x) of a state of nature θ, such as its mean μ(θ). Thus **an estimate is a guess about the true state of nature.** We can tell how good an estimate is by comparing t(x) to μ(θ).

A *predictor* p(x) concerns the independent observation of another random variable Z whose distribution is related to the true state of nature. **A prediction is a guess about another random value.** We can tell how good a particular prediction is only by comparing p(x) to the value realized byZ. We hope that *on average* the agreement will be good (in the sense of averaging over all possible outcomes x *and* simultaneously over all possible values of Z).

**Ordinary least squares affords the standard example.** The data consist of pairs (xi,yi)associating values yi of the dependent variable to values xi of the independent variable. The state of nature is specified by three parameters α, β, and σ: it says that each yi is like an independent draw from a normal distribution with mean α+βxi and standard deviation σ. α, β, and σ are parameters (numbers) believed to be fixed and unvarying. Interest focuses on α (the intercept) and β (the slope). The OLS estimate, written (α^,β^), is good in the sense that α^ tends to be close to α and β^ tends to be close to β, *no matter what the true (but unknown) values of α and β might be*.

OLS *prediction* consists of observing a *new* value Z=Y(x) of the dependent variable associated with some value x of the independent variable. x might or might not be among the xi in the dataset; that is immaterial. One intuitively good prediction is that this new value is likely to be close to α^+β^x. Better predictions say *just how close* the new value might be (they are called prediction intervals). They account for the fact that α^ and β^ are uncertain (because they depend mathematically on the random values (yi)), that σ is not known for certain (and therefore has to be estimated), as well as the assumption that Y(x) has a normal distribution with standard deviation σ and mean α+βx (note the absence of any hats!).

Note especially that this prediction has *two separate* sources of uncertainty: uncertainty in the data (xi,yi) leads to uncertainty in the estimated slope, intercept, and residual standard deviation (σ); in addition, there is uncertainty in just what value of Y(x) will occur. This additional uncertainty–because Y(x) is random–characterizes predictions. A prediction may *look* like an estimate (after all, α^+β^x *estimates* α+βx 🙂 and may even have the very same mathematical formula (p(x) can sometimes be the same as t(x)), but *it will come with a greater amount of uncertainty than the estimate.*

Here, then, in the example of OLS, we see the distinction clearly: an *estimate* guesses at the parameters (which are fixed but unknown numbers), while a *prediction* guesses at the value of a random quantity. The source of potential confusion is that the prediction usually builds on the estimated parameters and might even have the same formula as an estimator.

**In practice, you can distinguish estimators from predictors in two ways:**

*purpose*: an estimator seeks to know a property of the true state of nature, while a prediction seeks to guess the outcome of a random variable; and*uncertainty*: a predictor usually has larger uncertainty than a related estimator, due to the added uncertainty in the outcome of that random variable. Well-documented and described predictors therefore usually come with uncertainty bands–prediction intervals–that are wider than the uncertainty bands of estimators, known as confidence intervals. A characteristic feature of prediction intervals is that they can (hypothetically) shrink as the dataset grows, but they will not shrink to zero width–the uncertainty in the random outcome is “irreducible”–whereas the widths of confidence intervals will tend to shrink to zero, corresponding to our intuition that the precision of an estimate can become arbitrarily good with sufficient amounts of data.

In applying this to assessing potential investment loss, first consider the purpose: do you want to know how much you might actually lose on *this* investment (or *this* particular basket of investments) during a given period, or are you really just guessing what is the *expected* loss (over a large universe of investments, perhaps)? The former is a prediction, the latter an estimate. Then consider the uncertainty. How would your answer change if you had nearly infinite resources to gather data and perform analyses? If it would become very precise, you are probably estimating the expected return on the investment, whereas if you remain highly uncertain about the answer, you are making a prediction.