GLOSSARY

 

Bias. The difference between the expected value of an estimator and the true quantity being estimated. For example, if Y is a function of the data that estimates an unknown parameter q, the bias of Y is E(Y) - q.

 

Bin. A group of values of a continuous variable, used to partition the data into subsets. For example, event dates can be grouped so that each year is one bin, and all the events during a single year form a subset of the data.

 

c.d.f. Cumulative distribution function, typically denoted F

 

Cell. When the data are expressed in a table of counts, a cell is the smallest element of the table. Each cell has an observed count and, under some null hypothesis, an expected count. Each cell can be analyzed on its own, and then compared to the other cells to see if the data show trends, patterns, or other forms of nonhomogeneity. In a 1 ´ J table, as with events in time, each cell corresponds to one subset of the data. In a 2  J table, as with failures on demand, each data subset corresponds to two cells, one cell for failures and one for successes.

 

Confidence interval. In the frequentist approach, a 100p% confidence interval has a probability p of containing the true unknown parameter. This is a property of the procedure, not of any one particular interval. Any one interval either does or does not contain the true parameter. However, any random data set leads to a confidence interval, and 100p% of these contain the true parameter.

 

Conjugate. A family of prior distributions is conjugate, for data from a specified distribution, if a prior distribution in the family results in the posterior distribution also being in the family. A prior distribution in the conjugate family is called a conjugate prior. For Poisson data, the gamma distributions are conjugate. For binomial data, the beta distributions are conjugate.

 

Credibility interval. In the Bayesian approach, a 100p% credibility interval contains 100p% of the Bayesian probability distribution. For example, if q has been estimated by a posterior distribution, the 5th and 95th percentiles of this distribution contain 90% of the probability, so they form a (posterior) 90% credibility interval. It is not required to have equal probability in the two tails (5% in this example), although it is very common. For example, the interval bounded by 0 and the 90th percentile would also be a 90% credibility interval, a one-sided interval. Bayes credibility intervals have the same intuitive purpose as frequentist confidence intervals, but their definitions and interpretations are different.

 

Cumulative distribution function (c.d.f.). For a random variable X, the c.d.f. F(x) = Pr(X £ x). If X is discrete, such as a count of events, the c.d.f. is a step function, with a jump at each possible value of X. If X is continuous, such as a duration time, the c.d.f. is continuous.

 

Density. A function that is integrated to yield a probability for a continuously distributed random variable. If X has density f, then Pr(a £ X £ b) = . The density f is related to the c.d.f. F by f(x) = F¢(x) and F(x) = . The density is sometimes referred to as the p.d.f., the probability density function.

 

Duration. The time until something of interest happens, such as failure to run, recovery from a failure, restoration of offsite power, etc.

 

Estimate, estimator. In the frequentist approach, an estimator is a function of random data, and an estimate is the particular value taken by the estimator for a particular data set. That is, the term estimator is used for the random variable, and estimate is used for a number. The usual convention of using upper case letters for random variables and lower case letter for numbers is often ignored in this setting, so the context must be used to show whether a random variable or a number is being discussed.

 

Event rate. See failure rate for repairable systems, and replace the word “failure” by “event.”

 

Expected value. The mean, or average, value of a random variable. If X is a discrete random variable, taking values xi with probability f(xi) = Pr(X = xi), the expected value of X is E(X) = Si xif(xi). If X is a continuously distributed random variable with density f, the expected value of X is E(X) = . The expected value of X is also called the expectation of X or the mean of X.

 

Exposure time. The length of time during which the events of interest can possibly occur. The units must be specified, such as reactor-critical-years, site-calendar-hours, or system-operating-hours. Also called time at risk.

 

Failure on demand. Failure when a standby system is demanded, even though the system was apparently ready to function just before the demand. It is modeled as a random event, having some probability, but unpredictable on any one specific demand. Compare standby failure.

 

Failure rate. For a repairable system, the failure rate, l, is such that lDt is approximately the expected number of failures in a short time period from t to t + Dt. If simultaneous failures do not occur, lDt is also approximately the probability that a failure will occur in the period from t to t + lDt. In this setting, l is also called a failure frequency. For a nonrepairable system, lDt is approximately the probability that an unfailed system at time t will fail in the time period from t to t + lDt. In this setting, l is also called the hazard rate.

 

Frequency. For a repairable system, frequency and rate are two words with the same meaning, and are used interchangeably. If simultaneous events do not occur, the frequency satisfies l(t)Dt » Pr(an event occurs between t and t + Dt), for small Dt.

 

Hazard rate. For a nonrepairable system, hazard rate and failure rate are two phrases with the same meaning, used interchangeably. The hazard rate h(t) satisfies h(t)Dt » Pr(t < T £ t + Dt ½ T > t), where Dt is small and T denotes the duration time of interest.

 

Hypothesis. If the evidence against the null hypothesis, H0, is strong, H0 is rejected in favor of the alternative hypothesis, H1. If the evidence against H0 is not strong, H0 is “accepted”; that is, it is not necessarily believed, but it is given the benefit of the doubt, it is not rejected.

 

Improper distribution. A function that is treated as a probability distribution function (p.d.f.), but which is not a p.d.f. because it does not have a finite integral. For example, a uniform distribution (constant p.d.f.) on an infinite range is improper. Improper distributions are sometimes useful prior distributions, as long as the resulting posterior distribution is a proper distribution.

 

Interval. The notation (a, b) denotes the interval of all points from a to b. This is enough for all the applications in this handbook. However, sometimes an additional refinement is added, giving a degree of mathematical correctness that most readers may ignore: The standard notation in mathematics is that (a, b) includes the points between a and b, but not the two end points. In set notation, it is {x | a < x < b}. Square brackets show that the end points are included. Thus, (a, b] includes b but not a, {x | a < x £ b}.

 

Likelihood. For discrete data, the likelihood is the probability of the observations. For continuous data, the likelihood is the joint density of the observations, which is the product of the densities of the individual observations if the observations are independent. When some of the observations are discrete and come are continuous, the likelihood is the product of the two types.

 

Maximum likelihood estimation. A general technique for estimating unknown parameters. Suppose that X is a random variable with distribution governed by a parameter q. Given data, consider the likelihood as a function of q. Estimate q by the value that makes the likelihood largest (maximizes the likelihood). This estimate, a function of the data, is often denoted . Maximum likelihood estimation typically yields a simple and intuitive formula. With large data sets, the maximum likelihood estimator is known to have good properties, such as being asymptotically unbiased and asymptotically minimum variance. With small data sets, the estimate is not necessarily very good. The acronym MLE is used both for the maximum likelihood estimate and the maximum likelihood estimator, depending on the context (see estimate in glossary).

 

Mean. For a random variable X, the mean is the expectation. For a set of observations, the mean, also called the sample mean, is the average of the observed values.

 

Mean square error (MSE). The expected squared difference between an estimator and the true quantity being estimated. For example, if Y is a function of the data that estimates a parameter q, the mean squared error (MSE) of Y is E[(Y - q)2]. It can be shown that the MSE(Y) = var(Y) + [bias(Y)]2.

 

Median. The 50th percentile.

 

Moments. The expected values of specified powers of a random variable. The moments are numbers that help characterize the distribution of the random variable. The first moment of a random variable X is the mean, E(X). In general, the ith moment is E(Xi ). For i > 1, the ith central moment is defined as E[(X - E(X))i]. The second central moment is called the variance, and its square root is called the standard deviation, often denoted s. The skewness is the third central moment divided by s3. The kurtosis is the fourth central moment divided by s4. Moments, and simple functions of moments such as the standard deviation, skewness, and kurtosis, are the most commonly used numbers for characterizing probability distributions. The other commonly used numbers are percentiles.

 

Nonrepairable system. A system that can only fail once, after which data collection stops. An example is a standby safety system, if the failure to run cannot be recovered during the mission of the system. Data from a nonrepairable system consist of data from identical copies of the system, not collected in any particular sequence. For example, data from a safety system may be collected, with each run starting with the system nominally operable, and the system either running to completion of the mission or failing before that time. The successive demands to run are regarded as demands on identical copies of the system, having no time sequence. See repairable system.

 

Outage, outage time. An outage is an event when a system is unavailable, out of service for some reason. The outage time is the duration of the event. Compare unavailability.

 

p.d.f. Probability distribution function for discrete random variable, often denoted by f, or probability density function for continuous random variable, typically denoted by f.

 

Percentile. Consider a continuous distribution with density (p.d.f.) f and cumulative distribution function (c.d.f.) F. The 100qth percentile is the value x such that F(x) = q, or equivalently

.

If the distribution is concentrated on the positive line, the lower limit of integration may be replaced by 0. The 100qth percentile is equal to qth quantile. For example, the 95th percentile equals the 0.95 quantile. If X has a discrete distribution, a percentile may not be unique. The 100qth percentile is defined as x such that Pr(X £ x) ³ 100q% and Pr(X ³ x) ³ 100(1 - q)%. Similarly, for a finite sample, the 100qth percentile is defined as x such that at least 100q% of the values in the sample are x or smaller, and at least 100(1 - q)% are x or larger. For example, if a sample is a set of three numbers, {1.2, 2.5, 5.9}, the median (corresponding to q = 0.5) is 2.5, because at least half of the numbers are 2.5 or smaller and at least half are 2.5 or larger. If the sample has four numbers, {1.2, 2.5, 2.8, 5.9}, then any number from 2.5 to 2.8 can be considered a median. In this case, the average, (2.5 + 2.8)/2, is often chosen.

 

Poisson process. A process in which events (such as failures) occur in a way that satisfies the three assumptions given in Section 4.2.2 and Appendix A5 (homogeneous Poisson process) or the assumptions given in Section 8.2.2 (nonhomogeneous Poisson process). The number of events in any time period is a Poisson random variable.

 

Posterior distribution. A distribution that quantifies, in a Bayesian way, the belief about a parameter after data have been observed. It reflects both the prior belief and the observed data.

 

Power of a test. The probability that the test will reject H0 when H0 is false. If many possible alternatives to H0 are considered, the power depends on the particular alternative.

 

Pr( ). The probability function. Pr(event) is the probability that the event occurs.

 

Prior. A colloquial abbreviation for prior distribution.

 

Prior distribution. A distribution that quantifies, in a Bayesian way, the belief about a parameter before any data have been observed.

 

Probability density function (p.d.f.). For a continuous random variable X, the p.d.f. f(x)Dx » Pr(x < X £ x + Dx) for small Dx. The p.d.f. is related to the c.d.f. by f(x) = F¢(x), the derivative.

 

Probability distribution function (p.d.f.). For a discrete random variable X, the p.d.f. f(x) = Pr(X = x).

 

p-value. In the context of testing, the p-value is the significance level at which the data just barely cause H0 to be rejected. H0 is rejected when a test statistic is extreme, and the p-value is the probability (under H0) that the random test statistic would be at least as extreme as actually observed.

 

Quantile. Consider a continuous distribution with density (p.d.f.) f and cumulative distribution function (c.d.f.) F. The qth quantile is the value x such that F(x) = q, or equivalently

.

If the distribution is concentrated on the positive line, the lower limit of integration may be replaced by 0. The qth quantile is equal to the (100q)th percentile. For example, the 0.95 quantile equals the 95th percentile. If X has a discrete distribution, a quantile may not be unique. See percentile for a fuller explanation.

 

Random sample. x1, ..., xn are a random sample if they are the observed values of X1, ..., Xn, where the Xis are statistically independent of each other and all have the same distribution.

 

Rate. See frequency.

 

Relative standard deviation. The standard deviation, expressed as a fraction of the mean. The relative standard deviation of X is st.dev.(X)/E(X). Some authors call it the coefficient of variation, and express it as a percent.

 

Relative variance. The square of the relative standard deviation. The relative variance of X is var(X)/[E(X)]2.

 

Renewal process. A process in which events (such as failures or restorations) occur in a way such that the times between events are independent and identically distributed. For example, if the process consists of failures and nearly instantaneous repairs, each repair restores the system to good-as-new condition.

 

Repairable system. A system that can fail repeatedly. Each failure is followed by repair, and the possibility of another failure sooner or later. An example is a power plant, with initiating events counted as the “failures.” After such an event, the plant is brought back up to its operating condition, and more initiating events can eventually occur. The sequence of events is important, and can reveal whether long term degradation or improvement occurs. See nonrepairable system.

 

Residual. When a model is fitted to data, the residual for a data point is the data value minus the fitted value (the estimated mean). The residuals together can be used to quantify the overall scatter of the data around the fitted model. If the assumed model assigns different variances to different data points, the standardized residuals are sometimes constructed. A standardized residual is the ordinary residual divided by its estimated standard deviation.

 

Return-to-service test. A test performed at the end of maintenance, which must be successful. If the system does not perform successfully on the test, the maintenance is resumed and the test is not counted as a return-to-service test. A return-to-service test can demonstrate that no latent failed conditions exist (see standby failure), but it provides absolutely no information about the probability of failure on a later demand (see failure on demand).

 

Skewed distribution. A distribution that is not symmetrical. A distribution that is restricted to the range from 0 to ¥ is typically skewed to the right. Its mean is larger than its median, and the 95th percentile is farther from the median than the 5th percentile is. The Poisson, gamma, and lognormal distributions are a few examples of positively skewed distributions.

 

Standardized residual. See residual.

 

Standby failure. For a standby system, failure to start resulting from an existing, or latent, failed condition. The system is in this failed condition for some time, but the condition is not discovered until the demand. Compare failure on demand.

 

Standard deviation. The square root of a variance. The standard deviation of a random variable X has the same units as X.

 

Standard error. The estimated standard deviation of the estimator of a parameter, in the frequentist approach. For example, suppose that l is the parameter to be estimated, and is the estimator. The estimator depends on random data, and therefore is random, with a standard deviation, s.d.(). The estimated value of this standard deviation is the standard error for l.

 

Statistic. A function of the data, such as the sample mean or the Pearson chi-squared statistic. Before the data are observed, the statistic is a random variable, which can take many values, depending on the random data. The observed value of a statistic is a number.

 

Time at risk. See exposure time.

 

Unavailability. For a standby system, the probability that the system is unavailable, out of service, when demanded. This may be divided into different causes ¾ unavailability from planned maintenance and unavailability from unplanned maintenance. Unavailability is distinct from failure to start of a nominally available system.

 

Variance. The variance of X, var(X) is the expected value of [X - E(X)]2 . It is a measure of how much X deviates from its average. It has units equal to the squared units of X. Compare standard deviation.