Statistics Glossary

addition rule for mutually exclusive random events, the chance of at least one of them occurring is the sum of their individual probabilities.

alternative hypothesis a research hypothesis; the hypothesis that is supported if the null hypothesis is rejected.

bar chart a graphic that displays how data fall into different categories or groups.

bell-shaped curve symmetrical, single-peaked frequency distribution.

Also called the normal curve or gaussian curve.

bias the consistent underestimation or overestimation of a true value, because of preconceived notion of the person sampling the population.

bimodal curve with two equal scores of highest frequency.

binomial event with only two possible outcomes.

binomial probability distribution for binomial events, the frequency of the number of favorable outcomes. For a large number of trials, the binomial distribution approaches the normal distribution.

bivariate involving two variables, especially, when attempting to show a correlation between two variables, the analysis is said to be bivariate.

box plot (box-and-whiskers) a graphic display of data indicating symmetry and central tendency.

Central Limit Theorem a rule that states that the sampling distribution of means from any population will be normal for large sample n.

chi-square a probability distribution used to test the independence of two nominal variables.

class frequency the number of observations that fall into each class interval.

class intervals categories or groups contained in frequency graphics.

coefficient of determination a measure of the proportion of each other's variability that two variables share.

confidence interval the range of values that a population parameter could take at a given level of significance.

confidence level the probability of obtaining a given result by chance.

continuous variable a variable that can be measured with whole numbers and fractional (or decimal) parts thereof.

correlated two (or more) quantities that change together in a consistent manner. Thus, if the value of one variable is known, the other can be immediately determined from their relationship.

correlation coefficient a measure of the degree to which two variables are linearly related.

critical value the value of a computed statistic used as a threshold to decide whether the null hypothesis will be rejected.

data numerical information about variables; the measurements or observations to be analyzed with statistical methods.

degrees of freedom a parameter used to help select the critical value in some probability distributions.

dependent events events such that the outcome of one has an effect on the probability of the outcome of the other.

dependent variable a variable that is caused or influenced by another.

descriptive statistics numerical data that describe phenomena.

deviation the distance of a value in a population (or sample) from the mean value of the population (or sample).

directional test a test of the prediction that one value is higher than another; also called a one-tailed test.

discrete variable a variable that can be measured only by means of whole numbers; or one which assumes only a certain set of definite values, and no others.

disjoint occurrence both outcomes unable to happen at the same time.

distribution a collection of measurements; how scores tend to be dispersed about a measurement scale.

dot plot a graphic that displays the variability in a small set of measures.

double counting a mistake encountered in calculating the probability of at least one of several events occurring, when the events are not mutually exclusive. In this case, the addition rule does not apply.

empirical rule a rule that is founded on observation, without a theoretical basis. Or a "rule of thumb."

frequency distribution the frequency of occurrence of the values of a variable. For each possible value of the variable, there is an associated frequency with which the variable assumes that value.

frequency histogram a graphic that displays how many measures fall into different classes, giving the frequency at which each category is seen observed.

frequency polygon a graphic presentation of frequency of a phenomenon that typically uses straight lines and points.

grouped data data that has been sorted into categories, usually in order to construct a frequency histogram.

grouped measures a set of values that belong to the same class.

independent events events such that the outcome of one has no effect on the probability of the outcome of the other.

independent variable a variable that causes, or influences, another variable.

inference conclusion about a population parameter based upon analysis of a sample statistic. Inferences are always stated with a confidence level.

intercept the value of y at which a line crosses the vertical axis.

interquartile range set of measures lying between the lower quartile (25th percentile) and the upper quartile (75th percentile), inclusive.

interval a scale using numbers to rank order; its intervals are equal but with an arbitrary 0 point.

joint occurrence both outcomes happening simultaneously; P(AB).

least squares any line- or curve-fitting model that minimizes the squared distance of data points to the line.

lower quartile (Q1), the 25th percentile of a set of measures.

mean the sum of the measures in a distribution divided by the number of measures; the average.

measures of central tendency descriptive measures that indicate the center of a set of values, for example, mean, median, and mode.

measures of variation descriptive measures that indicate the dispersion of a set of values, for example, variance, standard deviation, and standard error of the mean.

median the middle measure in an ordered distribution.

middle quartile (Q2), the 50th percentile of a set of measures; the median.

mode most frequent measure in a distribution; the high point on a frequency distribution.

mound-shaped curve symmetrical, single-peaked frequency distribution. Also called the normal curve or gaussian curve. Also called a bell-shaped curve.

multiplication rule the probability of two or more independent (hence, not mutually exclusive) events all occurring is the product of their individual probabilities.

mutually exclusive events such that the occurrence of one precludes the occurrence of the other.

negative relationship a relationship between two variables such that when one increases, the other decreases.

negatively skewed curve a probability or frequency distribution that is not normal, but rather is shifted such that the mean is less than the mode.

nominal a scale using numbers, symbols, or names to designate different subclasses.

non-directional test a test of the prediction that two values are equal or a test that they are not equal; a two-tailed test.

non-parametric test statistical test used when assumptions about normal distribution in the population cannot be met, or when the level of measurement is ordinal or less. For example, the c-square test.

normal distribution smooth bell-shaped curve symmetrical about the mean such that its shape and area obey the empirical rule.

null hypothesis the reverse of the research hypothesis. The null hypothesis is directly tested by statistical analysis so that it is either rejected or not rejected, with a confidence level. If the null hypothesis is rejected, the alternative hypothesis is supported.

numerical statistics statistical parameters presented as numbers (as opposed to pictorial statistics).

ogive a graphic that displays a running total.

one-tailed test a test of the prediction that one value is higher than another.

ordinal a scale using numbers or symbols to rank order; its intervals are unspecified.

outlier a data point that falls far from most other points; a score extremely divergent from the other measures of a set.

parameter a characteristic of a population. The goal of statistical analysis is usually to estimate population parameters, using statistics from a sample of the population.

Pearson's product moment coefficient identical to the correlation coefficient.

percentile the value in an ordered set of measurements such that P% of the measures lie below that value.

pictorial statistics statistical parameters that are presented as graphs or charts (as opposed to simply as numbers).

pie chart a graphic that displays parts of the whole, in the form of a circle with its area divided appropriately.

point estimate a number computed from a sample to represent a population parameter.

population a group of phenomena that have something in common. The population is the larger group, whose properties (parameters) are estimated by taking a smaller sample from within the population, and applying statistical analysis to the sample.

positive relationship a relationship between two variables such that when one increases, the other increases, or when one decreases, the other decreases.

positively skewed curve a probability or frequency distribution that is not normal, but rather is shifted such that the mean is greater than the mode.

power the probability that a test will reject the null hypothesis when it is, in fact, false.

probability a quantitative measure of the chances for a particular outcome or outcomes.

probability distribution a smooth curve indicating the frequency distribution for a continuous random variable.

proportion for a binomial random event, the probability of a successful (or favorable) outcome in a single trial.

qualitative variable phenomenon measured in kind, that is, non-numerical units. For example, color is a qualitative variable, because it cannot be expressed simply as a number.

quantitative variable phenomenon measured in amounts, that is, numerical units. For example, length is a quantitative variable.

random an event for which there is no way to know, before it occurs, what the outcome will be. Instead, only the probabilities of each possible outcome can be stated.

random error error that occurs as a result of sampling variability, through no direct fault of the sampler. It is a reflection of the fact that the sample is smaller than the population; for larger samples, the random error is smaller.

range difference between the largest and smallest measures of a set.

ratio a scale using numbers to rank order; its intervals are equal, and the scale has an absolute 0 point.

region of acceptance the area of a probability curve in which a computed test statistic will lead to acceptance of the null hypothesis.

region of rejection the area of a probability curve in which a computed test statistic will lead to rejection of the null hypothesis.

regression a statistical procedure used to estimate the linear dependence of one or more independent variables on a dependent variable.

relative frequency the ratio of class frequency to total number of measures.

relative frequency principle of probability if a random event is repeated a large number of times, then the proportion of times that a particular outcome occurs is the probability of that outcome occurring in a single event.

research hypothesis a prediction or expectation to be tested. If the null hypothesis is rejected, then the research hypothesis (also called alternative hypothesis) is supported.

residual the vertical distance between a predicted value y and its actual value.

sample a group of members of a population selected to represent that population. A sample to which statistical analysis is applied should be randomly drawn from the population, to avoid bias.

sampling distribution the distribution obtained by computing a statistic for a large number of samples drawn from the same population.

sampling variability the tendency of the same statistic computed from a number of random samples drawn from the same population to differ.

scatter plot a graphic display used to illustrate degree of correlation between two variables.

skewed a distribution displaced at one end of the scale and a tail strung out at the other end.

slope a measure of a line's slant.

standard deviation a measure of data variation; the square root of the variance.

standard error a measure of the random variability of a statistic, such as the mean (i.e., standard error of the mean). The standard error of the mean is equal to the standard deviation divided by the square root of the sample size (n).

standardize to convert to a z-score.

statistic a characteristic of a sample. A statistic is an estimate of a population parameter. For larger samples, the statistic is a better estimate of the parameter.

statistical significance the probability of obtaining a given result by chance. High statistical significance does not necessarily imply importance.

statistics a branch of mathematics that describes and reasons from numerical observations; or descriptive measures of a sample.

stem-and-leaf graphic display that shows actual scores as well as distribution of classes.

symmetry a shape such that one side is the exact mirror image of the other.

symmetric distribution a probability or frequency distribution that has the property in which the mean, median, and mode are all the same value.

systematic error the consistent underestimation or overestimation of a true value, due to poor sampling technique.

t-distribution a probability distribution often used when the population standard deviation is not known or when the sample size is small.

tabled value the value of a computed statistic used as a threshold to decide whether the null hypothesis will be rejected.

test statistic a computed quantity used to decide hypothesis tests.

two-tailed test a test of the prediction that two values are equal, or a test that they are not equal.

Type I error rejecting a null hypothesis that is, in fact, true.

Type II error failing to reject a null hypothesis that is, in fact, false.

upper quartile (Q3), the 75th percentile of a set of measures.

value a measurement or classification of a variable.

variable an observable characteristic of a phenomenon that can be measured or classified.

variance a measure of data variation; the mean of the squared deviation scores about the means of a distribution.

z-score a unit of measurement obtained by subtracting the mean and dividing by the standard deviation.