Descriptive statistics employs a set of procedures that make it possible to meaningfully and accurately summarize and describe samples of data. In order for one to make meaningful statements about psychological events, the variable or variables involved must be organized, measured, and then expressed as quantities. Such measurements are often expressed as measures of central tendency and measures of variability.
Organization of data. Graphical representation of data is typically the first organizational step. Frequency distributions, histograms, and/or frequency polygons are usually prepared in this process.
- A frequency distribution, often the first organizational step, is an ordered arrangement of all variables, which shows the number of occurrences in each category. Table shows a frequency distribution concerning how much time students spent studying for an exam. Note that the total number tallied (counted) in each category by the researcher equals the number listed in the frequency column.
Such a frequency distribution can be presented graphically as a frequency histogram or frequency polygon.
- Frequency histograms are bar graphs. Figure shows a frequency histogram derived from the data in the frequency distribution in Table . The frequency (number of students) determined from the tally is the ordinate (vertical, or Y, axis), and the number of hours studied is the abscissa (horizontal, or X, axis). Each one‐hour interval is presented sequentially, and the height of each bar represents the number of students who studied that number of hours.
- Frequency polygons are graphs in which the frequency of occurrence of the variable measured is shown by using connected points rather than bars. Figure shows, in a frequency polygon, the same data displayed in Figure . (Note that if the midpoints of each of the bars in Figure were connected, the result would be this frequency polygon.)
Measures of central tendency. The three measures of central tendency, the mean, median, and mode, describe a distribution of data and are an index of the average, or typical, value of a distribution of scores.
- The mean, the arithmetic average of all scores under consideration, is computed by dividing the sum of the scores by the number of scores. Based on the data in Table 1,
- The median is the point at which 50% of the observations fall below and 50% above or, in other words, the middle number of a set of numbers arranged in ascending or descending order. (If the list includes an even number of categories, the median is the arithmetic average of the middle two numbers.) Based on the data in Table , the full list of each student's study hours would be written 10, 9, 9, 9, 8, 8, 8, 8, and so on. If the list were written out in full, it would be clear that the middle two numbers of the 40 entries are 6 and 6, which average 6. So the median of the hours studied is 6.
- The mode is the number that appears most often. Based on the data in Table , the mode of the number of hours studied is also 6 (8 students studied for 6 hours, so 6 appears 8 times in the list, more than any other number).
Graphical representations of the measures of central tendency may be presented in frequency polygons that take the form of curves, which may be normal or skewed.
- Generally, if enough measures are taken of a variable and plotted as a frequency polygon, the result is a normal curve (bell‐shaped curve), or normal distribution (Figure a). The curve is symmetrical, and the mean, median, and mode fall at the highest point on the curve.
- Skewed distributions are asymmetrical, with most of the scores grouped toward one end. The mean, median, and mode fall at different points. Distributions may be skewed to the left (negatively skewed) (Figure 3b) or to the right (positively skewed) (Figure 3c).
- The frequency distribution termed bimodal, has two peaks, which represent two equal scores of highest frequency. In such a distribution, the mean and median may be at the same point or different points.
Measures of variation. Variability refers to the extent that scores differ from one another and from the mean. Widely used measures of variability are the range, variance, and standard deviation.
- The range describes the spread of scores in a distribution. It is calculated by subtracting the lowest from the highest score in the distribution. (In the example of hours of study, the range is 10 − 1 = 9 hours.)
- The variance is a measure of variation from the mean of the squared deviation scores about the means of a distribution. Using the data from Table as an example, the variance for the entire distribution is computed by
- determining the mean of the distribution of data
- subtracting the mean from each score to determine the deviation score for that item (Table , column 1)
TABLE 2 COMPUTATION OF DEVIATION SCORE, VARIANCE, STANDARD DEVIATION: HOURS STUDIED FOR AN EXAM
- squaring each deviation score (to eliminate minus signs) and multiplying it by the frequency of that score to account for the total number of scores (Table , column 2)
- summing the results of the previous multiplication step (Table , last entry in column 2) to arrive at the total of all squared deviation scores and dividing by ( N − 1) ( N = number of scores)
Some variance computations simply use N, but ( N− 1) is considered to produce a more precise measurement. The variance gives one indication of how much the scores differ.
- The standard deviation (SD) is the square root of the variance.