is so named because the subject “operates” on the environment. An early theory of operant conditioning, proposed by Edward Thorndike, used the name instrumental learning
because the response is “instrumental” in obtaining the reward. (Both operant and classical conditioning are also called S‐R learning
because a stimulus, S, has been paired with a response, R.)
A device called an operant box (sometimes called a Skinner box) was designed by the well‐known experimenter B. F. Skinner. Learning in the operant conditioning procedure can be explained by the law of effect (also proposed by Thorndike, in 1911), which suggests that responses are learned when they are followed by a “satisfying state of affairs.” Although operant conditioning requires the use of neither a CS nor a UCS, a CS can be employed but requires the use of shaping and reinforcement procedures.
Shaping. In operant conditioning, the subject must first emit the response that the experimenter plans to reward. Shaping is the name given to those initial steps needed to get the subject to engage in the behavior that is to be rewarded. If, for example, a rat is to be rewarded for pressing a bar, it must first learn
Generally, rewards (usually food) initially are given at the end of each of these steps. Finally, however, a reward is given only when the bar is pressed. (With subjects who understand spoken commands, shaping can sometimes be accomplished verbally.)
Positive and negative reinforcement. Reinforcement is the process of following an event with a second event meant to make the recurrence of the first event more likely. The second event itself is called the reinforcer.
Positive reinforcement is the presentation of a rewarding or pleasant stimulus (something that the subject wants, also called a positive reinforcer) that increases the probability that a particular response will occur. For example, if a student rewrites a term paper and is rewarded for that rewrite by a better grade, getting the grade is the positive reinforcer, and the teacher's awarding the grade to encourage rewrites is positive reinforcement. While a rat may learn to press a bar in an operant box if the action triggers a mechanism that delivers food, it may also respond to such rewards as water or even a minute amount of pleasurable electrical stimulation of a particular brain structure. (If food or water are to be used for reinforcement, the animal is usually first deprived of that substance for a time.)
Negative reinforcement, on the other hand, is the presentation of an unpleasant stimulus (something the subject does not want, also called a negative reinforcer) that increases the likelihood that a particular response, in order to remove or avoid the negative reinforcer, will occur. For example, giving a rat an unpleasant electric shock when it presses a bar increases the probability that the rat will avoid the bar‐pressing action. As another example, a rat presented with such a negative reinforcer may learn to run to the right in a maze to avoid getting the shock that awaits it on the left, or a child may clean toys off the floor without being told in order to avoid a spanking (many of which were received in the past for not complying).
Punishment. Punishment differs from negative reinforcement in that it decreases the probability that a particular preceding event will occur again. When subjects are punished, they experience the unpleasant (aversive) stimulus rather than avoid it. Once experienced, punishment may sometimes serve as a negative reinforcer; a subject may increase certain types of responses to avoid the unpleasant experience. For example, a student who doesn't study may be punished by being given an F on an exam. But while the F was initially punishment, it can now serve as a negative reinforcer that causes the student to increase study time to avoid getting an F.
Schedule of reinforcement. Reinforcement can occur after every response, a situation called continuous reinforcement. It can also occur only after some responses, intermittent reinforcement. A response learned under the latter conditions is more resistant to extinction, a phenomenon called the partial reinforcement effect.
Psychologists have studied intermittent reinforcement effects by using various patterns of delivering rewards after a response. These patterns, called schedules of reinforcement, include
fixed‐ratio schedule: reinforcement after a set number of responses
variable‐ratio schedule: reinforcement after a variable number of responses
fixed‐interval schedule: reinforcement after the same (fixed) interval of time has elapsed
variable‐interval schedule: reinforcement after a variable interval of time has elapsed
The response styles of subjects, whether they be rats in operant boxes or employees in a workplace, vary based on the schedule used. Other things being equal, a variable‐ratio schedule produces the most responses from a subject in a given time. A fixed‐ratio schedule fosters quick learning of the desired response; the number of responses then remains steady but fewer than those produced by the variable ratio schedule. Variable‐interval schedules result in slower learning of the response followed by a steady number of responses (but fewer than those produced by the fixed‐ratio schedule). Fixed‐interval schedules produce both relatively few responses overall and a drop in number of responses right after reinforcement, although the number increases as the time for reinforcement nears. (It is interesting that most people are paid on a fixed‐interval schedule. Perhaps productivity could be increased if more frequent rewards for good performance (bonuses) were given, that is, by using a fixed‐ratio schedule.)
Primary and secondary reinforcers. In addition to being classified as positive or negative, reinforcers can be primary or secondary.
A primary reinforcer is a substance (such as palatable food) or situation (such as the administration of a painful electric shock) that is universally rewarding or punishing.
A secondary reinforcer is a formerly neutral stimulus that has acquired reward or punishment value, for example, the letters A or F given on examinations.