Data Analysis and Probability Standard for Grades 6–8

Expectations
Instructional programs from prekindergarten through grade 12 should enable all students to— In grades 6–8 all students should—
Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them
 • formulate questions, design studies, and collect data about a characteristic shared by two populations or different characteristics within one population; • select, create, and use appropriate graphical representations of data, including histograms, box plots, and scatterplots.
Select and use appropriate statistical methods to analyze data
 • find, use, and interpret measures of center and spread, including mean and interquartile range; • discuss and understand the correspondence between data sets and their graphical representations, especially histograms, stem-and-leaf plots, box plots, and scatterplots.
Develop and evaluate inferences and predictions that are based on data
 • use observations about differences between two or more samples to make conjectures about the populations from which the samples were taken; • make conjectures about possible relationships between two characteristics of a sample on the basis of scatterplots of the data and approximate lines of fit; • use conjectures to formulate new questions and plan new studies to answer them.
Understand and apply basic concepts of probability
 • understand and use appropriate terminology to describe complementary and mutually exclusive events; • use proportionality and a basic understanding of probability to make and test conjectures about the results of experiments and simulations; • compute probabilities for simple compound events, using such methods as organized lists, tree diagrams, and area models.

Prior to the middle grades, students should have had experiences collecting, organizing, and representing sets of data. They should be facile both with representational tools (such as tables, line plots, bar graphs, and line graphs) and with measures of center and spread (such as median, mode, and range). They should have had experience in using some methods of analyzing information and answering questions, typically about a single population.

In grades 6–8, teachers should build on this base of experience to help students answer more-complex questions, such as those concerning relationships among populations or samples and those about relationships between two variables within one population or sample. Toward this end, new representations should be added to the students' repertoire. Box plots, for example, allow students to compare two or more samples, such as the heights of students in two different classes. Scatterplots allow students to study related pairs of characteristics in one sample, such as height versus arm span among students in one class. In addition, students can use and further develop their emerging understanding of proportionality in various aspects of their study of data and statistics.

Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them

Middle-grades students should formulate questions and design experiments or surveys to collect relevant data so that they can compare characteristics within a population or between populations. For example, a teacher might ask students to examine how various design characteristics of a paper airplane—such as its length or the number of paper clips attached to its nose—affect the distance it travels and its consistency of flight. Students would then plan experiments in which they collect data that would allow them to compare the effects of particular design features. In addition to helping students design their experiments logically, the teacher should help them consider other factors that might affect the data, such as wind or inconsistencies in launching the planes.

Because laboratory experiments involving data collection are part of the middle-grades science curriculum, mathematics teachers may find it useful to collaborate with science teachers so that they are consistent in their design of experiments. Such collaboration could be extended so that students might collect the data for an experiment in science class and analyze it in mathematics class.

In addition to collecting their own data, students should learn to find relevant data in other resources, such as Web sites or print publications. Consumer Reports, for example, regularly compares the characteristics of various products, such as the quality of peanut butter; the longevity of rechargeable batteries; or the cost, size, and fuel efficiency of automobiles. When using data from other sources, students need to determine which data are appropriate for their needs, understand how the data were gathered, and consider limitations that could affect interpretation.

p. 249

Middle-grades students should learn to use absolute-and relative-frequency bar graphs and histograms to represent the data they collect and to decide which form of representation is appropriate for different » purposes. For example, suppose students were considering the following question:

Compare the distance traveled by a paper airplane constructed using one paper clip with the distance traveled by a plane that is built with two paper clips. Which one travels farther when thrown indoors?

In an experiment conducted to answer this question, one student might throw one of the airplanes forty times while team members measure and record the distance traveled each time. The group might later do the same for the other paper airplane. The teacher might then have the students use a relative-frequency histogram to represent the data, as shown in figure 6.27. For comparison, the teacher might suggest that students display both sets of data using box plots, as in figure 6.28.

 Fig. 6.27. A relative-frequency histogram for data for a paper airplane with one paper clip

 Fig. 6.28. Box plots for the paper airplane data

Select and use appropriate statistical methods to analyze data

p. 250 In the middle grades, students should learn to use the mean, and continue to use the median and the mode, to describe the center of a set of data. Although the mean often quickly becomes the method of choice » for students when summarizing a data set, their knack for computing the mean does not necessarily correspond to a solid understanding of its meaning or purpose (McClain 1999). Students need to understand that the mean "evens out" or "balances" a set of data and that the median identifies the "middle" of a data set. They should compare the utility of the mean and the median as measures of center for different data sets. As several authors have noted (e.g., Uccellini [1996]; Konold [forthcoming]), students often fail to apprehend many subtle aspects of the mean as a measure of center. Thus, the teacher has an important role in providing experiences that help students construct a solid understanding of the mean and its relation to other measures of center.

Relating Mean and Median

Students also need to think about measures of center in relation to the spread of a distribution. In general, the crucial question is, How do changes in data values affect the mean and median of a set of data? To examine this question, teachers could have students use a calculator to create a table of values and compute the mean and median. Then they could change one of the data values in the table and see whether the values of the mean and the median are also changed. These relationships can be effectively demonstrated using software through which students can control a data value and observe how the mean and median are affected. For example, using software that produces line plots for data sets, students could plot a set of data and mark the mean and median on the line. The students could then change one data value and observe how the mean and median change. By repeating this process for various data points, they can notice that changing one data value usually does not affect the median at all, unless the moved value is at the middle of the data set or moves across the middle, but that every change in a value affects the mean. Thus, the mean is more likely to be influenced by extreme values, since it is affected by the actual data values, but the median involves only the relative positions of the values. Other similar problems can be useful in helping students understand the different sensitivities of the mean and median; for example, the mean is very sensitive to the addition or deletion of one or two extreme data points, whereas the median is far less sensitive to such changes.

Students should consider how well different graphs represent important characteristics of data sets. For example, they might notice that it is easier to see symmetry or skewness in a graph than in a table of values. Graphs, however, can lose some of the features of the data, as can be demonstrated by generating a family of histograms for a single set of data, using different bin sizes: the different histograms may convey different pictures of the symmetry, skewness, or variability of the data set. Another example is seen when comparing a histogram and a box plot for the same data, such as those for the one-clip plane in figures 6.27 and 6.28. Box plots do not convey as much specific information about the data set, such as where clusters occur, as histograms do. But box plots can provide effective comparisons between two data sets because they make descriptive characteristics such as median and interquartile range readily apparent.

Develop and evaluate inferences and predictions that are based on data

p. 251

In collecting and representing data, students should be driven by a desire to answer questions on the basis of the data. In the process, they should make observations, inferences, and conjectures and develop new » questions. They can use their developing facility with rational numbers and proportionality to refine their observations and conjectures. For example, when considering the relative-frequency histogram in figure 6.27, most students would observe that "the paper plane goes between 15 and 21 feet about as often as it goes between 24 and 33 feet," but such an observation is not very precise about the frequency. A teacher could press students to make more-precise statements: "The plane goes between 15 and 21 feet about 45 percent of the time."

Box plots are useful when making comparisons between populations. A teacher might pose the following question about the box plots in figure 6.28:

From the box plots (in fig. 6.28), which type of plane appears to fly farther? Which type of plane is more consistent in the distance it flies?

From the relative position of the two graphs, students can infer that the two-clip plane generally flies slightly farther than the one-clip plane. Students can answer the second question by using the spreads of the data portrayed in the box plots to argue that the one-clip plane is more variable in the distance it travels than the two-clip plane.

Scatterplots are useful for detecting and examining relationships between two characteristics of a population. For example, a teacher might ask students to consider if a relationship exists between the length and the width of warblers' eggs (activity adapted from Encyclopaedia Britannica Educational Corporation [1998, pp. 104–19]). She might provide the students with data and ask the students to make a scatterplot in which each point displays the length and the width of an egg, as shown in figure 6.29. Most students will note that the relationship between the length and the width of the eggs seems to be direct (or positive); that is, longer eggs also tend to be wider. Many students will also note that the points on this scatterplot approximate a straight line, thus suggesting a nearly linear relationship between length and width. To make this relationship even more apparent, the teacher could have students draw an approximate line of fit for the data, as has been done in figure 6.29. Students could apply their developing understanding of the slope of a line to determine that the slope is approximately three-fourths and that therefore the ratio of the width to the length of warblers' eggs is approximately 3:4.

 Fig. 6.29. A scatterplot showing the relationship between the length and the width of warblers' eggs (Encyclopaedia Britanica Educational Corporation 1998, p. 109)

Teachers can also help students learn to use scatterplots to consider the relationship between two characteristics in different populations. For example, students could measure the height and arm span for groups of middle school and high school students and then make a scatterplot in which the points for middle school students are plotted with one color and the points for high school students are plotted with a second color. Students can make observations about the differences between the two samples, such as that students in the high school sample are generally taller than those in the middle school sample. They can also use the plots to examine possible similarities. In particular, if students draw an approximate line of fit for each set of points, they can determine whether the slopes are approximately equal (i.e., the lines are approximately parallel), which would indicate that the relationship between height and arm span is about the same for both middle school and high school students.

p. 252 Because linearity is an important idea in the middle grades, students should encounter many scatterplots that have a nearly linear shape. But » teachers should also have students explore plots that represent nonlinear relationships. For example, in connection with their study of geometry and measurement, students could measure the lengths of the bases of several similar triangles and use formulas to find their areas or graph paper to estimate their areas. Creating a scatterplot of the lengths of the bases and the areas will make evident the quadratic relationship between length and area in similar figures.

Teachers should encourage students to plot many data sets and look for relationships in the plots; computer graphing software and graphing calculators can be very helpful in this work. Students should see a range of examples in which plotting data sets suggests linear relationships, nonlinear relationships, and no apparent relationship at all. When a scatterplot suggests that a relationship exists, teachers should help students determine the nature of the relationship from the shape and direction of the plot. For example, for an apparently linear relationship, students could use their understanding of slope to decide whether the relationship is direct or inverse. Students should discuss what the relationships they have observed might reveal about the sample, and they should also discuss whether their conjectures about the sample might apply to larger populations containing the sample. For example, if a sample consists of students from one sixth-grade class in a school, how valid might the inferences made from the sample be for all sixth graders in the school? For all middle-grades students in the school? For all sixth graders in the city? For all sixth graders in the country? Such discussions can suggest further studies students might undertake to test the generality of their conjectures.

Understand and apply basic concepts of probability

p. 253

Teachers should give middle-grades students numerous opportunities to engage in probabilistic thinking about simple situations from which students can develop notions of chance. They should use appropriate terminology in their discussions of chance and use probability to » make predictions and test conjectures. For example, a teacher might give students the following problem:

Suppose you have a box containing 100 slips of paper numbered from 1 through 100. If you select one slip of paper at random, what is the probability that the number is a multiple of 5? A multiple of 8? Is not a multiple of 5? Is a multiple of both 5 and 8?

Students should be able to use basic notions of chance and some basic knowledge of number theory to determine the likelihood of selecting a number that is a multiple of 5 and the likelihood of not selecting a multiple of 5. In order to facilitate classroom discussion, the teacher should help students learn commonly accepted terminology. For example, students should know that "selecting a multiple of 5" and "selecting a number that is not a multiple of 5" are complementary events and that because 40 is in the set of possible outcomes for both "selecting a multiple of 5" and "selecting a multiple of 8," they are not mutually exclusive events.

Teachers can help students relate probability to their work with data analysis and to proportionality as they reason from relative-frequency histograms. For example, referring to the data displayed in figure 6.27, a teacher might pose questions like, How likely is it that the next time you throw a one-clip paper airplane, it goes at least 27 feet? No more than 21 feet?

Although the computation of probabilities can appear to be simple work with fractions, students must grapple with many conceptual challenges in order to understand probability. Misconceptions about probability have been held not only by many students but also by many adults (Konold 1989). To correct misconceptions, it is useful for students to make predictions and then compare the predictions with actual outcomes.

Computer simulations may help students avoid or overcome erroneous probabilistic thinking. Simulations afford students access to relatively large samples that can be generated quickly and modified easily. Technology can thus facilitate students' learning of probability in at least two ways: With large samples, the sample distribution is more likely to be "close" to the actual population distribution, thus reducing the likelihood of incorrect inferences based on empirical samples. With easily generated samples, students can focus on the analysis of the data rather than be distracted by the demands of data collection. If simulations are used, teachers need to help students understand what the simulation data represent and how they relate to the problem situation, such as flipping coins.

p. 254

Although simulations can be useful, students also need to develop their probabilistic thinking by frequent experience with actual experiments. Many can be quite simple. For example, students could be asked to predict the probability of various outcomes of flipping two coins sixty times. Some students will incorrectly expect that there are three equally likely outcomes of flipping two coins once: two heads, two tails, and one of each. If so, they may predict that each of these will occur about twenty times. If groups of students conducted this experiment, they could construct a relative-frequency bar graph from the pooled data for the entire class. Then they could discuss whether the results of the experiment are consistent with their predictions. If students are accustomed to reasoning from and about data, they will understand that » discrepancies between predictions and outcomes from a large and representative sample must be taken seriously. The detection of discrepancies can lead to learning when students turn to classmates and their teacher for alternative ways to think about the possible results of flipping two coins (or other similar compound events). Teachers can then introduce students to various methods—organized lists, tree diagrams, and area models—to help them understand and compute the probabilities of compound events.

Using a problem like the following, a teacher might assess students' understanding of probability in a manner that includes data analysis and reveals possible misconceptions:

For the one-clip paper airplane, which was flight-tested with the results shown in the relative-frequency histogram (in fig. 6.27), what is the probability that exactly one of the next two throws will be a dud (i.e., it will travel less that 21 feet) and the other will be a success (i.e., it will travel 21 feet or more)?

To solve this problem, students would need to understand the data representation in figure 6.27 and use ratios to estimate that there is about a 45 percent chance that a throw will be a dud and about a 55 percent chance that it will be a success. Then they would need to use some method for handling the compound event and deal with the fact that there are two ways it might occur. Students who understand all that is required might produce a tree diagram like the one in figure 6.30 to show that the total probability is 198/400, or .495, since each of the two possibilities—"dud first, then success" and "success first, then dud"—has a probability of 99/400.

 Fig. 6.30. A tree diagram for determining the probability of a compound event, given simple data.