Data Analysis and Probability Standard for Grades 68
Prior to the middle grades, students should have had experiences collecting, organizing, and representing sets of data. They should be facile both with representational tools (such as tables, line plots, bar graphs, and line graphs) and with measures of center and spread (such as median, mode, and range). They should have had experience in using some methods of analyzing information and answering questions, typically about a single population.
In grades 68, teachers should build on this base of experience to help
students answer more-complex questions, such as those concerning relationships
among populations or samples and those about relationships between two
variables within one population or sample. Toward this end, new representations
should be added to the students' repertoire. Box plots, for example, allow
students to compare two or more samples, such as the heights of students
in two different classes. Scatterplots allow students to study related
pairs of characteristics in one sample, such as height versus arm span
among students in one class. In addition, students can use and further
develop their emerging understanding of proportionality in various aspects
of their study of data and statistics.
Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them
Middle-grades students should formulate questions and design experiments or surveys to collect relevant data so that they can compare characteristics within a population or between populations. For example, a teacher might ask students to examine how various design characteristics of a paper airplanesuch as its length or the number of paper clips attached to its noseaffect the distance it travels and its consistency of flight. Students would then plan experiments in which they collect data that would allow them to compare the effects of particular design features. In addition to helping students design their experiments logically, the teacher should help them consider other factors that might affect the data, such as wind or inconsistencies in launching the planes.
Because laboratory experiments involving data collection are part of the middle-grades science curriculum, mathematics teachers may find it useful to collaborate with science teachers so that they are consistent in their design of experiments. Such collaboration could be extended so that students might collect the data for an experiment in science class and analyze it in mathematics class.
In addition to collecting their own data, students should learn to find relevant
data in other resources, such as Web sites or print publications. Consumer
Reports, for example, regularly compares the characteristics of various
products, such as the quality of peanut butter; the longevity of rechargeable
batteries; or the cost, size, and fuel efficiency of automobiles. When
using data from other sources, students need to determine which data are
appropriate for their needs, understand how the data were gathered, and
consider limitations that could affect interpretation.
Middle-grades students should learn to use absolute-and relative-frequency bar graphs and histograms to represent the data they collect and to decide which form of representation is appropriate for different » purposes. For example, suppose students were considering the following question:
In an experiment conducted to answer this question, one student might throw one of the airplanes forty times while team members measure and record the distance traveled each time. The group might later do the same for the other paper airplane. The teacher might then have the students use a relative-frequency histogram to represent the data, as shown in figure 6.27. For comparison, the teacher might suggest that students display both sets of data using box plots, as in figure 6.28.
Select and use appropriate statistical methods to analyze data
|p. 250||In the middle
grades, students should learn to use the mean, and continue to use the median
and the mode, to describe the center of a set of data. Although the mean
often quickly becomes the method of choice » for
students when summarizing a data set, their knack for computing the mean
does not necessarily correspond to a solid understanding of its meaning
or purpose (McClain 1999). Students need to understand that the mean "evens
out" or "balances" a set of data and that the median identifies the "middle"
of a data set. They should compare the utility of the mean and the median
as measures of center for different data sets. As several authors have noted
(e.g., Uccellini ; Konold [forthcoming]), students often fail to apprehend
many subtle aspects of the mean as a measure of center. Thus, the teacher
has an important role in providing experiences that help students construct
a solid understanding of the mean and its relation to other measures of
Students also need to think about measures of center in relation to the spread of a distribution. In general, the crucial question is, How do changes in data values affect the mean and median of a set of data? To examine this question, teachers could have students use a calculator to create a table of values and compute the mean and median. Then they could change one of the data values in the table and see whether the values of the mean and the median are also changed. These relationships can be effectively demonstrated using software through which students can control a data value and observe how the mean and median are affected. For example, using software that produces line plots for data sets, students could plot a set of data and mark the mean and median on the line. The students could then change one data value and observe how the mean and median change. By repeating this process for various data points, they can notice that changing one data value usually does not affect the median at all, unless the moved value is at the middle of the data set or moves across the middle, but that every change in a value affects the mean. Thus, the mean is more likely to be influenced by extreme values, since it is affected by the actual data values, but the median involves only the relative positions of the values. Other similar problems can be useful in helping students understand the different sensitivities of the mean and median; for example, the mean is very sensitive to the addition or deletion of one or two extreme data points, whereas the median is far less sensitive to such changes.
Students should consider how well different graphs represent important characteristics
of data sets. For example, they might notice that it is easier to see
symmetry or skewness in a graph than in a table of values. Graphs, however,
can lose some of the features of the data, as can be demonstrated by generating
a family of histograms for a single set of data, using different bin sizes:
the different histograms may convey different pictures of the symmetry,
skewness, or variability of the data set. Another example is seen when
comparing a histogram and a box plot for the same data, such as those
for the one-clip plane in figures 6.27 and 6.28. Box plots do not convey
as much specific information about the data set, such as where clusters
occur, as histograms do. But box plots can provide effective comparisons
between two data sets because they make descriptive characteristics such
as median and interquartile range readily apparent.
Develop and evaluate inferences and predictions that are based on data
In collecting and representing data, students should be driven by a desire to answer questions on the basis of the data. In the process, they should make observations, inferences, and conjectures and develop new » questions. They can use their developing facility with rational numbers and proportionality to refine their observations and conjectures. For example, when considering the relative-frequency histogram in figure 6.27, most students would observe that "the paper plane goes between 15 and 21 feet about as often as it goes between 24 and 33 feet," but such an observation is not very precise about the frequency. A teacher could press students to make more-precise statements: "The plane goes between 15 and 21 feet about 45 percent of the time."
Box plots are useful when making comparisons between populations. A teacher might pose the following question about the box plots in figure 6.28:
From the relative position of the two graphs, students can infer that the two-clip plane generally flies slightly farther than the one-clip plane. Students can answer the second question by using the spreads of the data portrayed in the box plots to argue that the one-clip plane is more variable in the distance it travels than the two-clip plane.
Scatterplots are useful for detecting and examining relationships between two characteristics of a population. For example, a teacher might ask students to consider if a relationship exists between the length and the width of warblers' eggs (activity adapted from Encyclopaedia Britannica Educational Corporation [1998, pp. 10419]). She might provide the students with data and ask the students to make a scatterplot in which each point displays the length and the width of an egg, as shown in figure 6.29. Most students will note that the relationship between the length and the width of the eggs seems to be direct (or positive); that is, longer eggs also tend to be wider. Many students will also note that the points on this scatterplot approximate a straight line, thus suggesting a nearly linear relationship between length and width. To make this relationship even more apparent, the teacher could have students draw an approximate line of fit for the data, as has been done in figure 6.29. Students could apply their developing understanding of the slope of a line to determine that the slope is approximately three-fourths and that therefore the ratio of the width to the length of warblers' eggs is approximately 3:4.
Teachers can also help students learn to use scatterplots to consider the relationship
between two characteristics in different populations. For example, students
could measure the height and arm span for groups of middle school and
high school students and then make a scatterplot in which the points for
middle school students are plotted with one color and the points for high
school students are plotted with a second color. Students can make observations
about the differences between the two samples, such as that students in
the high school sample are generally taller than those in the middle school
sample. They can also use the plots to examine possible similarities.
In particular, if students draw an approximate line of fit for each set
of points, they can determine whether the slopes are approximately equal
(i.e., the lines are approximately parallel), which would indicate that
the relationship between height and arm span is about the same for both
middle school and high school students.
|p. 252|| Because linearity
is an important idea in the middle grades, students should encounter many
scatterplots that have a nearly linear shape. But »
teachers should also have students explore plots that represent nonlinear
relationships. For example, in connection with their study of geometry and
measurement, students could measure the lengths of the bases of several
similar triangles and use formulas to find their areas or graph paper to
estimate their areas. Creating a scatterplot of the lengths of the bases
and the areas will make evident the quadratic relationship between length
and area in similar figures.
Teachers should encourage students to plot many data sets and look for relationships in the plots; computer graphing software and graphing calculators can be very helpful in this work. Students should see a range of examples in which plotting data sets suggests linear relationships, nonlinear relationships, and no apparent relationship at all. When a scatterplot suggests that a relationship exists, teachers should help students determine the nature of the relationship from the shape and direction of the plot. For example, for an apparently linear relationship, students could use their understanding of slope to decide whether the relationship is direct or inverse. Students should discuss what the relationships they have observed might reveal about the sample, and they should also discuss whether their conjectures about the sample might apply to larger populations containing the sample. For example, if a sample consists of students from one sixth-grade class in a school, how valid might the inferences made from the sample be for all sixth graders in the school? For all middle-grades students in the school? For all sixth graders in the city? For all sixth graders in the country? Such discussions can suggest further studies students might undertake to test the generality of their conjectures.
Understand and apply basic concepts of probability
Teachers should give middle-grades students numerous opportunities to engage in probabilistic thinking about simple situations from which students can develop notions of chance. They should use appropriate terminology in their discussions of chance and use probability to » make predictions and test conjectures. For example, a teacher might give students the following problem:
Students should be able to use basic notions of chance and some basic knowledge of number theory to determine the likelihood of selecting a number that is a multiple of 5 and the likelihood of not selecting a multiple of 5. In order to facilitate classroom discussion, the teacher should help students learn commonly accepted terminology. For example, students should know that "selecting a multiple of 5" and "selecting a number that is not a multiple of 5" are complementary events and that because 40 is in the set of possible outcomes for both "selecting a multiple of 5" and "selecting a multiple of 8," they are not mutually exclusive events.
Teachers can help students relate probability to their work with data analysis and to proportionality as they reason from relative-frequency histograms. For example, referring to the data displayed in figure 6.27, a teacher might pose questions like, How likely is it that the next time you throw a one-clip paper airplane, it goes at least 27 feet? No more than 21 feet?
Although the computation of probabilities can appear to be simple work with fractions, students must grapple with many conceptual challenges in order to understand probability. Misconceptions about probability have been held not only by many students but also by many adults (Konold 1989). To correct misconceptions, it is useful for students to make predictions and then compare the predictions with actual outcomes.
Computer simulations may help students avoid or overcome erroneous probabilistic thinking. Simulations afford students access to relatively large samples that can be generated quickly and modified easily. Technology can thus facilitate students' learning of probability in at least two ways: With large samples, the sample distribution is more likely to be "close" to the actual population distribution, thus reducing the likelihood of incorrect inferences based on empirical samples. With easily generated samples, students can focus on the analysis of the data rather than be distracted by the demands of data collection. If simulations are used, teachers need to help students understand what the simulation data represent and how they relate to the problem situation, such as flipping coins.
Although simulations can be useful, students also need to develop their probabilistic thinking by frequent experience with actual experiments. Many can be quite simple. For example, students could be asked to predict the probability of various outcomes of flipping two coins sixty times. Some students will incorrectly expect that there are three equally likely outcomes of flipping two coins once: two heads, two tails, and one of each. If so, they may predict that each of these will occur about twenty times. If groups of students conducted this experiment, they could construct a relative-frequency bar graph from the pooled data for the entire class. Then they could discuss whether the results of the experiment are consistent with their predictions. If students are accustomed to reasoning from and about data, they will understand that » discrepancies between predictions and outcomes from a large and representative sample must be taken seriously. The detection of discrepancies can lead to learning when students turn to classmates and their teacher for alternative ways to think about the possible results of flipping two coins (or other similar compound events). Teachers can then introduce students to various methodsorganized lists, tree diagrams, and area modelsto help them understand and compute the probabilities of compound events.
Using a problem like the following, a teacher might assess students' understanding of probability in a manner that includes data analysis and reveals possible misconceptions:
To solve this problem, students would need to understand the data representation in figure 6.27 and use ratios to estimate that there is about a 45 percent chance that a throw will be a dud and about a 55 percent chance that it will be a success. Then they would need to use some method for handling the compound event and deal with the fact that there are two ways it might occur. Students who understand all that is required might produce a tree diagram like the one in figure 6.30 to show that the total probability is 198/400, or .495, since each of the two possibilities"dud first, then success" and "success first, then dud"has a probability of 99/400.