### Data Analysis and Probability

 Instructional programs from prekindergarten through grade 12 should enable all students to— formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them; select and use appropriate statistical methods to analyze data; develop and evaluate inferences and predictions that are based on data; understand and apply basic concepts of probability.

The Data Analysis and Probability Standard recommends that students formulate questions that can be answered using data and addresses what is involved in gathering and using the data wisely. Students should learn how to collect data, organize their own or others' data, and display the data in graphs and charts that will be useful in answering their questions. This Standard also includes learning some methods for analyzing data and some ways of making inferences and conclusions from data. The basic concepts and applications of probability are also addressed, with an emphasis on the way that probability and statistics are related.

The amount of data available to help make decisions in business, politics, research, and everyday life is staggering: Consumer surveys guide the development and marketing of products. Polls help determine political-campaign strategies, and experiments are used to evaluate the safety and efficacy of new medical treatments. Statistics are often misused to sway public opinion on issues or to misrepresent the quality and effectiveness of commercial products. Students need to know about data analysis and related aspects of probability in order to reason statistically—skills necessary to becoming informed citizens and intelligent consumers.

The increased curricular emphasis on data analysis proposed in these Standards is intended to span the grades rather than to be reserved for the middle grades and secondary school, as is common in many countries. NCTM's 1989 Curriculum and Evaluation Standards for School Mathematics introduced standards in statistics and probability at all grade bands; a number of organizations have developed instructional materials and professional development programs to promote the teaching and learning of these topics. Building on this base, these Standards recommend a strong development of the strand, with concepts and procedures becoming increasingly sophisticated across the grades so that by the end of high school students have a sound knowledge of elementary statistics. To understand the fundamentals of statistical ideas, students must work directly with data. The emphasis on working with data entails students' meeting new ideas and procedures as they progress through the grades rather than revisiting the same activities and topics. The data and statistics strand allows teachers and students to make a number of important connections among ideas and procedures from number, algebra, measurement, and geometry. Work in data analysis and probability offers a natural way for students to connect mathematics with other school subjects and with experiences in their daily lives.

p. 48

In addition, the processes used in reasoning about data and statistics will serve students well in work and in life. Some things children learn in school seem to them predetermined and rule bound. In studying data and statistics, they can also learn that solutions to some problems depend on assumptions and have some degree of uncertainty. The kind of reasoning used in probability and statistics is not always intuitive, and so students will not necessarily develop it if it is not included in the curriculum.»

#### Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them

Because young children are naturally curious about their world, they often raise questions such as, How many? How much? What kind? or Which of these? Such questions often offer opportunities for beginning the study of data analysis and probability. Young children like to design questions about things close to their experience—What kind of pets do classmates have? What are children's favorite kinds of pizza? As students move to higher grades, the questions they generate for investigation can be based on current issues and interests. Students in grades 6–8, for example, may be interested in recycling, conservation, or manufacturers' claims. They may pose questions such as, Is it better to use paper or plastic plates in the cafeteria? or Which brand of batteries lasts longer? By grades 9–12, students will be ready to pose and investigate problems that explore complex issues.

Young children can devise simple data-gathering plans to attempt to answer their questions. In the primary grades, the teacher might help frame the question or provide a tally sheet, class roster, or chart on which data can be recorded as they are collected. The "data" might be real objects, such as children's shoes arranged in a bar graph or the children themselves arranged by interest areas. As students move through the elementary grades, they should spend more time planning the data collection and evaluating how well their methods worked in getting information about their questions. In the middle grades, students should work more with data that have been gathered by others or generated by simulations. By grades 9–12, students should understand the various purposes of surveys, observational studies, and experiments.

A fundamental idea in prekindergarten through grade 2 is that data can be organized or ordered and that this "picture" of the data provides information about the phenomenon or question. In grades 3–5, students should develop skill in representing their data, often using bar graphs, tables, or line plots. They should learn what different numbers, symbols, and points mean. Recognizing that some numbers represent the values of the data and others represent the frequency with which those values occur is a big step. As students begin to understand ways of representing data, they will be ready to compare two or more data sets. Books, newspapers, the World Wide Web, and other media are full of displays of data, and by the upper elementary grades, students ought to learn to read and understand these displays. Students in grades 6–8 should begin to compare the effectiveness of various types of displays in organizing the data for further analysis or in presenting the data clearly to an audience. As students deal with larger or more-complex data sets, they can reorder data and represent data in graphs quickly, using technology so that they can focus on analyzing the data and understanding what they mean.

#### Select and use appropriate statistical methods to analyze data

p. 49

Although young children are often most interested in their own piece of data on a graph (I have five people in my family), putting all » the students' information in one place draws attention to the set of data. Later, students should begin to describe the set of data as a whole. Although this transition is difficult (Konold forthcoming), students may, for example, note that "more students come to school by bus than by all the other ways combined." By grades 3–5, students should be developing an understanding of aggregated data. As older students begin to see a set of data as a whole, they need tools to describe this set. Statistics such as measures of center or location (e.g., mean, median, mode), measures of spread or dispersion (range, standard deviation), and attributes of the shape of the data become useful to students as descriptors. In the elementary grades, students' understandings can be grounded in informal ideas, such as middle, concentration, or balance point (Mokros and Russell 1995). With increasing sophistication in secondary school, students should choose particular summary statistics according to the questions to be answered.

Throughout the school years, students should learn what it means to make valid statistical comparisons. In the elementary grades, students might say that one group has more or less of some attribute than another. By the middle grades, students should be quantifying these differences by comparing specific statistics. Beginning in grades 3–5 and continuing in the middle grades, the emphasis should shift from analyzing and describing one set of data to comparing two or more sets (Konold forthcoming). As they move through the middle grades into high school, students will need new tools, including histograms, stem-and-leaf plots, box plots, and scatterplots, to identify similarities and differences among data sets. Students also need tools to investigate association and trends in bivariate data, including scatterplots and fitted lines in grades 6–8 and residuals and correlation in grades 9–12.

#### Develop and evaluate inferences and predictions that are based on data

Central elements of statistical analysis—defining an appropriate sample, collecting data from that sample, describing the sample, and making reasonable inferences relating the sample and the population—should be understood as students move through the grades. In the early grades, students are most often working with census data, such as a survey of each child in the class about favorite kinds of ice cream. The notion that the class can be viewed as a sample from a larger population is not obvious at these grades. Upper elementary and early middle-grades students can begin to develop notions about statistical inference, but developing a deep understanding of the idea of sampling is difficult (Schwartz et al. 1998). Research has shown that students in grades 5–8 expect their own judgment to be more reliable than information obtained from data (Hancock, Kaput, and Goldsmith 1992). In the later middle grades and high school, students should address the ideas of sample selection and statistical inference and begin to understand that there are ways of quantifying how certain one can be about statistical results.

p. 50

In addition, students in grades 9–12 should use simulations to learn about sampling distributions and make informal inferences. In particular, they should know that basic statistical techniques are used to monitor quality in the workplace. Students should leave secondary school » with the ability to judge the validity of arguments that are based on data, such as those that appear in the press.

#### Understand and apply basic concepts of probability

A subject in its own right, probability is connected to other areas of mathematics, especially number and geometry. Ideas from probability serve as a foundation to the collection, description, and interpretation of data.

In prekindergarten through grade 2, the treatment of probability ideas should be informal. Teachers should build on children's developing vocabulary to introduce and highlight probability notions, for example, We'll probably have recess this afternoon, or It's unlikely to rain today. Young children can begin building an understanding of chance and randomness by doing experiments with concrete objects, such as choosing colored chips from a bag. In grades 3–5 students can consider ideas of chance through experiments—using coins, dice, or spinners—with known theoretical outcomes or through designating familiar events as impossible, unlikely, likely, or certain. Middle-grades students should learn and use appropriate terminology and should be able to compute probabilities for simple compound events, such as the number of expected occurrences of two heads when two coins are tossed 100 times. In high school, students should compute probabilities of compound events and understand conditional and independent events. Through the grades, students should be able to move from situations for which the probability of an event can readily be determined to situations in which sampling and simulations help them quantify the likelihood of an uncertain outcome.

Many of the phenomena that students encounter, especially in school, have predictable outcomes. When a fair coin is flipped, it is equally likely to come up heads or tails. Which outcome will result on a given flip is uncertain—even if ten flips in a row have resulted in heads, for many people it is counterintuitive that the eleventh flip has only a 50 percent likelihood of being tails. If an event is random and if it is repeated many, many times, then the distribution of outcomes forms a pattern. The idea that individual events are not predictable in such a situation but that a pattern of outcomes can be predicted is an important concept that serves as a foundation for the study of inferential statistics.

 Home | Table of Contents | Purchase | Resources NCTM Home | Illuminations Web site Copyright © 2000 by the National Council of Teachers of Mathematics.