This chapter summarizes some of the key concepts and relationships of single-variable statistics that we might find useful for characterizing measurements, particularly when we have measured a quantity at multiple times, or we’ve measured many individual members of a population or collection. It does, however, point to some connections that we can make between the measurement and characterization of data and the scientific description of nature that we sometimes seek.

## 6.1 Measurement and Sampling

In the natural sciences we often need to estimate or measure a quantity or set of quantities that is too large, too numerous, or too complex to characterize completely in an efficient way. We can instead characterize it approximately with a *representative sample*. A representative sample is a small subset of the whole that is measured in order to characterize the whole.

Consider an example. In small headwater streams, many aspects of biotic health are linked with the size of the substrate – the sand, pebbles or boulders that compose the streambed. But it is impractical to measure all the gajillions of particles scattered over the entire bed. Instead, we attempt to get a smaller but representative sample of

the bed material. This may be done in a number of different ways, but two common methods are: 1) to take one or more buckets full of sediment from the streambed and do a detailed particle-size analysis in a laboratory; and 2) measure the size of 100 randomly selected particles from the bed. Both methods obtain a sample, but each may represent the true streambed in a different way. The bucket method requires us to choose sample sites on the streambed. Our choices might be *biased *toward those places where sampling might be easier, the bed more visible, or the water shallower. In this case, our results might not be representative of the streambed as a whole.

The “pebble count” method, on the other hand, is intended to produce a more random sample of the . A person wading in the stream steps diagonally across the channel, and at each step places her index of the toe of her boot. The diameter of the particle that her finger touches first is measured, and then she repeats the process,- across the channel until she has measured 100 (or some larger pre- determined number) particles. In principle, this *random sample *is more representative of the, particularly as the number of particles in the sample is increased. Of course, increasing the number of particles in the sample increases the time and effort used, but with diminishing returns for improving the accuracy of the sample.

Element 1.

(^{1})This method is sometimes called the “Wolman pebble count” method for Reds Wolman, the scientist who first described and popularized it.

Hypothetically-speaking, an alternative pebble-count method could be to stretch a tape measure across the stream and measure the particle size at regular intervals, say every half meter. We can call this strategy the “point count” method. This alternative is appealing since it ensures that samples are distributed evenly across the channel and that samples are not clustered in space. However, it is conceivable that such *systemic sampling *could lead to a systematic bias(^{2}).

Element 2.

(^{2})Systematic sampling is sometimes an easier, more straight-forward approach to sampling. However, if the setting within which sampling is taking place might have some systematic structure, systematic sampling could inadvertently bias the sample.

If for example the had clusters or patterns of particles in it that had a wavelength of 0.5 m, you could be inadvertently sampling only a certain part of the top of each dune, which might skew your results toward particle sizes that are concentrated on dune crests. Thus, a random sample is usually preferable as it is less susceptible to this kind of systematic bias.

Quantities derived from a random sample are unrelated to one another in the same way that the size of one grain measured during a pebble count has no influence on the size of the next one. Part of our sequence of data might look like this:

12, 2, 5, 26, 4, 28, 19, 29, 3, 15, 31, 19, 24, 27, 7, 22, 28, 33, 21, 28, 13, 15, 25, 10, 14, 13, 16, 18, 33, 5

The random nature of this set of data allows us to use some of the familiar ways of describing our data, while boosting our confidence that we are also properly characterizing the larger system that we are sampling.

### 6.1.1 Example: mark-recapture

A frequent concern of the wildlife ecologist is the abundance and health of a particular species of interest. Ideally, we could count and assess the health of every individual in a population, but that is usually not practical - heck, we have a tough enough time counting and assessing the health of all the humans in a small town! Instead of trying to track down every individual though, we can do a decent job by simply taking a random sample from the population and performing the desired analysis on that random sample. As we have seen, if we are sufficiently careful about avoiding bias in our sampling, we can be reasonably confident that our sample will tell us something use- ful (and not misleading) about the larger population that the sample came from.

If our concern is mainly with the population of a target species in a certain area, we can use a method called *mark-recapture*, or *capture- recapture. *The basic premise is simple: we capture some number of individuals in a population at one time, band, tag or mark them in such a way that they can be recognized later as individuals that were previously captured, then release them. Some time later, after these individuals have dispersed into the population as a whole, we capture another set. The proportion of the individuals in the second capture who are marked should, in theory, be the same as the proportion of the whole population that we marked to begin with. If the number of individuals we marked the first time around is *N*(_{1}), the number we captured the second time around is *N*(_{2}), and the number in the second group that bore marks from the first capture is *M*, the population *P *may be estimated most simply as:

*P* = (frac{N_{1} N_{2}}{M}) (6.1)

This comes from the assumption that our sample each time is random, and that the marked individuals have exactly the same likelihood of being in the second capture as they did in the first: 1/*P*. Therefore, if we sampled and marked a fraction *N*(_{1})/*P *the first time around and sample *N*2 the second time around, then we should expect a fraction *M*/*N*(_{2}) of them to be marked.

Of course this whole plan can be foiled if some key assumptions are not met. For example, we need the population to be “closed” – that is, individuals do not enter and leave the population such that our sample is not coming from the same set of individuals each time. Problems could also ensue if our “random” sample isn’t random, if somehow the process of marking individuals either harmed them or made their likelihood of re-capture more or less likely, or if the time we allowed for them to re-mix with their population was not appropriate. On the last point, you can imagine that if we recapture tortoises 10 minutes after releasing them from their first capture, our second sample will not be very random. On the other hand, if we recapture marked fish 20 years after they were first marked, many of them may have died and been replaced by their offspring, and thus our assumption of a “closed” population is violated. So in planning a mark-recapture study, space and timescales need to be taken into account.

It is worth noting that the method described here is about the most stripped down version of mark-recapture. There are many modifications to the method and the equation used to compute population that either account for immigration/emigration, multiple recaptures, some possible re-recaptures, etc. There are also related methods us- ing tagging and marking that can be used to explore the dispersal of individuals, migration routes and alot more!

## 6.2 Describing measurements

Measurements, or “data”, can inform and influence much of a re- source manager’s work objectives, since they convey information about the systems of interest. Sometimes the data speak for themselves: raw numbers are sufficiently clear and compelling that nothing more needs to be done to let the data speak. More commonly, however, the data need to be summarized and characterized through one or more processes of **data processing **and **data reduction**. Processing might simply refer to a routine set of algorithms applied to raw data to make it satisfy the objectives of the project or problem. Data reduction usually summarizes a large set of data with a smaller set of descritptive statistics. For a set of measurements of a simple quantity, for example, we might wish to know:

About Our Data

**Things we often want to know about our data**

1. what is a typical observation?

2. how diverse are the data?

3. how should these properties of the data be characterized for different types of quantities?

The first point suggests the use of our measures of central tendency: mean, median and mode. The second goal relates to measures of spread or dispersion in the data. For example, how close are most values in the data set to the mean?

## 6.3 Central tendency

The central tendency of a data set is a characteristic central value that may be the **mean**, **median**, or **mode**. Which of these measures of central tendency best characterizes the data set depends on the nature of the data and what we wish to characterize about it. Most of us are already familiar with the concept of a mean, or av- erage value of a set of numbers. We normally just add together all of the observed values and divide by the number of values to get the mean. Actually, this is the *arithmetic mean*, and there are many alter- native ways of computing different kinds of means that are useful in particular circumstances, but we won’t worry about these now. For our purposes, the arithmetic mean is the mean we mean when we say mean or average. It would be mean to say otherwise. Before continuing, lets briefly discuss the different kinds of nota- tion what we might use when talking about data. To define some- thing like the mean with an equation, we’d like to make the defi- nition as general as possible, i.e., applicable to all cases rather than just one. So we need notation that, for example, does not specify the number of data points in the data set but allows that to vary. If we want to find the mean (call it *x* ̄) of a set of 6 data points (*x*1, *x*2, and so on), one correct formula might look like this:

(ar{x}=frac{x_{1}+x_{2}+x_{3}+x_{4}+x_{5}+x_{6}}{6}) (6.2)

and of course this is correct. But we can’t use the same formula for a dataset that has 7 or 8 values, or anything other than 6 values. Fur- thermore, it is not very convenient to have to write out each term in the numerator if the data set is really large. So we need a shorthand that is both brief and not specific to a certain number of data points. One approach is to write:

(ar{x}) = (frac{x_{1}+x_{2}+...+x_{n}}{n}) (6.3)

where we understand that *n *is the number of observations in the data set. The ellipsis in the numerator denotes all the missing values between *x*(_{2}) and *x*(_{n}), the last value to be included in the average. Using this type of equation to define the mean is much more general than the first example, and is more compact as long as there are 4 or more values to be averaged.

One additional way you might see the mean defined is using so- called “sigma notation”(^{3}), where it looks like this:

(ar{x}=frac{1}{n} sum_{i=1}^{n} x_{i}) (6.4)

Element 3.

(^{3})This symbol is a handy shorthand for the process of adding a bunch of quantities together, but also serves the purpose of scaring many poor students away. Once you realize that it’s just an abbreviation for listing all the the terms tobeadded(*x*1 +*x*2 +...)andsomeof the rules for doing so, it becomes a tad less fearsome.

where the big Σ is the summation symbol. If you’ve never encountered this before, here’s how to interpret it: the “summand”, the stuff after the Σ, is to be interpreted as a list of values (in this case *x*(_{i})) that need to be added together, and *i *starts at 1 and increases until you get to *n*. You can see the rules for what *i *means by looking at the text below and above the Σ. Below where it says *i *= 1 that means that *i *begins with a value of 1 and increases with each added term until *i *= *n*, which is the last term. So in the end, you can interpret this to have a meaning identical to the equivalent expressions above, but in some cases this notation can be more compact and explicit. It also looks fancier and more intimidating, so people will sometimes use this notation to scare you off, even though it gives you the same result as the second equation above.

### 6.3.1 Mean versus Median

For some data sets, the mean can be a misleading way to describe the central tendency. If your creel after a day of fishing includes 5 half- pound crappies, a 3/4-pound walleye, 4 one 16-pound , it would be correct but misleading to say that the average size of the fish you caught was 2.1 pounds. The distribution of weights includes one distant outlier, the, that greatly distorts the mean, but all of the other fish you caught weighed one pound or less. We might say in this case that the mean is sensitive to outliers.

The median is an alternative measure of central tendency that is not sensitive to outliers. It is simply the value for which half the ob- servations are greater and half are smaller. From your fishing catch, the 0.75 pound walleye represents the median value, since 5 fish (the crappies) were smaller and 5 fish (the smallies and the muskie) were larger. The median may also be thought of as the middle value in

a sorted list of values, although there is really only a distinct middle value when you have an odd number of observations. In the event that you’ve got an even number of observations, the median is halfway between the two middle observations.

### 6.3.2 Mode

The mode is the value or range of values that occurs most frequently in a data set. Since you caught 5 half-pound fish and fewer of every other weight value in the dataset, the mode of this distribution is 0.5 pounds. Now if the weights we’ve reported above are actually rounded from true measured weights that differ slightly, this definition becomes less satisfactory. For example, suppose the half-pound crappies actually weighed 0.46, 0.49, 0.5, 0.55 and 0.61 pounds. None of these are actually the same value, so can we say that this is still a mode? Indeed we can if we choose toor *bin *these data. We might say that our fish weights fall into bins that range from 0.375 to 0.625, 0.625 to 0.875, 0.875 to 1.125, and so on. In this case, since all of our crappies fall in the range 0.375 to 0.625 (which is 5 ± 1/8 lbs), this size range remains the mode of the data set. We can see this visually in a histogram, which is just a bar-chart showing how often measurements fall within each bin in a range (Figure 6.2).

## 6.4 Spread

As mentioned previously, one way to quantify dispersion of a data set is to find the difference between any given observation and the expected value or sample mean. If we write this:

x(_{i}) - (ar{x}), (6.5)

we can call each such difference a **residual**. A could be used to de- scribe the relationship between individual data points and the sample mean, but doesn’t by itself characterize the spread of the entire data set. But what if we add together all of these residuals and divide by the number of data points? Well, this should just give us zero, according to the definition of the mean! But suppose instead that we *squared *the residuals before adding them together. The formula would look like:

(frac{1}{n} sum_{i=1}^{n}left(x_{i}-ar{x} ight)^{2}) (6.6)

This expression is defined as the **variance **and is strangely denoted by *σ*2, but you’ll see why in a minute. Squaring the residuals made most of them larger and made negative residuals positive. It also accentuated those outlier data points that were farther from the mean. Now if we take the square root of the variance, we’re left with a finite positive value that very well represents how far data typically are from the mean: the **standard deviation **of the sample, or σ(^{4}). The formal definition of standard deviation looks like this:

(sigma=sqrt{frac{1}{n} sum_{i=1}^{n}left(x_{i}-ar{x} ight)^{2}}) (6.7)

The gives us a good sense for how far from the mean a typical measurement lies. We can now characterize a sample as having a mean value of (ar{x}) and standard deviation of *σ*, or saying that typical values are (ar{x}) ± *σ*. But in reality, if we computed (ar{x}) and *σ*, the bounds set by (ar{x}) − *σ *and (ar{x}) + *σ *only contain about 68% of the data points. If we want to include more of the data, we could use two standard deviations above and below the mean, in which case we’ve bounded more than 95% of the data.

## 6.5 Error & Uncertainty

One piece of information we have thus far omitted from our list of properties that fully define a quantity’s value is uncertainty. This is particularly important when we are quantifying something that has been measured directly or derived from measurements. Thus, to even more completely define the value of a *measured quantity*, we should include some estimate of the uncertainty associated with the number assigned to it. This will often look like:

*x *= *x*(_{best}) ± *δ**x*, (6.8)

where *x *is the thing we are trying to quantify, *x*(_{best}) is our best guess of its value, and *δ**x *is our estimate of the uncertainty. Though it will depend on the quantity in question, our best estimate will often be the result of a single measurement or – better yet – the mean of a number of repeated measurements.

Element

The preferred value *x*(_{best}) for a quantity of interest will often be the **mean **of repeated measurements of that quantity.

### 6.5.1 Uncertainty in measured quantities

All measurements are subject to some degree of uncertainty, arising from the limited resolution of the instrument or scale used to make the measurement, or from random or systematic errors resulting from the method or circumstances of measurement. Let’s consider an example:

Suppose two fisheries biologists each measured the lengths of ten of the brook trout captured during the electrofishing traverse from Problem 3.7. Both used boards with identical scales printed on them, graduated to half of a centimeter. They then plan to put their measurements together to get a data set of 20 fish. One of them was trained to pinch together the tail fins to make this measurement, while the other was not. In addition, because they wished not to

harm the fish, they made their measurements quickly, even if the fish flopped and wiggled during the measurement. What are the potential sources of error and how big are they relative to one another?

For starters, implicit in the graduations on this board is that the user cannot confidently read any better than half-centimeters off the scale. He or she can, however, visually *interpolate *between two adjacent graduations to improve precision (see below). However, this step is inherently subjective and limits the certainty of the measurement. We might call this **instrumental error **because its magnitude is set by the instrument or device use to make the measurement. One way to reduce this source of error is to use a more finely-graduated scale.

Instrumental error

Instrumental error is fixed by the resolution of the device used to make a measurement, and can usually only be reduced by using a more precise instrument.

A second source of error arises from the hasty measurements and the fact that the fish were not necessarily cooperative. Perhaps the mouth was sometimes not pressed up all the way against the stop, or the fish wasn’t well aligned with the scale. Some lengths may have been too large or small as a result, yielding a source of error that was essentially random. Indeed, we can call this **random error **since its sign and magnitude are largely unrelated from one measurement to the next. Reducing this source of error in this case would require either more careful and deliberate effort at aligning and immobilizing the fish, or making multiple measurements of the same fish. Both of these solutions could endanger the fish and may therefore not be desirable.

Random error

Random measurement errors may be mitigated by repeating measurements.

A third source of error is associated with the difference in the way the two scientists dealt with the tail fin. Length measurements made with the fins pinched together will usually be longer than those without. Had they measured the same group of ten fish, one set of measurements would have yielded lengths consistently smaller than the other. This is a **systematic error**, and can often be troublesome and difficult to detect. This highlights the need for a procedural statement that establishes clear guidelines for measurements wherever such sources of systematic error can arise.

Systemic error

Systematic errors result in data that deviate systematically from the true values. These errors may often be more difficult to detect and correct, and data collection efforts should make great pains to eliminate any sources of systematic error.

Each of these types of error can affect the results of the measurements, and should be quantified and included in the description of the best estimate of fish length. But errors can affect the best estimate in different ways. Instrumental error, as described above, can itself either be random or systematic. The printed scale on one of the fish measurement boards could be stretched by a factor of 3% compared to the other, resulting in a systematic error. Likewise one board might be made from plastic that is more slippery than the other and thus more difficult to align the fish on. This could result in additional random error associated with that device. But what are the relation- ships between these types of errors and the best estimate that we are seeking?

Error or variation?

**Error or variation? Questions to ask yourself**

1. What were possible sources of error in your measurements? Are they random or systematic?

2. How can you tell the difference between error in measurement and natural variability?

### 6.5.2 Real variability

Not all deviations from the mean are errors. For real quantities in nature, there is no good reason to assume that, for example, all age-0 brook trout will be the same length. Indeed we expect that there are real variations among fish of a single age cohort due to differences in genetics, feeding patterns, and other real factors. If we’re measuring a group of age-0 fish to get a handle on how those fish vary in size, then at least some of the variation in our data reflects real variation in the length of those fish. How do we tease out the variation that is due to errors from the variation that is due to real variability?

Often a good approach is to try to independently estimate the magnitude of the measurement errors. If those measurement errors are about the same magnitude as the variations (residuals) within the data, then it may not be possible to identify real variability. However, in the more likely event that our measurements are reasonably ac- curate and have small measurement errors compared to their spread about the mean, then the indicated variations probably reflect true variability.

This observation returns us to our earlier question: when we seek to characterize some quantity how should we identify our best es- timate and our degree of uncertainty in that estimate. If we wish to characterize a single quantity and our certainty that our best estimate is close to or equal to the true value, we should use the mean of re- peated measurements of this value and the standard error of those measurements. The standard error can be readily estimated by divid- ing the standard deviation of the repeated measures by the number of measurements *n*:

SE = (frac{*σ*}{sqrt{n}}) (6.9)

This should be equivalent to the standard deviation of a number of estimates of the mean *x* ̄, if several samples were taken from the full population of measurements. Like the standard deviation, we can be about about 68% confident that the range *x**best* + SE to *x**best* − SE includes the true value we wish to characterize, but if we use 1.96 SE instead, we can have 95% confidence(^{5}). A complete statement, then, of our best estimate with 95% certainty in this context is to say:

*x *= *x*(_{best}) ± 1.96 SE, (6.10)

If instead we desire a characterization of a typical value and range for something that has real variability among individuals in a population, we will usually describe it with the mean and standard devia- tion.

*x *= *x**best* ± 1.96 *σ*, (6.11)

Element 5.

(^{5})Note that we are currently assuming that our measurements are noramally distributed.

## 6.6 Distributions

The kind of data we’ve been talking about thus far is univariate: a single quantity with variable values like the diameter of a stream- bed particle, or the length of a fish. As we know, not all age-0 brook trout are the same size. In a first-pass capture of 50 fish, for example, we should expect some variability in length that might reflect age, genetics, social structure, or any other factor that might influence development. The variation may be visualized graphically in a number of ways. We’ll start with a histogram.

A histogram shows the distribution of a set of *discrete *measurements – that is the range of values and the number of data points falling into each of a number of bins, which are just ranges of values (112.5 to 117.5 is one bin, 117.5 to 122.5 another. ). This can be called a frequency distribution, and a histogram is one of the best ways to visualize a frequency distribution (Figure 6.3).

But what if we had uniformly distributed data? A uniform distribution means that it is equally likely that we’ll find an individual with a length on the low end (97.5-102.5 mm) of the range as any other. That would look quite different – there would be no hump in the middle of the histogram, but rather a similar number of measurements of each possible length. The uniform distribution is great: in fact, we count on uniformity sometimes. If you are at the casino and rolling the dice, you probably assume (unless you’re dishonest) that there is an equal probability that you’ll roll a 6 as there is that you’ll roll a 1 on any given die. We can call that a uniform probability distribution for a single roll of a die. But What if the game you are playing counts the sum of the numbers on 5 dice? Is there still a uniform probability of getting any total value from 5 to 30?

We could actually simulate that pretty easily by randomly choosing (with a computer program like R(^{6}) or Excel) five integers between 1 and 6 and adding them together. Figure 6.4 shows the plot that comes out. Looks sorta like a bell curve, right? Well, how likely is it that you’ll get five 1’s or five 6’s? Not very, right? You’re no more likely to get one each of 1,2,3,4 and 5 either, right? However, there are multiple ways to get a 1,2,3,4 and 5 with different dice showing each of the possible numbers, whereas there is only one way to get all sixes and one way to get all ones. So there are better chances that you’ll get a *random *assortment of numbers, some higher and some lower, and their sum will tend toward a central value, the mean of the possible values. So, since your collection of rolls of the dice rep- resent a random sample from a uniform distribution, the sum of several rolls will be normally distributed.

Element 6.

(^{6}) R is a top choice software for general purpose data analysis and modeling. It is free software, works on most computer platforms, and has nearly infinite capabilities due to the user-contributed package repository. Learn more about R at https://cran.r-project.org/

What’s it got to do with fish? If we sample brook trout randomly from one stream reach and measure their lengths, we might expect them to be normally distributed. Describing such a normal distribution with quantities like the mean and standard deviation gives us the power to compare different populations, or to decide whether some individuals are outliers. The nuts and bolts of those comparisons depend on how the type of distribution represented by the population. An ideal normal distribution is defined by this equation:

(f(x)=frac{1}{sqrt{2 pi sigma}} exp left[frac{(x-mu)^{2}}{2 sigma^{2}} ight]) (6.12)

and it’s graph, in the context of our original hypothetical distribution of fish lengths, looks like the red line in Figure 6.5. In order to compare the continuous and discrete distributions, we’ve divided the counts in each bin by the total number in the sample (50), to yield a *density *distribution. The blue line is just a smoothed interpolation of the top centers of each bar in the discrete distribution, so it generally reflects the density of data within each bin. As you can see, the discrete distribution density and the continuous normal distribution functions are similar, but there are some bumps in the discrete distribution that don’t quite match the continuous curve. As you can imagine though, that difference would become less pronounced as your dataset grows larger. Related to this, then, is the idea that your *confidence *in the central tendency and spread derived from your dataset should get better with more data.

Exercise 1)

1. Download the data from Derek Ogle’s InchLake2 dataset from the fishR data website. Using either a spreadsheet or data analysis package, isolate the bluegill from the dataset and identify the following:

(a) Mean bluegill length.

(b) Standard deviation of bluegill length.

(c) Mean bluegill weight.

(d) Standard deviation of bluegill weight.

Exercise 2)

The graph and data table below and right show measurements of brook trout lengths from pass #1 of the electrofishing campaign described in Problem 3.7. Use these resources to answer the following questions:

(a) Judging from the histogram in Figure 2, does the dataset contain just one mode or more than one? What might be the reason for this?

(b) What is the mean and standard-deviation for the (presumed) age-0 portion of this sample?

## Key Ideas - Chapter 6: Reasoning about data

This article by J. Michael Shaughmessy and Maxine Pfannkuch is subtitled "Statistical Thinking: A Story of Variation and Prediction" considers work carried out by students using real data.

### Crime Scene Evidence

This task, produced by the Royal Statistical Society with Plymouth University, uses a problem-solving approach.

The resource enables teachers to lead pupils through a crime investigation to help solve the problem of whodunit? A theft has occurred and the only clue to identify the culprit is a footprint.

Pupils investigate how helpful the footprint may be in identifying the thief. They use averages, histograms and scatter diagrams to explore the likelihood of various suspects being the culprit.

### Chapter 3: Using Random Samples of Real Data

This chapter in the booklet Relevant and Engaging Statistics and Data Handling from the Royal Statistical Society Centre for Statistical Education (RSSCSE) describes the steps needed to both take and use random samples of real data from the CensusAtSchool website. In addition it offers some ideas to allow students to use samples of real data in their data handling and statistics lessons.

### Chapter 6: Data Visualisation

This chapter in the booklet Relevant and Engaging Statistics and Data Handling from the Royal Statistical Society Centre for Statistical Education (RSSCSE) looks at ways to visualise data. In particular how data can be displayed in tables and charts having been retrieved from an online database, particularly the AtSchool database, using a Database Interrogation Tool.

Examples of common visualisations include: tables, matrices, charts, graphs, maps, Venn diagrams, and Chernoff faces.

### Data With No Name

In this resource from CensusAtSchool, a set of data is presented with little background information. Students are invited, via a series of questions, to turn the data into usable, useful information applying both mathematical reasoning and use of statistical methods.

It encourages the use of spreadsheets as a means to further enhance the quality of the work and provides scope for further investigation. Students will be engaged in using frequency tables, grouped data, mean, median, mode and range, and in comparing distributions.

### Towards The Construction of Meaning for Trend in Active Graphing

This link is to a PDF of a paper by Ainley, Nardi and Pratt on the Institute of Education website.

Page 12 describes two tasks that were used in the research, ostensibly to engage students in the use of scatterplots. However, the students also need to consider the signal and noise in the data that emerge during the active graphing process, particularly in the helicopters task.

As students carry out the tasks, the bivariate data are plotted using a spreadsheet. For example, in the case of the helicopter task, the students might be aiming to find the &lsquobest&rsquo helicopter &ndash the one with the longest time of flight. They might consider the time of flight when helicopters of differing wing lengths are dropped.

The data are likely to be quite noisy given the need to measure lengths and durations of time. Nevertheless, the plot of wing length against time should gradually reveal a wing length that seems to offer the maximum time of flight. The signal that emerges through the noise is likely to be a humped shape with large or small wing lengths resulting in helicopters that drop very quickly.

### Visualisation Inference Tools

This is a link to Chris Wild&rsquos personal website in which he reports on the latest developments of his visual inference tools.

This is not a tool to use directly with younger students, but it provides the reader with an interesting insight into how modern tools are beginning to make sophisticated ideas about statistical inference easier to visualise. This is work in progress but well worth monitoring.

The website links to seminars, webinars and movies describing the tools. It is also possible to download and install the software for the tools themselves. The emphasis is very much on visualisation. By sampling and re-sampling many times, and keeping a graphical trace of the parameters of interest, it becomes possible to imagine sampling distributions as animations.

## Groundworks Reasoning with Data and Probability

Just like the other titles in the Groundworks series, Reasoning with Data and Probability centers on the big ideas of data organization and analysis and probability using interesting and challenging problems. The five big ideas in Reasoning with Data and Probability are:

- Interpret Displays of Data
- Organize Data
- Describe Data
- Ways to Count
- Probability

The text contains 12 different sets of problems. A set refers to a specific type of mathematical reasoning problem. Each set contains six different problems for ample practice reinforcement. With 12 different sets of six problems, each text totals 72 different mathematical reasoning problems.

Each problem set consists of eight pages and begins with a teaching information page. This age contains several features to help guide the teacher through the activity, either as a class or individually. Following the teacher page are six pages of student problems, each containing one problem, all addressing the same concept. The final page in the problem set is a solutions page.

## Quantitative Aptitude - Data Interpretation Questions

Data Interpretation is the process of analysing data, inspecting the elements in data and Interpreting to extract maximum information from the given set of data or information. The data is given in the form of charts, tables and graphs. Data Interpretation has no particular syllabus, this section tests one's ability in analysing data, decision making capability and speed. Data Interpretation looks simple and easy but the calculations are time consuming. For solving the data interpretation problems efficiently one should analyse the given data and focus on aspects of the data that are necessary to answer the questions, before attending data interpretation section one should be very comfortable with numbers, calculations, percentages, fractions, averages and ratios to increase the calculation speed.

We come across Data Interpretation Questions in many competitive exams and Entrance Tests like Bank Exams (SBI PO), MBA entrance exams (CAT, MAT), HPAS, APPSC group1, HR executives, UPSC CPF (AC),IBPS ,UP Police constable exams, TNPSC VAO, WBSC, PPSC, HAL Results, NDA, Lokhsabha secretariat, Rajyasabha secretariat exams and more

Thorough practice of different papers on data interpretation allows you to solve different kinds of data interpretation and can help improve your logic in solving problems.

We have a large database of questions on Quantitative Aptitude (Data Interpretation) for you to practice and score high.

## Applying Practice Evidence

Research continues to find that using evidence-based guidelines in practice, informed through research evidence, improves patients’ outcomes. 81� Research-based guidelines are intended to provide guidance for specific areas of health care delivery. 84 The clinician𠅋oth the novice and expert—is expected to use the best available evidence for the most efficacious therapies and interventions in particular instances, to ensure the highest-quality care, especially when deviations from the evidence-based norm may heighten risks to patient safety. Otherwise, if nursing and medicine were exact sciences, or consisted only of techne, then a 1:1 relationship could be established between results of aggregated evidence-based research and the best path for all patients.

### Evaluating Evidence

Before research should be used in practice, it must be evaluated. There are many complexities and nuances in evaluating the research evidence for clinical practice. Evaluation of research behind evidence-based medicine requires critical thinking and good clinical judgment. Sometimes the research findings are mixed or even conflicting. As such, the validity, reliability, and generalizability of available research are fundamental to evaluating whether evidence can be applied in practice. To do so, clinicians must select the best scientific evidence relevant to particular patients𠅊 complex process that involves intuition to apply the evidence. Critical thinking is required for evaluating the best available scientific evidence for the treatment and care of a particular patient.

Good clinical judgment is required to select the most relevant research evidence. The best clinical judgment, that is, reasoning across time about the particular patient through changes in the patient’s concerns and condition and/or the clinician’s understanding, are also required. This type of judgment requires clinicians to make careful observations and evaluations of the patient over time, as well as know the patient’s concerns and social circumstances. To evolve to this level of judgment, additional education beyond clinical preparation if often required.

### Sources of Evidence

Evidence that can be used in clinical practice has different sources and can be derived from research, patient’s preferences, and work-related experience. 85 , 86 Nurses have been found to obtain evidence from experienced colleagues believed to have clinical expertise and research-based knowledge 87 as well as other sources.

For many years now, randomized controlled trials (RCTs) have often been considered the best standard for evaluating clinical practice. Yet, unless the common threats to the validity (e.g., representativeness of the study population) and reliability (e.g., consistency in interventions and responses of study participants) of RCTs are addressed, the meaningfulness and generalizability of the study outcomes are very limited. Relevant patient populations may be excluded, such as women, children, minorities, the elderly, and patients with multiple chronic illnesses. The dropout rate of the trial may confound the results. And it is easier to get positive results published than it is to get negative results published. Thus, RCTs are generalizable (i.e., applicable) only to the population studied—which may not reflect the needs of the patient under the clinicians care. In instances such as these, clinicians need to also consider applied research using prospective or retrospective populations with case control to guide decisionmaking, yet this too requires critical thinking and good clinical judgment.

Another source of available evidence may come from the gold standard of aggregated systematic evaluation of clinical trial outcomes for the therapy and clinical condition in question, be generated by basic and clinical science relevant to the patient’s particular pathophysiology or care need situation, or stem from personal clinical experience. The clinician then takes all of the available evidence and considers the particular patient’s known clinical responses to past therapies, their clinical condition and history, the progression or stages of the patient’s illness and recovery, and available resources.

In clinical practice, the particular is examined in relation to the established generalizations of science. With readily available summaries of scientific evidence (e.g., systematic reviews and practice guidelines) available to nurses and physicians, one might wonder whether deep background understanding is still advantageous. Might it not be expendable, since it is likely to be out of date given the current scientific evidence? But this assumption is a false opposition and false choice because without a deep background understanding, the clinician does not know how to best find and evaluate scientific evidence for the particular case in hand. The clinician’s sense of salience in any given situation depends on past clinical experience and current scientific evidence.

### Evidence-Based Practice

The concept of evidence-based practice is dependent upon synthesizing evidence from the variety of sources and applying it appropriately to the care needs of populations and individuals. This implies that evidence-based practice, indicative of expertise in practice, appropriately applies evidence to the specific situations and unique needs of patients. 88 , 89 Unfortunately, even though providing evidence-based care is an essential component of health care quality, it is well known that evidence-based practices are not used consistently.

Conceptually, evidence used in practice advances clinical knowledge, and that knowledge supports independent clinical decisions in the best interest of the patient. 90 , 91 Decisions must prudently consider the factors not necessarily addressed in the guideline, such as the patient’s lifestyle, drug sensitivities and allergies, and comorbidities. Nurses who want to improve the quality and safety of care can do so though improving the consistency of data and information interpretation inherent in evidence-based practice.

Initially, before evidence-based practice can begin, there needs to be an accurate clinical judgment of patient responses and needs. In the course of providing care, with careful consideration of patient safety and quality care, clinicians must give attention to the patient’s condition, their responses to health care interventions, and potential adverse reactions or events that could harm the patient. Nonetheless, there is wide variation in the ability of nurses to accurately interpret patient responses 92 and their risks. 93 Even though variance in interpretation is expected, nurses are obligated to continually improve their skills to ensure that patients receive quality care safely. 94 Patients are vulnerable to the actions and experience of their clinicians, which are inextricably linked to the quality of care patients have access to and subsequently receive.

The judgment of the patient’s condition determines subsequent interventions and patient outcomes. Attaining accurate and consistent interpretations of patient data and information is difficult because each piece can have different meanings, and interpretations are influenced by previous experiences. 95 Nurses use knowledge from clinical experience 96 , 97 and𠅊lthough infrequently—research. 98�

Once a problem has been identified, using a process that utilizes critical thinking to recognize the problem, the clinician then searches for and evaluates the research evidence 101 and evaluates potential discrepancies. The process of using evidence in practice involves 𠇊 problem-solving approach that incorporates the best available scientific evidence, clinicians’ expertise, and patient’s preferences and values” 102 (p. 28). Yet many nurses do not perceive that they have the education, tools, or resources to use evidence appropriately in practice. 103

Reported barriers to using research in practice have included difficulty in understanding the applicability and the complexity of research findings, failure of researchers to put findings into the clinical context, lack of skills in how to use research in practice, 104 , 105 amount of time required to access information and determine practice implications, 105� lack of organizational support to make changes and/or use in practice, 104 , 97 , 105 , 107 and lack of confidence in one’s ability to critically evaluate clinical evidence. 108

### When Evidence Is Missing

In many clinical situations, there may be no clear guidelines and few or even no relevant clinical trials to guide decisionmaking. In these cases, the latest basic science about cellular and genomic functioning may be the most relevant science, or by default, guestimation. Consequently, good patient care requires more than a straightforward, unequivocal application of scientific evidence. The clinician must be able to draw on a good understanding of basic sciences, as well as guidelines derived from aggregated data and information from research investigations.

Practical knowledge is shaped by one’s practice discipline and the science and technology relevant to the situation at hand. But scientific, formal, discipline-specific knowledge are not sufficient for good clinical practice, whether the discipline be law, medicine, nursing, teaching, or social work. Practitioners still have to learn how to discern generalizable scientific knowledge, know how to use scientific knowledge in practical situations, discern what scientific evidence/knowledge is relevant, assess how the particular patient’s situation differs from the general scientific understanding, and recognize the complexity of care delivery𠅊 process that is complex, ongoing, and changing, as new evidence can overturn old.

Practice communities like individual practitioners may also be mistaken, as is illustrated by variability in practice styles and practice outcomes across hospitals and regions in the United States. This variability in practice is why practitioners must learn to critically evaluate their practice and continually improve their practice over time. The goal is to create a living self-improving tradition.

Within health care, students, scientists, and practitioners are challenged to learn and use different modes of thinking when they are conflated under one term or rubric, using the best-suited thinking strategies for taking into consideration the purposes and the ends of the reasoning. Learning to be an effective, safe nurse or physician requires not only technical expertise, but also the ability to form helping relationships and engage in practical ethical and clinical reasoning. 50 Good ethical comportment requires that both the clinician and the scientist take into account the notions of good inherent in clinical and scientific practices. The notions of good clinical practice must include the relevant significance and the human concerns involved in decisionmaking in particular situations, centered on clinical grasp and clinical forethought.

### The Three Apprenticeships of Professional Education

We have much to learn in comparing the pedagogies of formation across the professions, such as is being done currently by the Carnegie Foundation for the Advancement of Teaching. The Carnegie Foundation’s broad research program on the educational preparation of the profession focuses on three essential apprenticeships:

To capture the full range of crucial dimensions in professional education, we developed the idea of a three-fold apprenticeship: (1) intellectual training to learn the academic knowledge base and the capacity to think in ways important to the profession (2) a skill-based apprenticeship of practice and (3) an apprenticeship to the ethical standards, social roles, and responsibilities of the profession, through which the novice is introduced to the meaning of an integrated practice of all dimensions of the profession, grounded in the profession’s fundamental purposes. 109

This framework has allowed the investigators to describe tensions and shortfalls as well as strengths of widespread teaching practices, especially at articulation points among these dimensions of professional training.

Research has demonstrated that these three apprenticeships are taught best when they are integrated so that the intellectual training includes skilled know-how, clinical judgment, and ethical comportment. In the study of nursing, exemplary classroom and clinical teachers were found who do integrate the three apprenticeships in all of their teaching, as exemplified by the following anonymous student’s comments:

With that as well, I enjoyed the class just because I do have clinical experience in my background and I enjoyed it because it took those practical applications and the knowledge from pathophysiology and pharmacology, and all the other classes, and it tied it into the actual aspects of like what is going to happen at work. For example, I work in the emergency room and question: Why am I doing this procedure for this particular patient? Beforehand, when I was just a tech and I wasn’t going to school, I𠆝 be doing it because I was told to be doing it—or I𠆝 be doing CPR because, you know, the doc said, start CPR. I really enjoy the Care and Illness because now I know the process, the pathophysiological process of why I’m doing it and the clinical reasons of why they’re making the decisions, and the prioritization that goes on behind it. I think that’s the biggest point. Clinical experience is good, but not everybody has it. Yet when these students transition from school and clinicals to their job as a nurse, they will understand what’s going on and why.

The three apprenticeships are equally relevant and intertwined. In the Carnegie *National Study of Nursing Education* and the companion study on medical education as well as in cross-professional comparisons, teaching that gives an integrated access to professional practice is being examined. Once the three apprenticeships are separated, it is difficult to reintegrate them. The investigators are encouraged by teaching strategies that integrate the latest scientific knowledge and relevant clinical evidence with clinical reasoning about particular patients in unfolding rather than static cases, while keeping the patient and family experience and concerns relevant to clinical concerns and reasoning.

Clinical judgment or phronesis is required to evaluate and integrate techne and scientific evidence.

Within nursing, professional practice is wise and effective usually to the extent that the professional creates relational and communication contexts where clients/patients can be open and trusting. Effectiveness depends upon mutual influence between patient and practitioner, student and learner. This is another way in which clinical knowledge is dialogical and socially distributed. The following articulation of practical reasoning in nursing illustrates the social, dialogical nature of clinical reasoning and addresses the centrality of perception and understanding to good clinical reasoning, judgment and intervention.

## Data Analysis and Displays STEAM Video/Performance Task

**STEAM Video**

Fuel Economy

The fuel economy of a vehicle is a measure of the effciency of the vehicle’s engine. What are the benefits of using a car with high fuel economy?

Watch the STEAM Video “Fuel Economy.” Then answer the following questions.

1. Tory says that the footprint of a vehicle is the area of the rectangle formed by the wheel base and the track width. What is the footprint of a car with a wheel base of 106 inches and a track width of 61 inches?

2. The graph shows the relationship between the fuel economy and the footprint for four vehicles.

a. What happens to the fuel economy as the footprint increases?

b. Plot the point (50, 40) on the graph. What does this point represent? Does the point fit in with the other points? Explain.

Answer:

1.The footprint of a car = 6,466 sq inches.

Explanation:

In the above-given question,

Tory says that the footprint of a vehicle is the area of the rectangle formed by the wheelbase and the track width.

area of rectangle = length x width

Given that the footprint of a car = 106 inches.

width with 61 inches.

area = 106 x 61

footprint = 6,466 sq inches.

Answer:

2. a.The fuel economy increases when the footprint increases.

Explanation:

In the above-shown video,

tory says that whenever the footprint increases the fuel economy also increases.

whenever the footprint decreases the fuel economy decreases.

Answer:

2.b.The point (50, 40) represents the outlier.

Explanation:

In the above-given graph,

the point (50, 40) lies in the graph.

it represents the outlier of the graph.

**Performance Task**

Cost vs. Fuel Economy

After completing this chapter, you will be able to use the STEAM concepts you learned to answer the questions in the Video Performance Task. You will be given fuel economies and purchase prices of hybrid and non hybrid car models.

You will be asked to create graphs to compare car models. Why might you want to know the relationship between the fuel economy and the purchase price of a vehicle?

Answer:

The relationship between the fuel economy and the purchase price of a vehicle is proportional.

Explanation:

In the above-given figure,

Given that the city fuel Economy and the purchase price of the cars.

for car A (21.8, 24)

for car B(22.4, 22)

for car C(40.1, 18)

if the fuel economy increases the purchase price also increases.

whenever the economy decreases the purchase price also decreases.

### Data Analysis and Displays Getting Ready for Chapter 6

**Chapter Exploration**

1. Work with a partner. The table shows the number of absences and the final grade for each student in a sample.

a.Write the ordered pairs from the table. Then plot them in a coordinate plane.

b. Describe the relationship between absences and final grade.

c. MODELING A student has been absent6 days. Use the data to predict the student’s final grade. Explain how you found your answer.

Answer:

a. (0, 95), (3, 88), (2, 90), (5, 83), (7, 79), (9, 70), (4, 85), (1, 94), (10, 65), (8, 75).

b. the relationships between the absences and the final grade is decreasing when the absences increases.

c. The student’s final grade is 80.

Explanation:

a. From the above-given figure,

The ordered pairs are:

(0, 95), (3, 88), (2, 90), (5, 83), (7, 79), (9, 70), (4, 85), (1, 94), (10, 65), (8, 75).

B. whenever the final grade is decreasing the absences also decrease.

whenever the final grade increases the absence also increases.

c. Given that the student has been absent for 6 days.

The student’s final grade is 80.

2. Work with a partner. Match the data sets with the most appropriate scatter plot. Explain your reasoning.

a. month of birth and birth weight for infants at a day care

b. quiz score and test score of each student in a class

c. age and value of laptop computers

**Vocabulary**

The following vocabulary terms are defined in this chapter. Think about what each term might mean and record your thoughts.

scatter plot

two-way table

line of fit

joint frequency

Answer:

Scatter plot = A scatter plot uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point.

Two-way table = A two-way table is a way to display frequencies or relative frequencies for two categorical variables.

Line of fit = Line of fit refers to a line through a scatter plot of data points that best expresses the relationship between those points.

Joint frequency = Joint frequency is joining one variable from the row and one variable from the column.

Explanation:

Scatter plot = A scatter plot uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point.

Two-way table = A two-way table is a way to display frequencies or relative frequencies for two categorical variables.

Line of fit = Line of fit refers to a line through a scatter plot of data points that best expresses the relationship between those points.

Joint frequency = Joint frequency is joining one variable from the row and one variable from the column.

### Lesson 6.1 Scatter Plots

**EXPLORATION 1**

Work with a partner. The weights and circumferences of several sports balls are shown.

a. Represent the data in the coordinate plane. Explain your method.

b. Is there a relationship between the size and the weight of a sports ball? Explain your reasoning.

c. Is it reasonable to use the graph to predict the weights of the sports balls below? Explain your reasoning.

Kickball : circumference = 26 in.

Bowling ball : circumference = 27 in.

Answer:

a.(21, 30), (5, 9), (1.6, 5.3), (16, 28), (2, 8), (1.4, 7), (7, 12), (10, 26).

Explanation:

Answer:

b. The weight is measured in inches and size is measured in ounces.

Explanation:

In the above-given figure,

the size and the weight of the balls are given.

size and weight of basketball = (21, 30).

size and weight of baseball = (5, 9).

size and weight of golfball = (1.6, 5.3).

size and weight of soccerball = (16, 28).

size and weight of tennis = (2, 8).

size and weight of racquetball = (1.4, 7).

size and weight of softball = (7, 12).

size and weight of volleyball = (10, 26)

Answer:

c. No, it is not reasonable to use the graph.

Question 1.

Make a scatter plot of the data. Identify any outliers, gaps, or clusters.

Answer:

outliers = (120, 70)

gaps =(10, 62) to (45, 85)

clusters =(80, 95), (90, 97), (80, 91)

Explanation:

outliers =(120, 70)

gaps = (10, 62) to (45, 85)

clusters = (80, 95), (90, 97), (80, 91)

Question 2.

Describe the relationship between the data in Example 1.

Answer:

Linear relationship.

Explanation:

In the above-given graph,

the relationship used is a linear relationship.

**Self-Assessment for Concepts & Skills**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

Question 3. **SCATTER PLOT**

Make a scatter plot of the data. Identify any outliers, gaps, or clusters. Then describe the relationship between the data.

Answer:

outliers = (3,24)

clusters = 22 to 36

gaps = (4, 27), (8, 36)

Explanation:

outliers = (3,24)

clusters = 22 to 36

gaps = (4, 27), (8, 36)

Question 4. **WHICH ONE DOESN’T BELONG?**

Using the scatter plot, which point does not belong with the other three? Explain your reasoning.

Answer:

The point (3.5, 3) does not belong with the other three.

Explanation:

In the above-given figure

The points (1,8), (3, 6.5), and (8, 2) lies in the coordinate plane.

the point (3.5, 3) does not belong with the other three.

the point (3.5, 3) is an outlier. **Self-Assessment for Problem Solving**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

Question 5.

The table shows the high school and college grade point averages (GPAs) of 10 students. What college GPA do you expect for a high school student with a GPA of 2.7?

Answer:

The college GPA I expect for a high school student with a GPA of 2.7 is 2.45.

Explanation:

In the above-given points,

given that the college GPA for high school students.

college GPA for 2.4 = high school students of 2.6

so I am expecting the 2.45 for 2.7.

Question 6.

The scatter plot shows the ages of 12 people and the numbers of pets each person owns. Identify any outliers, gaps, or clusters. Then describe the relationship between the data.

Answer:

outliers = (40, 6)

clusters = (20, 2) to (70, 1)

gaps = (0, 30), (1, 35), (2, 50) and so on.

Explanation:

Given that,

the person’s age (years) in the x-axis.

a number of pets owned in the y-axis.

outliers = (40, 6)

clusters = (20, 2) to (70, 1)

gaps = (0, 30), (1, 35), (2, 50) and so on.

### Scatter Plots Homework & Practice 6.1

**Review & Refresh**

**Solve the system. Check your solution.**

Question 1.

y = – 5x + 1

y = – 5x – 2

Answer:

There is no solution for the given equation.

Explanation:

Given that y = – 5x + 1

y = – 5x – 2

so there is no solution for the given equation.

Question 2.

2x + 2y = 9

x = 4.5 – y

Explanation:

Given that,

2x + 2y = 9

x = 4.5 – y

2(4.5 – y) + 2y = 9

9 – 2y + 2y = 9

-2y and + 2y get cancelled on both sides.

9 = 9

Question 3.

y = – x

6x + y = 4

Explanation:

Given that y = -x

6x + y = 4

6x + (-x) = 4

6x – x = 4

5x = 4

x = (4/5)

Question 4.

When graphing a proportional relationship represented by y = mx, which point is not on the graph?

A. (0, 0)

B. (0, m)

C. (1, m)

D. (2, 2m)

Answer:

Point A is not on the graph.

Explanation:

In the above question,

given that the points are:

(0, 0)

(0, m)

(1, m)

(2, 2m)

the point (0, 0) is not in the graph.

**Concepts, Skills, &Problem Solving**

**USING A SCATTER PLOT** The table shows the average prices (in dollars) of jeans sold at different stores and the numbers of pairs of jeans sold at each store in one month. (See Exploration 1, p. 237.)

Question 5.

Represent the data in a coordinate plane.

Answer:

The points are (22, 152), (40, 94), (28, 134), (35, 110), and (46, 81)

Explanation:

In the above-given figure,

The points are (22, 152), (40, 94), (28, 134), (35, 110), and (46, 81)

Question 6.

Is there a relationship between the average price and the number sold? Explain your reasoning.

Answer:

The linear relationship.

Explanation:

In the above-given figure,

the relationship given is linear relationship.

**MAKING A SCATTER PLOT** Make a scatter plot of the data. Identify any outliers, gaps, or clusters.

Question 7.

Answer:

Outliers = (102, 63)

gaps = x from 40 to 44

clusters = 82 to 89

Explanation:

outliers = (102, 63)

gaps = x from 40 to 44

clusters = 82 to 89

Question 8.

Answer:

Outliers = (0, 5.5)

gaps = x from 4.5 to 5.5

clusters = 1.5 to 2.5

Explanation:

outliers = (0, 5.5)

gaps = x from 4.5 to 5.5

clusters = 1.5 to 2.5

**IDENTIFYING RELATIONSHIPS** Describe the relationship between the data. Identify any outliers, gaps, or clusters.

Question 9.

Answer:

Outliers = (15, 10)

gaps = from x = 15 to x = 25

clusters = 0

Negative linear relationship.

Explanation:

Outliers = (15, 10)

gaps = from x = 15 to x = 25

clusters = 0

There are no clusters.

Question 10.

Answer:

There are no clusters.

gaps = from x = 4 to x = 36

outliers.

Explanation:

In the above-given figure,

there are no clusters.

gaps = from x = 4 to x = 36

no outliers.

Question 11.

Answer:

There is no relationship.

there are no clusters.

no gaps.

no outliers.

Explanation:

In the above-given graph,

there are no clusters.

no gaps.

no clusters.

there is no relationship.

Question 12. **CRITICAL THINKING**

The table shows the average price per pound for honey at a store from 2014 to 2017. Describe the relationship between the data.

Answer:

The relationship is a positive linear relationship.

Explanation:

In the above-figure,

given points are:

(2014, $4.65), (2015, $5.90), (2016, $6.50), and (2017, $7.70)

so the above given is a positive linear relationship.

Question 13. **MODELING REAL LIFE**

The scatter plot shows the amount of rainfall and the amount of corn produced by a farm over the last 10 years. Describe the relationship between the amount of rainfall and the amount of corn produced.

Answer:

The relationship is a positive linear relationship.

Explanation:

In the above-given figure,

outliers = (49, 80)

clusters = from x = 190 to 220.

Question 14. **OPEN-ENDED**

Describe a set of real-life data that has a negative linear relationship.

Answer:

Question 15. **MODELING REAL LIFE**

The scatter plot shows the total earnings (wages and tips) of a food server during one day.

a. About how many hours must the server work to earn $70?

b. About how much does the server earn for 5 hours of work?

c. Describe the relationship shown by the data.

Answer:

a. 3.5 h

b. 85 $

c. positive linear relationship.

Explanation:

In the above-given graph,

given that,

a. the hours must server work to earn $70 = 3.5 h

b. The server earns for 5 hours of work = $ 85.

c. the relationship is shown by the data = positive linear relationship.

Question 16. **PROBLEM SOLVING**

The table shows the memory capacities (in gigabytes) and prices (in dollars) of tablet computers. (a) Make a scatter plot of the data. Then describe the relationship between the data. (b) Identify any outliers, gaps, or clusters. Explain why they might exist.

Answer:

Outliers =(16, 50)

gaps = 128 on x.

clusters = 64, 32, 64

Explanation:

Outliers =(16, 50)

gaps =128 on x.

clusters = 64, 32, 64.

Question 17. **PATTERNS**

The scatter plot shows the numbers of drifting scooters sold by a company.

a. In what year were1000 scooters sold?

b. About how many scooters were sold in 2015?

c. Describe the relationship shown by the data.

d. Assuming this trend continues, in what year are about 500 drifting scooters sold?

Answer:

a. 2014

b. about 950 scooters.

c. negative linear relationship.

d. 2019.

Explanation:

In the above-given figure,

Given that the number of vehicles sold in the year.

a. 2014

b. about 950 scooters.

c. negative linear relationship.

d. 2019

Question 18. **DIG DEEPER!**

Sales of sunglasses and beach towels at a store show a positive linear relationship in the summer. Does this mean that the sales of one item cause the sales of the other item to increase? Explain.

Explanation:

In the above-figure,

given that the sales of the sunglasses and beach towels at a store show a positive linear relationship.

yes the sales of one item cause the sales of the other item to increase.

### Lesson 6.2 Lines of Fit

**EXPLORATION 1**

Representing Data by a Linear Equation

Work with a partner. You have been working on a science project for 8 months. Each month, you measured the length of a baby alligator.

a. Use a scatter plot to draw a line that you think best describes the relationship between the data.

b. Write an equation for your line in part(a).

c. MODELING Use your equation in part(b) to predict the length of the baby alligator next September.

Answer:

a. The relation is a linear relationship.

Explanation:

Question 1.

The table shows the numbers of people who attend a festival over an eight-year period. (a) Make a scatter plot of the data and draw a line of fit. (b) Write an equation of the line of fit. (c) Interpret the slope and the y-intercept of the line of fit.

Answer:

The order pairs (1, 420), (2, 500), (3, 650), (4, 900), (5, 1100), (6, 1500), (7, 1750), (8, 2400)

Explanation:

Question 2.

Find an equation of the line of best ﬁt for the data in Example 1. Identify and interpret the correlation coefficient.

Answer:

**Self-Assessment for Concepts & Skills**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

Question 3. **FINDING A LINE OF FIT**

The table shows the numbers of days spent training and the race times for several people in a race.

a. Make a scatter plot of the data and draw a line of fit.

b. Write an equation of the line of fit.

c. Interpret the slope and the y-intercept of the line of fit.

Question 4. **IDENTIFYING RELATIONSHIPS**

Find an equation of the line of best fit for the data at the left. Identify and interpret the correlation coefficient

Answer:

**Self-Assessment for Problem Solving**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

Question 5.

The ordered pairs show amounts y (in inches) of rainfall equivalent x to inches of snow. About how many inches of rainfall are equivalent to 6 inches of snow? Justify your answer.

(16, 1.5) (12, 1.3) (18, 1.8) (15, 1.5) (20, 2.1) (23, 2.4)

Answer:

Question 6.

The table shows the heights (in feet) of a high jump bar and the number of people who successfully complete each jump. Identify and interpret the correlation coefficient.

Answer:

### Lines of Fit Homework & Practice 6.2

**Review & Refresh**

**Describe the relationship between the data. Identify any outliers, gaps, or clusters.**

Question 1.

Answer:

Negative linear relationship.

outliers = (6, 10)

clusters = 0

gaps = 0

Explanation:

In the above-given figure,

The relationship is negative linear relationship.

outliers = (6, 10)

cluster = 0

gaps = 0

there are no clusters and no gaps.

Question 2.

Question 3.

positive linear relationships.

outliers = 0

gaps = 0

clusters = x = 11 to x = 15

Explanation:

In the above-given figure,

given that

positive linear relationship.

outliers = 0

gaps = 0

clusters = x = 11 to x = 15

**Write the fraction as a decimal and a percent.**

Question 4.

(frac<29><100>)

Answer:

Decimal = 0.29

percent = 29 %

Explanation:

Given that

(29/100)

0.29

percent = 29%

decimal = 0.29

Answer:

Decimal = 0.28

percent = 28%

Explanation:

Given that

(7/25) = 0.28

decimal = 0.28

percent = 28

Answer:

Decimal = 0.7

percent = 0.007

Explanation:

Given that

(35/50) = 0.7

decimal = 0.7

percent = 0.007

**Concepts, Skills, &Problem Solving** **REPRESENTING DATA BY A LINEAR EQUATION** Use a scatter plot to draw a line that you think best describes the relationship between the data. (See Exploration 1, p. 243.)

Question 7.

Answer:

The points are (0,0), (1, 0.8), (2, 1.50), (3, 2.20), (4, 3.0), (5, 3.75)

Explanation:

In the above-given figure,

Given that :

the points are (0, 0), (1, 0.8), (2, 1.50), (3, 2.20), (4, 3.0), (5, 3.75)

The blue berries are in the x-axis.

weight is measured in pounds.

weight is shown in the y-axis.

Question 8.

Answer:

The given points are (0,91), (2, 82), (4, 74), (6, 65), (8, 55), (10, 43).

Explanation:

In the above-given figure,

Given that :

the points are (0, 91, (2, 82), (4, 74), (6, 65), (8, 55), (10, 43)

The Age is given on the x-axis.

value is measured in dollars.

value is given in the y-axis.

Question 9. **FINDING A LINE OF FIT**

The table shows the daily high temperatures (°F)and the numbers of hot chocolates sold at a coffee shop for eight randomly selected days.

a. Make a scatter plot of the data and draw a line of fit.

b.Write an equation of the line of fit.

c. Interpret the slope and the y-intercept of the line of fit.

Answer:

a.The given points are (30, 45), (36, 43), (44, 36), (51, 35), (60, 30), (68, 27), (75, 23), (82, 17).

b. y = -0.5x + 60

c. you could expect that 60 hot chocolates are sold when the temperature is 0 degree f, and the sales decrease by 1 hot chocolate for every 2 degrees f increase in temperature.

Explanation:

a.The given points are (30, 45), (36, 43), (44, 36), (51, 35), (60, 30), (68, 27), (75, 23), (82, 17).

b. y = -0.5x + 60

c. you could expect that 60 hot chocolates are sold when the temperature is 0 degree f, and the sales decrease by 1 hot chocolate for every 2 degrees f increase in temperature.

Question 10. **NUMBER SENSE**

Which correlation coefficient indicates a stronger relationship: – 0.98 or 0.91? Explain.

Answer:

0.91 indicates a stronger correlation coefficient.

Explanation:

In the above-given question,

-0.98 is a negative value and 0.91 is a positive value.

So 0.91 indicates a stronger correlation coefficient.

Question 11. **IDENTIFYING RELATIONSHIPS**

The table shows the admission costs (in dollars) and the average number of daily visitors at an amusement park each year for the past 8 years. Find an equation of the line of best fit. Identify and interpret the correlation coefficient.

Answer:

The equation for the line of best fit is Y = -4.9x + 1042

about -0.969.

strong negative correlation.

Explanation:

In the above-given figure,

The given points are (20, 940), (21, 935), (22, 940), (24, 925), (25, 920), (27, 905), (28, 910), and (30, 890)

The equation for the line of best fit is y = -4.9x + 1042.

about -0.969.

strong negative correlation.

Question 12. **REASONING**

The table shows the weights(in pounds) and the prescribed dosages (in milligrams) of medicine for six patients.

a. Find an equation of the line of best fit. Identify and interpret the correlation coefficient.

b. Interpret the slope of the line of best fit.

c. A patient who weighs 140 pounds is prescribed 135 milligrams of medicine. How does this affect the line of best fit?

Answer:

Question 13. **MODELING REAL LIFE**

The table shows the populations (in millions) and the numbers of electoral votes assigned for eight states in the 2016 presidential election.

a. Find an equation of the line of best fit. Identify and interpret the correlation coefficient.

b. Interpret the slope of the line of best fit.

c. Interpret the y-intercept of the line of best fit.

d. RESEARCH Research the Electoral College to explain the meaning of your answer in part(c).

Answer:

a. y = 1.3 x + 2 about 0.9995 strong positive correlation.

b. The number of electoral votes increases by 1.3 for every increase of 1 million people in the state.

c. A state with a population of 0 has 2 electoral votes.

d. The number of electoral votes a state has is based on the number of members that the state has in congress. Each state has 2 senators, plus a number of members of the House of Representatives based on its population. so, the y-intercept is 2 because a hypothetical state with no population would still have 2 senators.

Explanation:

a. y = 1.3 x + 2 about 0.9995 strong positive correlation.

b. The number of electoral votes increases by 1.3 for every increase of 1 million people in the state.

c. A state with a population of 0 has 2 electoral votes.

d. The number of electoral votes a state has is based on the number of members that the state has in congress. Each state has 2 senators, plus a number of members of the House of Representatives based on its population. so, the y-intercept is 2 because a hypothetical state with no population would still have 2 senators.

Question 14. **MODELING REAL LIFE**

The table shows the numbers (in millions) of active accounts for two social media websites over the past five years. Assuming this trend continues, how many active accounts will Website B have when Website A has 280 million active accounts? Justify your answer.

Question 15. **DIG DEEPER!**

The table shows the heights y(in feet) of a baseball x seconds after it was hit.

a. Predict the height after 5 seconds.

b. The actual height after 5 seconds is about 3 feet. Why might this be different from your prediction?

Answer:

a. 251 ft.

b. The height of the baseball is not linear.

Explanation:

a. The height after 5 seconds is 251 feet.

Given that the seconds on the x-axis and height on the y-axis.

the points are (0, 3), (0.5, 39), (1, 67), (1.5, 87), and (2, 99).

b. The actual height after 5 seconds is about 3 feet.

### Lesson 6.3 Two-Way Tables

**EXPLORATION 1**

Analyzing Data

Work with a partner. You are the manager of a sports shop. The table shows the numbers of soccer T-shirts that your shop has left in stock at the end of a soccer season.

a. Complete the table.

b. Are there any black-and-gold XL T-shirts in stock? Justify your answer.

c. The numbers of T-shirts you ordered at the beginning of the soccer season are shown below. Complete the table.

d. REASONING How would you alter the numbers of T-shirts you order for the next soccer season?

Answer:

Question 1.

How many students in the survey above studied for the test and failed?

Answer:

Question 2.

You randomly survey students in a cafeteria about their plans for a football game and a school dance. The two-way table shows the results. Find and interpret the marginal frequencies for the survey.

Answer:

Question 3.

You randomly survey students about whether they buy a school lunch or pack a lunch. The results are shown. Make a two-way table that includes the marginal frequencies.

Answer:

**Self-Assessment for Concepts & Skills**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

Question 4. **READING A TWO-WAY TABLE**

The results of a music survey are shown in the two-way table. How many students dislike both country and jazz? How many students like country but dislike jazz?

Answer:

Question 5. **MAKING A TWO-WAY TABLE**

You randomly survey students about their preference for a class field trip. The results are shown in the tally sheets. Make a two-way table that includes the marginal frequencies.

Answer:

**Self-Assessment for Problem Solving**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

Question 6.

The results of a voting survey are shown in the two-way table. For each age group, what percent of voters prefer Candidate A? Candidate B? Determine whether there is a relationship between age and candidate preference.

Answer:

Question 7.

You randomly survey 40 students about whether they play an instrument. You find that8 males play an instrument and 13 females do not play an instrument. A total of 17 students in the survey play an instrument. Make a two-way table that includes the marginal frequencies.

Answer:

Question 8.

Collect data from each student in your math class about whether they like math and whether they like science. Is there a relationship between liking math and liking science? Justify your answer.

Answer:

### Two-Way Tables Homework & Practice 6.3

**Review & Refresh**

**Find an equation of the line of best fit for the data.**

Question 1.

Answer:

The line y = 12.6x + 75.8 best fit for the data.

Explanation:

In the above-given figure,

Given that the points are (0,75), (1, 91), (2, 101), (3, 109) and (4, 129).

The line y = 12.6x + 75.8 is the best fit for the data.

Question 2.

Answer:

The vertices of a triangle are A (1, 2), B (3, 1), and C (1, – 1). Draw the figure and its image after the translation.

Question 3.

4 units left

Answer:

Question 4.

2 units down

Answer:

Question 5.

(x – 2, y + 3)

Answer:

**Concepts, Skills, &Problem Solving**

**ANALYZING DATA** In Exploration 1, determine how many of the indicated T-shirt are in stock at the end of the soccer season. (See Exploration 1, p. 249.)

Question 6.

black-and-white M

Answer:

4 T-shirts are in stock at the end of the soccer season.

Explanation:

In the above-given Exploration 1,

Given that The T-shirts are in stock.

4 T-shirts are in stock at the end of the soccer season.

Question 7.

blue-and-gold XXL

Explanation:

In the above-given Exploration 1,

Given that The T-shirts are in stock.

0 T-shirts are in stock at the end of the soccer season.

Question 8.

blue-and-white L

Explanation:

In the above-given Exploration 1,

Given that the T-shirts are in stock.

1 T-shirt is in stock at the end of the soccer season.

**READING A TWO-WAY TABLE** You randomly survey students about participating in a yearly fundraiser. The two-way table shows the results.

Question 9.

How many female students participateFundraiserin the fundraiser?

Answer:

51 students participate.

Explanation:

In the above-given table,

Given that male and female students are participated in the fundraiser.

so 51 female students participate.

Question 10.

How many male students do not participate in the fundraiser?

Answer:

30 male students do not participate.

Explanation:

In the above-given table,

Given that male and female students are participated in the fundraiser.

so 30 male students do not participate.

**FINDING MARGINAL FREQUENCIES** Find and interpret the marginal frequencies.

Question 11.

Answer:

71 students are juniors.

75 students are seniors.

93 students will attend the school play.

53 students will not attend the school play.

146 students were surveyed.

Explanation:

In the above-given table,

Given that students of the class participate in the school play.

71 students are juniors.

75 students are seniors.

93 students will attend the school play.

53 students will not attend the school play.

146 students were surveyed.

Question 12.

Answer:

The data plan of 78 people is limited for the cell phone company A.

The data plan of 94 people is limited for the cell phone company B.

The data plan of 175 people is unlimited for the cell phone company A.

The data plan of 135 people is unlimited for the cell phone company B.

482 people were surveyed.

Explanation:

In the above-given table,

The data plan of the cell phone company are given.

The data plan of 78 people is limited for the cell phone company A.

The data plan of 94 people is limited for the cell phone company B.

The data plan of 175 people is unlimited for the cell phone company A.

The data plan of 135 people is unlimited for the cell phone company B.

482 people were surveyed.

Question 13. **MAKING A TWO-WAY TABLE**

A researcher randomly surveys people with a medical condition about whether they received a treatment and whether their condition improved. The results are shown. Make a two-way table that includes the marginal frequencies.

Answer:

The people who improved with treatment = 34.

The people who did not improve with treatment = 10

The people who improved with no treatment = 12.

The people who did not improve with no treatment = 29

Totally are about 85 people.

Explanation:

The people who improved with treatment = 34.

The people who did not improve with treatment = 10

The people who improved with no treatment = 12.

The people who did not improve with no treatment = 29

Totally are about 85 people.

Question 14. **MODELING REAL LIFE**

You randomly survey students in your school about the color of their eyes. The results are shown in the tables.

a. Make a two-way table.

b. Find and interpret the marginal frequencies for the survey.

c. For each eye color, what percent of the students in the survey are male? female? Organize the results in a two-way table.

Answer:

Question 15. **REASONING**

Use the information from Exercise 14. For each gender, what percent of the students in the survey have green eyes? blue eyes? brown eyes? Organize the results in a two-way table.

Answer:

Question 16. **CRITICAL THINKING**

What percent of students in the survey in Exercise 14 are either female or have green eyes? What percent of students in the survey are males who do not have green eyes? Find and explain the sum of these two percents.

Answer:

Question 17. **MODELING REAL LIFE**

You randomly survey people in your neighborhood about whether they have at least $1000 in savings. The results are shown in the tally sheets. For each age group, what percent of the people have at least $1000 in savings? do not have at least $1000 in savings? Determine whether there is a relationship between age and having at least $1000 in savings.

Answer:

Question 18. **DIG DEEPER!**

The three-dimensional bar graph shows information about the numbers of hours students at a high school work at part-time jobs during the school year.

a. Make a two-way table that represents the data. Use estimation to find the entries in your table.

b. A newspaper article claims that more males than females drop out of high school to work full-time. Do the data support this claim? Explain your reasoning.

Answer:

### Lesson 6.4 Choosing a Data Display

**EXPLORATION 1**

Displaying Data

Work with a partner. Analyze and display each data set in a way that best describes the data. Explain your choice of display.

a. NEW ENGLAND ROADKILL A group of schools in New England participated in a two-month study. They reported 3962 dead animals.

Birds: 307

Mammals: 2746

AmphibiAnswer: 145

Reptiles: 75

Unknown: 689

b. BLACK BEAR ROADKILL The data below show the numbers of black bears killed on a state’s roads each year for 20 years.

c. RACCOON ROADKILL A one-week study along a four-mile section of road found the following weights (in pounds) of raccoons that had been killed by vehicles.

d. What can be done to minimize the number of animals killed by vehicles?

Answer:

**Choose an appropriate data display for the situation. Explain your reasoning.**

Question 1.

the population of the United States divided into age groups

Answer:

Question 2.

the number of students in your school who play basketball, football, soccer, or lacrosse

Answer:

**Tell whether the data display is appropriate for representing the data in Example 2. Explain your reasoning.**

Question 3.

dot plot

Answer:

Question 4.

circle graph

Answer:

Question 5.

stem-and-leaf plot

Answer:

Question 6.

Which bar graph is misleading? Explain.

Answer:

**Self-Assessment for Concepts & Skills**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

**CHOOSING A DATA DISPLAY** Choose an appropriate data display for the situation. Explain your reasoning.

Question 7.

the percent of band students playing each instrument

Answer:

Question 8.

a comparison of the amount of time spent using a tablet computer and the remaining battery life

Answer:

Question 9. **IDENTIFYING A MISLEADING DISPLAY**

Is the box-and-whisker plot misleading? Explain.

Answer:

**Self-Assessment for Problem Solving**

Solve each exercise. Then rate your understanding of the success criteria in your journal.

Question 10.

An employee at an animal shelter creates the histogram shown. A visitor concludes that the number of 7-year-old to 9-year-old dogs is triple the number of 1-year-old to 3-year-old dogs. Determine whether this conclusion is accurate. Explain.

Answer:

Question 11. **DIG DEEPER!**

A business manager creates the line graph shown. (a) How do the data appear to change over time? Explain why this conclusion may not be accurate. (b) Why might the business manager want to use this line graph?

Answer:

### Choosing a Data Display Homework & Practice 6.4

**Review & Refresh**

**You randomly survey students about whether they recycle. The two-way table shows the results.**

Question 1.

How many male students recycle? How many female students do not recycle?

Answer:

Question 2.

Find and interpret the marginal frequencies.

Answer:

**Find the slope and the y-intercept of the graph of the linear equation.**

Question 3.

y = 4x + 10

Answer:

Question 4.

y = – 3.5x – 2

Answer:

Question 5.

y – 8 = – x

Answer:

**Concepts, Skills, & Problem Solving**

Question 6. **DISPLAYING DATA**

Analyze and display the data in a way that best describes the data. Explain your choice of display. (See Exploration 1, p. 255.)

**CHOOSING A DATA DISPLAY** Choose an appropriate data display for the situation. Explain your reasoning.

Question 7.

a student’s test scores and how the scores are spread out

Answer:

stem and leaf plot shows how data is distributed.

Question 8.

the prices of different televisions and the numbers of televisions sold

Answer:

Question 9.

the outcome of rolling a number cube

Answer:

Question 10.

the distance a person drives each month

Answer:

Question 11. **IDENTIFYING AN APPROPRIATE DISPLAY**

A survey asked 800 students to choose their favorite school subject. The results are shown in the table. Tell whether each data display is appropriate for representing the portion of students who prefer math. Explain your reasoning.

Answer:

Question 12. **IDENTIFYING AN APPROPRIATE DISPLAY**

The table shows how many hours you worked as a lifeguard from May to August. Tell whether each data display is appropriate for representing how the number of hours worked changed during the 4 months. Explain your reasoning.

Answer:

Question 13. **WRITING**

When should you use a histogram instead of a bar graph to display data? Use an example to support your answer.

Answer:

**IDENTIFYING MISLEADING DISPLAYS** Which data display is misleading? Explain.

Question 14.

Answer:

Question 15.

Answer:

Question 16. **REASONING**

What type of data display is appropriate for showing the mode of a data set?

Answer:

Question 17. **CRITICAL THINKING**

The director of a music festival creates the data display shown. A customer concludes that the ticket price for Group C is more than double the ticket price for Group A. Determine whether this conclusion is accurate. Explain.

Answer:

Question 18. **PATTERNS**

A scientist gathers data about a decaying chemical compound and creates the scatter plot shown.

a.The scientist concludes that there is a negative linear relationship between the data. Determine whether this conclusion is accurate. Explain.

b. Estimate the amount of the compound remaining after 1 hour, 3 hours, 5 hours, and 7 hours.

Answer:

Question 19. **REASONING**

A survey asks 100 students to choose their favorite sports. The results are shown in the circle graph.

a. Explain why the graph is misleading.

b. What type of data display is more appropriate for the data? Explain.

Answer:

Question 20. **STRUCTURE**

With the help of computers, mathematicians have computed and analyzed trillions of digits of the irrational number π. One of the things they analyze is the frequency of each of the numbers 0 through 9. The table shows the frequency of each number in the ﬁrst 100,000 digits of π.

a. Display the data in a bar graph.

b. Display the data in a circle graph.

c. Which data display is more appropriate? Explain.

d. Describe the distribution.

Answer:

### Data Analysis and Displays Connecting Concepts

**Using the Problem-Solving Plan**

Question 1.

You randomly survey middle school students about whether they prefer action, comedy, or animation movies. The two-way table shows the results. Estimate the probability that a randomly selected middle school student prefers action movies.

Understand the problem.

You know the results of a survey about movie preference. You are asked to estimate the probability that a randomly selected middle school student prefers action movies.

Make a plan.

Find the marginal frequencies for the data. Then use the marginal frequencies to find the probability that a randomly selected middle school student prefers action movies.

Solve and check.

Use the plan to solve the problem. Then check your solution.

Answer:

Question 2.

An equation of the line of best fit for a data set is y = – 0.68x + 2.35. Describe what happens to the slope and the y-intercept of the line when each y-value in the data set increases by 7.

Answer:

Question 3.

On a school field trip, there must be 1 adult chaperone for every 16 students. There are 8 adults who are willing to be a chaperone for the trip, but only the number of chaperones that are necessary will attend. Ina class of 124 students, 80 attend the trip. Make a two-way table that represents the data.

Answer:

**Performance Task**

Cost vs. Fuel Economy

At the beginning of this chapter, you watched a STEAM Video called “Fuel Economy.” You are now ready to complete the performance task related to this video, available at BigIdeasMath.com. Be sure to use the problem-solving plan as you work through the performance task.

### Data Analysis and Displays Chapter Review

**Review Vocabulary**

Write the definition and give an example of each vocabulary term.

**Graphic Organizers**

You can use Information Frame an to help organize and remember a concept. Here is an example of an Information Frame for scatter plots.

Choose and complete a graphic organizer to help you study the concept.

1. lines of fit

2. two-way tables

3. data displays

**Chapter Self-Assessment**

As you complete the exercises, use the scale below to rate your understanding of the success criteria in your journal.

**6.1 Scatter Plots (pp. 237–242)**

Learning Target: Use scatter plots to describe patterns and relationships between two quantities.

Question 1.

Make a scatter plot of the data. Identify any outliers, gaps, or clusters.

Answer:

**Describe the relationship between the data. Identify any outliers, gaps, or clusters.**

Question 2.

Answer:

Question 3.

Answer:

Question 4.

Answer:

Question 5.

Your school is ordering custom T-shirts. The scatter plot shows the numbers of T-shirts ordered and the cost per shirt. Describe the relationship between the numbers of T-shirts ordered and the cost per T-shirt.

Answer:

Question 6.

Describe a set of real-life data that has each relationship.

a. positive linear relationship

b. no relationship

Answer:

Question 7.

The table shows the numbers of hours a waitress works and the amounts she earns in tips. How many hours do you expect the waitress to work when she earns $42 in tips?

Answer:

**6.2 Lines of Fit (pp. 243–248)**

Learning Target: Use lines of fit to model data.

Question 8.

The table shows the numbers of students at a middle school over a 10-year period.

a. Make a scatter plot of the data and draw a line of fit.

b. Write an equation of the line of fit.

c. Interpret the slope and the y-intercept of the line of fit.

d. Predict the number of students in year 11.

Answer:

Question 9.

Find an equation of the line of best fit for the data in Exercise 8. Identify and interpret the correlation coefficient.

Answer:

Question 10.

The table shows the revenue (in millions of dollars) for a company over an eight-year period. Assuming this trend continues, how much revenue will there be in year 9?

Answer:

**6.3 TwoWay Tables (pp. 249–254)**

Learning Target: Use two-way tables to represent data. You randomly survey students about participating in the science fair. The two-way table shows the results.

Question 11.

How many male students participate in the science fair?

Answer:

Question 12.

How many female students do not participate in the science fair?

Answer:

Question 13.

You randomly survey students in your school about whether they liked a recent school play. The two-way table shows the results. Find and interpret the marginal frequencies.

Answer:

You randomly survey people at a mall about whether they like the new food court. The results are shown.

Question 14.

Make a two-way table that includes the marginal frequencies.

Answer:

Question 15.

For each group, what percent of the people surveyed like the food court? dislike the food court? Organize your results in a two-way table.

Answer:

Question 16.

Does your table in Exercise 15 show a relationship between age and whether people like the food court?

Answer:

**6.4 Choosing a Data Display (pp. 255–262)**

Learning Target: Use appropriate data displays to represent situations.

Choose an appropriate data display for the situation. Explain your reasoning.

Question 17.

the numbers of pairs of shoes sold by a store each week

Answer:

Question 18.

the percent of votes that each candidate received in an election.

Answer:

Question 19.

Bird banding is attaching a tag to a bird’s wing or leg to track the movement of the bird. This provides information about the bird’s migration patterns and feeding behaviors. The table shows the numbers of robins banded in Pennsylvania over 5 years. Tell whether each data display is appropriate for representing how the number of bandings changed during the 5 years. Explain your reasoning.

Answer:

Question 20.

Give an example of a bar graph that is misleading. Explain your reasoning.

Answer:

Question 21.

Give an example of a situation where a dot plot is an appropriate data display. Explain your reasoning.

Answer:

### Data Analysis and Displays Practice Test

Question 1.

The graph shows the population (in millions) of the United States from 1960 to 2010.

a. In what year was the population of the United States about 180 million?

b. What was the approximate population of the United States in 1990?

c. Describe the relationship shown by the data.

Answer:

Question 2.

The table shows the weight of a baby over several months.

a. Make a scatter plot of the data and draw a line of fit.

b. Write an equation of the line of fit.

c. Interpret the slope and the y-intercept of the line of fit.

Answer:

Question 3.

You randomly survey students at your school about what type of books they like to read. The two-way table shows your results. Find and interpret the marginal frequencies.

Answer:

Choose an appropriate data display for the situation. Explain your reasoning.

Question 4.

magazine sales grouped by price range

Answer:

Question 5.

the distance a person hikes each week

Answer:

Question 6.

The table shows the numbers of AP exams (in thousands) taken from 2012 to 2016, where x = 12 represents the year 2012. Find an equation of the line of best fit. Identify and interpret the correlation coefficient.

Answer:

Question 7.

You randomly survey shoppers at a supermarket about whether they use reusable bags. Of 60 male shoppers,15 use reusable bags. Of 110 female shoppers,60 use reusable bags. Organize your results in a two-way table. Include the marginal frequencies. Estimate the probability that a randomly selected male shopper uses reusable bags.

Answer:

### Data Analysis and Displays Cumulative Practice

Question 1.

What is the solution of the system of linear equations?

y = 2x – 1

y = 3x + 5

A. ( 13, 6)

B. (- 6, – 13)

C. (- 13, 6)

D. (- 6, 13)

Answer:

Question 2.

The diagram shows parallel lines cut by a transversal. Which angle is the corresponding angle for ∠6 ?

F. ∠2

G. ∠3

H. ∠4

I. ∠8

Answer:

Question 3.

You randomly survey students in your school. You ask whether they have jobs. You display your results in the two-way table. How many male students do not have a job?

Answer:

Question 4.

Which scatter plot shows a negative relationship between x and y?

Answer:

Question 5.

A system of two linear equations has no solution. What can you conclude about the graphs of the two equations?

F. The lines have the same slope and the same y-intercept.

G. The lines have the same slope and different y-intercepts.

H. The lines have different slopes and the same y-intercept.

I. The lines have different slopes and different y-intercepts.

Answer:

Question 6.

What is the solution of the equation?

0.22(x + 6) = 0.2x + 1.8

A. x = 2.4

B. x = 15.6

C. x = 24

D. x = 156

Answer:

Question 7.

A person who is 5(frac<1><2>) feet tall casts a 3(frac<1><2>) -foot-long shadow. A nearby ﬂagpole casts a 28-foot-long shadow. What is the height (in feet) of the flag pole?

Answer:

Question 8.

A store records total sales (in dollars) each month for three years. Which type of graph can best show how sales increase over this time period?

F. circle graph

G. line graph

H. histogram

I. stem-and-leaf plot

Answer:

Question 9.

Trapezoid KLMN is graphed in the coordinate plane shown.

Rotate Trapezoid 90° clockwise about the origin. What are the M’, coordinates of point, the image of point M after the rotation?

A. (- 3, – 2)

B. (- 2, – 3)

C. (- 2, 3)

D. (3, 2)

Answer:

Question 10.

The table shows the numbers of hours students spent watching television from Monday through Friday for one week and their scores on a test that Friday.

Part A Make a scatter plot of the data.

Part B Describe the relationship between the hours of television watched and the test scores.

Part C Explain how to justify your answer in PartB using the linear regression feature of a graphing calculator.

Answer:

Get the free access to Download Big Ideas Math Answers Grade 8 Chapter 6 Data Analysis and Displays from here. All the solutions are prepared in a simple manner. Test yourself by answering the questions given at the end of the chapter. Keep in touch with us to get the Solutions of all Big Ideas Math Grade 8 Chapters.

## Data Sufficiency - Solved Examples

Following question has a question and the points characterised as I and II. You have to decide whether the evidence provided in the points are adequate to answer the question. Read both the points and give your answer.

Q 1 &minus In a state library, 10% of the books are added every year. What was the number of books that the library had in 1994?

I. During 1996, the library had 1, 00,000 books.

II. During 1995, 10,000 books were added.

Both I and II individually are adequate to answer the question. Hence option C is the answer.

Following question has a question and the points characterised as I and II. You have to decide whether the evidence provided in the points are adequate to answer the question. Read both the points and give your answer.

Q 2 &minus Ravi Yadav scored an aggregate of 80 marks in English, mathematics and computer. How much did he get in mathematics?

I. His aggregate in English and computer is 45.

II. He got 40 marks in computer.

From point I we can get the marks in mathematics by subtracting total marks of all three subjects to the total marks in two subjects. But from II we can’t get any answer. Hence option A is correct.

Following question has a question and the points characterised as I and II. You have to decide whether the evidence provided in the points are adequate to answer the question. Read both the points and give your answer.

Q 3 &minus The summation of ages of O, M, and N is 50 years. What is N’s age?

II. N is 10 years bigger than M.

Both I and II are necessary to answer the question. By subtracting O’s age 30 years to 50, we get 20 years. Then from II comparing N’s age and M’s age we can get the answer. Hence option E is correct.

Q 4 &minus Ravish, Anoop and Sandeep’s wage is in the scale 4:5:7, respectively. How much is Anoop’s wage?

I. The difference between Anoop and Sandeep’s wage is double that of Ravish and Anoop.

II. Anoop gets 4000 less than that of Sandeep.

By subtracting Anoop’s wage and calculating with the scale given we can get the answer. Hence option B is the correct answer.

Q 5 &minus What is the difference in the ages of P and L?

I. P is 20 years bigger than M.

II. M is 2 years lesser than Z.

Detail given in I and II are not adequate to answer the questions.

Q 6 &minus D is the sister of C. How is D associated with A?

Details in both the points are necessary to get the answer. By using both the points we can get the relationship between D and A.

Q 7 &minus In a certain coding system, 146 equals adopt good habits. What is the coding for habit in that system?

I. 473 equals like good pictures.

II. 826 equals passion becomes habit.

Point II individually is adequate to answer the question because by comparing the question and the point II, we can get the coding for habit. Hence option B is the correct answer.

Q 8 &minus P, B, C, D and X are positioned in a line. What is the location of B from the left hand end?

I. X is to the left hand of B.

II. P is positioned at one end second right hand of D who is the next neighbour of C and B.

Detail in both the points are necessary to get the answer. Hence option E is the correct answer.

Q 9 &minus How INDIA will be coded? Find out from the points given below.

I. If SALTY is coded as ASLYT.

II. If MANGO is coded as AMNOG.

Either I or II are adequate to answer the question. INDIA will be coded as NIDAI.

Q 10 &minus How many students are there in the class?

I. Dilip is 10 th from right hand and Jagdish is 14 th from left hand.

II. After interchanging their locations, Dilip becomes 27 th from right hand.

Both the points are required to get the answer. The total number of students is 27 &plus 14 – 1 = 40. Hence option E is correct.

## A Data & Reasoning Fabric to Enable Advanced Air Mobility

A Data & Reasoning Fabric (DRF) is envisioned to enable the full potential of advanced air mobility by providing all data and reasoning where they are needed. The DRF marketplace is based on an open foundational ecosystem of data and reasoning exchange between the many systems that must seamlessly interplay to manage the complex and dense airspace operations required to achieve advanced air mobility goals. DRF activities will identify, test and - as needed - research and develop critical core technologies, and collaboratively test these technologies, open standards and architectures, and an integrated framework with end-users so as to deliver reference designs and development environments that catalyze broad private and public sector buy-in and self-sustaining development of DRF and associated standards. This paper discusses some of the critical characteristics of establishing and maintaining DRF and addresses how NASA may be a significant contributor to that effort.

## THE CASE

We can start to explore a typical money laundering pattern based on the **concealment of the ultimate beneficial owner** of an asset.

In this case, a person who is issuing a loan request from a bank of which he or she is the ultimate beneficial owner may intend to launder unclean money via the bank. The **ultimate beneficial owner** is the entity that truly controls an asset. And at the same time, we need to specify **all the possible patterns** to conceal the ultimate beneficial owner of an asset, in this case, Acme Bank.

But how can we express the meaning of company control? And how can I generalize all possible paths of control by an individual or another company with ‘something’ a computer can run in a reasonable amount of time?

This is a set of 5 rules written in Vadalog, a language of Datalog± family that extends Datalog with many useful features such as existential quantification, aggregations, stratified negation, Boolean conditions, mathematical expressions, probabilistic reasoning, embedded functions, and arbitrary machine learning models while guaranteeing scalability thanks to PTIME data complexity for the reasoning task. [4]

With this set of rules, we can easily describe the concept of control of a company.

Let us describe the concept of company control via a set of Datalog rules as follows:

Rule 1 is the reflexive property for the predicate ‘control’. In general, a company (or a person or a family) x **controls** a company y, if:

- (Rule 2) x directly owns more than 50% of y
- or, (Rule 3) x controls a set of companies that jointly (i.e., summing the share amounts), and possibly together with x, own more than 50% of y. [2,3]

We can also assume that the CEO of a company has full control over it (Rule 4). This is of course a simplification but applies to this case. In Rule 5 we see the aggregation function that accumulates, summing them, direct and indirect ownerships along all possible ownership paths.

With 5 lines of Datalog, we can test thousands of path control among millions of companies in an AML- KG in minutes if we run the reasoning process on state-of-the-art cloud machines and with the Vadalog System. Instead of trying to find plausible paths via queries or with ad-hoc programs or algorithms. Also consider that expressing unknown navigation patterns in the graph is not trivial and involves resorting to sophisticated devices such as recursion, beyond the reach of standard programming skills of the analysts.

Let’s go deeper into the activation of these simple five rules on the FIU data!

This is the partial result of the AML-KG of the reasoning process combining the IDBs and EDBs of the KG. In black solid, the edges already present of the EDB that represent the ownership levels between companies, as well as the link isCeoAt. While the dotted green Control edge between My bank and Acme Bank has been inferred by the reasoning! So this green link belongs to the derived EDB part, the reasoning part, inferred through the application of the rules.

For now, we discovered that our bandit does not control Acme Bank. We only know that My Bank controls Acme Bank.

Now, after having tested a very common pattern of money laundering that is hiding the beneficial ultimate owner of an asset, let’s go further.

Sometimes criminals, especially in organized crime, try to conceal the control of an asset through their affiliates, often even family members or relatives, as usual within Mafia families.

So let’s add some more rules to spot this kind of relationship.

The goal of this other group of rules is to cluster individuals into families that can be real families or just criminal affiliates in a broader sense. In particular, Rule 1 contains a specialized **machine learning model** for link prediction, denoted by the **#sim** embedded function. It returns a **score p** measuring how likely the two individuals i_1 and 1_2 are spouses. Observe that the “::” symbol deviates from the standard Datalog syntax and denotes a kind of ‘rule probability’. In particular, Rule 1 yields spouses facts with a probability depending on p.

Rule 3 states that every individual belongs to a family, his/her own, and Rule 4 merges families f_1and f_2 whenever they contain two spouses, i_1 and i_2. Similar rules could merge families having individuals with different kinds of relationships. The overall effect is **clustering the person’s space**.

Then, we can link the first group of rules with the second group in Rule 5 where we can **aggregate ownership amounts** from different family members.

This is what we can finally reveal, using the reasoning on the available data:

Applying the second group of rules we find out the family members of ‘The bad guy’, in particular his spouse P1. The family also contains P2, P3, and potentially more people. Knowing the family members, we can determine the overall relationship of the family f with Acme Bank. To this aim, Rules 5 aggregates ownership amounts originating from different family members that together possibly control the asset with all the different contributions.

We can finally conclude that ‘The bad guy’ does not control Acme Bank **but** he is concealing the control of Acme Bank through his MAFIA family. P2 directly owns 0.34 of My Bank and P1 indirectly owns the 0.21% of My Bank deriving by 1%*0.93%*0.23%. In total, family f controls My Bank owning the 0.55% of the shares. My Bank, in turn, controls Acme Bank holding with 0.52% of the shares via a **pyramidal shareholding structure**, probably set up to obfuscate the connection between the two companies.

Family f controls Acme Bank and ‘The bad guy’ was trying to conceal the control of Acme Bank through his family. So, the trigger of the case, the initial STR containing only the transaction in which ‘The bad guy’ asks for a loan to Acme Bank, the bank that he indirectly controls, is **probably** an attempt to launder money by justifying unclean money with a fake loan. The overall confidence in this conclusion depends on the certainty in the existence of the personal relationship, the output of a link prediction model, as well as on the intrinsic reliability of the money laundering pattern.

Remember that the goal here is deciding on the suspiciousness of this STR and as a consequence, assessing a score of the suspiciousness. To settle this score, we can use this rule:

This rule tells us our individual is not literally the ultimate beneficial owner of Acme Bank BUT his family as a whole **is**. Moreover, as we have seen, the ‘w’ in the left-hand side of the rule, controls the bias towards activating the rule. It is in some sense a measure of the importance of the rule and, consequently, controls the likelihood of the suspicious.

Here is the full set of 11 rules used for the explanation of this case:

Cuemath, a student-friendly mathematics and coding platform, conducts regular Online Live Classes for academics and skill-development, and their Mental Math App, on both iOS and Android, is a one-stop solution for kids to develop multiple skills. Understand the Cuemath fee structure and sign up for a free trial.

### What is a fallacy in mathematical reasoning?

Fallacy refers to errors in hypotheses caused due to logical inaccuracy.

### Why is mathematical reasoning important?

Students have the potential to solve higher-order thinking questions which are frequently asked in competitive examinations. But a lack of mathematical reasoning skills may render their potential. Encouragement is needed to develop a student's natural inclination to strive for purpose and meaning.

The reasoning is the most fundamental and essential tool of mathematics. It helps one understand and justify mathematical theorems. A good grip in reasoning will help students apply the concepts they learn in the classroom.

### What are the two types of fallacy?

The two types of fallacies are as follows:

Formal fallacy: When the relationship between premises and conclusion is not valid or when premises are unsound, Formal fallacies are created.

Informal Fallacy: Misuse of language and evidence is classified as an Informal fallacy.