# COMPLETE IDIOTS GUIDE STATISTICS PDF

THE COMPLETE IDIOT'S GUIDE TO and Design are registered trademarks of Penguin Group (USA) Inc. ISBN: X Library of Congress Catalog. Philosophy continually asks questions about these issues: What exists? To some, philosophy may The Complete Idiot''s Guide to Algebra - Yola. Statistics, statistics everywhere, but not a single word can we understand! .. THE COMPLETE IDIOT'S GUIDE TO and Design are registered.

Author: | HESTER LANDEN |

Language: | English, Spanish, Arabic |

Country: | Japan |

Genre: | Religion |

Pages: | 188 |

Published (Last): | 23.04.2016 |

ISBN: | 871-3-56612-991-1 |

ePub File Size: | 17.57 MB |

PDF File Size: | 13.26 MB |

Distribution: | Free* [*Regsitration Required] |

Downloads: | 42149 |

Uploaded by: | BESS |

Not a numbers person? No problem! - Selection from The Complete Idiot's Guide to Statistics, 2nd Edition [Book]. And, guess what, that is sometimes how statisticians are born! xviii The Complete Idiot's Guide to Statistics, Seco nd Edition How This Book Is Organized The. Complete Idiot's Guide Statistics 2nd. byComplete Idiot's Guide Statistics 2nd. Topics Complete Idiot's Guide Statistics 2nd. Collection.

View table of contents. Start reading. Book Description Not a numbers person? No problem! Who Thought of This Stuff? Measures of Central Tendency Inferential statistics, because it would not be feasible to survey every Asian American household in the country. These results would be based on a sample of the population and used to make an inference on the entire population.

Inferential statistics, because it would not be feasible to survey every household in the country. Descriptive statistics, because the average SAT score would be based on the entire population, which is the incoming freshman class.

U Interval data has all the properties of ordinal data with the additional capability of calculating meaningful differences between the observations. U Ratio data has all the properties of interval data with the additional capability of expressing one observation as a multiple of another. In its basic form, making sense of the patterns in the data can be very difficult because our human brains are not very efficient at processing long lists of raw numbers.

We do a much better job of absorbing data when it is presented in summarized form through tables and graphs.

In the next several sections, we will examine many ways to present data so that it will be more useful to the person performing the analysis. Through these techniques, we are able to get a better overview of what the data is telling us. And believe me, there is plenty of data out there with some very interesting stories to tell.

Stay tuned. The best way to describe a frequency distribution is to start with an example. Anyway, below is a table of the batting averages of individual Pirates at the end of the season. I have not attached names with these averages in order to protect their identities.

A frequency distribution is a table that shows the number of data observations that fall into specific intervals. Transforming this data into the frequency distribution shown here makes this fact more obvious.

Batting Average Number of Players. In this example, the intervals are the batting average ranges in the first column of the table. The intervals in a frequency distribution are officially known as classes, and the number of observations in each class is known as class frequencies. A very confusing phone bill that requires a Ph.

Using this data, I have constructed the following frequency distribution. From classes of equal size. I chose 3 data values to be in each class for this distribution. An example of a class is 0—2, which includes the number of days when John made 0, 1, or 2 calls. Make classes mutually exclusive, or in other words, prevent classes from overlapping.

Try to have no fewer than 5 classes and no more than 15 classes. Too few or too many classes tend to hide the true characteristics of the frequency distribution. Avoid open-ended classes, if possible for instance, a highest class of 15—over.

Classes are considered mutually exclusive when observations can only fall into one class. Include all data values from the original table in a class. In other words, the classes should be exhaustive. Too few or too many classes will obscure patterns in a frequency distribution.

Consider the extreme case where there are so many classes that no class has more than one observation. The other extreme is where there is only one class with all the observations residing in that class. This would be a pretty useless frequency distribution! Rather than display the number of observations in each class, this method calculates the percentage of observations in each class by dividing the frequency of each class by the total number of observations.

Relative frequency distributions display the percentage of observations of each class relative to the total number of observations. The total percentage in a relative frequency distribution should be percent or very close within 1 percent, because of rounding errors.

Get it? Cousin, relative? This provides you with the percentage of observations that are less than or equal to the class of interest. The resulting cumulative frequency distribution is shown here. Cumulative frequency distributions indicate the percentage of observations that are less than or equal to the current class. According to this table, John used his phone 8 times or less on 84 percent of the days in the month. If designed properly, frequency distribution is an excellent way to tease good information out of stubborn data.

Figure 3. A histogram is a bar graph showing the number of observations in each class as the height of each bar. At least the highest class on the graph is the 0 to 2 calls per day.

Things could be worse. How nice! The first thing we need to do is open Excel to a blank sheet and enter our data in Column A starting in Cell A1 use the data from the earlier table. Next enter the upper limits to each class in Column B starting in Cell B1.

**Related Post:**

*ANTIPATTERNS THE SURVIVAL GUIDE PDF*

For example, in the class 0—2, the upper limit would be 2. Random Thoughts For some bizarre reason, Excel refers to classes as bins. Go figure.

Go to the Tools menu at the top of the Excel window and select Data Analysis. Select the Histogram option from the list of Analysis Tools see Figure 3. Data Analysis dialog box. In the Histogram dialog box as shown in Figure 3. Then, click in the Bin Range list box and click in the worksheet to select cells B1 through B6 the upper limits for the 6 classes.

Click OK to generate the frequency distribution and histogram see Figure 3.

The problem here is that the histogram looks like an elephant sat on it. Click on the chart to select it, and then click on the bottom border to drag the bottom of the chart down lower, expanding the histogram to look like Figure 3.

Frequency distributions and histograms are convenient ways to get an accurate picture of what your data is trying to tell you. The Chart Wizard allows me more control over the final appearance. A statistician named John Tukey originated the idea during the s.

## The Complete Idiot's Guide to Statistics, 2nd Edition

The major benefit of this approach is that all the original data points are visible on the display. Normally, Brian would only report his better scores, but we statisticians must be unbiased and accurate. Because there were 5 scores in the 70s, there are 5 digits to the right of 7.

If we choose to, we can break this display down further by adding more stems.

The stem and leaf display splits the data values into stems the first digit in the value and leaves the remaining digits in the value. By listing all of the leaves to the right of each stem, we can graphically describe how the data is distributed.

Here, the stem labeled 7 5 stores all the scores between 75 and The stem 8 0 stores all the scores between 80 and Brian usually scores in the low 80s. You can find an excellent source of information about stem and leaf displays at the Statistics Canada website at www.

This type of chart is simply a circle divided into portions whose area is equal to the relative frequency distribution. If you cannot use colors, use patterns and textures to display pie charts.

As you can see, the pie chart approach is much easier on the eye when compared to looking at data from a table. This person must be a pretty good statistics teacher! To construct a pie chart by hand, you first need to calculate the center angle for each slice in the pie, which is illustrated in Figure 3.

You determine the center angle of each slice by multiplying the relative frequency of the class by which is the number of degrees in a circle. These results are shown in the following table. To demonstrate this type of chart see Figure 3. An unnamed filing cabinet. The histogram that we visited earlier in the chapter is actually a special type of bar chart that plots frequencies rather than actual data values.

Bar charts are more useful when you want to highlight the actual data values. Our current resident teenagers seem to have a costly compulsion to take very long, very hot showers, and sometimes more than once a day. As I lie awake at night listening to the constant stream of hot water, all I can envision are dollar bills flowing down the drain. So I have tabulated some data, which shows the number of showers the cleanest kids on the block have taken in each of the recent months with the corresponding utility bill.

Notice that at these rates we average more than one shower per day. Because the line connecting the data points seems to have an overall upward trend, my suspicions hold true.

It seems the more showers our waterlogged darlings take, the higher the utility bill. Line charts prove very useful when you are interested in exploring patterns between two different types of data.

They are also helpful when you have many data points and want to show all of them on one graph. Now that you have mastered the art of displaying descriptive statistics, you are ready to move on to calculating them in the next chapter. The following table represents the exam grades from 36 students from a certain class that I might have taught.

Construct a frequency distribution with 9 classes ranging from 56 to Construct a histogram using the solution from Problem 1. Construct a relative and a cumulative frequency distribution from the data in Problem 1. Construct a pie chart from the solution to Problem 1. Construct a stem and leaf diagram from the data in Problem 1 using one stem for the scores in the 50s, 60s, 70s, 80s, and 90s. Construct a stem and leaf diagram from the data in Problem 1 using two stems for the scores in the 50s, 60s, 70s, 80s, and 90s.

U Histograms provide a graphical overview of data from frequency distributions. U Pie, bar, and line charts are effective ways to present data in different graphical forms. With that task behind us, we can now proceed to the next step— summarizing our data numerically. If these are not calculated with loving care, our final analysis could be misleading. And as everybody knows, statisticians never want to be misleading. So this chapter focuses on how to calculate descriptive statistics manually and, if you choose, how to verify these results with our good friend Excel.

This is the first chapter that uses mathematical formulas that have all those funnylooking Greek symbols that can make you break out into a cold sweat. But have no fear. We will slay these demons one by one through careful explanation and, in the end, victory will be ours. The first, measures of central tendency, describes the center point of our data set with a single value.

The second category, measures of dispersion, is the topic of Chapter 5. Measures of central tendency describe the center point of a data set with a single value. Measures of dispersion describe how far individual data values have strayed from the mean.

The formula for the sample mean is as follows: As in many teenage households, video games are a common form of entertainment in our family room.

Turns out the delay was really between my brain and my fingers. Anyway, the following data set represents the number of hours each week that video games are played in our household. A weighted mean refers to a mean that needs to go on a diet. Just kidding; I was checking to see whether you were paying attention.

A weighted mean allows you to assign more weight to certain values and less weight to others. Type Score Weight Percent Exam Project Homework 94 89 83 50 35 15 We can calculate your final grade using the following formula for a weighted average. Note that here we are dividing by the sum of the weights rather than by the number of data values. You earned an A! The weights in a weighted average do not need to add to 1 as in the previous example.

I would calculate this by: You can actually calculate the mean of grouped data from a frequency distribution. After we have determined the midpoint for each class, we can calculate the mean of the frequency distribution using the following equation—which is basically a weighted average formula: Wrong Number The mean of a frequency distribution where data is grouped into classes is only an approximation to the mean of the original data set from which it was derived.

This is true because we make the assumption that the original data values are at the midpoint of each class, which is not necessarily the case. The true mean of the 30 original data values in the cell phone example is only 4. What is the average daily demand during the past 65 days? The median is the value in the data set for which half the observations are higher and half the observations are lower.

We find the median by arranging the data values in ascending order and identifying the halfway point. In this case, that will be the values 5 and 6, resulting in a median of 5. Notice that there are four data values to the left 3, 4, 4, and 4 of these center points and four data values to the right 7, 7, 9, and The median is a measure of central tendency that represents the value in the data set for which half the observations are higher and half the observations are lower.

When there is an even number of data points, the median will be the average of the two center points. Therefore, the median for this data set is 5 hours of video games per week. Again, there are four data values to the left and right of this center point. That wraps up all the different ways to measure central tendency of our data set.

However, one question is screaming to be answered, and that is … 9 17 Random Thoughts There can be more than one mode of a data set if more than one value occurs the most frequent number of times. If you think all the data in your data set is relevant, then the mean is your best choice. This measurement is affected by both the number and magnitude of your values. However, very small or very large values can have a significant impact on the mean, especially if the size of the sample is small.

If this is a concern, perhaps you should consider using the median. The median is not as sensitive to a very large or small value. Consider the following data set from the original video game example: The mean of this sample was 6.

If you think 17 is not a typical value that you would expect in this data set, the median would be your best choice for central tendency. The poor lonely mode has limited applications. It is primarily used to describe data at the nominal scale—that is, data that is grouped in descriptive categories such as gender. If 60 percent of our survey respondents were male, then the mode of our data would be male. To begin, open a blank Excel worksheet and enter the video game data Figure 4.

**You might also like:**

*POKEMON DIAMOND AND PEARL GUIDE PDF*

Click on the Tools menu at the top of the spreadsheet between Format and Data and select Data Analysis. After selecting Data Analysis, you should see the dialog box shown in Figure 4.

Select Descriptive Statistics and click OK. The following dialog box will appear Figure 4. Descriptive Statistics dialog box. Then choose the Summary statistics check box and click OK. After you expand columns C and D slightly to see all the figures, your spreadsheet should look like Figure 4.

As you can see, the mean is 6. Piece of cake! Calculate the mean, median, and mode for the following data set: A company counted the number of their employees in each of the following age classes. According to this distribution, what is the average age of the employees in the company?

## The Complete Idiot's Guide to Statistics

Age Range Number of Employees 20—24 8 25—29 37 30—34 25 35—39 48 40—44 27 45—49 10 6. Calculate the weighted mean of the following values with the corresponding weights. Value Weight 3 2 1 7. A company counted the number of employees at each level of years of service in the following table. What is the average number of years of service in this company? U The median of a data set is the midpoint of the set if the values are arranged in ascending or descending order.

U The median is the single center value from the data set if there is an odd number of values in the set. The median is the average of the two center values if the number of values in the set is even. U The mode of a data set is the value that appears most often in the set. There can be more than one mode in a data set. But in doing so, we lost information that could be useful. For the video game example, if the only information I provided you was that the mean of my sample was 6.

As you will see later, this distinction can be very important. To address this issue, we rely on the second major category of descriptive statistics, measures of dispersion, which describes how far the individual data values have strayed from the mean. Recently, we needed to purchase a new grill since our old one mysteriously caught fire one night when I was at school teaching.

The best part about this kind of grill is that it has about four parts to assemble, which is something I can easily put together in 3 or 4 weeks. As protection measurement. Anyway, the following data set represents the number of meals each month that Debbie cranks up on the turbo-charged grill: However, the limitation is that it only relies on two data points to describe the variation in the sample. No other values between the highest and lowest points are part of the range calculation.

The formula for the sample variance is: This measure is widely used in inferential statistics. The rest of the calculations can be facilitated by the following table. Even though at first glance this equation may look more imposing, its bark is much worse than its bite.

Check it out and decide for yourself what works best for you. Let me lay this out in the following table to prove to you there are fewer calculations than with the previous method. The benefits of the raw score method become more obvious as the size of the sample n gets larger.

The good news is the variance of a population is calculated in the same manner as the sample variance. The bad news is I need to introduce another funny-looking Greek symbol: The equation for the variance of a population is as follows: Can you guess which one is me?

My age adds a little spice to the variance. The standard deviation is simply the square root of the variance. Just as with the variance, there is a standard deviation for both the sample and population, as shown in the following equations. Sample standard deviation: Recall from the previous sections that the variance from my sample of the number of meals Debbie grilled per month was 6. The standard deviation of this sample is as follows: The standard deviation of the age of this population is as follows: In comparison, the units of the variance for the grill example would be 6.

From Chapter 4, we know the mean of this frequency distribution is this: The frequency of these potty breaks must keep Debbie very busy. When this is the case, the empirical rule sounds like a decree from the emperor tells us that approximately 68 percent of the data values will be within one standard deviation from the mean.

According to the empirical rule, For example, suppose that the average exam if a distribution follows a bellscore for my large statistics class is 88 points shape—a symmetrical curve and the standard deviation is 4. In our example, two standard deviations equal 8.

According to Figure 5. Two standard deviations around the mean. In this case, I would expect all the exam scores to be between 76 and Number of Students Three standard deviations around the mean. M p kS We will revisit the empirical rule concept in subsequent chapters.

Using this equation, we can state the following: U At least This last example is shown as: The following table shows the number of home runs hit by the top 40 players in Major League Baseball during the season. Number of Home Runs from Top 40 Players in 57 38 33 29 52 38 32 29 49 37 31 29 46 37 31 28 43 35 31 28 42 34 30 28 42 34 30 28 41 34 30 28 39 33 29 27 39 33 29 27 Source: The following histogram shows that this distribution is neither bell-shape nor symmetrical, so we cannot apply the empirical rule see Figure 5.

The mean for this distribution is The following table summarizes various intervals around the mean with the percentage of values within those intervals. From the data set, we can observe that 95 percent actually fall between The same explanation holds true for three and four standard deviations around the mean. This technique includes quartile and interquartile measurements.

Approximately 25 percent of the data points will fall below the first quartile, Q1. Approximately 50 percent of the data points will fall below the second quartile, Q2. And, you guessed it, 75 percent should fall below the third quartile, Q3. Arrange your data in ascending order. Find the median of the data set. This is Q2. Find the median of the lower half of the data set in parenthesis.

This is Q1. Find the median of the upper half of the data set in parenthesis. This is Q3.

It is simply the difference between the third and first quartiles, as follows: These are extreme values whose accuracy is questioned and can cause unwanted distortions in statistical results.

Any values that are more than: Therefore, the value 10 would be an outlier in this data set. Use the exact same steps to calculate these measures as those used to calculate measures of central tendency shown in Chapter 4. As you can see from Figure 5. Also note that this data set has no mode since no value appears more than once.

## Customers who bought this item also bought

This wraps up our discussion on the different ways to describe measures of dispersion. Wrong Number The values for variance and standard deviation reported by Excel are for a sample. If your data set represents a population, you need to recalculate the results using N in the denominator rather than n — 1. Calculate the variance, standard deviation, and the range for the following sample data set: Calculate the variance, standard deviation, and the range for the following population data set: Calculate the quartiles and the cutoffs for the outliers for the following data set: A company counted the number of their employees in each of the age classes as follows.

A company counted the number of employees at each level of years of service in the table that follows. What is the standard deviation for the number of years of service in this company? Years of Service Number of Employees 1 5 2 7 3 10 4 8 5 12 6 3 7. A data set that follows a bell-shape and symmetrical distribution has a mean equal to 75 and a standard deviation equal to What range of values centered around the mean would represent 95 percent of the data points?

A data set that is not bell-shape and symmetrical has a mean equal to 50 and a standard deviation equal to 6. What is the minimum percent of values that would fall between 38 and 62? U The variance of a data set summarizes the squared deviation of each data value from the mean. U The standard deviation of a data set is the square root of the variance and is expressed in the same units as the original data values.

U The empirical rule states that if a distribution follows a bell-shape, a symmetri- cal curve centered around the mean, we would expect approximately 68, 95, and U The interquartile range measures the spread of the center half of the data set and identifies outliers, which are extreme values that you should discard before analysis. I know the topic of probability theory scares the living daylights out of many students, but it is a very important topic in the world of statistics.

The topic of probability acts as a critical link between descriptive and inferential statistics. Without a firm grasp of probability concepts, inferential statistics will seem like a foreign language. Because of this, Part 2 is designed to help you over this hurdle. But before we enter that realm, we need to arm ourselves with probability theory.

Accurately predicting the probability that an event will occur has widespread applications. For instance, the gaming industry uses probability theory to set odds for lotteries, card games, and sporting events. The focus of this chapter is to start with the basics of probability, after which we will gently proceed to more complex concepts in Chapters 7 and 8.

When I see that the weather forecast shows an 80 percent chance of rain tomorrow and I want to play golf or that my beloved Pittsburgh Pirates have only won 40 percent of their games this year which they also did last year and the year before that , there is a 65 percent chance I will get moody. In simple terms, probability is the likelihood of a particular event like rain or winning a ballgame.

U Experiment. The process of measuring or observing an activity for the purpose of collecting data. An example is rolling a pair of dice. U Outcome. A particular result of an experiment. An example is rolling a pair of threes with the dice.

U Sample space. All the possible outcomes of the experiment. U Event. An example is rolling a total of 2, 3, 4, or 5 with two dice. To properly define probability, we need to consider which type of probability we are referring to. I have underlined the outcomes that correspond to Event A. There is a total of 10 of them. You also need to know the total number of possible outcomes in the sample space.

To use classical probability, you need to understand the underlying process so you can determine the number of outcomes associated with the event. You also need to be able to count the total number of possible outcomes in the sample space. As you will see next, this may not always be possible. This type of probability observes the number of occurrences of an event through an experiment and calculates the probability from a relative frequency distribution.

The following table indicates the number of wake-up calls John required over the past 20 school days. Empirical probability requires that you count the frequency that an event occurs through an experiment and calculate the probability from the relative frequency distribution. Using the previous table, we can also examine the probability of other events. That boy needs to go to bed earlier on school nights! This is calculated using classical probability. Compare this to the probability that you will be struck by lightning once during your lifetime, which is 0.

This is an empirical probability determined by the number of times people have been struck by lightning in the past. According to these statistics, you are more than 5, times more likely to be struck by lightning than win the lottery!

However, if I were to observe days of this data, the relative frequencies would approach the true or classical probabilities of the underlying process.

This pattern is known as the law of large numbers. For this experiment, the empirical probability for the event heads is percent. However, if I were to flip the coin times, I would expect the empirical probability of this experiment to be much closer to the classical probability of 50 percent.

The law of large numbers states that when an experiment is conducted a large number of times, the empirical probabilities of the process will converge to the classical probabilities. Under these circumstances, we rely on experience and intuition to estimate the probabilities. The basic ones are as follows: U The probability of Event A must be between 0 and 1. U The sum of all the probabilities for the events in the sample space must be equal to 1. Using this definition, we can state the following: For example, if the experiment is rolling a single six-sided die, the sample space is shown in Figure 6.

To demonstrate this technique, I will use the following example. Now that my children are older and living away from home, I cherish those moments when the phone rings and I see one of their numbers appear on my caller ID.

The following table, called a contingency table, categorizes the last 50 phone calls by child and type of call. In this case, the data types are child and type of call. We can use the contingency table to calculate the simple probability that the next phone call will come from Christin as follows: The intersection of Events A and B represents the number of instances where Events A and B occur at the same time that is, the same phone call is both from Christin and a crisis.

The probability of the intersection of two events is known as a joint probability. What about the probability that the next phone call will come from Christin and will involve a crisis?

The number of phone calls from our contingency table that meet both criteria is 14, so: Using our previous example, the following table shows the four combinations that include either a call from Christin or a crisis phone call.

Define each of the following as classical, empirical, or subjective probability. The probability that the baseball player Derek Jeter will get a hit during his next at bat. The probability of drawing an Ace from a deck of cards. The probability that I will shoot lower than a 90 during my next round of golf. The probability of winning the next state lottery drawing.

The probability that the drive belt for my riding lawnmower will break this summer it did. The probability that I will finish writing this book before my deadline. Identify whether each of the following are valid probabilities. A survey of families asked whether the household had Internet access. Each family was classified by race.

The contingency table is shown here. We define: Event A: The selected family has an Internet connection in its home. Event B: The selected family is Asian American. Hit a particularly tricky question? Bookmark it to easily review again before an exam.

The best part? As a Chegg Study subscriber, you can view available interactive solutions manuals for each of your classes for one low monthly price. Why buy extra books when you can get all the homework help you need in one place? Can I get help with questions outside of textbook solution manuals?Now click on the Tools menu again; Data Analysis will now appear in the list.

Working the Standard Deviation Click the Yes button and follow any further instructions. Our hope is that researchers and students with such a background will nd this book a relatively selfcontained means of using SPSS to Download the complete idiot s guide to statistics 2nd edition or read the complete idiot s guide to statistics 2nd edition online books in PDF, EPUB and Mobi Format.

Measures of dispersion describe how far individual data values have strayed from the mean. Later, during the s, English mathemati- cian Thomas Bayes developed probability concepts that have also been very useful to the field of statistics.

People often use statistics when attempting to persuade you to their point of view.

### Related Posts:

- COLOR AND LIGHT A GUIDE FOR THE REALIST PAINTER EBOOK
- NETWORK SECURITY A BEGINNERS GUIDE PDF
- COMPTIA NETWORK+ STUDY GUIDE SECOND EDITION PDF
- FL STUDIO POWER THE COMPREHENSIVE GUIDE PDF
- QTP USER GUIDE 10.0 PDF
- MICROSOFT PROJECT GUIDE BOOK
- CHINA TRAVEL GUIDE PDF
- ROBUST STATISTICS THEORY AND METHODS PDF
- THE POWER PDF
- PITA RUIZ ALGEBRA LINEAL PDF
- PETER AND THE WOLF BOOK
- BROWNS NAUTICAL ALMANAC BOOK
- ALGEBRA PDF SSC