Midterm Exam Review (Chapters 1 - 7)
Key Terms and Concepts
| Descriptive statistics | Numerical, graphical, and tabular methods for organizing and summarizing data. |
| Inferential statistics | Methods for generalizing from a sample to the population from which the sample was selected. |
| Population | The entire collection of individuals or measurements about which information is desired. |
| Sample | A part of the population selected for study. |
| Categorical data | Individual observations are categorical responses (non-numerical). |
| Numerical data | Individual observations are numerical in nature. |
| Discrete numerical data | Possible values are isolated points along a number line. |
| Continuous numerical data | Possible values form an entire interval along the number line. |
| Bivariate and multivariate data | Each observation consists of two (bivariate) or more (multivariate) responses or values. |
| Observational study | A study that observes characteristics of an existing population. |
| Simple random sample of size n | A sample selected in a way that gives every different sample of size n an equal chance of being selected. |
| Stratified sampling | Dividing a population into subgroups (strata) and then taking independent random samples from each stratum. |
| Confounding variable | A variable that is related to both group membership and to the response variable. |
| Measurement or response bias | The tendency for a sample to differ from the population because the method of observation tends to produce values that differ from the true value. |
| Selection bias | The tendency for a sample to differ from the population due to systematic exclusion of some part of the population. |
| Nonresponse bias | The tendency for a sample to differ from the population because measurements are not obtained from all individuals selected for inclusion in the sample. |
| Experiment | A procedure for investigating the effect of an experimental condition (which is manipulated by the experimenter) on a response variable. |
| Extraneous factor | A variable that is not of interest in the current study but is thought to affect the response variable. |
| Direct Control | Holding extraneous factors constant so that their effects are not confounded with those of the experimental conditions. |
| Blocking | Using extraneous factors to create experimental groups that are similar with respect to those factors, thereby filtering out their effect. |
| Randomization | Random assignment to experimental conditions. |
| Replication | Ensuring that there is an adequate number of observations on each experimental treatment. |
| Placebo treatment | A treatment that resembles the other treatments in an experiment, but which has no active ingredients. |
| Control group | A group that receives no treatment or a placebo treatment. |
| Frequency Distribution | A table that displays frequencies, and sometimes relative and cumulative relative frequencies, for categories (categorical data), possible values (discrete data), or class intervals (continuous data). |
| Bar Chart | A graph of a frequency distribution for a categorical data set. Each category is represented by a bar and the area of the bar is proportional to the corresponding frequency or relative frequency. |
| Pie Chart | A graph of a frequency distribution for a categorical data set. Each category is represented by a slice of the pie and the area of the slice is proportional to the corresponding frequency or relative frequency. |
| Dotplot | A picture of numerical data in which each observation is represented by a dot on or above a horizontal measurement axis. |
| Stem-and-leaf display | A method of organizing quantitative data in which the stem values (leading digit(s) of the observations) are listed in a column, and the leaf (trailing digit(s)) for each observation is then listed beside the corresponding stem. Sometimes stems are repeated to stretch the display. |
| Histogram | A picture of the information in a frequency distribution. A rectangle is drawn above each category label, possible value (discrete data), or class interval. The rectangle's area is proportional to the corresponding relative frequency (or, equivalently, frequency). |
| Histogram Shapes | A (smoothed) histogram may be unimodal (a single peak), bimodal (two peaks), or multimodal. A unimodal histogram may be symmetric, positively skewed (a long right or upper tail), or negatively skewed. A frequently occurring shape is the normal curve. |
|
|
Notation for sample data consisting of observations on a variable x where n is the sample size. |
| Sample mean, |
The most frequently used measure of center of a sample. It can be very sensitive to the presence of even a single outlier (unusually large or small observation). |
| Population mean, µ | The average x value in the entire population. |
| Sample median | The middle value in the ordered list of sample observations. (For n even, the median is the average of the two middle values.) It is very insensitive to outliers. |
| Trimmed mean | A measure of center in which the observations are first ordered from smallest to largest, one or more observations are deleted from each end, and the remaining ones are averaged. In terms of sensitivity to outliers, it is a compromise between the mean and the median. |
| Deviations from the mean:
|
Quantities used to assess variability in a sample. Except for
rounding effects,
|
The sample variance, and standard deviation, |
The most frequently used measures of variability for sample data. |
| The population variance |
Measures of variability for the entire population. |
| Quartiles and the interquartile range | The lower quartile separates the smallest 25% of the data from the remaining 75%, and the upper quartile separates the largest 25% from the smallest 75%. The interquartile range (iqr), a measure of variability less sensitive to outliers than s, is the difference between the upper and lower quartiles. |
| Chebyshev's Rule | This rule states that for any number k
≥ 1, at least 100(1 - 1/k2)% of the observations in any data set are within k standard deviations of the mean. It is typically conservative in that the actual percentages often considerably exceed 100(1 - 1/k2)%. |
| Empirical Rule | This rule gives the approximate percentage of observations within one standard deviation (68%), two standard deviations (95%), or three standard deviations (99.7%) of the mean when the histogram is well approximated by a normal curve. |
| z score | This quantity gives the distance between an observation and the mean as a certain number of standard deviations. It is positive (negative) if the observation lies above (below) the mean. |
| rth percentile | The value such that r percent of the observations in the data set fall at or below that value. |
| Boxplot | A picture that conveys information about the most important features of a data set: center, spread, extent of skewedness, and presence of outliers. |
| Scatter Plot | A picture of bivariate numerical data in which each observation (x, y) is represented as a point located with respect to a horizontal x-axis and a vertical y-axis. |
| Pearson's sample correlation coefficient: |
A measure of the extent to which sample x and y
values are linearly related;
|
| Spearman's correlation coefficient, |
Pearson's r applied to the ranks of the x and y values. It will detect both linear and non-linear relationships and is not as sensitive to outliers as is r. |
| Principle of least squares | The method used to select a line that summarizes an approximate linear relationship between x and y. The least squares line is the line that minimizes the sum of the squared vertical deviations from the points in the scatter plot. |
![]() ![]() |
The slope and vertical (y) intercept of the least squares line. |
| Predicted (fitted) values: |
Obtained by substituting the x value for each observation into the
least squares line;
|
| Residuals | Obtained by subtracting each predicted value from the corresponding
observed y value;
|
| Residual Plot | Scatter plot of the (x, residual) pairs. Isolated points or a pattern of points in a residual plot are indicative of potential problems. |
| Residual (error) sum of squares: |
The sum of the squared residuals is a measure of y variation that cannot be attributed to an approximate linear relationship (unexplained variation). |
| Total sum of squares: |
The sum of squared deviations from the sample mean |
Coefficient of determination ![]() |
The proportion of variation in observed y's that can be attributed to an approximate linear relationship. |
Standard deviation about the least squares line:![]() |
The size of a "typical" deviation from the least squares line. |
| Transformation | A simple function of the x and/or y variable which is then used in a regression. |
| Power transformation | An exponent, or power, p is first specified, and then new
(transformed) data values are calculated as
|
| Chance Experiment | Any experiment for which there is uncertainty concerning the resulting outcome. |
| Sample Space | The collection of all possible outcome from a chance experiment. |
| Event | Any collection of possible outcome from a chance experiment. |
| Simple event | Any event that consists of a single outcome. |
| Events: 1. not A, Ac, !A, A', ~A 2. A or B, A 3. A and B, A |
1. The event consisting of all outcomes not in A. 2. The event consisting of all outcomes in at least one of the two events. 3. The event consisting of all outcomes common to both events. |
| Disjoint (mutually exclusive) events | Events that have no outcomes in common and so cannot occur at the same time. |
| Basic properties of probability | 1. The probability of an event must be a value between 0 and 1. 2. If S is the sample space for an experiment, P(S) = 1. 3. If A and B are disjoint events, P(A 4. P(A) + P(!A) = 1. |
|
|
P(E) when the outcomes are equally likely and where N is the number of outcomes in the sample space. |
|
|
Addition rules when events are disjoint. |
|
|
The conditional probability of the event E given that the event F has occurred. |
|
|
Events E and F are independent if the probability of E given that F has occurred is the same as the probability that E will occur with no knowledge of F. |
|
|
Multiplication rules for independent events. |
|
|
The general addition rule for two events. |
|
|
The general multiplication rule for two events. |
|
|
The law of total probability, where B1, B2,...,Bk are disjoint events with P(B1)+...+P(Bk) = 1. |
|
|
Bayes' rule, where B1, B2,...,Bk are disjoint events with P(B1)+...+P(Bk) = 1. |
| Independent outcomes | Two events are independent if the chance that one event occurs is not affected by knowledge of whether or not the other occurred. |
| Simulation | A technique for estimating probabilities that generates observations by performing an experiment that is similar in structure to the real situation of interest. |
| Random variable: discrete or continuous | A numerical variable with a value determined by the outcome of a chance experiment. It is discrete if its possible values are isolated points along the number line and continuous if its possible values form an entire interval along the number line. |
| Probability distribution p(x) of a discrete random variable x | A formula, table, or graph that gives the
probability associated with each x value. Conditions on p(x) are
(a) p(x) > 0, and (b) Σ p(x) = 1, where the sum is over all possible x-values. |
| Probability distribution of a continuous random variable x | Specified by a smooth (density) curve for which the total area under the curve is 1. The probability P(a < x < b) is the area under the curve and above the interval from a to b; this is also P(a < x < b). |
| μx and σx | The mean and standard deviation, respectively, of a random variable x. These quantities describe the center and extent of spread about the center of the variable's probability distribution. |
| μx = Σxp(x) | The mean value of a discrete random variable x; it locates the center of the variable's probability distribution. |
| σx2= Σ(x
- μ)2p(x) |
The variance and standard deviation, respectively, of a discrete random variable; these are measures of the extent to which the variable's distribution spreads out about μ. |
|
Binomial probability distribution |
This formula gives the probability of observing x successes (x = 0, 1, 2,..., n) among n trials of a binomial experiment. |
| μx
= np |
The mean and standard deviation of a binomial random variable. |
| Normal distribution | >A continuous probability distribution that has a bell-shaped density curve. A particular normal distribution is determined by specifying the values of μ and σ. |
|
Standard normal distribution (z curve) |
This is the normal distribution with μ =0 and σ = 1. The density curve is called the z curve, and z is the letter commonly used to denote a variable having this distribution. Areas under the z curve to the left of various values are given in Appendix table II. |
| z critical value | A number on the z measurement scale that captures a specified tail area or central area. (z*) |
|
|
z is obtained by "standardizing": subtracting the mean and then dividing by the standard deviation. When x has a normal distribution, z has a standard normal distribution. This fact implies that probabilities involving any normal random variable (any μ or σ) can be obtained from z curve areas. |
| Normal probability plot | A picture used to judge the plausibility of the assumption that a sample has been selected from a normal population distribution. If the plot is reasonably straight, this assumption is reasonable. |
| Normal approximation to the binomial distribution | When both nπ > 10 and n(1
- π) > 10, binomial probabilities are well
approximated by corresponding areas under a normal curve with μ
= nπ and σ =
|
Notes from Mrs. Caso
This midterm exam will be taken on the computer over the network using ExamView Pro. The questions will appear in a random order for each person taking the exam. The choices for the multiple choice questions will also be randomized for each person. The final printouts, however, will all follow the same format and question order.
You may use the online formula sheets and tables while working on the midterm exam. In addition, you will need your calculator, a pen or pencil, and scratch paper. You will also need your student ID number in order to access the test online.
There are 70 questions on the exam which will be worth 70 points total. The questions will be true/false, multiple choice, and numerical computation. There will be no written response questions on this exam. Many of the questions have been taken from AP Statistics Exam preparation books and will follow that format. This is by no means an easy exam. It will be graded as a standardized test, similar to the AP Exam. The grade distribution will be based on where your standardized grade falls on the normal curve:
Interactive Materials
Use the interactive materials sections from the Chapters 1 through 7 web pages.
Schedule
| Timeline: | 5 days |
| Day 84: 01/11/12 |
Review Chapters 1-4 |
| Day 85: 01/12/12 |
Review Chapters 5 |
| Day 86: 01/13/12 |
Review Chapter 6-7 |
| Day 87: 01/17/12 |
Exam Day 1: Periods 1, 2, & 3 |
| Day 88: 01/18/12 |
Exam Day 2: Periods 4/5, 6/7, & 8/9 |
| Day 89: 01/19/12 |
Exam Day 3: Periods 10 & 11 |

and standard deviation, 



