An investigation will typically focus on a well-defined collection of objects constituting a population of interest.git
When desired information is available for all objects in the population, we have what is called a census.promise
A subset of the population -- a sample -- is selected in some prescribed manner.dom
A variable is any characteristic whose value may change from one object to another in the population.ide
The number of observations in a single sample will often be denoted by n.url
Given a data set consisting of n obversations on some variable x, the individual observations will be denoted by x1,x2,x3,...,xn.3d
Steps for Constructing a Stem-and-Leaf Displayrest
A dotplots is an attractive summary of numerical data when the data set is reasonably samll or there ar relatively few distinct data values. Each observation is represented by a dot above the corresponding location on a horizontal measurement scale.orm
Consider data consisting of observations on a discrete variable x.ip
Histograms come in a variety of shapes.ci
For a given set of number x1,x2,x3,...,xn, the most familiar and useful measure of the center is the mean, or arithmetic average of the set.
The word median is synonymous with "middle", and the sample median is indeed the middle value when the observations are ordered from smallest to largest.
A trimmed mean is a conpromise between mean and median. A 10% trimmed mean, for example, would be computed by eliminating the smallest 10% and the largest 10% of sample and then averaging what is left over.
The simplest measure of variability in a sample is the range, which is the difference between the largest and smallest sample values.
The sample variance, denoted by s2;
The sample standard deviation, denoted by s.
We will use σ2 to denote the population variance and σ to denote the population standard deviation.
It is customary to refer to s2 as being based on n-1 degrees of freedom(df).
This terminology results from the fact that although s2 is based on the n quantities, these sum to 0, so specifying the values of any n-1 of the quantities determines the remaining value. For example, if n=4 and x1-x=8,x2-x=-6,x4-x=-4, then automatically we have x3-x=2, so only 3 of the 4 values of xi-x are freely determined(3df).
After the n observations in a data set are ordered from smallest to largest, the lower fourth and upper fourth are given by:
lower fourth:
upper fourth:
That is, the lower(upper) fourth is hte median of the smallest(largest) half of the data, where the median is included in both halves if n is odd. A measure of spread that is resistant to ourliersis th fourth spread ƒs, given by:
ƒs = upper fourth - lower fourth
Any observation father than 1.5ƒs from the closest fourth is an outlier. An outlier is extreme if it is more than 3ƒs from the nearest fourth, and it is mild otherwise.
A comparative or side-by-side boxplot is a very effective way of revealing similarities and differences between two or more data sets consisting of observations on the same variable.