PyDataset Documentation (adopted from R Documentation):
Monthly Airline Passenger Numbers 1949-1960
The classic Box & Jenkins airline data. Monthly totals of international airline passengers, 1949 to 1960.
A monthly time series, in thousands.
Source:
First 5 rows of the dataset:
## time AirPassengers
## 1 1949.000000 112
## 2 1949.083333 118
## 3 1949.166667 132
## 4 1949.250000 129
## 5 1949.333333 121
PyDataset Documentation (adopted from R Documentation):
Sales Data with Leading Indicator
The sales time series BJsales
and leading indicator
BJsales.lead
each contain 150 observations. The objects are
of class "ts"
.
Source:
References:
First 5 rows of the dataset:
## time BJsales
## 1 1 200.1
## 2 2 199.5
## 3 3 199.4
## 4 4 198.9
## 5 5 199.0
PyDataset Documentation (adopted from R Documentation):
Biochemical Oxygen Demand
The BOD
data frame has 6 rows and 2 columns giving the
biochemical oxygen demand versus time in an evaluation of water
quality.
This data frame contains the following columns:
Time
: A numeric vector giving the time of the
measurement (days).demand
: A numeric vector giving the biochemical oxygen
demand (mg/l).Source:
Originally from:
First 5 rows of the dataset:
## Time demand
## 1 1 8.3
## 2 2 10.3
## 3 3 19.0
## 4 4 16.0
## 5 5 15.6
PyDataset Documentation (adopted from R Documentation):
Determination of Formaldehyde
These data are from a chemical experiment to prepare a standard curve for the determination of formaldehyde by the addition of chromatropic acid and concentrated sulphuric acid and the reading of the resulting purple color on a spectrophotometer.
A data frame with 6 observations on 2 variables.
[,1] carb
: numeric, Carbohydrate (ml)[,2] optden
: numeric, Optical DensitySource:
References:
First 5 rows of the dataset:
## carb optden
## 1 0.1 0.086
## 2 0.3 0.269
## 3 0.5 0.446
## 4 0.6 0.538
## 5 0.7 0.626
PyDataset Documentation (adopted from R Documentation):
Hair and Eye Color of Statistics Students
Distribution of hair and eye color and sex in 592 statistics students.
A 3-dimensional array resulting from cross-tabulating 592 observations on 3 variables. The variables and their levels are as follows:
Hair
: Black, Brown, Red, BlondEye
: Brown, Blue, Hazel, GreenSex
: Male, FemaleThe Hair x Eye table comes rom a survey of students at the
University of Delaware reported by Snee (1974). The split by
Sex
was added by Friendly (1992a) for didactic
purposes.
This data set is useful for illustrating various techniques for the analysis of contingency tables, such as the standard chi-squared test or, more generally, log-linear modelling, and graphical methods such as mosaic plots, sieve diagrams or association plots.
Source:
Snee (1974) gives the two-way table aggregated over Sex
.
The Sex
split of the ‘Brown hair, Brown eye’ cell was
changed to agree with that used by Friendly (2000).
References:
First 5 rows of the dataset:
## Hair Eye Sex Freq
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
PyDataset Documentation (adopted from R Documentation):
Effectiveness of Insect Sprays
The counts of insects in agricultural experimental units treated with different insecticides.
A data frame with 72 observations on 2 variables.
[,1] count
: numeric, Insect count[,2] spray
: factor, The type of spraySource:
Reference:
First 5 rows of the dataset:
## count spray
## 1 10 A
## 2 7 A
## 3 20 A
## 4 14 A
## 5 14 A
PyDataset Documentation (adopted from R Documentation):
Quarterly Earnings per Johnson & Johnson Share
Quarterly earnings (dollars) per Johnson & Johnson share 1960–80.
A quarterly time series
[,1] time
: numeric, The time index (in fractional
years)[,2] value
: numeric, Quarterly earnings per shareSource:
First 5 rows of the dataset:
## time JohnsonJohnson
## 1 1960.00 0.71
## 2 1960.25 0.63
## 3 1960.50 0.85
## 4 1960.75 0.44
## 5 1961.00 0.61
PyDataset Documentation (adopted from R Documentation):
Level of Lake Huron 1875–1972
Annual measurements of the level, in feet, of Lake Huron 1875–1972.
A time series of length 98.
[,1] time
: numeric, The time index (years)[,2] value
: numeric, Level of Lake Huron (feet)Sources:
First 5 rows of the dataset:
## time LakeHuron
## 1 1875 580.38
## 2 1876 581.86
## 3 1877 580.97
## 4 1878 580.80
## 5 1879 579.79
PyDataset Documentation (adopted from R Documentation):
Intercountry Life-Cycle Savings Data
Data on the savings ratio 1960–1970.
A data frame with 50 observations on 5 variables:
[,1] sr
: numeric, Aggregate personal savings ratio[,2] pop15
: numeric, % of population under 15[,3] pop75
: numeric, % of population over 75[,4] dpi
: numeric, Real per-capita disposable
income[,5] ddpi
: numeric, Growth rate of dpiUnder the life-cycle savings hypothesis as developed by Franco Modigliani, the savings ratio (aggregate personal saving divided by disposable income) is explained by per-capita disposable income, the percentage rate of change in per-capita disposable income, and two demographic variables: the percentage of population less than 15 years old and the percentage of the population over 75 years old. The data are averaged over the decade 1960–1970 to remove the business cycle or other short-term fluctuations.
Source:
The data were obtained from Belsley, Kuh and Welsch (1980). They in turn obtained the data from Sterling (1977).
References:
First 5 rows of the dataset:
## sr pop15 pop75 dpi ddpi
## Australia 11.43 29.35 2.87 2329.68 2.87
## Austria 12.07 23.32 4.41 1507.99 3.93
## Belgium 13.17 23.80 4.43 2108.47 3.82
## Bolivia 5.75 41.89 1.67 189.13 0.22
## Brazil 12.88 42.19 0.83 728.47 4.56
PyDataset Documentation (adopted from R Documentation):
Flow of the River Nile
Measurements of the annual flow of the river Nile at Ashwan 1871–1970.
A time series of length 100.
[,1] time
: numeric, The time index (years)[,2] value
: numeric, Annual flow of the Nile (10^8
m^3)Source:
References:
First 5 rows of the dataset:
## time Nile
## 1 1871 1120
## 2 1872 1160
## 3 1873 963
## 4 1874 1210
## 5 1875 1160
PyDataset Documentation (adopted from R Documentation):
Potency of Orchard Sprays
An experiment was conducted to assess the potency of various constituents of orchard sprays in repelling honeybees, using a Latin square design.
A data frame with 64 observations on 4 variables.
[,1] decrease
: numeric, The response (decrease in bee
visits)[,2] rowpos
: numeric, Row position in the orchard[,3] colpos
: numeric, Column position in the
orchard[,4] treatment
: factor, Type of spray treatmentIndividual cells of dry comb were filled with measured amounts of lime sulphur emulsion in sucrose solution. Seven different concentrations of lime sulphur ranging from a concentration of 1/100 to 1/1,562,500 in successive factors of 1/5 were used as well as a solution containing no lime sulphur.
The responses for the different solutions were obtained by releasing 100 bees into the chamber for two hours, and then measuring the decrease in volume of the solutions in the various cells.
An 8 x 8 Latin square design was used and the treatments were coded as follows:
A
: highest level of lime sulphurB
: next highest level of lime sulphur…
G
: lowest level of lime sulphurH
: no lime sulphurSource:
Reference:
First 5 rows of the dataset:
## decrease rowpos colpos treatment
## 1 57 1 1 D
## 2 95 2 1 E
## 3 8 3 1 B
## 4 69 4 1 H
## 5 92 5 1 G
PyDataset Documentation (adopted from R Documentation):
Results from an Experiment on Plant Growth
Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.
A data frame of 30 cases on 2 variables:
[,1] weight
: numeric, Dry weight of the plants[,2] group
: factor, Treatment group (ctrl
,
trt1
, trt2
)Source:
First 5 rows of the dataset:
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
PyDataset Documentation (adopted from R Documentation):
Reaction Velocity of an Enzymatic Reaction
The Puromycin
data frame has 23 rows and 3 columns of
the reaction velocity versus substrate concentration in an enzymatic
reaction involving untreated cells or cells treated with Puromycin.
This data frame contains the following columns:
conc
: a numeric vector of substrate concentrations
(ppm)rate
: a numeric vector of instantaneous reaction rates
(counts/min/min)state
: a factor with levels treated
untreated
Data on the velocity of an enzymatic reaction were obtained by Treloar (1974). The number of counts per minute of radioactive product from the reaction was measured as a function of substrate concentration in parts per million (ppm) and from these counts the initial rate (or velocity) of the reaction was calculated (counts/min/min). The experiment was conducted once with the enzyme treated with Puromycin, and once with the enzyme untreated.
Source:
First 5 rows of the dataset:
## conc rate state
## 1 0.02 76 treated
## 2 0.02 47 treated
## 3 0.06 97 treated
## 4 0.06 107 treated
## 5 0.11 123 treated
PyDataset Documentation (adopted from R Documentation):
Survival of passengers on the Titanic
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner ‘Titanic’, summarized according to economic status (class), sex, age and survival.
A 4-dimensional array resulting from cross-tabulating 2201 observations on 4 variables. The variables and their levels are as follows:
Class
: 1st, 2nd, 3rd, CrewSex
: Male, FemaleAge
: Child, AdultSurvived
: No, YesThe sinking of the Titanic is a famous event, and new books are still being published about it. Many well-known facts—from the proportions of first-class passengers to the ‘women and children first’ policy, and the fact that that policy was not entirely successful in saving the women and children in the third class—are reflected in the survival rates for various classes of passenger.
These data were originally collected by the British Board of Trade in their investigation of the sinking. Note that there is not complete agreement among primary sources as to the exact numbers on board, rescued, or lost.
Due in particular to the very successful film ‘Titanic’, the last years saw a rise in public interest in the Titanic. Very detailed data about the passengers is now available on the Internet, at sites such as Encyclopedia Titanica (http://www.rmplc.co.uk/eduweb/sites/phind).
Source:
The source provides a data set recording class, sex, age, and survival status for each person on board of the Titanic, and is based on data originally collected by the British Board of Trade and reprinted in:
First 5 rows of the dataset:
## Class Sex Age Survived Freq
## 1 1st Male Child No 0
## 2 2nd Male Child No 0
## 3 3rd Male Child No 35
## 4 Crew Male Child No 0
## 5 1st Female Child No 0
PyDataset Documentation (adopted from R Documentation):
The Effect of Vitamin C on Tooth Growth in Guinea Pigs
The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
A data frame with 60 observations on 3 variables.
[,1] len
, numeric: Tooth length[,2] supp
, factor: Supplement type (VC or OJ).[,3] dose
, numeric: Dose in milligrams.Source:
References:
First 5 rows of the dataset:
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
PyDataset Documentation (adopted from R Documentation):
Student Admissions at UC Berkeley
Aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex.
A 3-dimensional array resulting from cross-tabulating 4526 observations on 3 variables. The variables and their levels are as follows:
Admit
, factor: Admission status (Admitted or
Rejected)Gender
, factor: Male or FemaleDept
, factor: Department (A, B, C, D, E, F)Freq
, numeric: Number of applicants with this
combination of factorsThis data set is frequently used for illustrating Simpson’s paradox, see Bickel et al (1975). At issue is whether the data show evidence of sex bias in admission practices. There were 2691 male applicants, of whom 1198 (44.5%) were admitted, compared with 1835 female applicants of whom 557 (30.4%) were admitted. This gives a sample odds ratio of 1.83, indicating that males were almost twice as likely to be admitted. In fact, graphical methods (as in the example below) or log-linear modelling show that the apparent association between admission and sex stems from differences in the tendency of males and females to apply to the individual departments (females used to apply more to departments with higher rejection rates).
This data set can also be used for illustrating methods for graphical display of categorical data, such as the general-purpose mosaic plot or the fourfold display for 2-by-2-by-k tables. See the home page of Michael Friendly (http://www.math.yorku.ca/SCS/friendly.html) for further information.
References:
First 5 rows of the dataset:
## Admit Gender Dept Freq
## 1 Admitted Male A 512
## 2 Rejected Male A 313
## 3 Admitted Female A 89
## 4 Rejected Female A 19
## 5 Admitted Male B 353
PyDataset Documentation (adopted from R Documentation):
Road Casualties in Great Britain 1969–84
UKDriverDeaths
is a time series giving the monthly
totals of car drivers in Great Britain killed or seriously injured Jan
1969 to Dec 1984. Compulsory wearing of seat belts was introduced on 31
Jan 1983.
Seatbelts
is more information on the same problem.
Seatbelts
is a multiple time series, with columns
DriversKilled
: car drivers killed.drivers
: same as UKDriverDeaths
.front
: front-seat passengers killed or seriously
injured.rear
: rear-seat passengers killed or seriously
injured.kms
: distance driven.PetrolPrice
: petrol price.VanKilled
: number of van (‘light goods vehicle’)
drivers.law
: 0/1: was the law in effect that month?Sources:
Reference:
First 5 rows of the dataset:
## time UKDriverDeaths
## 1 1969.000000 1687
## 2 1969.083333 1508
## 3 1969.166667 1507
## 4 1969.250000 1385
## 5 1969.333333 1632
PyDataset Documentation (adopted from R Documentation):
UK Quarterly Gas Consumption
Quarterly UK gas consumption from 1960Q1 to 1986Q4, in millions of therms.
A quarterly time series of length 108.
Source:
First 5 rows of the dataset:
## time UKgas
## 1 1960.00 160.1
## 2 1960.25 129.7
## 3 1960.50 84.8
## 4 1960.75 120.1
## 5 1961.00 160.1
PyDataset Documentation (adopted from R Documentation):
Accidental Deaths in the US 1973–1978
A time series giving the monthly totals of accidental deaths in the USA. The values for the first six months of 1979 are 7798 7406 8363 8460 9217 9316.
First 5 rows of the dataset:
## time USAccDeaths
## 1 1973.000000 9007
## 2 1973.083333 8106
## 3 1973.166667 8928
## 4 1973.250000 9137
## 5 1973.333333 10017
PyDataset Documentation (adopted from R Documentation):
Violent Crime Rates by US State
This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.
A data frame with 50 observations on 4 variables:
[,1] Murder
, numeric: Murder arrests (per 100,000)[,2] Assault
, numeric: Assault arrests (per
100,000)[,3] UrbanPop
, numeric: Percent urban population[,4]
Rape`, numeric: Rape arrests (per 100,000)Sources:
Reference:
First 5 rows of the dataset:
## Murder Assault UrbanPop Rape
## Alabama 13.2 236 58 21.2
## Alaska 10.0 263 48 44.5
## Arizona 8.1 294 80 31.0
## Arkansas 8.8 190 50 19.5
## California 9.0 276 91 40.6
PyDataset Documentation (adopted from R Documentation):
Tipping data
One waiter recorded information about each tip he received over a period of a few months working in one restaurant.
A data frame with 244 rows and 7 variables:
In all he recorded 244 tips. The data was reported in a collection of case studies for business statistics (Bryant & Smith 1995).
Reference:
First 5 rows of the dataset:
## total_bill tip sex smoker day time size
## 1 16.99 1.01 Female No Sun Dinner 2
## 2 10.34 1.66 Male No Sun Dinner 3
## 3 21.01 3.50 Male No Sun Dinner 3
## 4 23.68 3.31 Male No Sun Dinner 2
## 5 24.59 3.61 Female No Sun Dinner 4