Statistics

Please find links to key chapters from this area - Statistics, Descriptive and Applications, of the course below. In addition, here is a growing collection of practice questions for you to use as a reviw exercise for the whole unit. Currently there are 5 exam style questions worth 38 marks and should be completed in about 40 minutes, before looking at the solutions.

START QUIZ!

Shorter P1 style question - 6 marks

The number of days of rain per month in a town in Louisiana was recorded for a priod of 12 months. the data is shown below.

10, 8, 8, 7, 8, 12, 13, 12, 10, 8, x, 7

The mdeian value for this data set is 8.5

a) Find the value of x

b) Find the mean number of days it rains per month

c) Find the standard deviation

a) x =

b) Mean =

c) Standard deviation =

a) For the median value to be 8.5, the average of the 6th and 7th value must be 8.5. this can only happen if the 6th is 8 and the 7th is 9. There is no 9 in the list so x must be 9.

b) From the GDC, the mean value is 9.33 (3sf)

c) From the GDC, the Stndard deviation is 1.97 (3sf)

*Be aware - make sure your calculator is counting '1' of each of the items in list 1 and not looking for a frequency in list 2.

Shorter P1 style question - 6 marks

Looking for something in the water around Cold Mountain, scientists measured the amount of disolved Oxygen in the water and the temperature of the water on different days investigate the change. Here are the results.

Temperature	18	7.7	4.2	23.3	14.4	22.1	11.2
Dissolved O2 (mg/l)	4.7	8.6	12.5	4	5	4	7

Using this data, find

a) The Pearson's product moment correlation coefficient

b) The equation of the regression line y on x

Using the equation of the regression line,

c) Estimate the concentration of dissolved oxygen when the temperature is 10°

a) R =

b) y = x + (fpr y=mx+c fill in the values for m and c)

c) Concentration = mg/l

a) From the GDC, r = -0.925 (3sf)

b) From the GDC, y = -0.402x + 12.3 (3sf)

c) y = -0.402 X 10 + 12.3 = 8.28

Shorter P1 style question - 6 marks

In a prison, a sample of 310 inmates are surveyed about the number of books they read in the last year. The results are shown in the table below.

Number of books	Number of inmates
0 to 4	60
5 to 9	106
10 to 14	93
15 to 19	a
20 to 24	9

a) What is the value of a?

b) In which class interval does the median value lie?

c) Work out an estimate for the mean number of books read by the inmates.

a) a =

b) to

c) Estimated mean = books

a) Because the total is 310 (given in the question) the missing frequency must be 42 to make that so.

b) There are 310 results, the median will be between the 155th and 156th, both of these are in this second category

c) From the GDC making sure that it is set up to read list 2 as a frequency. List one should be the midpoints of the class intervals. In this case, with discrete data, there are 5 possible answers for each interval, so the midpoint is the middle one. In the first case, 0 to 4, the 5 posisble answers are 0, 1, 2, 3, 4, so the midpoint is 2.

At a nursing home the following observed frequencies were collected to see if the likelihood of speaking another language was dependent on age group. The following data was collected

	Speaks 1 language	Speaks more than 1 loanguage
71 - 80	13	18
81 - 90	43	28
91 - 100	21	14

a) Calculate the expected frequency for the number of people in the 71 - 80 age group who speak more than 1 language

b) Calculate the expected frequency for the number of people in the 91 - 100 age group who speak only one language.

c) Calculate the number of degrees of freedom for this test.

d) Calculate the p number for this test

e) Conclude using a 10% significance level (enter A, for accept and R for reject)

e) We the null hypothesis

a) \(\frac { Total\quad 71\quad -\quad 80 }{ Total\quad people } \times Total\quad >1\quad language\\ \frac { 31 }{ 137 } \times 60=13.57664234=13.6\quad (3sf)\)

b)\(\frac { Total\quad 91\quad -\quad 100 }{ Total\quad people } \times Total\quad 1\quad language\\ \frac { 35 }{ 137 } \times 77=19.67153285=19.7\quad (3sf)\)

c) \(dof=(rows-1)(columns-1)\\ =(3-1)(2-1)\\ =2\)

d) From GDC, rounded to 3sf

e) We accept because the probability of independence (p number) is 19% which is higher than the significance level of 10% (this explanation will be expected)

The height of a sample of men in 1935 was normally distributed with a mean average of 170cm and a standard deviation of 6cm.

aWhat is the probability that a man chosen from that sample would be

a) more than 160 cm tall?

b) Between 180 cm and 190 cm tall

In 1999, a similar sample showed a normal distribution with an average od 178cm and a standard deviation of 8.5cm

For this sample, the probability of random selected man being above 'x' cm tall is 0.3

c) What is the value of x?

The probability of a randomly selected person from the sample being between 'y' and 180 cm is also 0.3.

d) What is the value of y?

c) cm

d) cm

a) From GDC entering

\(Lower\quad limit\quad =\quad 160\\ Upper\quad limit\quad =\quad { 10 }^{ 99 }\quad (or\quad similar\quad large\quad number)\\ \mu \quad =\quad 170\\ \sigma \quad =\quad 6\\ p=0.952\)

b) From GDC entering

\(Lower\quad limit\quad =\quad 180\\ Upper\quad limit\quad =\quad 190\quad \\ \mu \quad =\quad 170\\ \sigma \quad =\quad 6\\ p=0.0473\)

c) Using the inverse normal calculator

\(Tail\quad Left\quad and\quad Area\quad (p)\quad =0.7\quad OR\\ Tail\quad right\quad and\quad area\quad (p)\quad =\quad 0.3\\ \mu \quad =\quad 178\\ \sigma \quad =\quad 8\\ Height\quad =\quad 182.195204\\ =182cm\quad (3sf)\)

d) Two steps

step 1 - Work out the probability of being less than 180cm

\(Lower\quad limit\quad =\quad 0\\ Upper\quad limit\quad =\quad 180\quad \\ \mu \quad =\quad 178\\ \sigma \quad =\quad 6\\ p=0.59870632=0.599\quad (3sf)\)

Now you know that the probability of being less than 'y' = 0.599-0.3=0.299, because the 0.3 area is between y and 180, so, using inverse normal

\(Tail\quad Left\quad and\quad Area\quad (p)\quad =0.299\quad \\ \mu \quad =\quad 178\\ \sigma \quad =\quad 8\\ Height\quad =\quad 173.78177\\ =174cm\quad (3sf)\)

Cumulative Frequency & Box Plots
This topic is all about these two related tools for helping us look at how a data set is spread out. Learn about filling in cumulative frequency tables, plotting the corresponding curves and using the curves to draw box plots and answer questions...
Correlation and Regression
Description of the concept... The following is a series of slides and videos that will help you understand, learn about and review this sub-topic. Keep track of your progress and practice the exam questions on this ACTIVITY LINK Use these slides...
The Normal Distribution
The normal distribution is a fascinating, naturally occurring phenomenon that has very relevant applications to understanding the world around us. When a data set is normally distributed it has some key properties that allow us to make predictions about th
Chi Squared Independence tests
The chi squared independence tests is a widely used technique for looking for a relationship between variables that are categorical. We can use a scatter graph to look for a relationship between GDP and life expectancy, but what about GDP and...

IBDP Maths Studies

Start here

Your Graphical Display Calculator

Number & Algebra

Mathematical Models

Statistics

Geometry & Trigonometry

Logic, sets & Probability

Calculus

Exams & Assessment

Statistics

Cumulative Frequency & Box Plots

Correlation and Regression

The Normal Distribution

Chi Squared Independence tests

IBDP Maths Studies

Home

Statistics

Statistics Review questions 1/1

Cumulative Frequency & Box Plots

Correlation and Regression

The Normal Distribution

Chi Squared Independence tests