"Estimate a population mean". modules 7 and 8 TODO. ref p115 top
Estimating a population mean. fm p115 top. module 18
problem: popuMean= __ ; popuSigma = __ ; curve is normal Is it surprising to find a sample, of sampleSize 'n', with a sampleMean >= surpriseX formulation: compute (P(sampleMean) > surpriseX) process: # since curve is normal, no need that sample size >= 30 # popuMean will be used to calculate surpriseZ sampleSigma = popuSigma/sqrt(sampleSize) # convert surpriseX to surpriseZ surpriseZ = (surpriseX - popuMean) / sampleSigma surpriseP = p4z(surpriseZ) # convert Z into its probability P
Inference for one proportion summary. m16 p0 = from some published survey. Propose that other conditions have changed the related probability H0 = "null hypothesis", based on the original article and its data Ha = "alternate hypothesis" alpha = (significance level). typically 0.05 could be diff determine 'n' for your sample bNormal # n*p0 > 10 etc forms the minimum. collect sample data; extract pHat (the sample mean) stdErr = sqrt(p0 * (1-p0) / n) zScore = (pHat - p0) / stdErr) pValue = p4z(zScore) # careful, need both tails if Ha used 'notEqual' if pValue < alpha, conclude that Ha is correct if pValue > alpha, no conclusion, data doesn't resolve this issue.
## # relation = one of 'LT','NE','GT' def hypothesize(self,p0,pHat,relation,n,alpha) # bNormal(self, n,p0) normality test. # print H0, Ha # calc stderr stndErr = stdError(self, n,p0) : # zValue = zScore(self, stndErr,p,pHat) # = (pHat - p0) / stndErr write something to return area for zValue. from table. include x's as well as y's # get area # trim/fix area value # apply
The National Health Survey uses household interviews to describe the health-related habits of U.S. adults. From these interviews they estimate population parameters associated with behaviors such as alcohol consumption, cigarette smoking, and hours of sleep for all U.S. adults.
In the 2005-2007 report, they estimated that 30% of all current smokers started smoking before the age of 16. Imagine that we want to verify this estimate. So we randomly select a sample of 100 smokers and calculate the proportion who started smoking before the age of 16. How much error do we expect in the sample proportions if the 30% is correct for the population overall? Use the applet and a give an error based on 2 standard deviations. (WHY 2 SDs? dont we want 'typical' ?)
Applet: set these values p = 0.3 // population probability that smokers start before age 16 n = 100 // set sample size to 100, as instructed. pHat: slider bar // red line and pHat not needed for this answer
"Sampling Distribution: mean = 0.2999 standard deviation = 0.0458
answer is 0.0458 x 2 = 0.09
We are instructed to use 2 standard deviations for the margin of error. So proportions that vary less than 2 x SD
Use the applet to conduct a simulation. Which option gives the right answer and the best explanation?
set p=0.2, n=20, pHat=0.2 histogram'ish plot shows 0.3 just beyond 1 sd. mean 0.2019, std dev = 0.0906 "so error of 0.10 is not unusual...
same skittles problem except the bag now has 100 candies. n=100 only change needed now the std dev is 0.04 and an x value of 0.3 (vs mean of 0.2) is 2.5 std devs away from the mean, therefore 'unusual'=='surprising'
Learn by Doing, p77.85 // surprised if exit poll sample 35% when P=40% 'surprised' implies 'unusual' (to me). think that works for Them too. n=100 (again) p=0.4 ran the Applet again. mean about .4, std dev is 0.049 or about 0.05 So unusally (low) would be mean - 2 x 0.05 or .4 - .1 = .3 and the value given is .35 so it's not unusual (surprising). ALL wrong. A _local_ exit poll is not a random sample. So no such conclusions!