Kelly Martin wrote:HRIP7 wrote:The most important thing is the proportion of the population that has the feature of interest (in this case, paid editing). If only one editor in 500 were paid, you'd need a fairly large sample (a lot more than 500) to obtain a reliable estimate of the actual proportion. If every second editor were paid (equivalent to tossing a fair coin), you'd get a fairly good result from just 50 editors. The population size is irrelevant in comparison.
There is a formula (which I don't remember, and my stats reference is buried in a box somewhere) that relates sample size to confidence interval for samples seeking to measure a population proportion. This formula includes the actual population proportion, which, of course, creates a chicken-egg problem, since you can't know, even imprecisely, the population proportion until you do the sample. I don't recall the exact formula but I know it includes p and (1-p) as terms, and achieves a minima at p=0.5 and is infinite at p=0 or p=1.
If the sample size is
S, and your samples are properly random
with replacement, then the number of your targeted subpopulation in your sample will be binomially distributed with mean
p S and variance
p (1 -
p)
S . If both
S and
p S are sufficienly large for the normal approximation to be usable, then a sample size of
S =
x**2/(
p (1-
p) ) will give you a confidence level of
c that the number of your targeted subpopulation in your sample lies within
x standard deviations of the mean, where
c is the probability that a standard normal variate lies between -
x and
x. I suspect this is the formula you're referring to.
However, if you want a confidence level
c that the number of your targeted subpopulation in your sample lies within a fraction
f of the mean, you need
x**2
p (1 -
p)
S = (
f p S )**2; i.e.
S = (
x/f)**2 (1 -
p)/
p . For 95% confidence, you need
x =approx 1.96, so if
p =approx 0.5 then to have 95% confidence that you're within 10% of the mean, for example, you need
S =approx 19.6**2 =approx 384 . Of course, as you point out, you don't actually know what
p is, and the smaller it is, the larger you need to make
S to achieve a given confidence level.
E voi, piuttosto che le nostre povere gabbane d'istrioni, le nostr' anime considerate. Perchè siam uomini di carne ed ossa, e di quest' orfano mondo, al pari di voi, spiriamo l'aere.