|
|
ERDC TN-DOER-C15
July 2000
to as stratified random sampling (Lubin, Williams, and Lin 1995). The sampling method will
ultimately be selected based on the greatest confidence in capturing representative data, quality and
availability of existing information on which to base the method selection, and cost considerations.
Additional discussion can be found in U.S. Environmental Protection Agency/U.S. Army Corps of
Engineers (1995).
Estimating the number of samples required. Ultimately, the number of samples obtained will
be determined by cost considerations. The upper threshold will almost certainly be set by the
number of samples required to determine the desired parameter (e.g., contaminant concentrations,
percent sand) with a specified degree of confidence. If a normally distributed sample can be
assumed, then from the empirical rule, approximately 95 percent of the values will lie within 1.96 s of
the mean, where s is the standard deviation of the sample. An acceptable margin of error can then
be used to estimate the number of samples required. For example, to calculate the mean concen-
tration of a constituent at a selected depth within 10 mg/kg at the 95 percent confidence level, then:
s
= 10
1.96
(1)
n
Solving for n gives the number of samples required to determine the mean within 10 mg/kg, at the
95 percent confidence level. Higher or lower confidence levels can be used. Further discussion
can be found in Mendenhall and Beaver (1994). The obvious disadvantage to this method is that
some idea of the variability of the data to be obtained is required prior to sampling. One could use
results from analysis of selected samples taken within the CDF to estimate s and determine how
many additional samples should be analyzed. (The standard deviation for the subsample can be
calculated directly, or the range of the data can be used to estimate s (Appendix I).) If no data are
available, an action level can be used as an estimated value for the variance. Such an iterative
approach is described by Lubin, Williams, and Lin (1995) using a mathematical relation for
estimating sample numbers that does not use the mean, but does incorporate acceptable error levels
(α and β). However, environmental data are typically highly variable (large s), which may result
in unrealistically high numbers of samples required. Additionally, these approaches require the
assumption of a normal distribution, which is not typical of most environmental data. The geometric
alternative variance can be used to estimate required sample size for lognormally distributed data;
this approach is further described in Lubin, Williams and Lin (1995). Another alternative is to
sample sequentially, evaluating data as they are generated and continuing to sample until a definitive
threshold is achieved at a desired confidence level. The sequential approach and additional methods
for estimating required sample numbers for different grid configurations and confidence levels are
described in Lubin, Williams, and Lin (1995).
Several of the nonparametric data analysis methods require a minimum number of samples and
observations to be valid, or require equally paired numbers of observations between samples to be
compared. For example, the Kruskal-Wallis H-test (nonparametric ANOVA) requires at least three
samples with at least three observations per sample. When there are more than 6 observations per
sample, the distribution of the H statistic is well approximated by the chi-square distribution
(McBean and Rovers 1998). The STATSS (Lubin, Williams, and Lin 1995) guidance document
provides simple guidance for determining the number of samples required for a specified error level
or confidence interval.
6
|
Privacy Statement - Press Release - Copyright Information. - Contact Us - Support Integrated Publishing |