COPYRIGHT: This article is copyrighted and is not to be used without
proper acknowledgment and citation. It will appear as a chapter in
Encyclopedia of Biostatistics, edited by Peter Armitage and
Theodore Colton (Editors-in-Chief), to be published by John Wiley in summer, 1998 as six
volumes. The article will be in a section titled "Design of
Experiments and Sample Surveys", edited by Paul Levy.
INTRODUCTION
In the past ten years many researchers in the health sciences
have become interested in performing secondary analyses using
data from complex sample surveys. These analyses are descriptive,
analytical, hypothesis generating or model building. Sample survey
statisticians are aware that specialized software should be used
to analyze complex sample survey data, particularly when analyses
are descriptive or analytical and the survey design includes clustering
( [6], [7] ). Carlson's [3] article in this volume reviews some
currently available sample survey software packages.
However, some scientists are not aware of the need to use specialized
software or, if aware, prefer not to do so because of the need
to learn a new software package. Secondary data analysts may be
confused when they realize that there is a difference of opinion,
even among sample survey statisticians, as to when it is necessary
to use specialized software for analysis of sample survey data
( [4], [5], [10] ). Finally, many biostatisticians are not able
to offer advice on this topic because they are not familiar with
the specialized data analysis issues for sample survey data.
This paper uses sample survey data from BRFSS (Behavioral Risk
Factor Surveillance System) surveys to illustrate that biased
point estimates, inappropriate standard errors and confidence
intervals, and misleading tests of significance can result from
using standard statistical software packages to analyze sample
survey data. General recommendations are given to indicate situations
in which serious errors are likely to occur with the use of standard
statistical software packages for sample survey data.
CAUTIONS IN USING STANDARD STATISTICAL SOFTWARE PACKAGES
Standard statistical software packages generally do not take into
account four common characteristics of sample survey data: (1)
unequal probability selection of observations, (2) clustering
of observations, (3) stratification and (4) nonresponse and other
adjustments [2 ]. Point estimates of population parameters are
impacted by the value of the analysis weight for each observation.
These weights depend upon the selection probabilities and other
survey design features such as stratification and clustering.
Hence, standard packages will yield biased point estimates if
the weights are ignored. Estimated variance formulas for point
estimates based on sample survey data are impacted by clustering,
stratification and the weights. By ignoring these aspects, standard
packages generally underestimate the estimated variance of a point
estimate, sometimes substantially so.
Most standard statistical packages can perform weighted analyses,
usually via a WEIGHT statement added to the program code. Use
of standard statistical packages with a weighting variable may
yield the same point estimates for population parameters as sample
survey software packages. However, the estimated variance often
is not correct and can be substantially wrong, depending upon
the particular program within the standard software package.
DESCRIPTION OF BRFSS SURVEYS
The BRFSS [13 ] program was established by CDC (Centers for Disease
Prevention and Control) to provide state level data to estimate
the prevalence of risk factors for disease and poor health. States
select a continuous probability sample of the adult noninstitutionalized
population using some type of random digit dialing (RDD) telephone
sampling [9]; the Mitofsky-Waksberg technique [16] is commonly
used. Once a residence is reached, almost all states select one
adult, with equal probability, to undergo a telephone interview.
States generally interview about 1500 to 4500 adults per year.
BRFSS surveys result in an unequal probability sample of adults,
primarily because only one adult per sampled household is selected.
Weighting adjustments generally are done for enumeration nonresponse
(a roster of adults within the household was not obtained) and/or
for interview nonresponse (the selected adult was not interviewed).
Further, poststratification of the observations to U.S. Census
data generally is done. Hence, each observation in the dataset
has a value for the variable FINALWT (final analysis weight).
This value indicates the number of persons in the population
represented by that observation. The value of FINALWT varies
across observations, sometimes considerably, depending upon the
state's sampling plan.
In addition to differential weighting, the statewide sample generally
is clustered by telephone bank (usually defined as a group of
telephone numbers with identical area code, prefix and first two
digits of the suffix). Some states use stratification in their
sampling process. Although BRFSS sampling details differ across
the states, each statewide BRFSS survey typically is weighted
and clustered, and a few are stratified.
This paper uses calendar year 1993 BRFSS data on diabetes for
the six states given in Table 1, yielding a total sample size
of 20,049 observations (completed interviews) over the six states.
Presence/absence of diabetes is defined as a yes/no answer to
"Have you ever been told by a doctor that you have diabetes";
the few observations with other than a yes/no answer are excluded
from all analyses.
METHODS FOR COMPARISON OF TWO SOFTWARE PACKAGES
In order to illustrate the general cautions expressed above, BRFSS
data were analyzed with a standard statistical package (SAS System
for Windows, Version 6.11 [12] ) and a specialized sample survey
software package (SUDAAN, version 7.0, [15] ), hereafter referred
to as SAS and SUDAAN. Analyses with SUDAAN are considered to
provide the correct answers. Sample survey packages other than
SUDAAN would provide identical point estimates as SUDAAN and identical
or very close estimated variances, depending upon the variance
estimation procedure used. Two common variance estimation techniques
are Taylor Series linearization ( [8], [14*] ) as in SUDAAN and
replication procedures[11] as in WesVar [1]. Other standard statistical
packages are assumed to provide the same results as SAS for unweighted
analyses and for point estimates for weighted analyses. However,
other standard statistical packages may have different default
calculations for variability calculations using weighted analyses.
Each state's sampling plan was described to SUDAAN in the same
way; within a state no stratification was used and observations
were clustered in their appropriate primary sampling unit (PSU),
usually a telephone bank. In order to perform analyses over
all six states, as well as state-specific analyses, the six state
concatenated dataset was described to SUDAAN as a stratified (by
state) multi-stage clustered survey. The finite population correction
factor was not used in estimated variance calculations. For those
familiar with SUDAAN, the PROC statement included DESIGN = WR,
the NEST statement included the stratification variable STATE
and the clustering (telephone bank) variable PSU, and the WEIGHT
statement included the variable FINALWT.
SAS analyses were conducted using four different approaches, all
of which ignored the clustering and stratification. The first
SAS approach analyzed the dataset unweighted; this is equivalent
to using the WEIGHT statement with the variable _ONE_ (a variable
whose value is 1.0 for every observation in the dataset). Table
1 (column 2) shows that, with this approach, the CA sample size
is 3719 and it contributes 19% to the total inference population.
The second SAS approach used the WEIGHT statement with the variable
FINALWT. There is great variability in FINALWT; Table 1 (column
3) indicates its range as 70 to 72,663. Table 1 also shows that
using FINALWT implies that the CA sample contributes 50% to the
total inference population, rather than only 19% in an unweighted
analysis.
The third SAS approach used the WEIGHT statement with the variable
NORMWT, a normed weight based on FINALWT. This approach is recommended
by some data analysts as giving results approximately equal to
results from a sample survey package. For person j within state
i , let finalwt(i,j) be the value of the variable FINALWT. Then,
the value of NORMWT for this person is defined as:
normwt(i,j) = (20,049) * [finalwt(i,j)] / (45,452,569)
The figure 45,452,569 is the estimated total adult population
of the six states, which is the sum of the value of FINALWT over
all 20,049 observations (Table 1, column 3). The variable NORMWT
has values less than 1.0 and greater than 1.0, and the sum of
the values of NORMWT over the entire dataset is 20,049, the total
sample size. Table 1 (column 4) shows that, with this approach,
the CA sample contributes 50% to the total inference population.
The fourth SAS approach used the WEIGHT statement with the variable
STNORMWT, a second normed weight calculated from FINALWT, where
the norming is done within state. Hence, the sum of the values
of STNORMWT over all observations within a state equals the sample
size for that state (Table 1, column 5). Clearly, the sum of
STNORMWT over the entire sample equals the total sample size 20,049.
Table 1 shows that, with this approach, the CA sample contributes
19% to the total inference population.
SUDAAN and the four SAS methods were compared first on a descriptive
analysis, i.e. estimation of diabetes prevalence for the total
population (six states combined) and for each state. PROC DESCRIPT
in SUDAAN was used, although PROC CROSSTAB in SUDAAN could also
have been used, where the diabetes variable was coded as 1 or
2. PROC MEANS was used for all four SAS methods to obtain the
estimated prevalence and estimated standard error for the point
estimate; the diabetes variable was coded as 0 (no diabetes)
or 100 (have diabetes).
Second, SUDAAN and the four SAS methods were compared on a chi-square
analysis to test the null hypothesis of no relationship between
gender and diabetes. These analyses were performed for the total
population and for each state. PROC CROSSTAB was used in SUDAAN
and PROC FREQ was used in SAS with diabetes coded as a categorical
variable (1,2).
RESULTS
Descriptive Analyses
Table 2 (columns 2 and 3) shows that unweighted SAS, compared
to SUDAAN, overestimates the prevalence of diabetes by about 10%
of the SUDAAN point estimate for the total population (5.40% versus
4.86%) and for half of the states. Note also that the estimated
standard errors in Table 2 are smaller for unweighted SAS than
for SUDAAN. For the entire population the SUDAAN estimated standard
error is 35% larger than the standard error estimated by unweighted
SAS (.219 versus .160). The combination of the biased point estimate
and underestimation of the standard error could result in quite
misleading confidence intervals for the prevalence of diabetes.
Table 2 (columns 4 and 5) shows that SAS with FINALWT or NORMWT
give identical results, with the SAS point estimates the same
as SUDAAN but the SAS estimated standard errors still lower than
SUDAAN. The magnitude of SAS underestimation of the standard
error with FINALWT or with NORMWT is somewhat worse than with
SAS unweighted. The advantage of using SAS with FINALWT or NORMWT,
compared to SAS unweighted, is that correct point estimates are
obtained.
Table 2 (columns 2 and 6) shows that SAS with STNORMWT gives identical
results to SAS with FINALWT or NORMWT for state specific analyses
but yields a biased point estimate for the total population along
with an underestimated standard error.
Chi -Square Analyses
The chi-square analysis tests the null hypothesis that the prevalence
of diabetes is the same for males and females. Table 3 (columns
2 and 3) shows that unweighted SAS, compared to SUDAAN, yields
a higher value for the chi-square statistic for the entire population,
giving a smaller P-value (.003 versus .014). A comparison of
unweighted SAS with SUDAAN, state by state, shows no consistent
pattern; the P-value for unweighted SAS is sometimes higher and
sometimes lower than for SUDAAN. However, since it was noted
above that unweighted SAS yields biased estimates of diabetes
prevalence, unweighted SAS probably should not be given serious
consideration in an analysis to determine if diabetes prevalence
differs by sex.
Table 3 (columns 2 and 4) shows that SAS with FINALWT, compared
to SUDAAN, yields an unreasonably large and suspicious value of
the chi-square statistic for the total population and for each
state. For this reason P-values are not included in Table 3 for
SAS with FINALWT. PROC FREQ in SAS, with FINALWT, considers the
sample size to be the sum of the values of FINALWT (i.e. 45,452,569)
as opposed to the actual sample size of 20,049 for the total
population. This is the reason for the very large values of the
chi-square statistic.
Table 3 (columns 2 and 5) shows that SAS with NORMWT, compared
to SUDAAN, yields a chi-square value for the six state area which
is twice as large (12.64 versus 6.07). However, this relationship
between SUDAAN and SAS with NORMWT does not hold for each of the
six states in Table 3. Compared to SUDAAN, SAS with NORMWT yields
a larger chi-square statistic value for some states but a smaller
value for other states. This occurs because, within each state,
SAS considers the sample size as the sum of NORMWT. Hence, the
sample size for CA is artificially inflated to 10,049 from 3719,
whereas the sample size for WV is artificially deflated to 598
from 2425 (see Table 1). Thus, the chi-square statistic using
SAS with NORMWT, compared to SUDAAN, is much larger for CA but
much smaller for WV.
Table 3 (columns 2 and 6) shows that SAS with STNORMWT, compared
with SUDAAN, yields a chi-square statistic for the total population
about twice as large as SUDAAN (13.20 versus 6.07). For each
state the chi-square statistic based on SAS with STNORMWT is about
15% to 20% larger than provided by SUDAAN. Because STNORMWT
is normed within a state, the sum of the weights reflect the statewide
sample size. Hence, SAS with STNORMWT shows the common pattern
that SAS generally calculates a larger value of the chi-square
statistic than does SUDAAN.
DISCUSSION
Unweighted Analyses with Standard Statistical Software
Although the empirical evidence in this paper is based only on
one type of survey (BRFSS), only on six states and only on 1993
data, the findings are consistent with other similar investigations
[7]. Using a standard statistical package with unweighted analyses
to analyze sample survey data generally will yield (1) biased
point estimates of population parameters, (2) underestimates of
the standard error for point estimates, (3) confidence intervals
on population parameters which are too narrow, and (4) tests of
significance which are too likely to reject the null hypothesis
because the standard errors or variability in the data generally
are underestimated.
The extent of the bias in unweighted point estimates will depend
upon the particular dataset and is related to the variability
of the FINALWT variable. If FINALWT has little variability in
the dataset, then an unweighted point estimate will be close to
a weighted point estimate. In the six state BRFSS dataset, the
value of FINALWT ranged from 70 to 72,663 over the six states.
This extreme variability in the value of FINALWT primarily is
due to varying sampling fractions across the states, i.e. a small
variation in state sample size (2400 to 4400) but widely different
statewide populations (1.4 to 22.8 million).
Another factor which contributes to the bias of estimates based
on unweighted analyses is the relationship between the value of
FINALWT and the variable being analyzed. In the dataset used
here the value of FINALWT is primarily influenced by the sampling
fraction in each state; you could say that certain states are
"oversampled". If state were strongly related to the
analysis variable (diabetes), then point estimates of diabetes
prevalence from unweighted analyses could be seriously biased.
In this dataset, the estimated statewide prevalences of diabetes
do not differ dramatically, ranging from 4% to 6%. If blacks
had been oversampled within each state to a large extent, then
the bias in estimated diabetes prevalence using unweighted analyses
would be substantial and positive, since blacks have a higher
prevalence of diabetes than do whites.
In addition to potentially biased point estimates from unweighted
analyses, standard errors and other measures of variability generally
are underestimated due to clustering and variability in FINALWT.
The intracluster correlation coefficients in BRFSS datasets generally
are positive but not substantial. This might be expected from
the Mitofsky-Waksberg RDD technique and the fact that most states
only have about three completed interviews per PSU (telephone
bank). Variability in FINALWT, and not clustering, is most likely
the most important factor contributing to the higher estimated
variances from SUDAAN in this BRFSS dataset. If other sample
survey datasets had been used with a higher degree of intra-cluster
correlation, unweighted analyses would have produced even smaller
estimates of variability, compared to SUDAAN.
Weighted Analyses with Standard Statistical Software
Using weighted analyses with FINALWT or NORMWT produces unbiased
point estimates of prevalence for the entire population over all
six states and for any strata (states) of interest. Although
not illustrated, these weighted analyses also yield unbiased point
estimates of diabetes prevalence among subpopulations based on
other characteristics, such as race or gender, where the subpopulations
contain observations from all or some strata.. Hence, either
of these two weighted techniques are fine if only point estimates
of prevalence are desired. Weighted SAS using FINALWT or NORMWT
tends to underestimate the standard error of estimated prevalences.
The degree of underestimation depends upon the size of the intra-cluster
correlation coefficient for the variables being analyzed. The
higher the intra-cluster correlation, the more serious the underestimation
of the variability. Weighted analyses using NORMWT or FINALWT
can be a reasonable analytical approach for point estimates of
population parameters under the following condition: all intra-cluster
correlation coefficients are near zero.
However, SAS with FINALWT in PROC FREQ gives substantially incorrect
results because the sample size is assumed to be the population
size. Whether this is true in other standard statistical packages
depends upon the packages' default options for weighted analyses
in chi-square tests.
SAS with NORMWT in PROC FREQ gives a larger chi-square statistic
than does SUDAAN for the entire population, about twice as large.
However, this procedure yields substantially incorrect chi-square
statistics for state-specific analyses. The state specific analyses
are wrong because the incorrect sample size is assumed for the
state analyses. This will occur also whenever subpopulations
are analyzed using NORMWT and the variable which defines the subpopulation
is related to the value of FINALWT.
SAS with the second normed weight, STNORMWT, gives more reasonable
values for the chi-square statistic for state level analyses,
although the chi-square statistics were always larger than with
SUDAAN. However, if the weight STNORMWT is used for analyses
over the entire population, a biased point estimate is obtained
for population parameters.
Conclusions
In searching for an approach to analyze sample survey data with standard statistical software, two reasonable criteria are:
Based on the empirical results above, a weighted analysis with
either FINALWT or NORMWT are the only approaches of the four considered
which yield unbiased point estimates for populations and subpopulations.
FINALWT is not good to use with SAS PROC FREQ because the sample
size is interpreted to be the population size. Hence, this leaves
only the option of a weighted analysis using NORMWT. However,
as shown above, weighted analyses with NORMWT possibly can yield
quite misleading results in subpopulation analyses.
It is recommended that sample survey software be used to analyze
sample survey data, especially for estimation of population parameters,
descriptive analyses and analytical analyses. Under certain circumstances,
standard statistical packages can be used to provide results approximately
equal to the results obtained from survey software. However,
recognition of these circumstances and awareness of the potential
pitfalls of using standard statistical packages requires detailed
information about the characteristics of the survey dataset (e.g.
sampling plan, weighting scheme, intracluster correlation) as
well as knowledge of the particular formulas and default options
used by the standard software package for weighted analyses.
In the end, it seems easier and less time consuming to use a sample
survey software package.
REFERENCES
[1] Brick JM, Broene P, James P and Severynse J (1996). A
User's Guide to WesVarPC, Westat, Inc., Rockville, MD.
[2] Brick JM and Kalton G (1996). Statistical Methods in
Medical Research, 5, 215-238.
[3] *Carlson B. *An article in EOB about software for variance
estimation in sample surveys.
[4] Graubard BI and Korn EL (1996). Statistical Methods in
Medical Research, 5, 263-281.
[5] Groves RM (1989). Survey Errors and Survey Costs,
John Wiley, New York,
[6] Korn EL and Graubard BI (1991). American Journal of Public
Health, 81(9), 1166-1173.
[7] Landis JR, Lepkowski JM, Eklund SA, and Stehouwer SA (1982).
A Statistical Methodology for Analyzing Data from a Complex Survey:
the First National Health and Nutrition Examination Survey. Vital
and Health Statistics, 2(92), DHEW, Washington, DC.
[8] LaVange LM, Stearns SC, Lafata JE, Koch GG, and Shah BV (1996).
Statistical Methods in Medical Research, 5, 311-329.
[9] Lepkowski JM, (1988). In Telephone Survey Methodology,
RM Groves, PP Biemer, LE Lyberg, JT Massey, WL Nicholls, and J
Waksberg, eds. John Wiley, New York, pp. 73-98.
[10] Pfeffermann D (1996). Statistical Methods in Medical
Research, 5, 239-261.
[11] Rust KF and Rao JNK (1996). Statistical Methods in Medical
Research, 5, 283-310.
[12] SAS Institute Inc. (1993). SAS Companion for the Microsoft
Window Environment, Version 6. SAS Institute Inc., Cary,
NC.
[13] Siegel PZ, Brockbill RM, Frazier EL, Mariolis P, Sanderson
LM and Waller MN, (1991). MMWR CDC Surveillance Summaries
40(4), 1-23.
[14] *Shah. EOB article on Taylor Series linearization approach
for variance estimation.
[15] Shah BV, Barnwell BG and Bieler GS, (1996). SUDAAN User's
Manual: Release 7.0, Research Triangle Institute, Research
Triangle Park, NC.
[16] Waksberg J (1978). Journal of the American Statistical
Association, 73(361), 40-46.
ACKNOWLEDGEMENT
This work was partially supported by CDC via the Division of Diabetes
Translation and the Division's 1996 Conference Planning Committee.
An invited paper based on this work was presented at the 1996
Diabetes Translation Conference, "Health Care in Transition:
Diabetes as a Model for Public Health", held in Washington
DC on March 31-April 3, 1996. All statements are the sole responsibility
of the author.
BIOGRAPHY
Donna Brogan received her Ph.D. in statistics in 1967 from Iowa State University. She has worked in sample surveys throughout her career, especially in design and analysis strategies. She conducts workshops on using sample survey software for data analysis. Currently she is Professor of Biostatistics at the Rollins School of Public Health at Emory University in Atlanta.
STATE |
|
|
|
|
California |
|
|
|
|
Florida |
|
|
|
|
Maryland |
|
|
|
|
Minnesota |
|
|
|
|
Tennessee |
|
|
|
|
West Virginia |
|
|
|
|
6 State Total |
|
|
|
STATE |
|
|
|
|
|
California |
|
|
|
|
|
Florida |
|
|
|
|
|
Maryland |
|
|
|
|
|
Minnesota |
|
|
|
|
|
Tennessee |
|
|
|
|
|
West Virginia |
|
|
|
|
|
6 State Total |
|
|
|
|
|
STATE |
|
|
|
|
|
California |
|
|
|
| |
Florida |
|
|
|
| |
Maryland |
|
|
|
| |
Minnesota |
|
|
|
| |
Tennessee |
|
|
|
| |
West Virginia |
|
|
|
| |
6 State Total |
|
|
|
|