Summary of survey software: Estimands and Statistical Analyses Accomodated
This is a summary of the information included under the heading
"Types of estimands and statistical analyses that can be accomodated"
for each of the software packages described on these pages. Select the
appropriate title for more information on any package.
AM Software was developed particularly for analysis of data from educational surveys (such as the National Assessment of Educational Progress). It includes a number of analyses under the rubric of Marginal Maximum Likelihood (MML),
based on test theory (Item Response Theory) and used particularly to
analyze data in which different subjects complete to different subscales
of a test. Procedures in this group include MML Regression, MML Means,
MML Table (Ordinal), MML Table (Nominal), MML Composite Means, MML
Composite Regression, NALS Table, and NAEP Table.
It also includes more standard analyses including Frequencies,
Descriptives, Correlations, (Linear) Regression, Percentiles, Probit and
Logit Regression.
Data manipulation capabilities include ability to recode or calculate
variables.
New features currently in a Beta-test version (available for download)
include:
- Graphics: bar charts, line charts, and the new Sectioned Density Plot
designed to compare distributions.
- New import/export facilities that enable you to easily import/export data to/from nearly 150 different data file formats, as well as over ODBC.
- Sample-design consistent Wald tests of model fit for all regression models, including the point-and-click ability to test the significance of subsets of regressors.
- A new Mantel-Haenszel stratified chi-square test of the type typically used to evaluate differential item function on tests. As with all AM procedures, this one provides significance tests that are appropriate for complex sample designs.
Bascula computes adjustment weights using auxiliary variables. It
incorporates various weighting techniques. If only categorical
auxiliary variables are used, the simplest technique is complete
poststratification. For incomplete poststratification, Bascula
offers a choice between linear weighting (based on the general
regression estimator) and multiplicative weighting (based on iterative
proportional fitting). Linear weighting can also be applied if one or
more of the auxiliary variables is a quantitative variable.
The program can calculate estimates of population totals, means, and
ratios.
Totals, means, ratios, proportions for total population and domains;
output includes estimated value of the parameter, standard error,
coefficient of variation, 95% confidence interval, design effect
(DEFF), and number of observations upon which the estimate is based.
Computes sampling errors and derived statistics such as design effects and
intra-cluster correlations for ratios and their differences over population
subclasses.
Means, proportions, odds ratios, risk ratios, risk differences.
EpiInfo also includes a wide variety of other estimation modules, not
necessarily designed for survey data estimation, and there is a related
mapping program, EpiMap.
The focus of this software is on calibration estimation using generalized regression (GREG) estimator theory.
- Main functions are: calculation of sample design weights,
calculation of g-weights under a calibration approach, calculation of
calibration estimates, and calculation of synthetic estimates.
- Estimation of totals, averages, and ratios, for universe or domains.
- Auxiliary variables are used for estimation through the Generalized
Regression (GREG) approach. This framework permits a large family of
estimators including the traditional separate, combined and post-stratified
estimators.
- Synthetic estimates can also be produced from auxiliary
information for each domain of interest.
- Descriptive statistics includig means, proportions, subgroup differences, linear contrasts.
- Multiple imputation for missing data.
- A variety of SAS procedures can be run under IVEware, including
CALIS, CATMOD, GENMOD, LIFEREG, MIXED, NLIN, PHREG, and PROBIT for
linear, logistic, Poisson, survival, and polytomous regression models.
These runs incorporate survey design-based variance estimation and/or
multiple imputation analysis for missing data.
Constructs estimates and standard errors for totals,
means, quantiles, ratios, difference of ratios and entries in
two-way tables. Weighted regression equations can also be estimated.
Add-on modules calculate logistic regressions and estimation with
poststratification.
- mean, quantiles, variance, tables, ratios, totals
- graphics: scatterplots, smoothers, boxplots, barcharts
- generalised linear models (e.g. linear regression, logistic regression, Poisson models, etc.)
- proportional hazards models
- proportional odds and other cumulative link models
- survival curves
- post-stratification, raking, and calibration
- tests of association in two-way tables
- loglinear models for multiway tables
SAS/STAT Software provides the SURVEYSELECT procedure for sample
selection and the SURVEYMEANS and SURVEYREG procedures for producing
descriptive statistics and regression estimates, respectively. These
three procedures are available in SAS versions 8 and higher. Beginning
with SAS 9, SAS/STAT also includes the SURVEYFREQ procedure for computing
crosstabulations and tests of association, and the SURVEYLOGISTIC
procedure for performing logistic regression. The analysis procedures
can accommodate complex survey designs that include stratification,
clustering, and unequal weighting.
- The SURVEYSELECT procedure provides a variety of methods for
selecting probability-based random samples. The procedure can select a
simple random sample, or samples with design features such as
stratification, clustering or multistage sampling, or unequal
probabilities of selection. It can accomodate very large sampling
frames. It can draw a replicated sampling, i.e. a sample composed of a
set of replicates, each selected in the same way.
PROC SURVEYSELECT accepts the sampling frame as a SAS data set.
Control language specifies the selection methods, the desired sample size
or sampling rate, and other parameters. The output data set contains
the selected units, with selection probabilities and sampling weights.
- The SURVEYMEANS procedure estimates population totals, means,
and ratios (SAS 8.2 and later), with estimates of their variances,
confidence limits, and other descriptive statistics, under sample
designs that may include stratification, clustering, and unequal
weighting.
- The SURVEYREG procedure estimates regression coefficients by
generalized least squares, using elementwise regression, assuming that
the regression coefficients are the same across strata and PSUs.
- The SURVEYLOGISTIC procedure fits logistic regression models
for discrete response survey data by maximum likelihood, incorporating
the sample design into the analysis.
- The SURVEYFREQ procedure produces one-way to n-way frequency
and crosstabulation tables from sample survey data. These tables include
estimates of population totals, population proportions, and corresponding
standard errors. Confidence limits, coefficients of variation, and
design effects are also available, as are tests of independence (Wald test,
Rao-Scott likelihood ratio test, Rao-Scott chi-square test).
- Complex Samples Plan module: Specifies design information
for sample selection and/or analysis. (See "Designs" section for designs that are supported.) The file created by this module is used by all other modules.
- Complex Samples Selection module: Chooses units according
to a sample design specified by Complex Samples Plan.
- Complex Samples Frequencies module: Cell counts
and proportions with standard errors.
- Complex Samples Descriptives module: Estimates sums, means, and ratios with standard errors and design effects, for whole population or subpopulations.
- Complex Samples Crosstabs module: One- or two-way
tabulations with standard errors, design effects, coefficients of variation,
odds ratios and/or relative risks, and tests of independence, taking into
account the complex survey design.
- Complex Samples General Linear Model module: Linear regression models including analysis of variance and analysis of covariance models. Model parameters with design-corrected standard errors, t-tests and Wald F and chi-square tests, adjustments for multiple comparisons.
- Complex Samples Logistic Regression module: Binary and
multinomial logistic regression models, with similar options for linear
predictor specification to CSGLM.
Note: above information is for SPSS 13.0; SPSS 12 supports a more
restricted set of features.
There are about currently about 50 Stata commands for various analyses
of survey data, including the following analyses and others:
- Estimation of means, totals, ratios, and proportions.
- Linear regression, logistic regression, and probit; also, tobit,
interval, censored, instrumental variables, multinomial logit, ordered
logit and probit, and Poisson. Point estimates, associated standard
errors, confidence intervals, and design effects for the full population
or subpopulations are displayed. Auxiliary commands will display all
this information for linear combinations (e.g., differences) of
estimators, and conduct hypothesis tests.
- Contingency tables with Rao-Scott corrections of
chi-squared tests; new survey-corrected regression commands including
tobit, interval, censored, instrumental variables, multinomial logit,
ordered logit and probit, and Poisson.
SUDAAN includes the following statistical procedures:
-
MULTILOG: Fits multinomial logistic regression models to ordinal and
nominal categorical data and computes hypothesis tests for model
parameters. Estimates odds ratios and their 95% confidence intervals
for each model parameter. Has GEE (Generalized Estimating Equation)
modeling capabilities for efficient parameter estimation.
-
REGRESS: Fits linear regression models to continuous outcomes and
performs hypothesis tests concerning the model parameters.
-
LOGISTIC: Fits logistic regression models to binary data and computes
hypothesis tests for model parameters. Estimates odds ratios and their
95% confidence intervals for each model parameter.
-
SURVIVAL: Fits proportional hazards (Cox regression) models to
failure time data. Estimates hazard ratios and their 95% confidence
intervals for each model parameter.
-
CROSSTAB: Computes frequencies, percentage distributions, odds
ratios, relative risks, and their standard errors (or confidence
intervals) for user-specified cross-tabulations, as well as chi-square
tests of independence and the Cochran-Mantel-Haenszel chi-square test
for stratified two-way tables.
-
DESCRIPT: Computes estimates of means, totals, proportions,
percentages, geometric means, quantiles, and their standard errors.
Also computes standardized estimates and tests of single
degree-of-freedom contrasts among levels of a categorical variable.
-
RATIO: Computes estimates and standard errors of generalized ratios
of the form (Summation y) / (Summation x), where x and y are observed
variables. Also computes standardized estimates and tests
single-degree-of-freedom contrasts among levels of a categorical
variable.
-
The EFFECT statement allows users to specify contrasts of regression
coefficients and hypothesis tests using simple effect names.
VPLX calculates summary statistics (means, proportions, and totals for
the entire sample or by subclasses) and their standard errors. It can
be used to calculate a valid t-test. Arithmetical transformations of
the data can be specified in the command language, which means that
standard errors can be calculated for arbitrary sums, differences,
products, and quotients.
- Estimates from tables (up to 8-way), including totals, means, percentages,
test of independence, and user-specified functions of variables or
estimates in cells of the table.
- Estimates of medians and other quantiles.
- Regression analysis, linear regression, logistic (dichomotomous and
polychotomous) regression, and ANOVA. Parameters estimates and tests of
hypotheses.
Return to main page for survey software