This is the main function to estimate population average controlled difference (ACD), or under stronger assumptions, the population average treatment effect (PATE), for a given outcome between levels of a binary treatment, exposure, or other group membership variable of interest for clustered, stratified survey samples where sample selection depends on the comparison group.
svycdiff(
df,
id_form,
a_form,
s_form,
y_form,
y_fam = NULL,
strata = NULL,
cluster = NULL
)
a `data.frame` or `tibble` containing the variables in the models.
a `string` indicating which identification formula to be used.
Options include "OM"
, "IPW1"
, "IPW2"
, or "DR"
.
See 'Details' for information.
an object of class `formula` which describes the propensity score model to be fit.
an object of class `formula` which describes the selection model to be fit.
an object of class `formula` which describes the outcome model
to be fit. Only used if id_form = "OM"
, else y_form = y ~ 1
.
a `family` function. Only used if id_form = "OM"
, else
y_fam = NULL
. Current options include gaussian
,
binomial
, or poisson
.
a `string` indicating strata, else strata = NULL
for no strata.
a `string` indicating cluster IDs, else cluster = NULL
for no clusters.
`svycdiff` returns an object of class "svycdiff" which contains:
A string denoting Which method was selected for estimation
A named vector containing the point estimate (est), standard error (err), lower confidence limit (lcl), upper confidence limit (ucl), and p-value (pval) for the estimated controlled difference
An object of class inheriting from "glm" corresponding to the outcome model fit, or NULL for IPW1 and IPW2
An object of class inheriting from "glm" corresponding to the propensity model fit
An object of class inheriting from "glm" corresponding to the weighted propensity model fit
An object of class "betareg" corresponding to the selection model fit, or NULL if the selection mechanism is known
The argument id_form
takes possible values "OM"
,
"IPW1"
, "IPW2"
, or "DR"
, corresponding to the four
formulas presented in Salerno et al. "OM"
refers to the method that
uses outcome modeling and direct standardization to estimate the controlled
difference, while "IPW1"
and "IPW2"
are inverse probability
weighted methods. "IPW1"
and "IPW2"
differ with respect to how
the joint propensity and selection mechanisms are factored (see Salerno et
al. for additional details). "DR"
refers to the doubly robust form of
estimator, which essentially combines "OM"
and "IPW2"
.
For id_form = "IPW1"
or id_form = "IPW2"
, y_form
should
be of the form Y ~ 1
.
For known selection mechanisms, s_form
should be of the form
pS ~ 1
, where pS
is the variable corresponding to the
probability of selection (e.g., inverse of the selection weight), and there
should be two additional variables in the dataset: P_S_cond_A1X
and
P_S_cond_A0X
, corresponding to the known probability of selection
conditional on \(A = 1\) or \(0\) and \(X = x\), respectively. If
these quantities are not known, s_form
should contain the variables
which affect sample selection on the right hand side of the equation,
including the comparison group variable of interest.
N <- 1000
dat <- simdat(N)
S <- rbinom(N, 1, dat$pS)
samp <- dat[S == 1,]
y_mod <- Y ~ A * X1
a_mod <- A ~ X1
s_mod <- pS ~ A + X1
fit <- svycdiff(samp, "DR", a_mod, s_mod, y_mod, "gaussian")
fit
#>
#> Outcome Model:
#> glm(formula = y_form, family = y_fam, data = df)
#>
#> Treatment Model:
#> glm(formula = a_form, family = "quasibinomial", data = df)
#>
#> Selection Model:
#> betareg(formula = s_form, data = df)
#>
#> CDIFF:
#> CDIFF SE LCL UCL P-Value
#> 0.9998 0.1356 0.7341 1.2656 0.0000
summary(fit)
#>
#> CDIFF:
#> CDIFF SE LCL UCL P-Value
#> 0.9998 0.1356 0.7341 1.2656 0.0000
#>
#> Outcome Model:
#>
#> Treatment Model:
#>
#> Selection Model:
#>
#> Call:
#> betareg(formula = s_form, data = df)
#>
#> Quantile residuals:
#> Min 1Q Median 3Q Max
#> -3.3345 -0.6186 0.0328 0.6709 3.2765
#>
#> Coefficients (mean model with logit link):
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.013234 0.006230 2.124 0.0336 *
#> A 0.976878 0.008361 116.834 <2e-16 ***
#> X1 0.998339 0.005393 185.121 <2e-16 ***
#>
#> Phi coefficients (precision model with identity link):
#> Estimate Std. Error z value Pr(>|z|)
#> (phi) 619.90 35.86 17.29 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Type of estimator: ML (maximum likelihood)
#> Log-likelihood: 1654 on 4 Df
#> Pseudo R-squared: 0.9927
#> Number of iterations: 98 (BFGS) + 1 (Fisher scoring)