This is the main function to estimate population average controlled difference (ACD), or under stronger assumptions, the population average treatment effect (PATE), for a given outcome between levels of a binary treatment, exposure, or other group membership variable of interest for clustered, stratified survey samples where sample selection depends on the comparison group.

svycdiff(
  df,
  id_form,
  a_form,
  s_form,
  y_form,
  y_fam = NULL,
  strata = NULL,
  cluster = NULL
)

Arguments

df

a data frame or tibble containing the variables in the models.

id_form

a string indicating which identification formula to be used. Options include "OM", "IPW1", or "IPW2". See 'Details' for more information.

a_form

an object of class `formula` which describes the propensity score model to be fit.

s_form

an object of class `formula` which describes the selection model to be fit.

y_form

an object of class `formula` which describes the outcome model to be fit. Only used if `id_form` = "OM", else `y_form = y ~ 1`.

y_fam

a family function. Only used if `id_form` = "OM", else `y_fam = NULL`.

strata

a string indicating strata, else `strata = NULL` for no strata.

cluster

a string indicating cluster IDs, else `cluster = NULL` for no clusters.

Value

`svycdiff` returns an object of class "svycdiff" which contains:

id_form

A string denoting Which method was selected for estimation

cdiff

A named vector containing the point estimate (est), standard error (err), lower confidence limit (lcl), upper confidence limit (ucl), and p-value (pval) for the estimated controlled difference

fit_y

An object of class inheriting from "glm" corresponding to the outcome model fit, or NULL for IPW1 and IPW2

fit_a

An object of class inheriting from "glm" corresponding to the propensity model fit

wtd_fit_a

An object of class inheriting from "glm" corresponding to the weighted propensity model fit

fit_s

An object of class "betareg" corresponding to the selection model fit, or NULL if the selection mechanism is known

Details

The argument id_form takes possible values "OM", "IPW1", or "IPW2", corresponding to the three formulas presented in Salerno et al. "OM" refers to the method that uses outcome modeling and direct standardization to estimate the controlled difference, while IPW1 and IPW2 are inverse probability weighted methods. IPW1 and IPW2 differ with respect to how the joint propensity and selection mechanisms are factored (see Salerno et al. for additional details). "OM", "IPW1", or "IPW2" are useful in different settings, which warrants some brief discussion here. "OM" requires you to specify an outcome regression model, whereas "IPW1" and "IPW2" do not require estimation, nor do they assume additivity or interactivity. However, while OM, IPW1, and IPW2 are consistent, OM is most efficient if correctly specified.

For IPW1/IPW2, y_form should be of the form Y ~ 1.

For known S, s_form should be of the form S ~ 1, where S is the variable corresponding to the probability of selection. There should be two additional variables in the dataset: P_S_cond_A1X and P_S_cond_A0X, corresponding to the known probability of selection conditional on A = 1 or 0 and X = x, respectively. If these quantities are not known, s_form should contain the variables which affect sample selection on the right hand side of the equation, including the comparison group variable of interest.

Examples


N <- 1000

dat <- simdat(N)

S <- rbinom(N, 1, dat$P_S_cond_AX)

samp <- dat[S == 1,]

y_mod <- Y ~ A * X

a_mod <- A ~ X

s_mod <- P_S_cond_AX ~ A + X

fit <- svycdiff(samp, "OM", a_mod, s_mod, y_mod, "gaussian")

fit
#> 
#> Outcome Model:  
#> glm(formula = y_form, family = y_fam, data = df)
#> 
#> Treatment Model:  
#> glm(formula = a_form, family = "quasibinomial", data = df)
#> 
#> Selection Model:  
#> betareg(formula = s_form, data = df)
#> 
#> CDIFF:  
#>   CDIFF      SE     LCL     UCL P-Value 
#>  1.0833  0.0946  0.8980  1.2687  0.0000 

summary(fit)
#> 
#> CDIFF:  
#>   CDIFF      SE     LCL     UCL P-Value 
#>  1.0833  0.0946  0.8980  1.2687  0.0000 
#> 
#> Outcome Model:  
#> 
#> Treatment Model:  
#> 
#> Selection Model:  
#> 
#> Call:
#> betareg(formula = s_form, data = df)
#> 
#> Quantile residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.9217 -0.5517  0.0146  0.5745  4.6954 
#> 
#> Coefficients (mean model with logit link):
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 0.015923   0.006394    2.49   0.0128 *  
#> A           0.983983   0.007764  126.74   <2e-16 ***
#> X           0.991856   0.004474  221.71   <2e-16 ***
#> 
#> Phi coefficients (precision model with identity link):
#>       Estimate Std. Error z value Pr(>|z|)    
#> (phi)   808.32      40.79   19.82   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#> 
#> Type of estimator: ML (maximum likelihood)
#> Log-likelihood:  2457 on 4 Df
#> Pseudo R-squared: 0.9923
#> Number of iterations: 61 (BFGS) + 3 (Fisher scoring)