Function to simulate data based on specified relationships between the generated (continuous) outcome, variable of interest, confounder, and selection mechanism.

simdat(
  N,
  X_dist = "continuous",
  S_known = FALSE,
  tau_0 = 0,
  tau_X = 1,
  beta_0 = 0,
  beta_A = 1,
  beta_X = 1,
  hetero = TRUE,
  alpha_0 = 0,
  alpha_X = 1,
  alpha_A = 1,
  alpha_AX = 0.1
)

Arguments

N

int - Number of observations to be generated

X_dist

string - Distribution of the confounding variable, X. Defaults to "continuous" for a N(1, 1) variable, or "binary" for a Bernoulli(0.5) variable

S_known

boolean - Logical for whether the selection mechanism should be treated as known (deterministic) or needs to be estimated (simulated with Gaussian error; defaults to FALSE)

tau_0

double - Intercept for propensity model (defaults to 0)

tau_X

double - Coefficient for X in propensity model (defaults to 1)

beta_0

double - Intercept for selection model (defaults to 0)

beta_A

double - Coefficient for A in selection model (defaults to 1)

beta_X

double - Coefficient for X in selection model (defaults to 1)

hetero

boolean - Logical for heterogeneous treatment effect in the outcome model (defaults to TRUE)

alpha_0

double - Intercept for outcome model (defaults to 0)

alpha_X

double - Coefficient for X in outcome model (defaults to 1)

alpha_A

double - Coefficient for A in outcome model (defaults to 1)

alpha_AX

double - Coefficient for interaction between A and X in outcome model (only used if hetero == TRUE; defaults to 0.1)

Value

A data.frame with N observations of 7 variables:

Y

Observed outcome (continuous)

A

Comparison group variable of interest (binary)

X

Confounding variable (continuous or binary)

P_A_cond_X

True probability of A = 1 conditional on X (continuous)

P_S_cond_AX

True probability of selection (S = 1) conditional on A and X (continuous)

P_S_cond_A1X

True probability of selection (S = 1) conditional on A = 1 and X (continuous)

P_S_cond_A0X

True probability of selection (S = 1) conditional on A = 0 and X (continuous)

CDIFF

True controlled difference in outcomes by comparison group (double)

Details

The data are generated as follows. For a user-given number, N, observations in our so-called super population, we first generate a confounding variable, X, which relates to our outcome, Y, our variable of interest, A, and our selection indicator, S. We generate population-level data with X ~ N(1,1) or X ~ Bern(0.5) depending on whether distribution of X is chosen to be X_dist = "continous" or X_dist = "binary", respectively.

We then generate the remaining data from three models:

1. Propensity Model
2. Selection Model
3. Outcome Model

Examples


N <- 100000

dat <- simdat(N)

head(dat)
#>            Y A          X P_A_cond_X P_S_cond_AX P_S_cond_A1X P_S_cond_A0X
#> 1 -0.9374768 0 -0.4000435  0.4013019   0.3830190    0.6279066    0.3830190
#> 2  1.7171265 1  1.2553171  0.7782189   0.9076775    0.9076775    0.7834017
#> 3 -2.5860869 0 -1.4372636  0.1919695   0.1848481    0.3813458    0.1848481
#> 4  2.1034929 0  0.9944287  0.7299618   0.7233937    0.8766799    0.7233937
#> 5  5.0639509 1  1.6215527  0.8350092   0.9421912    0.9421912    0.8570581
#> 6  2.4688280 1  2.1484116  0.8955203   0.9587912    0.9587912    0.8953901
#>      CDIFF
#> 1 1.102055
#> 2 1.102055
#> 3 1.102055
#> 4 1.102055
#> 5 1.102055
#> 6 1.102055