Package 'trtswitch'

Title: Treatment Switching
Description: Implements rank preserving structural failure time model (RPSFTM), iterative parameter estimation (IPE), inverse probability of censoring weights (IPCW), marginal structural model (MSM), simple two-stage estimation (TSEsimp), and improved two-stage estimation with g-estimation (TSEgest) methods for treatment switching in randomized clinical trials.
Authors: Kaifeng Lu [aut, cre] (ORCID: <https://orcid.org/0000-0002-6160-7119>)
Maintainer: Kaifeng Lu <[email protected]>
License: GPL (>= 2)
Version: 0.2.7
Built: 2026-06-04 17:00:00 UTC
Source: https://github.com/kaifenglu/trtswitch

Help Index


Treatment Switching

Description

Implements rank preserving structural failure time model (RPSFTM), iterative parameter estimation (IPE), inverse probability of censoring weights (IPCW), marginal structural model (MSM), simple two-stage estimation (TSEsimp), and improved two-stage estimation with g-estimation (TSEgest) methods for treatment switching in randomized clinical trials.

Details

To enable bootstrapping of the parameter estimates, we implements many standard survival analysis methods in C++. These include but are not limited to Kaplan-Meier estimates of the survival curves, log-rank tests, accelerated failure time (AFT) models, and Cox proportional hazards models.

All treatment switching adjustment methods allow treatment switching in both treatment arms, stratification and covariates adjustment. For the AFT models, stratification factors are included as covariates (main effects only or all-way interactions) because SAS PROC LIFEREG does not have the strata statement. The RPSFTM, IPE and TSE methods can be used with or without recensoring. The IPCW and MSM methods can produce stabilized and truncated weights.

The treat variable adopts a treatment-before-control order, except with 1/0 or TRUE/FALSE coding.

Author(s)

Kaifeng Lu, [email protected]

References

James M. Robins and Anastasios A. Tsiatis. Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics. 1991;20(8):2609-2631.

Ian R. White, Adbel G. Babiker, Sarah Walker, and Janet H. Darbyshire. Randomization-based methods for correcting for treatment changes: Examples from the CONCORDE trial. Statistics in Medicine. 1999;18:2617-2634.

Michael Branson and John Whitehead. Estimating a treatment effect in survival studies in which patients switch treatment. Statistics in Medicine. 2002;21:2449-2463.

Ian R White. Letter to the Editor: Estimating treatment effects in randomized trials with treatment switching. Statistics in Medicine. 2006;25:1619-1622.

James M. Robins and Dianne M. Finkelstein. Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56:779-788.

Nicholas R. Latimer, Keith R. Abrams, Paul C. Lambert, Michael K. Crowther, Allan J. Wailoo, Jonathan P. Morden, Ron L. Akehurst, and Michael J. Campbell. Adjusting for treatment switching in randomised controlled trials - A simulation study and a simplified two-stage method. Statistical Methods in Medical Research. 2017;26(2):724-751.

Nicholas R. Latimer, Ian R. White, Kate Tilling, and Ulrike Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.

James M. Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550-560.

Miguel Angel Hernan, Babette Brumback, and James M. Robins. Marginal structural modesl to estimate the causual effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561-570.


Baseline subject-level data

Description

This data set contains baseline subject-level data. Of note, PDDT can be derived from the ADT variable of the ADTTE data set by selecting PARAMCD == "INPFS" & CNSR == 0 & EVNTDESC == "PROGRESSIVE DISEASE". Additionally, OSDT and DIED can be derived from the ADT and CNSR variables of the ADTTE data set by selecting PARAMCD == "OS".

Usage

adsl

Format

An object of class tbl_df (inherits from tbl, data.frame) with 412 rows and 12 columns.

Details

SUBJID

subject ID

SEX

sex: "M" or "F"

STRAT1V

stratification factor 1: ECOG PS

STRAT2V

stratification factor 2: inv. chosen chemotherapy

RANDDT

randomization date

TRT01P

planned treatment: Active or Placebo

TRTSDT

treatment start date

PDDT

date of disease progression

XODT

date of treatment crossover

OSDT

date of death or censoring

DIED

whether the patient died

DCUTDT

date of data cut


Longitudinal time-dependent covariate data

Description

This data set contains longitudinal time-dependent covariate data on ECOG101 and LDH.

Usage

adtdc

Format

An object of class tbl_df (inherits from tbl, data.frame) with 9813 rows and 4 columns.

Details

SUBJID

subject ID

PARAMCD

parameter code

ADT

analysis date

AVAL

covariate value


Acute myelogenous leukemia survival data from the survival package

Description

Survival in patients with acute myelogenous leukemia.

time

Survival or censoring time

status

censoring status

x

maintenance chemotherapy given or not

Usage

aml

Format

An object of class data.frame with 23 rows and 3 columns.


Assess Proportional Hazards Assumption Based on Supremum Test

Description

Obtains the standardized score processes and the simulated distribution under the null hypothesis as well as the p-values for the supremum tests.

Usage

assess_phregr(object, resample = 1000, seed = 12345)

Arguments

object

The output from the phregr call.

resample

The number of simulation samples for the supremem test.

seed

The random seed for the simulations.

Details

The supremum test corresponds to the ASSESS statement with ph option of SAS PROC PHREG.

Value

A list with the following components:

  • time the unique event times.

  • score_t the observed standardized score process.

  • score_t_list a list of simulated standardized score processes under the null hypothesis.

  • max_abs_value the supremum of the absolute value of the observed standardized score process for each covariate and the supremum of the sum of absolute values of the observed standardized score processes across all covariates.

  • p_value the p-values for the supremum tests for each covariate and the global test.

Author(s)

Kaifeng Lu, [email protected]

References

D. Y. Lin, L. J. Wei, and Z. Ying. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 1993; 80:557-572.

Examples

fit <- phregr(data = liver, time = "Time", event = "Status", 
              covariates = c("log(Bilirubin)", "log(Protime)", 
                             "log(Albumin)", "Age", "Edema"),
              ties = "breslow")
              
aph <- assess_phregr(fit, resample = 1000, seed = 314159)
  
aph

plot(aph, nsim = 20)

Stanford heart transplant data from the survival package

Description

Survival of patients on the waiting list for the Stanford heart transplant program.

start, stop, event

entry and exit time and status for the time interval

age

age-48 years

year

year of acceptance (in years after Nov 1, 1967)

surgery

prior bypass surgery 1=yes, 0=no

transplant

received transplant 1=yes, 0=no

id

patient id

Usage

heart

Format

An object of class data.frame with 172 rows and 8 columns.


Simulated CONCORDE trial data from the rpsftm package

Description

Patients were randomly assigned to receive treatment immediately or deferred, and those in the deferred arm could cross over and receive treatment. The primary endpoint was time to disease progression.

id

Patient identification number

def

Indicator that the participant was assigned to the deferred treatment arm

imm

Indicator that the participant was assigned to the immediate treatment arm

censyrs

The censoring time, in years, corresponding to the close of study minus the time of entry for each patient

xo

Indicator that crossover occurred

xoyrs

The time, in years, from entry to switching, or 0 for patients in the immediate arm

prog

Indicator of disease progression (1), or censoring (0)

progyrs

Time, in years, from entry to disease progression or censoring

entry

The time of entry into the study, measured in years from the date of randomisation

Usage

immdef

Format

An object of class data.frame with 1000 rows and 9 columns.


The binary data from Cox and Snell (1989, pp. 10-11).

Description

The dataset consits of the number of ingots not ready for rolling and the number of ingots ready for rolling for a number of combinations of heating time and soaking time.

Usage

ingots

Format

An object of class tbl_df (inherits from tbl, data.frame) with 25 rows and 4 columns.

Details

Heat

The heating time

Soak

The soaking time

NotReady

Response indicator, with a value 1 for units not ready for rolling (event) and a value of 0 for units ready for rolling (nonevent)

Freq

The frequency of occurrence of each combination of Heat, Soak, and NotReady


Inverse Probability of Censoring Weights (IPCW) for Treatment Switching

Description

Excludes data after treatment switching and fits a switching model to estimate the probability of not switching. The inverse of these probabilities (inverse probability of censoring weights) are then used as weights in a weighted Cox model to obtain the adjusted hazard ratio.

Usage

ipcw(
  data,
  id = "id",
  stratum = "",
  tstart = "tstart",
  tstop = "tstop",
  event = "event",
  treat = "treat",
  swtrt = "swtrt",
  swtrt_time = "swtrt_time",
  base_cov = "",
  numerator = "",
  denominator = "",
  logistic_switching_model = FALSE,
  strata_main_effect_only = TRUE,
  ns_df = 3,
  firth = FALSE,
  flic = FALSE,
  stabilized_weights = TRUE,
  trunc = 0,
  trunc_upper_only = TRUE,
  swtrt_control_only = TRUE,
  alpha = 0.05,
  ties = "efron",
  boot = FALSE,
  n_boot = 1000,
  seed = 0,
  nthreads = 0
)

Arguments

data

The input data frame that contains the following variables:

  • id: The id to identify observations belonging to the same subject for counting process data with time-dependent covariates.

  • stratum: The stratum.

  • tstart: The starting time of the time interval for counting-process data with time-dependent covariates.

  • tstop: The stopping time of the time interval for counting-process data with time-dependent covariates.

  • event: The event indicator, 1=event, 0=no event.

  • treat: The randomized treatment indicator, 1=treatment, 0=control.

  • swtrt: The treatment switch indicator, 1=switch, 0=no switch.

  • swtrt_time: The time from randomization to treatment switch.

  • base_cov: The baseline covariates (excluding treat) used in the outcome model.

  • numerator: The baseline covariates (excluding treat) used in the numerator switching model for stabilized weights.

  • denominator: The baseline (excluding treat) and time-dependent covariates used in the denominator switching model.

id

The name of the id variable in the input data.

stratum

The name(s) of the stratum variable(s) in the input data.

tstart

The name of the tstart variable in the input data.

tstop

The name of the tstop variable in the input data.

event

The name of the event variable in the input data.

treat

The name of the treatment variable in the input data.

swtrt

The name of the swtrt variable in the input data.

swtrt_time

The name of the swtrt_time variable in the input data.

base_cov

The names of baseline covariates (excluding treat) in the input data for the Cox model.

numerator

The names of baseline covariates (excluding treat) in the input data for the numerator switching model for stabilized weights.

denominator

The names of baseline (excluding treat) and time-dependent covariates in the input data for the denominator switching model.

logistic_switching_model

Whether a pooled logistic regression switching model is used.

strata_main_effect_only

Whether to only include the strata main effects in the logistic regression switching model. Defaults to TRUE, otherwise all possible strata combinations will be considered in the switching model.

ns_df

Degrees of freedom for the natural cubic spline for visit-specific intercepts of the pooled logistic regression model. Defaults to 3 for two internal knots at the 33 and 67 percentiles of the treatment switching times.

firth

Whether the Firth's bias reducing penalized likelihood should be used.

flic

Whether to apply intercept correction to obtain more accurate predicted probabilities.

stabilized_weights

Whether to use the stabilized weights. The default is TRUE.

trunc

The truncation fraction of the weight distribution. Defaults to 0 for no truncation in weights.

trunc_upper_only

Whether to truncate the weights from the upper end of the weight distribution only. Defaults to TRUE, otherwise the weights will be truncated from both the lower and upper ends of the distribution.

swtrt_control_only

Whether treatment switching occurred only in the control group. The default is TRUE.

alpha

The significance level to calculate confidence intervals.

ties

The method for handling ties in the Cox model, either "breslow" or "efron" (default).

boot

Whether to use bootstrap to obtain the confidence interval for hazard ratio. Defaults to FALSE.

n_boot

The number of bootstrap samples.

seed

The seed to reproduce the bootstrap results.

nthreads

The number of threads to use in bootstrapping (0 means the default RcppParallel behavior).

Details

The hazard ratio and confidence interval under a no-switching scenario are obtained as follows:

  • Exclude all observations after treatment switch.

  • Define the crossover and event indicators for the last time interval of each subject.

  • For time-dependent Cox switching models, replicate unique event times across treatment arms within each subject.

  • Fit the denominator switching model (and numerator model for stabilized weights) to estimate inverse probability of censoring weights. Either a Cox model with time-dependent covariates or a pooled logistic regression model can be used.

    • For the pooled logistic regression model, the probability of remaining uncensored (i.e., not switching) is calculated as 1p^switch1 - \hat{p}_{\text{switch}} and accumulated over time up to the start of each interval.

    • For the time-dependent Cox model, the probability of remaining unswitched is derived from the estimated baseline hazard and predicted risk score up to the end of each interval.

  • Fit a weighted Cox model to the outcome survival times (excluding data after switching) to estimate the hazard ratio.

  • Construct the p-value and confidence interval for the hazard ratio using either robust sandwich variance or bootstrapping. When bootstrapping is used, the confidence interval and p-value are based on a t-distribution with n_boot - 1 degrees of freedom.

Value

A list with the following components:

  • pvalue: The two-sided p-value.

  • pvalue_type: The type of two-sided p-value for treatment effect, i.e., "Cox model" or "bootstrap".

  • hr: The estimated hazard ratio from the Cox model.

  • hr_CI: The confidence interval for hazard ratio.

  • hr_CI_type: The type of confidence interval for hazard ratio, either "Cox model" or "bootstrap".

  • event_summary: A data frame containing the count and percentage of deaths and switches by treatment arm.

  • data_switch: A list of input data for the switching models by treatment group. The variables include id, stratum, "tstart", "tstop", "cross", denominator, swtrt, and swtrt_time. For logistic switching models, stratum variables are converted to dummy variables, and natural cubic spline basis variables are created for the visit-specific intercepts.

  • fit_switch: A list of fitted switching models for the denominator and numerator by treatment group.

  • data_outcome: The input data for the outcome Cox model including the inverse probability of censoring weights. The variables include id, stratum, "tstart", "tstop", "event", "treated", "unstablized_weight", "stabilized_weight", base_cov, and treat.

  • weight_summary: A data frame summarizing the weights by treatment arm.

  • km_outcome: The Kaplan-Meier estimates of the survival functions for the treatment and control groups based on the weighted outcome data.

  • lr_outcome: The log-rank test results for the treatment effect based on the weighted outcome data.

  • fit_outcome: The fitted outcome Cox model.

  • fail: Whether a model fails to converge.

  • settings: A list containing the input parameter values.

  • fail_boots: The indicators for failed bootstrap samples if boot is TRUE.

  • fail_boots_data: The data for failed bootstrap samples if boot is TRUE.

  • hr_boots: The bootstrap hazard ratio estimates if boot is TRUE.

Author(s)

Kaifeng Lu, [email protected]

References

James M. Robins and Dianne M. Finkelstein. Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56(3):779-788.

Examples

# Example 1: pooled logistic regression switching model

sim1 <- tssim(
  tdxo = TRUE, coxo = TRUE, allocation1 = 1, allocation2 = 1,
  p_X_1 = 0.3, p_X_0 = 0.3, 
  rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, 
  gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8,
  zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, 
  alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, 
  theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2,
  rate_C = 0.0000855, accrualIntensity = 20/30,
  fixedFollowup = FALSE, plannedTime = 1350, days = 30,
  n = 500, NSim = 100, seed = 314159)
  
fit1 <- ipcw(
  sim1[[1]], id = "id", tstart = "tstart", 
  tstop = "tstop", event = "event", treat = "trtrand", 
  swtrt = "xo", swtrt_time = "xotime", 
  base_cov = "bprog", numerator = "bprog", 
  denominator = c("bprog", "L"),
  logistic_switching_model = TRUE, ns_df = 3,
  swtrt_control_only = TRUE, boot = FALSE)
  
fit1 

# Example 2: time-dependent covariates Cox switching model

fit2 <- ipcw(
  shilong, id = "id", tstart = "tstart", tstop = "tstop", 
  event = "event", treat = "bras.f", swtrt = "co", 
  swtrt_time = "dco", 
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", 
               "pathway.f"),
  numerator = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", 
                "pathway.f"),
  denominator = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                  "pathway.f", "ps", "ttc", "tran"),
  swtrt_control_only = FALSE, boot = FALSE)

fit2

Iterative Parameter Estimation (IPE) for Treatment Switching

Description

Estimates the causal parameter by iteratively fitting an accelerated failure time (AFT) model to counterfactual unswitched survival times, and derives the adjusted hazard ratio from the Cox model using counterfactual unswitched survival times based on the estimated causal parameter.

Usage

ipe(
  data,
  id = "id",
  stratum = "",
  time = "time",
  event = "event",
  treat = "treat",
  rx = "rx",
  censor_time = "censor_time",
  base_cov = "",
  aft_dist = "weibull",
  strata_main_effect_only = TRUE,
  low_psi = -2,
  hi_psi = 2,
  treat_modifier = 1,
  recensor = TRUE,
  admin_recensor_only = TRUE,
  autoswitch = TRUE,
  root_finding = "brent",
  alpha = 0.05,
  ties = "efron",
  tol = 1e-06,
  boot = FALSE,
  n_boot = 1000,
  seed = 0,
  nthreads = 0
)

Arguments

data

The input data frame that contains the following variables:

  • id: The subject id.

  • stratum: The stratum.

  • time: The survival time for right censored data.

  • event: The event indicator, 1=event, 0=no event.

  • treat: The randomized treatment indicator, 1=treatment, 0=control.

  • rx: The proportion of time on active treatment.

  • censor_time: The administrative censoring time. It should be provided for all subjects including those who had events.

  • base_cov: The baseline covariates (excluding treat).

id

The name of the id variable in the input data.

stratum

The name(s) of the stratum variable(s) in the input data.

time

The name of the time variable in the input data.

event

The name of the event variable in the input data.

treat

The name of the treatment variable in the input data.

rx

The name of the rx variable in the input data.

censor_time

The name of the censor_time variable in the input data.

base_cov

The names of baseline covariates (excluding treat) in the input data for the causal AFT model and the outcome Cox model.

aft_dist

The assumed distribution for time to event for the AFT model. Options include "exponential", "weibull" (default), "loglogistic", and "lognormal".

strata_main_effect_only

Whether to only include the strata main effects in the AFT model. Defaults to TRUE, otherwise all possible strata combinations will be considered in the AFT model.

low_psi

The lower limit of the causal parameter.

hi_psi

The upper limit of the causal parameter.

treat_modifier

The optional sensitivity parameter for the constant treatment effect assumption.

recensor

Whether to apply recensoring to counterfactual survival times. Defaults to TRUE.

admin_recensor_only

Whether to apply recensoring to administrative censoring times only. Defaults to TRUE. If FALSE, recensoring will be applied to the actual censoring times for dropouts.

autoswitch

Whether to exclude recensoring for treatment arms with no switching. Defaults to TRUE.

root_finding

Character string specifying the univariate root-finding algorithm to use. Options are "brent" (default) for Brent's method, or "bisection" for the bisection method.

alpha

The significance level to calculate confidence intervals.

ties

The method for handling ties in the Cox model, either "breslow" or "efron" (default).

tol

The desired accuracy (convergence tolerance) for psi for the root finding algorithm.

boot

Whether to use bootstrap to obtain the confidence interval for hazard ratio. Defaults to FALSE, in which case, the confidence interval will be constructed to match the log-rank test p-value.

n_boot

The number of bootstrap samples.

seed

The seed to reproduce the bootstrap results.

nthreads

The number of threads to use in bootstrapping (0 means the default RcppParallel behavior).

Details

Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:

  • Estimate the causal parameter ψ\psi by iteratively fitting an AFT model to the observed survival times for the treatment arm and the counterfactual survival times for the control arm:

    Ui,ψ=TCi+eψTEiU_{i,\psi} = T_{C_i} + e^{\psi}T_{E_i}

  • Compute counterfactual survival times for control patients using the estimated ψ\psi.

  • Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.

  • Obtain the confidence interval for the hazard ratio using either the ITT log-rank test p-value or bootstrap. When bootstrapping, the interval and p-value are derived from a t-distribution with n_boot - 1 degrees of freedom.

Value

A list with the following components:

  • psi: The estimated causal parameter.

  • psi_CI: The confidence interval for psi.

  • psi_CI_type: The type of confidence interval for psi, i.e., "log-rank p-value" or "bootstrap".

  • pvalue: The two-sided p-value.

  • pvalue_type: The type of two-sided p-value for treatment effect, i.e., "log-rank" or "bootstrap".

  • hr: The estimated hazard ratio from the Cox model.

  • hr_CI: The confidence interval for hazard ratio.

  • hr_CI_type: The type of confidence interval for hazard ratio, either "log-rank p-value" or "bootstrap".

  • event_summary: A data frame containing the count and percentage of deaths and switches by treatment arm.

  • Sstar: A data frame containing the counterfactual untreated survival times and event indicators for each treatment group. The variables include id, stratum, "t_star", "d_star", "treated", base_cov, and treat.

  • kmstar: A data frame containing the Kaplan-Meier estimates based on the counterfactual untreated survival times by treatment arm.

  • data_aft: The input data for the AFT model for estimating psi. The variables include id, stratum, "t_star", "d_star", "treated", base_cov, and treat.

  • fit_aft: The fitted AFT model for estimating psi.

  • res_aft: The deviance residuals from the fitted AFT model.

  • data_outcome: The input data for the outcome Cox model of counterfactual unswitched survival times. The variables include id, stratum, "t_star", "d_star", "treated", base_cov, and treat.

  • km_outcome: The Kaplan-Meier estimates of the survival functions for the treatment and control groups based on the counterfactual unswitched survival times.

  • lr_outcome: The log-rank test results for the treatment effect based on the counterfactual unswitched survival times.

  • fit_outcome: The fitted outcome Cox model.

  • fail: Whether a model fails to converge.

  • psimissing: Whether the psi parameter cannot be estimated.

  • settings: A list containing the input parameter values.

  • fail_boots: The indicators for failed bootstrap samples if boot is TRUE.

  • fail_boots_data: The data for failed bootstrap samples if boot is TRUE.

  • hr_boots: The bootstrap hazard ratio estimates if boot is TRUE.

  • psi_boots: The bootstrap psi estimates if boot is TRUE.

Author(s)

Kaifeng Lu, [email protected]

References

Michael Branson and John Whitehead. Estimating a treatment effect in survival studies in which patients switch treatment. Statistics in Medicine. 2002;21(17):2449-2463.

Ian R White. Letter to the Editor: Estimating treatment effects in randomized trials with treatment switching. Statistics in Medicine. 2006;25(9):1619-1622.

Examples

library(dplyr)

# Example 1: one-way treatment switching (control to active)

data <- immdef %>% mutate(rx = 1-xoyrs/progyrs)

fit1 <- ipe(
  data, id = "id", time = "progyrs", event = "prog", treat = "imm", 
  rx = "rx", censor_time = "censyrs", aft_dist = "weibull",
  boot = FALSE)

fit1

# Example 2: two-way treatment switching (illustration only)

# the eventual survival time
shilong1 <- shilong %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(-c("ps", "ttc", "tran"))

shilong2 <- shilong1 %>%
  mutate(rx = ifelse(co, ifelse(bras.f == "MTA", dco/ady, 
                                1 - dco/ady),
                     ifelse(bras.f == "MTA", 1, 0)))

fit2 <- ipe(
  shilong2, id = "id", time = "tstop", event = "event",
  treat = "bras.f", rx = "rx", censor_time = "dcut",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
               "pathway.f"),
  aft_dist = "weibull", boot = FALSE)

fit2

Estimate of Milestone Survival Difference

Description

Obtains the estimate of milestone survival difference between two treatment groups.

Usage

kmdiff(
  data,
  stratum = "",
  treat = "treat",
  time = "time",
  time2 = "",
  event = "event",
  weight = "",
  milestone = 0,
  survDiffH0 = 0,
  conflev = 0.95
)

Arguments

data

The input data frame that contains the following variables:

  • stratum: The stratum.

  • treat: The treatment.

  • time: The follow-up time for right censored data, or the left end of each interval for counting process data.

  • time2: The right end of each interval for counting process data. Intervals are assumed to be open on the left and closed on the right, and event indicates whether an event occurred at the right end of each interval.

  • event: The event indicator, 1=event, 0=no event.

  • weight: The weight for each observation.

stratum

The name of the stratum variable in the input data.

treat

The name of the treatment variable in the input data.

time

The name of the time variable or the left end of each interval for counting process data in the input data.

time2

The name of the right end of each interval for counting process data in the input data.

event

The name of the event variable in the input data.

weight

The name of the weight variable in the input data.

milestone

The milestone time at which to calculate the survival probability.

survDiffH0

The difference in milestone survival probabilities under the null hypothesis. Defaults to 0 for superiority test.

conflev

The level of the two-sided confidence interval for the difference in milestone survival probabilities. Defaults to 0.95.

Value

A data frame with the following variables:

  • milestone: The milestone time relative to randomization.

  • survDiffH0: The difference in milestone survival probabilities under the null hypothesis.

  • surv1: The estimated milestone survival probability for the treatment group.

  • surv2: The estimated milestone survival probability for the control group.

  • survDiff: The estimated difference in milestone survival probabilities.

  • vsurv1: The variance for surv1.

  • vsurv2: The variance for surv2.

  • sesurvDiff: The standard error for survDiff.

  • survDiffZ: The Z-statistic value.

  • survDiffPValue: The two-sided p-value.

  • lower: The lower bound of confidence interval.

  • upper: The upper bound of confidence interval.

  • conflev: The level of confidence interval.

Author(s)

Kaifeng Lu, [email protected]

Examples

kmdiff(data = rawdata[rawdata$iterationNumber == 1, ],
       stratum = "stratum", treat = "treatmentGroup",
       time = "timeUnderObservation", event = "event",
       milestone = 12)

Kaplan-Meier Estimates of Survival Curve

Description

Obtains the Kaplan-Meier estimates of the survival curve.

Usage

kmest(
  data,
  stratum = "",
  time = "time",
  time2 = "",
  event = "event",
  weight = "",
  conftype = "log-log",
  conflev = 0.95,
  keep_censor = FALSE
)

Arguments

data

The input data frame that contains the following variables:

  • stratum: The stratum.

  • time: The follow-up time for right censored data, or the left end of each interval for counting process data.

  • time2: The right end of each interval for counting process data. Intervals are assumed to be open on the left and closed on the right, and event indicates whether an event occurred at the right end of each interval.

  • event: The event indicator, 1=event, 0=no event.

  • weight: The weight for each observation.

stratum

The name(s) of the stratum variable(s) in the input data.

time

The name of the time variable or the left end of each interval for counting process data in the input data.

time2

The name of the right end of each interval for counting process data in the input data.

event

The name of the event variable in the input data.

weight

The name of the weight variable in the input data.

conftype

The type of the confidence interval. One of "none", "plain", "log", "log-log" (the default), or "arcsin". The arcsin option bases the intervals on asin(sqrt(survival)).

conflev

The level of the two-sided confidence interval for the survival probabilities. Defaults to 0.95.

keep_censor

Whether to retain the censoring time in the output data frame.

Value

A data frame with the following variables:

  • size: The number of subjects in the stratum.

  • time: The event time.

  • nrisk: The number of subjects at risk.

  • nevent: The number of subjects having the event.

  • ncensor: The number of censored subjects.

  • surv: The Kaplan-Meier estimate of the survival probability.

  • sesurv: The standard error of the estimated survival probability based on the Greendwood formula.

  • lower: The lower bound of confidence interval if requested.

  • upper: The upper bound of confidence interval if requested.

  • conflev: The level of confidence interval if requested.

  • conftype: The type of confidence interval if requested.

  • stratum: The stratum.

Author(s)

Kaifeng Lu, [email protected]

Examples

kmest(data = aml, stratum = "x", time = "time", event = "status")

Parametric Regression Models for Failure Time Data

Description

Obtains the parameter estimates from parametric regression models with uncensored, right censored, left censored, or interval censored data.

Usage

liferegr(
  data,
  stratum = "",
  time = "time",
  time2 = "",
  event = "event",
  covariates = "",
  weight = "",
  offset = "",
  id = "",
  dist = "weibull",
  init = NA_real_,
  robust = FALSE,
  plci = FALSE,
  alpha = 0.05,
  maxiter = 50,
  eps = 1e-09
)

Arguments

data

The input data frame that contains the following variables:

  • stratum: The stratum.

  • time: The follow-up time for right censored data, or the left end of each interval for interval censored data.

  • time2: The right end of each interval for interval censored data.

  • event: The event indicator, 1=event, 0=no event.

  • covariates: The values of baseline covariates.

  • weight: The weight for each observation.

  • offset: The offset for each observation.

  • id: The optional subject ID to group the score residuals in computing the robust sandwich variance.

stratum

The name(s) of the stratum variable(s) in the input data.

time

The name of the time variable or the left end of each interval for interval censored data in the input data.

time2

The name of the right end of each interval for interval censored data in the input data.

event

The name of the event variable in the input data for right censored data.

covariates

The vector of names of baseline covariates in the input data.

weight

The name of the weight variable in the input data.

offset

The name of the offset variable in the input data.

id

The name of the id variable in the input data.

dist

The assumed distribution for time to event. Options include "exponential", "weibull", "lognormal", and "loglogistic" to be modeled on the log-scale, and "normal" and "logistic" to be modeled on the original scale.

init

A vector of initial values for the model parameters, including regression coefficients and the log scale parameter. By default, initial values are derived from an intercept-only model. If this approach fails, ordinary least squares (OLS) estimates, ignoring censoring, are used instead.

robust

Whether a robust sandwich variance estimate should be computed. In the presence of the id variable, the score residuals will be aggregated for each id when computing the robust sandwich variance estimate.

plci

Whether to obtain profile likelihood confidence interval.

alpha

The two-sided significance level.

maxiter

The maximum number of iterations.

eps

The tolerance to declare convergence.

Details

There are two ways to specify the model, one for right censored data through the time and event variables, and the other for interval censored data through the time (lower) and time2 (upper) variables. For the second form, we follow the convention used in SAS PROC LIFEREG:

  • If lower is not missing, upper is not missing, and lower is equal to upper, then there is no censoring and the event occurred at time lower.

  • If lower is not missing, upper is not missing, and lower < upper, then the event time is censored within the interval (lower, upper).

  • If lower is missing, but upper is not missing, then upper will be used as the left censoring value.

  • If lower is not missing, but upper is missing, then lower will be used as the right censoring value.

  • If lower is not missing, upper is not missing, but lower > upper, or if both lower and upper are missing, then the observation will not be used.

Value

A list with the following components:

  • sumstat: The data frame of summary statistics of model fit with the following variables:

    • n: The number of observations.

    • nevents: The number of events.

    • loglik0: The log-likelihood under null.

    • loglik1: The maximum log-likelihood.

    • niter: The number of Newton-Raphson iterations.

    • dist: The assumed distribution.

    • p: The number of parameters, including the intercept, regression coefficients associated with the covariates, and the log scale parameters for the strata.

    • nvar: The number of regression coefficients associated with the covariates (excluding the intercept).

    • robust: Whether the robust sandwich variance estimate is requested.

    • fail: Whether the model fails to converge.

  • parest: The data frame of parameter estimates with the following variables:

    • param: The name of the covariate for the parameter estimate.

    • beta: The parameter estimate.

    • sebeta: The standard error of parameter estimate.

    • z: The Wald test statistic for the parameter.

    • expbeta: The exponentiated parameter estimate.

    • lower: The lower limit of confidence interval.

    • upper: The upper limit of confidence interval.

    • p: The p-value from the chi-square test.

    • method: The method to compute the confidence interval and p-value.

    • sebeta_naive: The naive standard error of parameter estimate if robust variance is requested.

  • linear_predictors: The vector of linear predictors.

  • p: The number of parameters.

  • nvar: The number of columns of the design matrix excluding the intercept.

  • param: The parameter names.

  • beta: The parameter estimate.

  • vbeta: The covariance matrix for parameter estimates.

  • vbeta_naive: The naive covariance matrix for parameter estimates.

  • terms: The terms object.

  • xlevels: A record of the levels of the factors used in fitting.

  • settings: A list containing the input parameter values.

Author(s)

Kaifeng Lu, [email protected]

References

John D. Kalbfleisch and Ross L. Prentice. The Statistical Analysis of Failure Time Data. Wiley: New York, 1980.

Examples

library(dplyr)

# right censored data
(fit1 <- liferegr(
  data = rawdata %>% filter(iterationNumber == 1) %>% 
         mutate(treat = (treatmentGroup == 1)),
  stratum = "stratum",
  time = "timeUnderObservation", event = "event",
  covariates = "treat", dist = "weibull"))

# tobit regression for left censored data
(fit2 <- liferegr(
  data = tobin %>% mutate(time = ifelse(durable>0, durable, NA)),
  time = "time", time2 = "durable",
  covariates = c("age", "quant"), dist = "normal"))

The liver data used in SAS PROC PHREG documentation examples.

Description

This data set contains information on 418 patients with primary biliary cirrhosis.

Usage

liver

Format

An object of class tbl_df (inherits from tbl, data.frame) with 418 rows and 7 columns.

Details

Time

The follow-up time in years from the time of registration. The event could be liver transplantation, death, or the end of the study, whichever came first

Status

A censoring indicator, where a value of 1 indicating a death event, and 0 indicating a censored observation (the patient survived past the observation time)

Age

The patient's age in years

Albumin

Serum albumin level in g/dl

Bilirubin

Serum bilirubin level in mg/dl

Edema

Edema status, where a value of 0 indicates no edema, 0.5 indicates edema successfully treated with diuretics, and 1 indicates edema despite diuretic therapy

Protime

Prothrombin time in seconds


Logistic Regression Models for Binary Data

Description

Obtains the parameter estimates from logistic regression models with binary data.

Usage

logisregr(
  data,
  event = "event",
  covariates = "",
  freq = "",
  weight = "",
  offset = "",
  id = "",
  link = "logit",
  init = NA_real_,
  robust = FALSE,
  firth = FALSE,
  flic = FALSE,
  plci = FALSE,
  alpha = 0.05,
  maxiter = 50,
  eps = 1e-09
)

Arguments

data

The input data frame that contains the following variables:

  • event: The event indicator, 1=event, 0=no event.

  • covariates: The values of baseline covariates.

  • freq: The frequency for each observation.

  • weight: The weight for each observation.

  • offset: The offset for each observation.

  • id: The optional subject ID to group the score residuals in computing the robust sandwich variance.

event

The name of the event variable in the input data.

covariates

The vector of names of baseline covariates in the input data.

freq

The name of the frequency variable in the input data. The frequencies must be the same for all observations within each cluster as indicated by the id. Thus freq is the cluster frequency.

weight

The name of the weight variable in the input data.

offset

The name of the offset variable in the input data.

id

The name of the id variable in the input data.

link

The link function linking the response probabilities to the linear predictors. Options include "logit" (default), "probit", and "cloglog" (complementary log-log).

init

A vector of initial values for the model parameters. By default, initial values are derived from an intercept-only model.

robust

Whether a robust sandwich variance estimate should be computed. In the presence of the id variable, the score residuals will be aggregated for each id when computing the robust sandwich variance estimate.

firth

Whether the firth's bias reducing penalized likelihood should be used. The default is FALSE.

flic

Whether to apply intercept correction to obtain more accurate predicted probabilities. The default is FALSE.

plci

Whether to obtain profile likelihood confidence interval.

alpha

The two-sided significance level.

maxiter

The maximum number of iterations.

eps

The tolerance to declare convergence.

Details

Fitting a logistic regression model using Firth's bias reduction method is equivalent to penalization of the log-likelihood by the Jeffreys prior. Firth's penalized log-likelihood is given by

l(β)+12log(det(I(β)))l(\beta) + \frac{1}{2} \log(\mbox{det}(I(\beta)))

and the components of the gradient g(β)g(\beta) are computed as

g(βj)+12trace(I(β)1I(β)βj)g(\beta_j) + \frac{1}{2} \mbox{trace}\left(I(\beta)^{-1} \frac{\partial I(\beta)}{\partial \beta_j}\right)

The Hessian matrix is not modified by this penalty.

Firth's method reduces bias in maximum likelihood estimates of coefficients, but it introduces a bias toward one-half in the predicted probabilities.

A straightforward modification to Firth’s logistic regression to achieve unbiased average predicted probabilities involves a post hoc adjustment of the intercept. This approach, known as Firth’s logistic regression with intercept correction (FLIC), preserves the bias-corrected effect estimates. By excluding the intercept from penalization, it ensures that we don't sacrifice the accuracy of effect estimates to improve the predictions.

Value

A list with the following components:

  • sumstat: The data frame of summary statistics of model fit with the following variables:

    • n: The number of subjects.

    • nevents: The number of events.

    • loglik0: The (penalized) log-likelihood under null.

    • loglik1: The maximum (penalized) log-likelihood.

    • niter: The number of Newton-Raphson iterations.

    • p: The number of parameters, including the intercept, and regression coefficients associated with the covariates.

    • link: The link function.

    • robust: Whether a robust sandwich variance estimate should be computed.

    • firth: Whether the firth's penalized likelihood is used.

    • flic: Whether to apply intercept correction.

    • fail: Whether the model fails to converge.

    • loglik0_unpenalized: The unpenalized log-likelihood under null.

    • loglik1_unpenalized: The maximum unpenalized log-likelihood.

  • parest: The data frame of parameter estimates with the following variables:

    • param: The name of the covariate for the parameter estimate.

    • beta: The parameter estimate.

    • sebeta: The standard error of parameter estimate.

    • z: The Wald test statistic for the parameter.

    • expbeta: The exponentiated parameter estimate.

    • lower: The lower limit of confidence interval.

    • upper: The upper limit of confidence interval.

    • p: The p-value from the chi-square test.

    • method: The method to compute the confidence interval and p-value.

    • sebeta_naive: The naive standard error of parameter estimate.

  • fitted: The data frame with the following variables:

    • linear_predictors: The linear fit on the link function scale.

    • fitted_values: The fitted probabilities of having an event, obtained by transforming the linear predictors by the inverse of the link function.

  • p: The number of parameters.

  • link: The link function.

  • param: The parameter names.

  • beta: The parameter estimate.

  • vbeta: The covariance matrix for parameter estimates.

  • vbeta_naive: The naive covariance matrix for parameter estimates.

  • linear_predictors: The linear fit on the link function scale.

  • fitted_values: The fitted probabilities of having an event.

  • terms: The terms object.

  • xlevels: A record of the levels of the factors used in fitting.

  • settings: A list containing the input parameter values.

Author(s)

Kaifeng Lu, [email protected]

References

David Firth. Bias Reduction of Maximum Likelihood Estimates. Biometrika 1993; 80:27–38.

Georg Heinze and Michael Schemper. A solution to the problem of separation in logistic regression. Statistics in Medicine 2002;21:2409–2419.

Rainer Puhr, Georg Heinze, Mariana Nold, Lara Lusa, and Angelika Geroldinger. Firth's logistic regression with rare events: accurate effect estimates and predictions? Statistics in Medicine 2017; 36:2302-2317.

Examples

(fit1 <- logisregr(
  ingots, event = "NotReady", covariates = "Heat*Soak", freq = "Freq"))

Log-Rank Test of Survival Curve Difference

Description

Obtains the log-rank test using the Fleming-Harrington family of weights.

Usage

lrtest(
  data,
  stratum = "",
  treat = "treat",
  time = "time",
  time2 = "",
  event = "event",
  weight = "",
  weight_readj = FALSE,
  rho1 = 0,
  rho2 = 0
)

Arguments

data

The input data frame or list of data frames that contains the following variables:

  • stratum: The stratum.

  • treat: The treatment.

  • time: The follow-up time for right censored data, or the left end of each interval for counting process data.

  • time2: The right end of each interval for counting process data. Intervals are assumed to be open on the left and closed on the right, and event indicates whether an event occurred at the right end of each interval.

  • event: The event indicator, 1=event, 0=no event.

  • weight: The weight for each observation.

stratum

The name(s) of the stratum variable(s) in the input data.

treat

The name of the treatment variable in the input data.

time

The name of the time variable or the left end of each interval for counting process data in the input data.

time2

The name of the right end of each interval for counting process data in the input data.

event

The name of the event variable in the input data.

weight

The name of the weight variable in the input data.

weight_readj

Whether the weight variable at each event time will be readjusted to be proportional to the number at risk by treatment group. Defaults to FALSE.

rho1

The first parameter of the Fleming-Harrington family of weighted log-rank test. Defaults to 0 for conventional log-rank test.

rho2

The second parameter of the Fleming-Harrington family of weighted log-rank test. Defaults to 0 for conventional log-rank test.

Value

A data frame with the following variables:

  • uscore: The numerator of the log-rank test statistic.

  • vscore: The variance of the log-rank score test statistic.

  • logRankZ: The Z-statistic value.

  • logRankPValue: The two-sided p-value.

  • weight_readj: Whether the weight variable will be readjusted.

  • rho1: The first parameter of the Fleming-Harrington weights.

  • rho2: The second parameter of the Fleming-Harrington weights.

Author(s)

Kaifeng Lu, [email protected]

Examples

lrtest(rawdata[rawdata$iterationNumber == 1, ],
       stratum = "stratum", treat = "treatmentGroup",
       time = "timeUnderObservation", event = "event",
       rho1 = 0.5, rho2 = 0)

Marginal Structural Model (MSM) for Treatment Switching

Description

Excludes data after treatment switching when fitting the switching model to estimate the probabilities of not switching and then switching. The inverse of these probabilities (inverse probability of treatment weights) are then used as weights in a Cox model including data after switching to estimate the adjusted hazard ratio.

Usage

msm(
  data,
  id = "id",
  stratum = "",
  tstart = "tstart",
  tstop = "tstop",
  event = "event",
  treat = "treat",
  swtrt = "swtrt",
  swtrt_time = "swtrt_time",
  base_cov = "",
  numerator = "",
  denominator = "",
  strata_main_effect_only = TRUE,
  ns_df = 3,
  firth = FALSE,
  flic = FALSE,
  stabilized_weights = TRUE,
  trunc = 0,
  trunc_upper_only = TRUE,
  swtrt_control_only = TRUE,
  treat_alt_interaction = TRUE,
  alpha = 0.05,
  ties = "efron",
  boot = FALSE,
  n_boot = 1000,
  seed = 0,
  nthreads = 0
)

Arguments

data

The input data frame that contains the following variables:

  • id: The id to identify observations belonging to the same subject for counting process data with time-dependent covariates.

  • stratum: The stratum.

  • tstart: The starting time of the time interval for counting-process data with time-dependent covariates.

  • tstop: The stopping time of the time interval for counting-process data with time-dependent covariates.

  • event: The event indicator, 1=event, 0=no event.

  • treat: The randomized treatment indicator, 1=treatment, 0=control.

  • swtrt: The treatment switch indicator, 1=switch, 0=no switch.

  • swtrt_time: The time from randomization to treatment switch.

  • base_cov: The baseline covariates (excluding treat) used in the outcome model.

  • numerator: The baseline covariates (excluding treat) used in the numerator switching model for stabilized weights.

  • denominator: The baseline (excluding treat) and time-dependent covariates used in the denominator switching model.

id

The name of the id variable in the input data.

stratum

The name(s) of the stratum variable(s) in the input data.

tstart

The name of the tstart variable in the input data.

tstop

The name of the tstop variable in the input data.

event

The name of the event variable in the input data.

treat

The name of the treatment variable in the input data.

swtrt

The name of the swtrt variable in the input data.

swtrt_time

The name of the swtrt_time variable in the input data.

base_cov

The names of baseline covariates (excluding treat) in the input data for the Cox model.

numerator

The names of baseline covariates (excluding treat) in the input data for the numerator switching model for stabilized weights.

denominator

The names of baseline (excluding treat) and time-dependent covariates in the input data for the denominator switching model.

strata_main_effect_only

Whether to only include the strata main effects in the logistic regression switching model. Defaults to TRUE, otherwise all possible strata combinations will be considered in the switching model.

ns_df

Degrees of freedom for the natural cubic spline for visit-specific intercepts of the pooled logistic regression model. Defaults to 3 for two internal knots at the 33 and 67 percentiles of the treatment switching times.

firth

Whether the Firth's bias reducing penalized likelihood should be used.

flic

Whether to apply intercept correction to obtain more accurate predicted probabilities.

stabilized_weights

Whether to use the stabilized weights. The default is TRUE.

trunc

The truncation fraction of the weight distribution. Defaults to 0 for no truncation in weights.

trunc_upper_only

Whether to truncate the weights from the upper end of the weight distribution only. Defaults to TRUE, otherwise the weights will be truncated from both the lower and upper ends of the distribution.

swtrt_control_only

Whether treatment switching occurred only in the control group. The default is TRUE.

treat_alt_interaction

Whether to include an interaction between randomized and alternative treatments in the outcome model when both randomized arms can switch to alternative treatment.

alpha

The significance level to calculate confidence intervals.

ties

The method for handling ties in the Cox model, either "breslow" or "efron" (default).

boot

Whether to use bootstrap to obtain the confidence interval for hazard ratio. Defaults to FALSE.

n_boot

The number of bootstrap samples.

seed

The seed to reproduce the bootstrap results.

nthreads

The number of threads to use in bootstrapping (0 means the default RcppParallel behavior).

Details

The hazard ratio and confidence interval under a no-switching scenario are obtained as follows:

  • Exclude observations after treatment switch when fitting the switching model.

  • Define crossover indicators for the last time interval of each subject.

  • Fit the denominator switching model (and numerator model for stabilized weights) using a pooled logistic regression model to estimate the inverse probability of treatment weights (IPTWs).

    • The probability of remaining unswitched is calculated as 1p^switch1 - \hat{p}_{\text{switch}} and multiplied over time before treatment switch.

    • At the time of switching, this product is multiplied by the predicted probability of switching.

    • After treatment switch, the IPTW remains constant.

    • The inverse of the probability at the start of each interval is used as the interval weight.

  • Fit a weighted Cox model to the outcome survival times, including data after treatment switch, to estimate the hazard ratio.

  • Construct the p-value and confidence interval for the hazard ratio using either robust sandwich variance or bootstrapping. When bootstrapping is used, the confidence interval and p-value are based on a t-distribution with n_boot - 1 degrees of freedom.

Value

A list with the following components:

  • pvalue: The two-sided p-value.

  • pvalue_type: The type of two-sided p-value for treatment effect, i.e., "Cox model" or "bootstrap".

  • hr: The estimated hazard ratio from the Cox model.

  • hr_CI: The confidence interval for hazard ratio.

  • hr_CI_type: The type of confidence interval for hazard ratio, either "Cox model" or "bootstrap".

  • event_summary: A data frame containing the count and percentage of deaths and switches by treatment arm.

  • data_switch: A list of input data for the switching models by treatment group. The variables include id, stratum, "tstart", "tstop", "cross", denominator, swtrt, and swtrt_time. In addition, stratum variables are converted to dummy variables, and natural cubic spline basis variables are created for the visit-specific intercepts.

  • fit_switch: A list of fitted switching models for the denominator and numerator by treatment group.

  • data_outcome: The input data for the outcome Cox model including the inverse probability of censoring weights. The variables include id, stratum, "tstart", "tstop", "event", "treated", "crossed", "unstablized_weight", "stabilized_weight", base_cov, and treat. If treat_alt_interaction is TRUE, the data set also includes the "treated_crossed" variable.

  • weight_summary: A data frame summarizing the weights by treatment arm.

  • km_outcome: The Kaplan-Meier estimates of the survival functions for the treatment and control groups based on the weighted outcome data truncated at time of treatment switching.

  • lr_outcome: The log-rank test results for the treatment effect based on the weighted outcome data truncated at time of treatment switching.

  • fit_outcome: The fitted outcome Cox model.

  • fail: Whether a model fails to converge.

  • settings: A list containing the input parameter values.

  • fail_boots: The indicators for failed bootstrap samples if boot is TRUE.

  • fail_boots_data: The data for failed bootstrap samples if boot is TRUE.

  • hr_boots: The bootstrap hazard ratio estimates if boot is TRUE.

Author(s)

Kaifeng Lu, [email protected]

References

James M. Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550-560.

Miguel Angel Hernan, Babette Brumback, and James M. Robins. Marginal structural modesl to estimate the causual effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561-570.

Jing Xu, Guohui Liu, and Bingxia Wang. Bias and Type I error control in correcting treatment effect for treatment switching using marginal structural models in Phase III oncology trials. Journal of Biopharmaceutical Statistics. 2022;32(6):897-914.

Examples

sim1 <- tssim(
  tdxo = 1, coxo = 1, allocation1 = 1, allocation2 = 1,
  p_X_1 = 0.3, p_X_0 = 0.3, 
  rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, 
  gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8,
  zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, 
  alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, 
  theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2,
  rate_C = 0.0000855, accrualIntensity = 20/30,
  fixedFollowup = FALSE, plannedTime = 1350, days = 30,
  n = 500, NSim = 100, seed = 314159)

fit1 <- msm(
  sim1[[1]], id = "id", tstart = "tstart", 
  tstop = "tstop", event = "event", treat = "trtrand", 
  swtrt = "xo", swtrt_time = "xotime", 
  base_cov = "bprog", numerator = "bprog", 
  denominator = c("bprog", "L"), 
  ns_df = 3, swtrt_control_only = TRUE, boot = FALSE)
  
fit1

Proportional Hazards Regression Models

Description

Obtains the hazard ratio estimates from the proportional hazards regression model with right censored or counting process data.

Usage

phregr(
  data,
  stratum = "",
  time = "time",
  time2 = "",
  event = "event",
  covariates = "",
  weight = "",
  offset = "",
  id = "",
  ties = "efron",
  init = NA_real_,
  robust = FALSE,
  est_basehaz = TRUE,
  est_resid = TRUE,
  firth = FALSE,
  plci = FALSE,
  alpha = 0.05,
  maxiter = 50,
  eps = 1e-09
)

Arguments

data

The input data frame that contains the following variables:

  • stratum: The stratum.

  • time: The follow-up time for right censored data, or the left end of each interval for counting process data.

  • time2: The right end of each interval for counting process data. Intervals are assumed to be open on the left and closed on the right, and event indicates whether an event occurred at the right end of each interval.

  • event: The event indicator, 1=event, 0=no event.

  • covariates: The values of baseline covariates (and time-dependent covariates in each interval for counting process data).

  • weight: The weight for each observation.

  • offset: The offset for each observation.

  • id: The optional subject ID for counting process data with time-dependent covariates.

stratum

The name(s) of the stratum variable(s) in the input data.

time

The name of the time variable or the left end of each interval for counting process data in the input data.

time2

The name of the right end of each interval for counting process data in the input data.

event

The name of the event variable in the input data.

covariates

The vector of names of baseline and time-dependent covariates in the input data.

weight

The name of the weight variable in the input data.

offset

The name of the offset variable in the input data.

id

The name of the id variable in the input data.

ties

The method for handling ties, either "breslow" or "efron" (default).

init

The vector of initial values. Defaults to zero for all variables.

robust

Whether a robust sandwich variance estimate should be computed. In the presence of the id variable, the score residuals will be aggregated for each id when computing the robust sandwich variance estimate.

est_basehaz

Whether to estimate the baseline hazards. Defaults to TRUE.

est_resid

Whether to estimate the martingale residuals. Defaults to TRUE.

firth

Whether to use Firth’s penalized likelihood method. Defaults to FALSE.

plci

Whether to obtain profile likelihood confidence interval.

alpha

The two-sided significance level.

maxiter

The maximum number of iterations.

eps

The tolerance to declare convergence.

Value

A list with the following components:

  • sumstat: The data frame of summary statistics of model fit with the following variables:

    • n: The number of observations.

    • nevents: The number of events.

    • loglik0: The (penalized) log-likelihood under null.

    • loglik1: The maximum (penalized) log-likelihood.

    • scoretest: The score test statistic.

    • niter: The number of Newton-Raphson iterations.

    • ties: The method for handling ties, either "breslow" or "efron".

    • p: The number of columns of the Cox model design matrix.

    • robust: Whether to use the robust variance estimate.

    • firth: Whether to use Firth's penalized likelihood method.

    • fail: Whether the model fails to converge.

    • loglik0_unpenalized: The unpenalized log-likelihood under null.

    • loglik1_unpenalized: The maximum unpenalized log-likelihood.

  • parest: The data frame of parameter estimates with the following variables:

    • param: The name of the covariate for the parameter estimate.

    • beta: The log hazard ratio estimate.

    • sebeta: The standard error of log hazard ratio estimate.

    • z: The Wald test statistic for log hazard ratio.

    • expbeta: The hazard ratio estimate.

    • lower: The lower limit of confidence interval.

    • upper: The upper limit of confidence interval.

    • p: The p-value from the chi-square test.

    • method: The method to compute the confidence interval and p-value.

    • sebeta_naive: The naive standard error of log hazard ratio estimate if robust variance is requested.

  • basehaz: The data frame of baseline hazards with the following variables (if est_basehaz is TRUE):

    • time: The observed event time.

    • nrisk: The number of patients at risk at the time point.

    • nevent: The number of events at the time point.

    • haz: The baseline hazard at the time point.

    • varhaz: The variance of the baseline hazard at the time point assuming the parameter beta is known.

    • gradhaz: The gradient of the baseline hazard with respect to beta at the time point (in the presence of covariates).

    • stratum: The stratum.

  • residuals: The martingale residuals.

  • linear_predictors: The vector of linear predictors.

  • p: The number of parameters.

  • param: The parameter names.

  • beta: The parameter estimate.

  • vbeta: The covariance matrix for parameter estimates.

  • vbeta_naive: The naive covariance matrix for parameter estimates.

  • terms: The terms object.

  • xlevels: A record of the levels of the factors used in fitting.

  • settings: A list containing the input parameter values.

Author(s)

Kaifeng Lu, [email protected]

References

Per K. Anderson and Richard D. Gill. Cox's regression model for counting processes, a large sample study. Annals of Statistics 1982; 10:1100-1120.

Terry M. Therneau and Patricia M. Grambsch. Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.

Examples

library(dplyr)

# Example 1 with right-censored data
(fit1 <- phregr(
  data = rawdata %>% filter(iterationNumber == 1) %>% 
    mutate(treat = 1*(treatmentGroup == 1)),
  stratum = "stratum",
  time = "timeUnderObservation", event = "event",
  covariates = "treat", est_basehaz = FALSE, est_resid = FALSE))

# Example 2 with counting process data and robust variance estimate
(fit2 <- phregr(
  data = heart %>% mutate(rx = as.numeric(transplant) - 1),
  time = "start", time2 = "stop", event = "event",
  covariates = c("rx", "age"), id = "id",
  robust = TRUE, est_basehaz = TRUE, est_resid = TRUE))

Prepare Survival Data With Time-Dependent Covariates

Description

This function prepares a counting-process style survival dataset for analyses with time-dependent covariates. It merges baseline and longitudinal data, fills in missing covariate values using last-observation-carried-forward (LOCF), restricts to time points where covariates change (optional), and constructs tstart, tstop, and event variables suitable for use in survival models.

Usage

preptdc(
  adsl,
  adtdc,
  id = "SUBJID",
  randdt = "RANDDT",
  trtsdt = "TRTSDT",
  pddt = "PDDT",
  xodt = "XODT",
  osdt = "OSDT",
  died = "DIED",
  dcutdt = "DCUTDT",
  paramcd = "PARAMCD",
  adt = "ADT",
  aval = "AVAL",
  nodup = TRUE,
  offset = TRUE,
  nthreads = 0
)

Arguments

adsl

A data set containing baseline subject-level information. It should include, at a minimum, subject ID (id), randomization date (randdt), treatment start date (trtsdt), progression date (pddt), treatment switch date (xodt), survival outcome (osdt, died), and data cut-off date (dcutdt).

adtdc

A data set containing longitudinal time-dependent covariate data, with subject ID (id), parameter code (paramcd), analysis date (adt), and covariate value (aval).

id

Character string specifying the column name for subject ID.

randdt

Character string specifying the column name for randomization date.

trtsdt

Character string specifying the column name for treatment start date.

pddt

Character string specifying the column name for progression date.

xodt

Character string specifying the column name for treatment crossover/switch date.

osdt

Character string specifying the column name for overall survival date (death date or last known alive date).

died

Character string specifying the column name for death indicator (0 = alive/censored, 1 = died).

dcutdt

Character string specifying the column name for data cut-off date.

paramcd

Character string specifying the column name for parameter code (identifying different covariates).

adt

Character string specifying the column name for analysis date in the time-dependent covariate dataset.

aval

Character string specifying the column name for analysis value (covariate values).

nodup

Logical; if TRUE (default), only rows where at least one covariate changes compared to the previous row (within each subject) are retained, along with the first row per subject (baseline).

offset

Logical; if TRUE (default), add 1-day offset when computing analysis day variables (ady, osdy, etc.).

nthreads

Integer number of threads to use for ‘data.table’ (0 means the default data.table behavior).

Details

The function performs the following steps:

  1. Merge adsl and adtdc to obtain randomization date and treatment start date.

  2. Define adt2 as adt if adt > trtsdt, and randdt if adt <= trtsdt (i.e., baseline time point). This ensures that the baseline covariate value is the last non-missing value at or before the treatment start date. Post-baseline covariate values are anchored at their actual analysis dates. The first record per subject corresponds to survival time zero at randomization and ensures availability of baseline covariates at randomization.

  3. Keep the last record per subject, adt2, and paramcd.

  4. Construct a complete skeleton so all covariates are present for each subject and time point.

  5. Fill missing covariate values using LOCF.

  6. Pivot to wide format with one row per subject and time point.

  7. Optionally drop rows without covariate changes (nodup = TRUE).

  8. Merge survival outcomes from adsl.

  9. Compute time-to-event variables (ady, osdy, etc.), as well as counting-process style variables tstart, tstop, and event.

Value

A data set with one row per subject and time interval, including:

  • tstart, tstop — interval start and stop times (days from randomization).

  • event — event indicator (0/1).

  • Covariates expanded to wide format.

  • Auxiliary variables such as progression indicator (pd), treatment switch indicator (swtrt), and administrative censoring time.

Examples

surv_data <- preptdc(adsl, adtdc, nodup = TRUE, nthreads = 1)
head(surv_data)

A simulated time-to-event data set with 10 replications

Description

A simulated data set with stratification and delayed treatment effect:

iterationNumber

The iteration number

arrivalTime

The enrollment time for the subject

stratum

The stratum for the subject

treatmentGroup

The treatment group for the subject

timeUnderObservation

The time under observation since randomization

event

Whether the subject experienced the event

dropoutEvent

Whether the subject dropped out

Usage

rawdata

Format

An object of class data.frame with 4910 rows and 7 columns.


Simulation Study to Evaluate Recensoring Rules in RPSFTM

Description

Simulates datasets to evaluate the performance of various recensoring strategies under the Rank Preserving Structural Failure Time Model (RPSFTM) for handling treatment switching in survival analysis.

Usage

recensor_sim_rpsftm(
  nsim = 100L,
  n = 400L,
  shape = 1.5,
  scale = 553.9,
  gamma = 0.001,
  tfmin = 407.5,
  tfmax = 407.5,
  psi = -0.4621,
  omega = 0,
  pswitch = 0.7,
  a = 2,
  b = 4,
  low_psi = -5,
  hi_psi = 5,
  treat_modifier = 1,
  recensor_type = 1L,
  admin_recensor_only = TRUE,
  autoswitch = TRUE,
  alpha = 0.05,
  ties = "efron",
  tol = 1e-06,
  boot = TRUE,
  n_boot = 100L,
  seed = 0L
)

Arguments

nsim

Number of simulated datasets.

n

Number of subjects per simulation.

shape

Shape parameter of the Weibull distribution for time to death.

scale

Scale parameter of the Weibull distribution for time to death in the control group.

gamma

Rate parameter of the exponential distribution for random dropouts in the control group.

tfmin

Minimum planned follow-up time (in days).

tfmax

Maximum planned follow-up time (in days).

psi

Log time ratio of death time for control vs experimental treatment.

omega

Log time ratio of dropout time for control vs experimental treatment.

pswitch

Probability of treatment switching at disease progression.

a

Shape parameter 1 of the Beta distribution for time to disease progression as a fraction of time to death.

b

Shape parameter 2 of the Beta distribution for time to disease progression.

low_psi

Lower bound for the search interval of the causal parameter ψ\psi.

hi_psi

Upper bound for the search interval of the causal parameter ψ\psi.

treat_modifier

Sensitivity parameter modifying the constant treatment effect assumption.

recensor_type

Type of recensoring to apply:

  • 0: No recensoring

  • 1: Recensor all control-arm subjects

  • 2: Recensor only switchers in the control arm

  • 3: Recensor only control-arm switchers whose counterfactual survival exceeds the planned follow-up time

admin_recensor_only

Logical. If TRUE, recensoring is applied only to administrative censoring times. If FALSE, it is also applied to dropout times.

autoswitch

Logical. If TRUE, disables recensoring in arms without any treatment switching.

alpha

Significance level for confidence interval calculation (default is 0.05).

ties

Method for handling tied event times in the Cox model. Options are "efron" (default) or "breslow".

tol

Convergence tolerance for root-finding in estimation of ψ\psi.

boot

Logical. If TRUE, bootstrap is used to estimate the confidence interval for the hazard ratio. If FALSE, the confidence interval is matched to the log-rank p-value.

n_boot

Number of bootstrap samples, used only if boot = TRUE.

seed

Optional. Random seed for reproducibility.

Value

A data frame summarizing the simulation results, including:

  • recensor_type, admin_recensor_only: Settings used in the simulation.

  • Event rates: p_event_1, p_dropout_1, p_admin_censor_1, p_event_0, p_dropout_0, p_admin_censor_0.

  • Progression and switching: p_pd_0, p_swtrt_0, p_recensored_0.

  • Causal parameter (ψ\psi) estimates: psi, psi_est, psi_bias, psi_se, psi_mse.

  • Log hazard ratio estimates: loghr, loghr_est, loghr_se, loghr_mse.

  • Hazard ratio metrics: hr, hr_est (geometric mean), hr_pctbias (percent bias).

  • Standard errors of log hazard ratio: loghr_se_cox, loghr_se_lr, loghr_se_boot.

  • Coverage probabilities: hr_ci_cover_cox, hr_ci_cover_lr, hr_ci_cover_boot.

Author(s)

Kaifeng Lu, [email protected]

Examples

result <- recensor_sim_rpsftm(
  nsim = 10, n = 400, shape = 1.5, scale = exp(6.3169),
  gamma = 0.001, tfmin = 407.5, tfmax = 407.5,
  psi = log(0.5) / 1.5, omega = log(1), pswitch = 0.7,
  a = 2, b = 4, low_psi = -5, hi_psi = 5,
  treat_modifier = 1, recensor_type = 1,
  admin_recensor_only = TRUE, autoswitch = TRUE,
  alpha = 0.05, tol = 1e-6, boot = TRUE,
  n_boot = 10, seed = 314159)

Residuals for Parametric Regression Models for Failure Time Data

Description

Obtains the response, martingale, deviance, dfbeta, and likelihood displacement residuals for a parametric regression model for failure time data.

Usage

residuals_liferegr(
  object,
  type = c("response", "martingale", "deviance", "dfbeta", "dfbetas", "working",
    "ldcase", "ldresp", "ldshape", "matrix"),
  collapse = FALSE,
  weighted = (type %in% c("dfbeta", "dfbetas"))
)

Arguments

object

The output from the phregr call.

type

The type of residuals desired, with options including "response", "martingale", "deviance", "dfbeta", "dfbetas", "working", "ldcase", "ldresp", "ldshape", and "matrix".

collapse

Whether to collapse the residuals by id.

weighted

Whether to compute weighted residuals.

Details

The algorithms follow the residuals.survreg function in the survival package, except for martingale residuals, which are defined only for event or right-censored data for exponential, weibull, lognormal, and loglogistic distributions.

Value

Either a vector or a matrix of residuals, depending on the specified type:

  • response residuals are on the scale of the original data.

  • martingale residuals are event indicators minus the cumulative hazards for event or right-censored data.

  • working residuals are on the scale of the linear predictor.

  • deviance residuals are on the log-likelihood scale.

  • dfbeta residuals are returned as a matrix, where the ii-th row represents the approximate change in the model coefficients resulting from the inclusion of subject ii.

  • dfbetas residuals are similar to dfbeta residuals, but each column is scaled by the standard deviation of the corresponding coefficient.

  • matrix residuals are a matrix of derivatives of the log-likelihood function. Let LL be the log-likelihood, pp be the linear predictor (XβX\beta), and ss be log(σ)log(\sigma). Then the resulting matrix contains six columns: LL, L/p\partial L/\partial p, 2L/p2\partial^2 L/\partial p^2, L/s\partial L/\partial s, 2L/s2\partial^2 L/\partial s^2, and L2/ps\partial L^2/\partial p\partial s.

  • ldcase residulas are likelihood displacement for case weight perturbation.

  • ldresp residuals are likelihood displacement for response value perturbation.

  • ldshape residuals are likelihood displacement related to the shape parameter.

Author(s)

Kaifeng Lu, [email protected]

References

Escobar, L. A. and Meeker, W. Q. Assessing influence in regression analysis with censored data. Biometrics 1992; 48:507-528.

Examples

library(dplyr)

fit1 <- liferegr(
  data = tobin %>% mutate(time = ifelse(durable>0, durable, NA)),
  time = "time", time2 = "durable",
  covariates = c("age", "quant"), dist = "normal")

resid <- residuals_liferegr(fit1, type = "response")
head(resid)

Residuals for Proportional Hazards Regression Models

Description

Obtains the martingale, deviance, score, or Schoenfeld residuals for a proportional hazards regression model.

Usage

residuals_phregr(
  object,
  type = c("martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas",
    "scaledsch"),
  collapse = FALSE,
  weighted = (type %in% c("dfbeta", "dfbetas"))
)

Arguments

object

The output from the phregr call.

type

The type of residuals desired, with options including "martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas", and "scaledsch".

collapse

Whether to collapse the residuals by id. This is not applicable for Schoenfeld type residuals.

weighted

Whether to compute weighted residuals.

Details

For score and Schoenfeld type residuals, the proportional hazards model must include at least one covariate. The algorithms for deviance, dfbeta, dfbetas, and scaledsch residuals follow the residuals.coxph function in the survival package.

Value

For martingale and deviance residuals, the result is a vector with one element corresponding to each subject (without collapse). For score residuals, the result is a matrix where each row represents a subject and each column corresponds to a variable. The row order aligns with the input data used in the original fit. For Schoenfeld residuals, the result is a matrix with one row for each event and one column per variable. These rows are sorted by time within strata, with the attributes stratum and time included.

Score residuals represent each individual's contribution to the score vector. Two commonly used transformations of this are dfbeta, which represents the approximate change in the coefficient vector if the observation is excluded, and dfbetas, which gives the approximate change in the coefficients scaled by the standard error of the coefficients.

Author(s)

Kaifeng Lu, [email protected]

References

Terry M. Therneau, Patricia M. Grambsch, and Thomas M. Fleming. Martingale based residuals for survival models. Biometrika 1990; 77:147-160.

Patricia M. Grambsch and Terry M. Therneau. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994; 81:515-26.

Examples

library(dplyr)

# Example 1 with right-censored data
fit1 <- phregr(data = rawdata %>% filter(iterationNumber == 1) %>%
                 mutate(treat = 1*(treatmentGroup == 1)),
               stratum = "stratum",
               time = "timeUnderObservation", event = "event",
               covariates = "treat")

ressco <- residuals_phregr(fit1, type = "score")
head(ressco)

# Example 2 with counting process data
fit2 <- phregr(data = heart %>% mutate(rx = as.numeric(transplant) - 1),
               time = "start", time2 = "stop", event = "event",
               covariates = c("rx", "age"), id = "id", robust = TRUE)

resssch <- residuals_phregr(fit2, type = "scaledsch")
head(resssch)

Estimate of Restricted Mean Survival Time Difference

Description

Obtains the estimate of restricted mean survival time difference between two treatment groups.

Usage

rmdiff(
  data,
  stratum = "",
  treat = "treat",
  time = "time",
  event = "event",
  milestone = 0,
  rmstDiffH0 = 0,
  conflev = 0.95,
  biascorrection = FALSE
)

Arguments

data

The input data frame that contains the following variables:

  • stratum: The stratum.

  • treat: The treatment.

  • time: The possibly right-censored survival time.

  • event: The event indicator.

stratum

The name of the stratum variable in the input data.

treat

The name of the treatment variable in the input data.

time

The name of the time variable in the input data.

event

The name of the event variable in the input data.

milestone

The milestone time at which to calculate the restricted mean survival time.

rmstDiffH0

The difference in restricted mean survival times under the null hypothesis. Defaults to 0 for superiority test.

conflev

The level of the two-sided confidence interval for the difference in restricted mean survival times. Defaults to 0.95.

biascorrection

Whether to apply bias correction for the variance estimate of individual restricted mean survival times. Defaults to no bias correction.

Value

A data frame with the following variables:

  • milestone: The milestone time relative to randomization.

  • rmstDiffH0: The difference in restricted mean survival times under the null hypothesis.

  • rmst1: The estimated restricted mean survival time for the treatment group.

  • rmst2: The estimated restricted mean survival time for the control group.

  • rmstDiff: The estimated difference in restricted mean survival times.

  • vrmst1: The variance for rmst1.

  • vrmst2: The variance for rmst2.

  • sermstDiff: The standard error for rmstDiff.

  • rmstDiffZ: The Z-statistic value.

  • rmstDiffPValue: The two-sided p-value.

  • lower: The lower bound of confidence interval.

  • upper: The upper bound of confidence interval.

  • conflev: The level of confidence interval.

  • biascorrection: Whether to apply bias correction for the variance estimate of individual restricted mean survival times.

Author(s)

Kaifeng Lu, [email protected]

Examples

rmdiff(data = rawdata[rawdata$iterationNumber == 1, ],
       stratum = "stratum", treat = "treatmentGroup",
       time = "timeUnderObservation", event = "event",
       milestone = 12)

Estimate of Restricted Mean Survival Time

Description

Obtains the estimate of restricted means survival time for each stratum.

Usage

rmest(
  data,
  stratum = "",
  time = "time",
  event = "event",
  milestone = 0,
  conflev = 0.95,
  biascorrection = FALSE
)

Arguments

data

The input data frame that contains the following variables:

  • stratum: The stratum.

  • time: The possibly right-censored survival time.

  • event: The event indicator.

stratum

The name of the stratum variable in the input data.

time

The name of the time variable in the input data.

event

The name of the event variable in the input data.

milestone

The milestone time at which to calculate the restricted mean survival time.

conflev

The level of the two-sided confidence interval for the survival probabilities. Defaults to 0.95.

biascorrection

Whether to apply bias correction for the variance estimate. Defaults to no bias correction.

Value

A data frame with the following variables:

  • stratum: The stratum variable.

  • size: The number of subjects in the stratum.

  • milestone: The milestone time relative to randomization.

  • rmst: The estimate of restricted mean survival time.

  • stderr: The standard error of the estimated rmst.

  • lower: The lower bound of confidence interval if requested.

  • upper: The upper bound of confidence interval if requested.

  • conflev: The level of confidence interval if requested.

  • biascorrection: Whether to apply bias correction for the variance estimate.

Author(s)

Kaifeng Lu, [email protected]

Examples

rmest(data = aml, stratum = "x",
      time = "time", event = "status", milestone = 24)

Rank Preserving Structural Failure Time Model (RPSFTM) for Treatment Switching

Description

Estimates the causal treatment effect parameter using g-estimation based on the log-rank test, Cox model, or parametric survival/accelerated failure time (AFT) model. The method uses counterfactual untreated survival times to estimate the causal parameter and derives the adjusted hazard ratio from the Cox model using counterfactual unswitched survival times.

Usage

rpsftm(
  data,
  id = "id",
  stratum = "",
  time = "time",
  event = "event",
  treat = "treat",
  rx = "rx",
  censor_time = "censor_time",
  base_cov = "",
  psi_test = "logrank",
  aft_dist = "weibull",
  strata_main_effect_only = TRUE,
  low_psi = -2,
  hi_psi = 2,
  n_eval_z = 101,
  treat_modifier = 1,
  recensor = TRUE,
  admin_recensor_only = TRUE,
  autoswitch = TRUE,
  gridsearch = TRUE,
  root_finding = "brent",
  alpha = 0.05,
  ties = "efron",
  tol = 1e-06,
  boot = FALSE,
  n_boot = 1000,
  seed = 0,
  nthreads = 0
)

Arguments

data

The input data frame that contains the following variables:

  • id: The subject id.

  • stratum: The stratum.

  • time: The survival time for right censored data.

  • event: The event indicator, 1=event, 0=no event.

  • treat: The randomized treatment indicator, 1=treatment, 0=control.

  • rx: The proportion of time on active treatment.

  • censor_time: The administrative censoring time. It should be provided for all subjects including those who had events.

  • base_cov: The baseline covariates (excluding treat).

id

The name of the id variable in the input data.

stratum

The name(s) of the stratum variable(s) in the input data.

time

The name of the time variable in the input data.

event

The name of the event variable in the input data.

treat

The name of the treatment variable in the input data.

rx

The name of the rx variable in the input data.

censor_time

The name of the censor_time variable in the input data.

base_cov

The names of baseline covariates (excluding treat) in the input data for the outcome Cox model. These covariates will also be used in the Cox model for estimating psi when psi_test = "phreg" and in the parametric survival regression/AFT model for estimating psi when psi_test = "lifereg".

psi_test

The survival function to calculate the Z-statistic, e.g., "logrank" (default), "phreg", or "lifereg".

aft_dist

The assumed distribution for time to event for the AFT model when psi_test = "lifereg". Options include "exponential", "weibull" (default), "loglogistic", and "lognormal".

strata_main_effect_only

Whether to only include the strata main effects in the AFT model. Defaults to TRUE, otherwise all possible strata combinations will be considered in the AFT model.

low_psi

The lower limit of the causal parameter.

hi_psi

The upper limit of the causal parameter.

n_eval_z

The number of points between low_psi and hi_psi (inclusive) at which to evaluate the Z-statistics.

treat_modifier

The optional sensitivity parameter for the constant treatment effect assumption.

recensor

Whether to apply recensoring to counterfactual survival times. Defaults to TRUE.

admin_recensor_only

Whether to apply recensoring to administrative censoring times only. Defaults to TRUE. If FALSE, recensoring will be applied to the actual censoring times for dropouts.

autoswitch

Whether to exclude recensoring for treatment arms with no switching. Defaults to TRUE.

gridsearch

Whether to use grid search to estimate the causal parameter psi. Defaults to TRUE, otherwise, a root finding algorithm will be used.

root_finding

Character string specifying the univariate root-finding algorithm to use. Options are "brent" (default) for Brent's method, or "bisection" for the bisection method.

alpha

The significance level to calculate confidence intervals.

ties

The method for handling ties in the Cox model, either "breslow" or "efron" (default).

tol

The desired accuracy (convergence tolerance) for psi for the root finding algorithm.

boot

Whether to use bootstrap to obtain the confidence interval for hazard ratio. Defaults to FALSE, in which case, the confidence interval will be constructed to match the log-rank test p-value.

n_boot

The number of bootstrap samples.

seed

The seed to reproduce the bootstrap results.

nthreads

The number of threads to use in bootstrapping (0 means the default RcppParallel behavior).

Details

Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:

  • Estimate the causal parameter ψ\psi using g-estimation based on the log-rank test (default), Cox model, or parametric survival/AFT model, using counterfactual untreated survival times for both arms:

    Ui,ψ=TCi+eψTEiU_{i,\psi} = T_{C_i} + e^{\psi}T_{E_i}

  • Compute counterfactual survival times for control patients using the estimated ψ\psi.

  • Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.

  • Obtain the confidence interval for the hazard ratio using either the ITT log-rank test p-value or bootstrap. When bootstrapping, the interval and p-value are derived from a t-distribution with n_boot - 1 degrees of freedom.

If grid search is used to estimate ψ\psi, the estimated ψ\psi is the one with the smallest absolute value among those at which the Z-statistic is zero based on linear interpolation. If root finding is used, the estimated ψ\psi is the solution to the equation where the Z-statistic is zero.

Value

A list with the following components:

  • psi: The estimated causal parameter.

  • psi_roots: Vector of psi values at which the Z-statistic is zero, identified using grid search and linear interpolation.

  • psi_CI: The confidence interval for psi.

  • psi_CI_type: The type of confidence interval for psi, i.e., "grid search", "root finding", or "bootstrap".

  • pvalue: The two-sided p-value.

  • pvalue_type: The type of two-sided p-value for treatment effect, i.e., "log-rank" or "bootstrap".

  • hr: The estimated hazard ratio from the Cox model.

  • hr_CI: The confidence interval for hazard ratio.

  • hr_CI_type: The type of confidence interval for hazard ratio, either "log-rank p-value" or "bootstrap".

  • event_summary: A data frame containing the count and percentage of deaths and switches by treatment arm.

  • eval_z: A data frame containing the Z-statistics for treatment effect evaluated at a sequence of psi values. Used to plot and check if the range of psi values to search for the solution and limits of confidence interval of psi need be modified.

  • Sstar: A data frame containing the counterfactual untreated survival times and event indicators for each treatment group. The variables include id, stratum, "t_star", "d_star", "treated", base_cov, and treat.

  • kmstar: A data frame containing the Kaplan-Meier estimates based on the counterfactual untreated survival times by treatment arm.

  • data_outcome: The input data for the outcome Cox model of counterfactual unswitched survival times. The variables include id, stratum, "t_star", "d_star", "treated", base_cov, and treat.

  • km_outcome: The Kaplan-Meier estimates of the survival functions for the treatment and control groups based on the counterfactual unswitched survival times.

  • lr_outcome: The log-rank test results for the treatment effect based on the counterfactual unswitched survival times.

  • fit_outcome: The fitted outcome Cox model.

  • fail: Whether a model fails to converge.

  • psimissing: Whether the psi parameter cannot be estimated.

  • settings: A list containing the input parameter values.

  • fail_boots: The indicators for failed bootstrap samples if boot is TRUE.

  • fail_boots_data: The data for failed bootstrap samples if boot is TRUE.

  • hr_boots: The bootstrap hazard ratio estimates if boot is TRUE.

  • psi_boots: The bootstrap psi estimates if boot is TRUE.

Author(s)

Kaifeng Lu, [email protected]

References

James M. Robins and Anastasios A. Tsiatis. Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics. 1991;20(8):2609-2631.

Ian R. White, Adbel G. Babiker, Sarah Walker, and Janet H. Darbyshire. Randomization-based methods for correcting for treatment changes: Examples from the CONCORDE trial. Statistics in Medicine. 1999;18(19):2617-2634.

Examples

library(dplyr)

# Example 1: one-way treatment switching (control to active)

data <- immdef %>% mutate(rx = 1-xoyrs/progyrs)

fit1 <- rpsftm(
  data, id = "id", time = "progyrs", event = "prog", treat = "imm",
  rx = "rx", censor_time = "censyrs", boot = FALSE)

fit1

# Example 2: two-way treatment switching (illustration only)

# the eventual survival time
shilong1 <- shilong %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(-c("ps", "ttc", "tran"))

shilong2 <- shilong1 %>%
  mutate(rx = ifelse(co, ifelse(bras.f == "MTA", dco/ady, 
                                1 - dco/ady),
                     ifelse(bras.f == "MTA", 1, 0)))

fit2 <- rpsftm(
  shilong2, id = "id", time = "tstop", event = "event",
  treat = "bras.f", rx = "rx", censor_time = "dcut",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
               "pathway.f"),
  low_psi = -3, hi_psi = 3, boot = FALSE)

fit2

Urinary tract infection data from the logistf package

Description

This data set deals with urinary tract infection in sexually active college women, along with covariate information on age an contraceptive use. The variables are all binary and coded in 1 (condition is present) and 0 (condition is absent).

Usage

sexagg

Format

An object of class data.frame with 36 rows and 9 columns.

Details

case

urinary tract infection, the study outcome variable

age

>= 24 years

dia

use of diaphragm

oc

use of oral contraceptive

vic

use of condom

vicl

use of lubricated condom

vis

use of spermicide


The randomized clinical trial SHIVA data in long format from the ipcwswitch package

Description

The original SHIdat data set contains an anonymized excerpt of data from the SHIVA01 trial. This was the first randomized clinical trial that aimed at comparing molecularly targeted therapy based on tumor profiling (MTA) versus conventional therapy (CT) for advanced cancer. Patients were randomly assigned to receive the active or control treatment and may switch to the other arm or subsequent anti-cancer therapy upon disease progression. The restructured data is in the long format.

id

The patient's identifier

tstart

The start of the time interval

tstop

The end of the time interval

event

Whether the patient died at the end of the interval

agerand

The patient's age (in years) at randomization

sex.f

The patients' gender, either Male or Female

tt_Lnum

The number of previous lines of treatment

rmh_alea.c

The Royal Marsden Hospital score segregated into two categories

pathway.f

The molecular pathway altered (the hormone receptors pathway, the PI3K/ AKT/mTOR pathway, and the RAF/MEK pathway)

bras.f

The patient's randomized arm, either MTA or CT

ps

The ECOG performance status

ttc

The presence of concomitant treatments

tran

The use of platelet transfusions

dpd

The relative day of a potential progression

dco

The relative day of treatment switching

ady

The relative day of the latest news

dcut

The relative day of administrative cutoff

pd

Whether the patient had disease progression

co

Whether the patient switched treatment

Usage

shilong

Format

An object of class data.frame with 602 rows and 19 columns.


The repeated measures data from the "Six Cities" study of the health effects of air pollution (Ware et al. 1984).

Description

The data analyzed are the 16 selected cases in Lipsitz et al. (1994). The binary response is the wheezing status of 16 children at ages 9, 10, 11, and 12 years. A value of 1 of wheezing status indicates the occurrence of wheezing. The explanatory variables city of residence, age, and maternal smoking status at the particular age.

Usage

six

Format

An object of class tbl_df (inherits from tbl, data.frame) with 64 rows and 6 columns.

Details

case

case id

city

city of residence

age

age of the child

smoke

maternal smoking status

wheeze

wheezing status


Survival Curve for Proportional Hazards Regression Models

Description

Obtains the predicted survivor function for a proportional hazards regression model.

Usage

survfit_phregr(
  object,
  newdata,
  sefit = TRUE,
  conftype = "log-log",
  conflev = 0.95
)

Arguments

object

The output from the phregr call.

newdata

A data frame with the same variable names as those that appear in the phregr call. For right-censored data, one curve is produced per row to represent a cohort whose covariates correspond to the values in newdata. For counting-process data, one curve is produced per id in newdata to present the survival curve along the path of time-dependent covariates at the observed event times in the data used to fit phregr.

sefit

Whether to compute the standard error of the survival estimates.

conftype

The type of the confidence interval. One of "none", "plain", "log", "log-log" (the default), or "arcsin". The arcsin option bases the intervals on asin(sqrt(surv)).

conflev

The level of the two-sided confidence interval for the survival probabilities. Defaults to 0.95.

Details

If newdata is not provided and there is no covariate, survival curves based on the basehaz data frame will be produced.

Value

A data frame with the following variables:

  • id: The id of the subject for counting-process data with time-dependent covariates.

  • time: The observed times in the data used to fit phregr.

  • nrisk: The number of patients at risk at the time point in the data used to fit phregr.

  • nevent: The number of patients having event at the time point in the data used to fit phregr.

  • cumhaz: The cumulative hazard at the time point.

  • surv: The estimated survival probability at the time point.

  • sesurv: The standard error of the estimated survival probability.

  • lower: The lower confidence limit for survival probability.

  • upper: The upper confidence limit for survival probability.

  • conflev: The level of the two-sided confidence interval.

  • conftype: The type of the confidence interval.

  • covariates: The values of covariates based on newdata.

  • stratum: The stratum of the subject.

Author(s)

Kaifeng Lu, [email protected]

References

Terry M. Therneau and Patricia M. Grambsch. Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.

Examples

library(dplyr)

# Example 1 with right-censored data
fit1 <- phregr(data = rawdata %>% filter(iterationNumber == 1) %>%
                 mutate(treat = 1*(treatmentGroup == 1)),
               stratum = "stratum",
               time = "timeUnderObservation", event = "event",
               covariates = "treat")

surv1 <- survfit_phregr(fit1,
                        newdata = data.frame(
                          stratum = as.integer(c(1,1,2,2)),
                          treat = c(1,0,1,0)))
head(surv1)

# Example 2 with counting process data and robust variance estimate
fit2 <- phregr(data = heart %>% mutate(rx = as.numeric(transplant) - 1),
               time = "start", time2 = "stop", event = "event",
               covariates = c("rx", "age"), id = "id", robust = TRUE)

surv2 <- survfit_phregr(fit2,
                        newdata = data.frame(
                          id = c(4,4,11,11),
                          age = c(-7.737,-7.737,-0.019,-0.019),
                          start = c(0,36,0,26),
                          stop = c(36,39,26,153),
                          rx = c(0,1,0,1)))
head(surv2)

Brookmeyer-Crowley Confidence Interval for Quantiles of Right-Censored Time-to-Event Data

Description

Obtains the Brookmeyer-Crowley confidence interval for quantiles of right-censored time-to-event data.

Usage

survQuantile(
  time,
  event,
  cilevel = 0.95,
  transform = "loglog",
  probs = as.numeric(c(0.25, 0.5, 0.75))
)

Arguments

time

The vector of possibly right-censored survival times.

event

The vector of event indicators.

cilevel

The confidence interval level. Defaults to 0.95.

transform

The transformation of the survival function to use to construct the confidence interval. Options include "linear" (alternatively "plain"), "log", "loglog" (alternatively "log-log" or "cloglog"), "asinsqrt" (alternatively "asin" or "arcsin"), and "logit". Defaults to "loglog".

probs

The vector of probabilities to calculate the quantiles. Defaults to c(0.25, 0.5, 0.75).

Value

A data frame containing the estimated quantile and confidence interval corresponding to each specified probability. It includes the following variables:

  • prob: The probability to calculate the quantile.

  • quantile: The estimated quantile.

  • lower: The lower limit of the confidence interval.

  • upper: The upper limit of the confidence interval.

  • cilevel: The confidence interval level.

  • transform: The transformation of the survival function to use to construct the confidence interval.

Author(s)

Kaifeng Lu, [email protected]

Examples

survQuantile(
  time = c(33.7, 3.9, 10.5, 5.4, 19.5, 23.8, 7.9, 16.9, 16.6,
           33.7, 17.1, 7.9, 10.5, 38),
  event = c(0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1),
  probs = c(0.25, 0.5, 0.75))

Tobin's tobit data from the survival package

Description

Data from Tobin's original paper.

durable

Durable goods purchase

age

Age in years

quant

Liquidity ratio (x 1000)

Usage

tobin

Format

An object of class data.frame with 20 rows and 3 columns.


Two-Stage Estimation with g-Estimation (TSEgest) for Treatment Switching

Description

Estimates the causal parameter using g-estimation by fitting a pooled logistic regression switching model that includes counterfactual unswitched survival times and time-dependent confounders as covariates. The adjusted hazard ratio is then obtained from the Cox model using counterfactual unswitched survival times based on the estimated causal parameter.

Usage

tsegest(
  data,
  id = "id",
  stratum = "",
  tstart = "tstart",
  tstop = "tstop",
  event = "event",
  treat = "treat",
  censor_time = "censor_time",
  pd = "pd",
  pd_time = "pd_time",
  swtrt = "swtrt",
  swtrt_time = "swtrt_time",
  base_cov = "",
  conf_cov = "",
  strata_main_effect_only = TRUE,
  ns_df = 3,
  firth = FALSE,
  flic = FALSE,
  low_psi = -2,
  hi_psi = 2,
  n_eval_z = 101,
  recensor = TRUE,
  admin_recensor_only = TRUE,
  swtrt_control_only = TRUE,
  gridsearch = TRUE,
  root_finding = "brent",
  alpha = 0.05,
  ties = "efron",
  tol = 1e-06,
  offset = 1,
  boot = TRUE,
  n_boot = 1000,
  seed = 0,
  nthreads = 0
)

Arguments

data

The input data frame that contains the following variables:

  • id: The id to identify observations belonging to the same subject for counting process data with time-dependent covariates.

  • stratum: The stratum.

  • tstart: The starting time of the time interval for counting-process data with time-dependent covariates.

  • tstop: The stopping time of the time interval for counting-process data with time-dependent covariates.

  • event: The event indicator, 1=event, 0=no event.

  • treat: The randomized treatment indicator, 1=treatment, 0=control.

  • censor_time: The administrative censoring time. It should be provided for all subjects including those who had events.

  • pd: The disease progression indicator, 1=PD, 0=no PD.

  • pd_time: The time from randomization to disease progression.

  • swtrt: The treatment switch indicator, 1=switch, 0=no switch.

  • swtrt_time: The time from randomization to treatment switch.

  • base_cov: The baseline covariates (excluding treat).

  • conf_cov: The confounding variables (excluding treat) for predicting treatment switching.

id

The name of the id variable in the input data.

stratum

The name(s) of the stratum variable(s) in the input data.

tstart

The name of the tstart variable in the input data.

tstop

The name of the tstop variable in the input data.

event

The name of the event variable in the input data.

treat

The name of the treatment variable in the input data.

censor_time

The name of the censor_time variable in the input data.

pd

The name of the pd variable in the input data.

pd_time

The name of the pd_time variable in the input data.

swtrt

The name of the swtrt variable in the input data.

swtrt_time

The name of the swtrt_time variable in the input data.

base_cov

The names of baseline covariates (excluding treat) in the input data for the Cox model.

conf_cov

The names of confounding variables (excluding treat) in the input data for the logistic regression switching model.

strata_main_effect_only

Whether to only include the strata main effects in the logistic regression switching model. Defaults to TRUE, otherwise all possible strata combinations will be considered in the switching model.

ns_df

Degrees of freedom for the natural cubic spline for visit-specific intercepts of the pooled logistic regression model. Defaults to 3 for two internal knots at the 33 and 67 percentiles of the treatment switching times.

firth

Whether the Firth's bias reducing penalized likelihood should be used.

flic

Whether to apply intercept correction to obtain more accurate predicted probabilities.

low_psi

The lower limit of the causal parameter.

hi_psi

The upper limit of the causal parameter.

n_eval_z

The number of points between low_psi and hi_psi (inclusive) at which to evaluate the Wald statistics for the coefficient of the counterfactual in the logistic regression switching model.

recensor

Whether to apply recensoring to counterfactual survival times. Defaults to TRUE.

admin_recensor_only

Whether to apply recensoring to administrative censoring times only. Defaults to TRUE. If FALSE, recensoring will be applied to the actual censoring times for dropouts.

swtrt_control_only

Whether treatment switching occurred only in the control group. The default is TRUE.

gridsearch

Whether to use grid search to estimate the causal parameter psi. Defaults to TRUE, otherwise, a root finding algorithm will be used.

root_finding

Character string specifying the univariate root-finding algorithm to use. Options are "brent" (default) for Brent's method, or "bisection" for the bisection method.

alpha

The significance level to calculate confidence intervals.

ties

The method for handling ties in the Cox model, either "breslow" or "efron" (default).

tol

The desired accuracy (convergence tolerance) for psi for the root finding algorithm.

offset

The offset to calculate the time from disease progression to death or censoring, the time from disease progression to treatment switch, and the time from treatment switch to death or censoring. We can set offset equal to 0 (no offset), and 1 (default), 1/30.4375, or 1/365.25 if the time unit is day, month, or year, respectively.

boot

Whether to use bootstrap to obtain the confidence interval for hazard ratio. Defaults to TRUE.

n_boot

The number of bootstrap samples.

seed

The seed to reproduce the bootstrap results.

nthreads

The number of threads to use in bootstrapping (0 means the default RcppParallel behavior).

Details

Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:

  • Fit a pooled logistic regression switching model among control-arm patients who experienced disease progression:

    logit(p(Eik))=αUi,ψ+jβjxijk\text{logit}(p(E_{ik})) = \alpha U_{i,\psi} + \sum_{j} \beta_j x_{ijk}

    where EikE_{ik} is the switch indicator for subject ii at observation kk,

    Ui,ψ=TCi+eψTEiU_{i,\psi} = T_{C_i} + e^{\psi}T_{E_i}

    is the counterfactual survival time given a specific ψ\psi, and xijkx_{ijk} represents the time-dependent confounders. Natural cubic splines of time can be included to model time-varying baseline hazards. Ui,ψU_{i,\psi} is defined relative to the secondary baseline at disease progression and represents post-progression counterfactual survival, where TCiT_{C_i} and TEiT_{E_i} correspond to time spent after progression on control and experimental treatments, respectively. Martingale residuals may be used in place of counterfactual survival times to account for censoring.

  • Identify the value of ψ\psi for which the Z-statistic of α\alpha is approximately zero. This value is the causal parameter estimate.

  • Compute counterfactual survival times for control patients using the estimated ψ\psi.

  • Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.

  • When bootstrapping is used, derive the confidence interval and p-value for the hazard ratio from a t-distribution with n_boot - 1 degrees of freedom.

If treatment switching occurs before or in the absence of recorded disease progression, the patient is considered to have progressed at the time of treatment switching.

If grid search is used to estimate ψ\psi, the estimated ψ\psi is the one with the smallest absolute value among those at which the Z-statistic is zero based on linear interpolation. If root finding is used, the estimated ψ\psi is the solution to the equation where the Z-statistic is zero.

Value

A list with the following components:

  • psi: The estimated causal parameter for the control group.

  • psi_roots: Vector of psi values for the control group at which the Z-statistic is zero, identified using grid search and linear interpolation.

  • psi_CI: The confidence interval for psi.

  • psi_CI_type: The type of confidence interval for psi, i.e., "grid search", "root finding", or "bootstrap".

  • logrank_pvalue: The two-sided p-value of the log-rank test for the ITT analysis.

  • cox_pvalue: The two-sided p-value for treatment effect based on the Cox model applied to counterfactual unswitched survival times. If boot is TRUE, this value represents the bootstrap p-value.

  • hr: The estimated hazard ratio from the Cox model.

  • hr_CI: The confidence interval for hazard ratio.

  • hr_CI_type: The type of confidence interval for hazard ratio, either "Cox model" or "bootstrap".

  • event_summary: A data frame containing the count and percentage of deaths, disease progressions, and switches by treatment arm.

  • data_switch: The list of input data for the time from disease progression to switch by treatment group. The variables include id, stratum, "swtrt", and "swtrt_time". If swtrt == 0, then swtrt_time is censored at the time from disease progression to death or censoring.

  • km_switch: The list of Kaplan-Meier plot data for the time from disease progression to switch by treatment group.

  • eval_z: The list of data by treatment group containing the Wald statistics for the coefficient of the counterfactual in the logistic regression switching model, evaluated at a sequence of psi values. Used to plot and check if the range of psi values to search for the solution and limits of confidence interval of psi need be modified.

  • data_nullcox: The list of input data for counterfactual survival times for the null Cox model by treatment group. The variables include id, stratum, "t_star" and "d_star".

  • fit_nullcox: The list of fitted null Cox models for counterfactual survival times by treatment group, which contains the martingale residuals.

  • data_logis: The list of input data for pooled logistic regression models for treatment switching using g-estimation. The variables include id, stratum, "tstart", "tstop", "cross", "counterfactual", conf_cov, ns, pd_time, swtrt, and swtrt_time.

  • fit_logis: The list of fitted pooled logistic regression models for treatment switching using g-estimation.

  • data_outcome: The input data for the outcome Cox model of counterfactual unswitched survival times. The variables include id, stratum, "t_star", "d_star", "treated", base_cov and treat.

  • km_outcome: The Kaplan-Meier estimates of the survival functions for the treatment and control groups based on the counterfactual unswitched survival times.

  • lr_outcome: The log-rank test results for the treatment effect based on the counterfactual unswitched survival times.

  • fit_outcome: The fitted outcome Cox model.

  • fail: Whether a model fails to converge.

  • psimissing: Whether the psi parameter cannot be estimated.

  • settings: A list containing the input parameter values.

  • psi_trt: The estimated causal parameter for the experimental group if swtrt_control_only is FALSE.

  • psi_trt_roots: Vector of psi_trt values for the experimental group at which the Z-statistic is zero, identified using grid search and linear interpolation, if swtrt_control_only is FALSE.

  • psi_trt_CI: The confidence interval for psi_trt if swtrt_control_only is FALSE.

  • fail_boots: The indicators for failed bootstrap samples if boot is TRUE.

  • fail_boots_data: The data for failed bootstrap samples if boot is TRUE.

  • hr_boots: The bootstrap hazard ratio estimates if boot is TRUE.

  • psi_boots: The bootstrap psi estimates if boot is TRUE.

  • psi_trt_boots: The bootstrap psi_trt estimates if boot is TRUE and swtrt_control_only is FALSE.

Author(s)

Kaifeng Lu, [email protected]

References

NR Latimer, IR White, K Tilling, and U Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.

Examples

library(dplyr)

sim1 <- tsegestsim(
  n = 500, allocation1 = 2, allocation2 = 1, pbprog = 0.5, 
  trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, 
  scale1 = 360, shape2 = 1.7, scale2 = 688, 
  pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, 
  pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, 
  catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, 
  ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, 
  milestone = 546, seed = 2000)
  
data1 <- sim1$paneldata %>%
  mutate(visit7on = ifelse(progressed, tstop > timePFSobs + 105, 0))

fit1 <- tsegest(
  data = data1, id = "id", 
  tstart = "tstart", tstop = "tstop", event = "event", 
  treat = "trtrand", censor_time = "censor_time", 
  pd = "progressed", pd_time = "timePFSobs", 
  swtrt = "xo", swtrt_time = "xotime", 
  base_cov = "bprog", 
  conf_cov = c("bprog*cattdc", "timePFSobs", "visit7on"), 
  ns_df = 3, low_psi = -1, hi_psi = 1, n_eval_z = 101,
  recensor = TRUE, admin_recensor_only = TRUE, 
  swtrt_control_only = TRUE, alpha = 0.05, ties = "efron", 
  tol = 1.0e-6, offset = 0, boot = FALSE)
  
fit1

Simulate Survival Data for Two-Stage Estimation with g-estimation

Description

Obtains the simulated data for baseline prognosis, disease progression, treatment switching, death, and time-dependent covariates.

Usage

tsegestsim(
  n = 500L,
  allocation1 = 2L,
  allocation2 = 1L,
  pbprog = 0.5,
  trtlghr = -0.5,
  bprogsl = 0.3,
  shape1 = 1.8,
  scale1 = 360,
  shape2 = 1.7,
  scale2 = 688,
  pmix = 0.5,
  admin = 5000,
  pcatnotrtbprog = 0.5,
  pcattrtbprog = 0.25,
  pcatnotrt = 0.2,
  pcattrt = 0.1,
  catmult = 0.5,
  tdxo = 1,
  ppoor = 0.1,
  pgood = 0.04,
  ppoormet = 0.4,
  pgoodmet = 0.2,
  xomult = 1.4188308,
  milestone = 546,
  seed = 0L
)

Arguments

n

The total sample size for two treatment arms combined.

allocation1

The number of subjects in the active treatment group in a randomization block.

allocation2

The number of subjects in the control group in a randomization block.

pbprog

The probability of having poor prognosis at baseline.

trtlghr

The treatment effect in terms of log hazard ratio.

bprogsl

The poor prognosis effect in terms of log hazard ratio.

shape1

The shape parameter for the Weibull event distribution for the first component.

scale1

The scale parameter for the Weibull event distribution for the first component.

shape2

The shape parameter for the Weibull event distribution for the second component.

scale2

The scale parameter for the Weibull event distribution for the second component.

pmix

The mixing probability of the first component Weibull distribution.

admin

The administrative censoring time.

pcatnotrtbprog

The probability of developing metastatic disease on control treatment with poor baseline prognosis.

pcattrtbprog

The probability of developing metastatic disease on active treatment with poor baseline prognosis.

pcatnotrt

The probability of developing metastatic disease on control treatment with good baseline prognosis.

pcattrt

The probability of developing metastatic disease on active treatment with good baseline prognosis.

catmult

The impact of metastatic disease on shortening remaining survival time.

tdxo

Whether treatment crossover depends on time-dependent covariates between disease progression and treatment switching.

ppoor

The probability of switching for poor baseline prognosis with no metastatic disease.

pgood

The probability of switching for good baseline prognosis with no metastatic disease.

ppoormet

The probability of switching for poor baseline prognosis after developing metastatic disease.

pgoodmet

The probability of switching for good baseline prognosis after developing metastatic disease.

xomult

The direct effect of crossover on extending remaining survival time.

milestone

The milestone to calculate restricted mean survival time.

seed

The seed to reproduce the simulation results.

Value

A list with two data frames.

  • sumdata: A summary data frame with the following variables:

    • simtrueconstmean: The true control group restricted mean survival time (RMST).

    • simtrueconstlb: The lower bound for control group RMST.

    • simtrueconstub: The upper bound for control group RMST.

    • simtrueconstse: The standard error for control group RMST.

    • simtrueexpstmean: The true experimental group restricted mean survival time (RMST).

    • simtrueexpstlb: The lower bound for experimental group RMST.

    • simtrueexpstub: The upper bound for experimental group RMST.

    • simtrueexpstse: The standard error for experimental group RMST.

    • simtrue_coxwbprog_hr: The treatment hazard ratio from the Cox model adjusting for baseline prognosis.

    • simtrue_cox_hr: The treatment hazard ratio from the Cox model without adjusting for baseline prognosis.

    • simtrue_aftwbprog_af: The average acceleration factor from the Weibull AFT model adjusting for baseline prognosis.

    • simtrue_aft_af: The average acceleration factor from the Weibull AFT model without adjusting for baseline prognosis.

  • paneldata: A counting process style subject-level data frame with the following variables:

    • id: The subject ID.

    • trtrand: The randomized treatment arm.

    • bprog: Whether the patient had poor baseline prognosis.

    • tstart: The left end of time interval.

    • tstop: The right end of time interval.

    • event: Whether the patient died at the end of the interval.

    • timeOS: The observed survival time.

    • died: Whether the patient died during the study.

    • progressed: Whether the patient had disease progression.

    • timePFSobs: The observed time of disease progression at regular scheduled visits.

    • progtdc: The time-dependent covariate for progression.

    • catevent: Whether the patient developed metastatic disease.

    • cattime: When the patient developed metastatic disease.

    • cattdc: The time-dependent covariate for cat event.

    • xo: Whether the patient switched treatment.

    • xotime: When the patient switched treatment.

    • xotdc: The time-dependent covariate for treatment switching.

    • censor_time: The administrative censoring time.

Author(s)

Kaifeng Lu, [email protected]

References

NR Latimer, IR White, K Tilling, and U Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.

Examples

sim1 <- tsegestsim(
  n = 500, allocation1 = 2, allocation2 = 1, pbprog = 0.5, 
  trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, 
  scale1 = 360, shape2 = 1.7, scale2 = 688, 
  pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, 
  pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, 
  catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, 
  ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, 
  milestone = 546, seed = 2000)

Simple Two-Stage Estimation (TSEsimp) for Treatment Switching

Description

Estimates the causal parameter by fitting an accelerated failure time (AFT) model comparing post-progression survival between switchers and non-switchers, and derives the adjusted hazard ratio from the Cox model using counterfactual unswitched survival times based on the estimated causal parameter.

Usage

tsesimp(
  data,
  id = "id",
  stratum = "",
  time = "time",
  event = "event",
  treat = "treat",
  censor_time = "censor_time",
  pd = "pd",
  pd_time = "pd_time",
  swtrt = "swtrt",
  swtrt_time = "swtrt_time",
  base_cov = "",
  base2_cov = "",
  aft_dist = "weibull",
  strata_main_effect_only = TRUE,
  recensor = TRUE,
  admin_recensor_only = TRUE,
  swtrt_control_only = TRUE,
  alpha = 0.05,
  ties = "efron",
  offset = 1,
  boot = TRUE,
  n_boot = 1000,
  seed = 0,
  nthreads = 0
)

Arguments

data

The input data frame that contains the following variables:

  • id: The subject id.

  • stratum: The stratum.

  • time: The survival time for right censored data.

  • event: The event indicator, 1=event, 0=no event.

  • treat: The randomized treatment indicator, 1=treatment, 0=control.

  • censor_time: The administrative censoring time. It should be provided for all subjects including those who had events.

  • pd: The disease progression indicator, 1=PD, 0=no PD.

  • pd_time: The time from randomization to disease progression.

  • swtrt: The treatment switch indicator, 1=switch, 0=no switch.

  • swtrt_time: The time from randomization to treatment switch.

  • base_cov: The baseline covariates (excluding treat).

  • base2_cov: The baseline and secondary baseline covariates (excluding treat).

id

The name of the id variable in the input data.

stratum

The name(s) of the stratum variable(s) in the input data.

time

The name of the time variable in the input data.

event

The name of the event variable in the input data.

treat

The name of the treatment variable in the input data.

censor_time

The name of the censor_time variable in the input data.

pd

The name of the pd variable in the input data.

pd_time

The name of the pd_time variable in the input data.

swtrt

The name of the swtrt variable in the input data.

swtrt_time

The name of the swtrt_time variable in the input data.

base_cov

The names of baseline covariates (excluding treat) in the input data for the outcome Cox model.

base2_cov

The names of baseline and secondary baseline covariates (excluding treat) in the input data for the AFT model for post-progression survival.

aft_dist

The assumed distribution for time to event for the AFT model. Options include "exponential", "weibull" (default), "loglogistic", and "lognormal".

strata_main_effect_only

Whether to only include the strata main effects in the AFT model. Defaults to TRUE, otherwise all possible strata combinations will be considered in the AFT model.

recensor

Whether to apply recensoring to counterfactual survival times. Defaults to TRUE.

admin_recensor_only

Whether to apply recensoring to administrative censoring times only. Defaults to TRUE. If FALSE, recensoring will be applied to the actual censoring times for dropouts.

swtrt_control_only

Whether treatment switching occurred only in the control group. The default is TRUE.

alpha

The significance level to calculate confidence intervals.

ties

The method for handling ties in the Cox model, either "breslow" or "efron" (default).

offset

The offset to calculate the time disease progression to death or censoring. We can set offset equal to 0 (no offset), and 1 (default), 1/30.4375, or 1/365.25 if the time unit is day, month, or year, respectively.

boot

Whether to use bootstrap to obtain the confidence interval for hazard ratio. Defaults to TRUE.

n_boot

The number of bootstrap samples.

seed

The seed to reproduce the bootstrap results.

nthreads

The number of threads to use in bootstrapping (0 means the default RcppParallel behavior).

Details

Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:

  • Estimate the causal parameter ψ\psi by fitting an AFT model comparing post-progression survival between switchers and non-switchers in the control group who experienced disease progression.

  • Compute counterfactual survival times for control patients using the estimated ψ\psi.

  • Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.

  • When bootstrapping is used, derive the confidence interval and p-value for the hazard ratio from a t-distribution with n_boot - 1 degrees of freedom.

If treatment switching occurs before or in the absence of recorded disease progression, the patient is considered to have progressed at the time of treatment switching.

Value

A list with the following components:

  • psi: The estimated causal parameter for the control group.

  • psi_CI: The confidence interval for psi.

  • psi_CI_type: The type of confidence interval for psi, i.e., "AFT model" or "bootstrap".

  • pvalue: The two-sided p-value.

  • pvalue_type: The type of two-sided p-value for treatment effect, i.e., "Cox model" or "bootstrap".

  • hr: The estimated hazard ratio from the Cox model.

  • hr_CI: The confidence interval for hazard ratio.

  • hr_CI_type: The type of confidence interval for hazard ratio, either "Cox model" or "bootstrap".

  • event_summary: A data frame containing the count and percentage of deaths, disease progressions, and switches by treatment arm.

  • data_aft: A list of input data for the AFT model by treatment group. The variables include id, stratum, "pps", "event", "swtrt", base2_cov, pd_time, swtrt_time, and time.

  • fit_aft: A list of fitted AFT models by treatment group.

  • res_aft: A list of deviance residuals from the AFT models by treatment group.

  • data_outcome: The input data for the outcome Cox model of counterfactual unswitched survival times. The variables include id, stratum, "t_star", "d_star", "treated", base_cov, and treat.

  • km_outcome: The Kaplan-Meier estimates of the survival functions for the treatment and control groups based on the counterfactual unswitched survival times.

  • lr_outcome: The log-rank test results for the treatment effect based on the counterfactual unswitched survival times.

  • fit_outcome: The fitted outcome Cox model.

  • fail: Whether a model fails to converge.

  • psimissing: Whether the psi parameter cannot be estimated.

  • settings: A list containing the input parameter values.

  • psi_trt: The estimated causal parameter for the experimental group if swtrt_control_only is FALSE.

  • psi_trt_CI: The confidence interval for psi_trt if swtrt_control_only is FALSE.

  • fail_boots: The indicators for failed bootstrap samples if boot is TRUE.

  • fail_boots_data: The data for failed bootstrap samples if boot is TRUE.

  • hr_boots: The bootstrap hazard ratio estimates if boot is TRUE.

  • psi_boots: The bootstrap psi estimates if boot is TRUE.

  • psi_trt_boots: The bootstrap psi_trt estimates if boot is TRUE and swtrt_control_only is FALSE.

Author(s)

Kaifeng Lu, [email protected]

References

Nicholas R Latimer, KR Abrams, PC Lambert, MK Crowther, AJ Wailoo, JP Morden, RL Akehurst, and MJ Campbell. Adjusting for treatment switching in randomised controlled trials - A simulation study and a simplified two-stage method. Statistical Methods in Medical Research. 2017;26(2):724-751.

Examples

library(dplyr)

# modify pd and dpd based on co and dco
shilong <- shilong %>%
  mutate(dpd = ifelse(co & !pd, dco, dpd),
         pd = ifelse(co & !pd, 1, pd)) %>%
  mutate(dpd = ifelse(pd & co & dco < dpd, dco, dpd))
  
# the eventual survival time
shilong1 <- shilong %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(-c("ps", "ttc", "tran"))

# the last value of time-dependent covariates before pd
shilong2 <- shilong %>%
  filter(pd == 0 | tstart <= dpd) %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(bras.f, id, ps, ttc, tran)

# combine baseline and time-dependent covariates
shilong3 <- shilong1 %>%
  left_join(shilong2, by = c("bras.f", "id"))

# apply the two-stage method
fit1 <- tsesimp(
  data = shilong3, id = "id", time = "tstop", event = "event",
  treat = "bras.f", censor_time = "dcut", pd = "pd",
  pd_time = "dpd", swtrt = "co", swtrt_time = "dco",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f"),
  base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f", "ps", "ttc", "tran"),
  aft_dist = "weibull", alpha = 0.05,
  recensor = TRUE, swtrt_control_only = FALSE, offset = 1,
  boot = FALSE)

fit1

Simulate Data for Treatment Switching

Description

Simulates data for studies involving treatment switching, incorporating time-dependent confounding. The generated data can be used to evaluate methods for handling treatment switching in survival analysis.

Usage

tssim(
  tdxo = FALSE,
  coxo = TRUE,
  allocation1 = 1L,
  allocation2 = 1L,
  p_X_1 = 0.3,
  p_X_0 = 0.3,
  rate_T = 0.002,
  beta1 = -0.5,
  beta2 = 0.3,
  gamma0 = 0.3,
  gamma1 = -0.9,
  gamma2 = 0.7,
  gamma3 = 1.1,
  gamma4 = -0.8,
  zeta0 = -3.5,
  zeta1 = 0.5,
  zeta2 = 0.2,
  zeta3 = -0.4,
  alpha0 = 0.5,
  alpha1 = 0.5,
  alpha2 = 0.4,
  theta1_1 = -0.4,
  theta1_0 = -0.4,
  theta2 = 0.2,
  rate_C = 8.55e-05,
  accrualTime = 0L,
  accrualIntensity = NA_real_,
  followupTime = NA_real_,
  fixedFollowup = FALSE,
  plannedTime = 1350,
  days = 30,
  n = 500L,
  NSim = 1000L,
  seed = 0L
)

Arguments

tdxo

Logical indicator for timing of treatment switching:

  • 1: Treatment switching can occur at or after disease progression.

  • 0: Treatment switching is restricted to the time of disease progression.

coxo

Logical indicator for arm-specific treatment switching:

  • 1: Treatment switching occurs only in the control arm.

  • 0: Treatment switching is allowed in both arms.

allocation1

Number of subjects in the active treatment group in a randomization block. Defaults to 1 for equal randomization.

allocation2

Number of subjects in the control group in a randomization block. Defaults to 1 for equal randomization.

p_X_1

Probability of poor baseline prognosis in the experimental arm.

p_X_0

Probability of poor baseline prognosis in the control arm.

rate_T

Baseline hazard rate for time to death.

beta1

Log hazard ratio for randomized treatment (R).

beta2

Log hazard ratio for baseline covariate (X).

gamma0

Intercept for the time-dependent covariate model (L).

gamma1

Coefficient for lagged treatment switching (Alag) in the L model.

gamma2

Coefficient for lagged L (Llag) in the L model.

gamma3

Coefficient for baseline covariate (X) in the L model.

gamma4

Coefficient for randomized treatment (R) in the L model.

zeta0

Intercept for the disease progression model (Z).

zeta1

Coefficient for L in the Z model.

zeta2

Coefficient for baseline covariate (X) in the Z model.

zeta3

Coefficient for randomized treatment (R) in the Z model.

alpha0

Intercept for the treatment switching model (A).

alpha1

Coefficient for L in the A model.

alpha2

Coefficient for baseline covariate (X) in the A model.

theta1_1

Negative log time ratio for A for the experimental arm.

theta1_0

Negative log time ratio for A for the control arm.

theta2

Negative log time ratio for L.

rate_C

Hazard rate for random (dropout) censoring.

accrualTime

A vector that specifies the starting time of piecewise Poisson enrollment time intervals. Must start with 0, e.g., c(0, 3) breaks the time axis into 2 accrual intervals: [0, 3) and [3, Inf).

accrualIntensity

A vector of accrual intensities. One for each accrual time interval.

followupTime

Follow-up time for a fixed follow-up design.

fixedFollowup

Whether a fixed follow-up design is used. Defaults to 0 for variable follow-up.

plannedTime

The calendar time for the analysis.

days

Number of days in each treatment cycle.

n

Number of subjects per simulation.

NSim

Number of simulated datasets.

seed

Random seed for reproducibility.

Details

The simulation algorithm is adapted from Xu et al. (2022) and simulates disease progression status while incorporating the multiplicative effects of both baseline and time-dependent covariates on survival time. The design options tdxo and coxo specify the timing of treatment switching and the study arm eligibility for switching, respectively. Time is measured in days.

In a fixed follow-up design, all subjects share the same follow-up duration. In contrast, under a variable follow-up design, follow-up time also depends on each subject's enrollment date. The number of treatment cycles for a subject is determined by dividing the follow-up time by the number of days in each cycle.

  1. At randomization, subjects are assigned to treatment arms using block randomization, with allocation1 patients assigned to active treatment and allocation2 to control within each randomized block. A baseline covariate is also generated for each subject:

    XiBernoulli(p1Ri+p0(1Ri))X_i \sim \mbox{Bernoulli}(p_1 R_i + p_0 (1-R_i))

  2. The initial survival time is drawn from an exponential distribution with hazard:

    λTexp(β1Ri+β2Xi)\lambda_T \exp(\beta_1 R_i + \beta_2 X_i)

    We define the event indicator at cycle jj as

    Yi,j=I(Tij×days)Y_{i,j} = I(T_i \leq j\times days)

  3. The initial states are set to Li,0=0L_{i,0} = 0, Zi,0=0Z_{i,0} = 0, Ai,0=0A_{i,0} = 0, Yi,0=0Y_{i,0} = 0. For each treatment cycle j=1,,Jj=1,\ldots,J, we set tstart=(j1)×dayststart = (j-1) \times days.

  4. Generate time-dependent covariates:

    logitP(Li,j=1history)=γ0+γ1Ai,j1+γ2Li,j1+γ3Xi+γ4Ri\mbox{logit} P(L_{i,j}=1|\mbox{history}) = \gamma_0 + \gamma_1 A_{i,j-1} + \gamma_2 L_{i,j-1} + \gamma_3 X_i + \gamma_4 R_i

  5. If Tij×daysT_i \leq j \times days, set tstop=Titstop = T_i and Yi,j=1Y_{i,j} = 1, which completes data generation for subject ii.

  6. If Ti>j×daysT_i > j \times days, set tstop=j×dayststop = j\times days, Yi,j=0Y_{i,j} = 0, and perform the following before proceeding to the next cycle for the subject.

  7. Generate disease progression status: If Zi,j1=0Z_{i,j-1} = 0,

    logitP(Zi,j=1history)=ζ0+ζ1Li,j+ζ2Xi+ζ3Ri\mbox{logit} P(Z_{i,j}=1 | \mbox{history}) = \zeta_0 + \zeta_1 L_{i,j} + \zeta_2 X_i + \zeta_3 R_i

    Otherwise, set Zi,j=1Z_{i,j} = 1.

  8. Generate alternative therapy status: If Ai,j1=0A_{i,j-1} = 0, determine switching eligibility based on design options. If switching is allowed:

    logitP(Ai,j=1history)=α0+α1Li,j+α2Xi\mbox{logit} P(A_{i,j} = 1 | \mbox{history}) = \alpha_0 + \alpha_1 L_{i,j} + \alpha_2 X_i

    If switching is now allowed, set Ai,j=0A_{i,j} = 0. If Ai,j1=1A_{i,j-1} = 1, set Ai,j=1A_{i,j} = 1 (once switched to alternative therapy, remain on alternative therapy).

  9. Update survival time based on changes in alternative therapy status and time-dependent covariates:

    Ti=j×days+(Tij×days)exp{(θ1,1Ri+θ1,0(1Ri))(Ai,jAi,j1)θ2(Li,jLi,j1)}T_i = j\times days + (T_i - j\times days) \exp\{ -(\theta_{1,1}R_i + \theta_{1,0}(1-R_i))(A_{i,j} - A_{i,j-1}) -\theta_2 (L_{i,j} - L_{i,j-1})\}

Additional random censoring times are generated from an exponential distribution with hazard rate λC\lambda_C.

An extra record is generated when the minimum of the latent survival time, the random censoring time, and the administrative censoring time is greater than the number of regular treatment cycles times days per cycle.

Finally we apply the lag function so that Zi,jZ_{i,j} and Ai,jA_{i,j} represent the PD status and alternative therapy status at the start of cycle jj (and thus remain appplicable for the entire cycle jj) for subject ii.

To estimate the true treatment effect in a Cox marginal structural model, one can set α0=\alpha_0 = -\infty, which effectively forces Ai,j=0A_{i,j} = 0 (disabling treatment switching). The coefficient for the randomized treatment can then be estimated using a Cox proportional hazards model.

Value

A list of data frames, each containing simulated longitudinal covariate, pd status, alternative therapy status, and event history data with the following variables:

  • id: Subject identifier.

  • arrival_time: The enrollment time for the subject.

  • trtrand: Randomized treatment assignment (0 = control, 1 = experimental)

  • bprog: Baseline prognosis (0 = good, 1 = poor).

  • tpoint: Treatment cycle index.

  • tstart: Start day of the treatment cycle.

  • tstop: End day of the treatment cycle.

  • L: Time-dependent covariate at tstart predicting survival and switching; affected by treatment switching.

  • Llag: Lagged value of L.

  • Z: Disease progression status at tstart.

  • A: Treatment switching status at tstart.

  • Alag: Lagged value of A.

  • event: Death indicator at tstop.

  • timeOS: Observed time to death or censoring.

  • died: Indicator of death by end of follow-up.

  • progressed: Indicator of disease progression by end of follow-up.

  • timePD: Observed time to progression or censoring.

  • xo: Indicator for whether treatment switching occurred.

  • xotime: Time of treatment switching (if applicable).

  • censor_time: Administrative censoring time.

Author(s)

Kaifeng Lu, [email protected]

References

Jessica G. Young, and Eric J. Tchetgen Tchetgen. Simulation from a known Cox MSM using standard parametric models for the g-formula. Statistics in Medicine. 2014;33(6):1001-1014.

NR Latimer, IR White, K Tilling, and U Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.

Jing Xu, Guohui Liu, and Bingxia Wang. Bias and type I error control in correcting treatment effect for treatment switching using marginal structural models in Phse III oncology trials. Journal of Biopharmaceutical Statistics. 2022;32(6):897-914.

Examples

library(dplyr)

simulated.data <- tssim(
  tdxo = 1, coxo = 1, allocation1 = 1, allocation2 = 1,
  p_X_1 = 0.3, p_X_0 = 0.3, 
  rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, 
  gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8,
  zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, 
  alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, 
  theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2,
  rate_C = 0.0000855, accrualIntensity = 20/30, 
  fixedFollowup = FALSE, plannedTime = 1350, days = 30,
  n = 500, NSim = 100, seed = 314159)
  
simulated.data[[1]] %>% filter(id == 1)

Assess Proportional Hazards Assumption Based on Scaled Schoenfeld Residuals

Description

Obtains the scaled Schoenfeld residuals and tests the proportional hazards assumption using a score test for the interaction between each covariate and a transformed time variable.

Usage

zph_phregr(object, transform = "km")

Arguments

object

The output from the phregr call.

transform

A character string indicating how survival times should be transformed before the test is performed. Supported values include "identity", "log", "rank", and "km" (default).

Details

This corresponds to the cox.zph function from the survival package with terms = FALSE and global = TRUE.

Value

A list with the following components:

  • table A matrix with one row for each parameter and a final row for the global test. The columns contain the score test for adding the time-dependent term, the degrees of freedom, and the two-sided p-value.

  • x The transformed time values.

  • time The original (untransformed) event times, with tied event times repeated.

  • strata The stratum index for each event.

  • y The matrix of scaled Schoenfeld residuals, with one column for each parameter and one row for each event. Column names correspond to the parameter names.

  • var An approximate covariance matrix of the scaled Schoenfeld residuals, used to construct an approximate standard error band for plots.

  • transform the transformation applied to the time values.

Author(s)

Kaifeng Lu, [email protected]

References

Patricia M. Grambsch and Terry M. Therneau. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994; 81:515-26.

Examples

fit <- phregr(data = liver, time = "Time", event = "Status", 
              covariates = c("log(Bilirubin)", "log(Protime)", 
                             "log(Albumin)", "Age", "Edema"),
              ties = "breslow")
              
zph <- zph_phregr(fit, transform = "log")
  
zph$table