| Title: | Treatment Switching |
|---|---|
| Description: | Implements rank preserving structural failure time model (RPSFTM), iterative parameter estimation (IPE), inverse probability of censoring weights (IPCW), marginal structural model (MSM), simple two-stage estimation (TSEsimp), and improved two-stage estimation with g-estimation (TSEgest) methods for treatment switching in randomized clinical trials. |
| Authors: | Kaifeng Lu [aut, cre] (ORCID: <https://orcid.org/0000-0002-6160-7119>) |
| Maintainer: | Kaifeng Lu <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.2.7 |
| Built: | 2026-06-04 17:00:00 UTC |
| Source: | https://github.com/kaifenglu/trtswitch |
Implements rank preserving structural failure time model (RPSFTM), iterative parameter estimation (IPE), inverse probability of censoring weights (IPCW), marginal structural model (MSM), simple two-stage estimation (TSEsimp), and improved two-stage estimation with g-estimation (TSEgest) methods for treatment switching in randomized clinical trials.
To enable bootstrapping of the parameter estimates, we implements many standard survival analysis methods in C++. These include but are not limited to Kaplan-Meier estimates of the survival curves, log-rank tests, accelerated failure time (AFT) models, and Cox proportional hazards models.
All treatment switching adjustment methods allow treatment switching in both treatment arms, stratification and covariates adjustment. For the AFT models, stratification factors are included as covariates (main effects only or all-way interactions) because SAS PROC LIFEREG does not have the strata statement. The RPSFTM, IPE and TSE methods can be used with or without recensoring. The IPCW and MSM methods can produce stabilized and truncated weights.
The treat variable adopts a treatment-before-control order,
except with 1/0 or TRUE/FALSE coding.
Kaifeng Lu, [email protected]
James M. Robins and Anastasios A. Tsiatis. Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics. 1991;20(8):2609-2631.
Ian R. White, Adbel G. Babiker, Sarah Walker, and Janet H. Darbyshire. Randomization-based methods for correcting for treatment changes: Examples from the CONCORDE trial. Statistics in Medicine. 1999;18:2617-2634.
Michael Branson and John Whitehead. Estimating a treatment effect in survival studies in which patients switch treatment. Statistics in Medicine. 2002;21:2449-2463.
Ian R White. Letter to the Editor: Estimating treatment effects in randomized trials with treatment switching. Statistics in Medicine. 2006;25:1619-1622.
James M. Robins and Dianne M. Finkelstein. Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56:779-788.
Nicholas R. Latimer, Keith R. Abrams, Paul C. Lambert, Michael K. Crowther, Allan J. Wailoo, Jonathan P. Morden, Ron L. Akehurst, and Michael J. Campbell. Adjusting for treatment switching in randomised controlled trials - A simulation study and a simplified two-stage method. Statistical Methods in Medical Research. 2017;26(2):724-751.
Nicholas R. Latimer, Ian R. White, Kate Tilling, and Ulrike Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.
James M. Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550-560.
Miguel Angel Hernan, Babette Brumback, and James M. Robins. Marginal structural modesl to estimate the causual effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561-570.
This data set contains baseline subject-level data. Of note,
PDDT can be derived from the ADT variable of the ADTTE data set
by selecting PARAMCD == "INPFS" & CNSR == 0 & EVNTDESC == "PROGRESSIVE DISEASE". Additionally,
OSDT and DIED can be derived from the ADT and CNSR variables of
the ADTTE data set by selecting PARAMCD == "OS".
adsladsl
An object of class tbl_df (inherits from tbl, data.frame) with 412 rows and 12 columns.
SUBJIDsubject ID
SEXsex: "M" or "F"
STRAT1Vstratification factor 1: ECOG PS
STRAT2Vstratification factor 2: inv. chosen chemotherapy
RANDDTrandomization date
TRT01Pplanned treatment: Active or Placebo
TRTSDTtreatment start date
PDDTdate of disease progression
XODTdate of treatment crossover
OSDTdate of death or censoring
DIEDwhether the patient died
DCUTDTdate of data cut
This data set contains longitudinal time-dependent covariate data on ECOG101 and LDH.
adtdcadtdc
An object of class tbl_df (inherits from tbl, data.frame) with 9813 rows and 4 columns.
SUBJIDsubject ID
PARAMCDparameter code
ADTanalysis date
AVALcovariate value
Survival in patients with acute myelogenous leukemia.
timeSurvival or censoring time
statuscensoring status
xmaintenance chemotherapy given or not
amlaml
An object of class data.frame with 23 rows and 3 columns.
Obtains the standardized score processes and the simulated distribution under the null hypothesis as well as the p-values for the supremum tests.
assess_phregr(object, resample = 1000, seed = 12345)assess_phregr(object, resample = 1000, seed = 12345)
object |
The output from the |
resample |
The number of simulation samples for the supremem test. |
seed |
The random seed for the simulations. |
The supremum test corresponds to the ASSESS statement with ph
option of SAS PROC PHREG.
A list with the following components:
time the unique event times.
score_t the observed standardized score process.
score_t_list a list of simulated standardized score processes
under the null hypothesis.
max_abs_value the supremum of the absolute value of the observed
standardized score process for each covariate and the supremum of
the sum of absolute values of the observed standardized score processes
across all covariates.
p_value the p-values for the supremum tests for each covariate
and the global test.
Kaifeng Lu, [email protected]
D. Y. Lin, L. J. Wei, and Z. Ying. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 1993; 80:557-572.
fit <- phregr(data = liver, time = "Time", event = "Status", covariates = c("log(Bilirubin)", "log(Protime)", "log(Albumin)", "Age", "Edema"), ties = "breslow") aph <- assess_phregr(fit, resample = 1000, seed = 314159) aph plot(aph, nsim = 20)fit <- phregr(data = liver, time = "Time", event = "Status", covariates = c("log(Bilirubin)", "log(Protime)", "log(Albumin)", "Age", "Edema"), ties = "breslow") aph <- assess_phregr(fit, resample = 1000, seed = 314159) aph plot(aph, nsim = 20)
Survival of patients on the waiting list for the Stanford heart transplant program.
start, stop, evententry and exit time and status for the time interval
ageage-48 years
yearyear of acceptance (in years after Nov 1, 1967)
surgeryprior bypass surgery 1=yes, 0=no
transplantreceived transplant 1=yes, 0=no
idpatient id
heartheart
An object of class data.frame with 172 rows and 8 columns.
Patients were randomly assigned to receive treatment immediately or deferred, and those in the deferred arm could cross over and receive treatment. The primary endpoint was time to disease progression.
idPatient identification number
defIndicator that the participant was assigned to the deferred treatment arm
immIndicator that the participant was assigned to the immediate treatment arm
censyrsThe censoring time, in years, corresponding to the close of study minus the time of entry for each patient
xoIndicator that crossover occurred
xoyrsThe time, in years, from entry to switching, or 0 for patients in the immediate arm
progIndicator of disease progression (1), or censoring (0)
progyrsTime, in years, from entry to disease progression or censoring
entryThe time of entry into the study, measured in years from the date of randomisation
immdefimmdef
An object of class data.frame with 1000 rows and 9 columns.
The dataset consits of the number of ingots not ready for rolling and the number of ingots ready for rolling for a number of combinations of heating time and soaking time.
ingotsingots
An object of class tbl_df (inherits from tbl, data.frame) with 25 rows and 4 columns.
HeatThe heating time
SoakThe soaking time
NotReadyResponse indicator, with a value 1 for units not ready for rolling (event) and a value of 0 for units ready for rolling (nonevent)
FreqThe frequency of occurrence of each combination of
Heat, Soak, and NotReady
Excludes data after treatment switching and fits a switching model to estimate the probability of not switching. The inverse of these probabilities (inverse probability of censoring weights) are then used as weights in a weighted Cox model to obtain the adjusted hazard ratio.
ipcw( data, id = "id", stratum = "", tstart = "tstart", tstop = "tstop", event = "event", treat = "treat", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", numerator = "", denominator = "", logistic_switching_model = FALSE, strata_main_effect_only = TRUE, ns_df = 3, firth = FALSE, flic = FALSE, stabilized_weights = TRUE, trunc = 0, trunc_upper_only = TRUE, swtrt_control_only = TRUE, alpha = 0.05, ties = "efron", boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )ipcw( data, id = "id", stratum = "", tstart = "tstart", tstop = "tstop", event = "event", treat = "treat", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", numerator = "", denominator = "", logistic_switching_model = FALSE, strata_main_effect_only = TRUE, ns_df = 3, firth = FALSE, flic = FALSE, stabilized_weights = TRUE, trunc = 0, trunc_upper_only = TRUE, swtrt_control_only = TRUE, alpha = 0.05, ties = "efron", boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )
data |
The input data frame that contains the following variables:
|
id |
The name of the id variable in the input data. |
stratum |
The name(s) of the stratum variable(s) in the input data. |
tstart |
The name of the tstart variable in the input data. |
tstop |
The name of the tstop variable in the input data. |
event |
The name of the event variable in the input data. |
treat |
The name of the treatment variable in the input data. |
swtrt |
The name of the swtrt variable in the input data. |
swtrt_time |
The name of the swtrt_time variable in the input data. |
base_cov |
The names of baseline covariates (excluding treat) in the input data for the Cox model. |
numerator |
The names of baseline covariates (excluding treat) in the input data for the numerator switching model for stabilized weights. |
denominator |
The names of baseline (excluding treat) and time-dependent covariates in the input data for the denominator switching model. |
logistic_switching_model |
Whether a pooled logistic regression switching model is used. |
strata_main_effect_only |
Whether to only include the strata main
effects in the logistic regression switching model. Defaults to
|
ns_df |
Degrees of freedom for the natural cubic spline for visit-specific intercepts of the pooled logistic regression model. Defaults to 3 for two internal knots at the 33 and 67 percentiles of the treatment switching times. |
firth |
Whether the Firth's bias reducing penalized likelihood should be used. |
flic |
Whether to apply intercept correction to obtain more accurate predicted probabilities. |
stabilized_weights |
Whether to use the stabilized weights.
The default is |
trunc |
The truncation fraction of the weight distribution. Defaults to 0 for no truncation in weights. |
trunc_upper_only |
Whether to truncate the weights from the upper
end of the weight distribution only. Defaults to |
swtrt_control_only |
Whether treatment switching occurred only in
the control group. The default is |
alpha |
The significance level to calculate confidence intervals. |
ties |
The method for handling ties in the Cox model, either "breslow" or "efron" (default). |
boot |
Whether to use bootstrap to obtain the confidence
interval for hazard ratio. Defaults to |
n_boot |
The number of bootstrap samples. |
seed |
The seed to reproduce the bootstrap results. |
nthreads |
The number of threads to use in bootstrapping (0 means the default RcppParallel behavior). |
The hazard ratio and confidence interval under a no-switching scenario are obtained as follows:
Exclude all observations after treatment switch.
Define the crossover and event indicators for the last time interval of each subject.
For time-dependent Cox switching models, replicate unique event times across treatment arms within each subject.
Fit the denominator switching model (and numerator model for stabilized weights) to estimate inverse probability of censoring weights. Either a Cox model with time-dependent covariates or a pooled logistic regression model can be used.
For the pooled logistic regression model, the probability of
remaining uncensored (i.e., not switching) is calculated as
and accumulated over time up to the start of each interval.
For the time-dependent Cox model, the probability of remaining unswitched is derived from the estimated baseline hazard and predicted risk score up to the end of each interval.
Fit a weighted Cox model to the outcome survival times (excluding data after switching) to estimate the hazard ratio.
Construct the p-value and confidence interval for the hazard ratio
using either robust sandwich variance or bootstrapping. When
bootstrapping is used, the confidence interval and p-value are
based on a t-distribution with n_boot - 1 degrees of freedom.
A list with the following components:
pvalue: The two-sided p-value.
pvalue_type: The type of two-sided p-value for treatment effect,
i.e., "Cox model" or "bootstrap".
hr: The estimated hazard ratio from the Cox model.
hr_CI: The confidence interval for hazard ratio.
hr_CI_type: The type of confidence interval for hazard ratio,
either "Cox model" or "bootstrap".
event_summary: A data frame containing the count and percentage
of deaths and switches by treatment arm.
data_switch: A list of input data for the switching models by
treatment group. The variables include id, stratum,
"tstart", "tstop", "cross", denominator,
swtrt, and swtrt_time. For logistic switching models,
stratum variables are converted to dummy variables, and
natural cubic spline basis variables are created for the visit-specific
intercepts.
fit_switch: A list of fitted switching models for the
denominator and numerator by treatment group.
data_outcome: The input data for the outcome Cox model
including the inverse probability of censoring weights.
The variables include id, stratum, "tstart",
"tstop", "event", "treated",
"unstablized_weight", "stabilized_weight",
base_cov, and treat.
weight_summary: A data frame summarizing the weights by
treatment arm.
km_outcome: The Kaplan-Meier estimates of the survival
functions for the treatment and control groups based on the
weighted outcome data.
lr_outcome: The log-rank test results for the treatment
effect based on the weighted outcome data.
fit_outcome: The fitted outcome Cox model.
fail: Whether a model fails to converge.
settings: A list containing the input parameter values.
fail_boots: The indicators for failed bootstrap samples
if boot is TRUE.
fail_boots_data: The data for failed bootstrap samples
if boot is TRUE.
hr_boots: The bootstrap hazard ratio estimates
if boot is TRUE.
Kaifeng Lu, [email protected]
James M. Robins and Dianne M. Finkelstein. Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56(3):779-788.
# Example 1: pooled logistic regression switching model sim1 <- tssim( tdxo = TRUE, coxo = TRUE, allocation1 = 1, allocation2 = 1, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 0.0000855, accrualIntensity = 20/30, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500, NSim = 100, seed = 314159) fit1 <- ipcw( sim1[[1]], id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "trtrand", swtrt = "xo", swtrt_time = "xotime", base_cov = "bprog", numerator = "bprog", denominator = c("bprog", "L"), logistic_switching_model = TRUE, ns_df = 3, swtrt_control_only = TRUE, boot = FALSE) fit1 # Example 2: time-dependent covariates Cox switching model fit2 <- ipcw( shilong, id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "bras.f", swtrt = "co", swtrt_time = "dco", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), numerator = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), denominator = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f", "ps", "ttc", "tran"), swtrt_control_only = FALSE, boot = FALSE) fit2# Example 1: pooled logistic regression switching model sim1 <- tssim( tdxo = TRUE, coxo = TRUE, allocation1 = 1, allocation2 = 1, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 0.0000855, accrualIntensity = 20/30, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500, NSim = 100, seed = 314159) fit1 <- ipcw( sim1[[1]], id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "trtrand", swtrt = "xo", swtrt_time = "xotime", base_cov = "bprog", numerator = "bprog", denominator = c("bprog", "L"), logistic_switching_model = TRUE, ns_df = 3, swtrt_control_only = TRUE, boot = FALSE) fit1 # Example 2: time-dependent covariates Cox switching model fit2 <- ipcw( shilong, id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "bras.f", swtrt = "co", swtrt_time = "dco", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), numerator = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), denominator = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f", "ps", "ttc", "tran"), swtrt_control_only = FALSE, boot = FALSE) fit2
Estimates the causal parameter by iteratively fitting an accelerated failure time (AFT) model to counterfactual unswitched survival times, and derives the adjusted hazard ratio from the Cox model using counterfactual unswitched survival times based on the estimated causal parameter.
ipe( data, id = "id", stratum = "", time = "time", event = "event", treat = "treat", rx = "rx", censor_time = "censor_time", base_cov = "", aft_dist = "weibull", strata_main_effect_only = TRUE, low_psi = -2, hi_psi = 2, treat_modifier = 1, recensor = TRUE, admin_recensor_only = TRUE, autoswitch = TRUE, root_finding = "brent", alpha = 0.05, ties = "efron", tol = 1e-06, boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )ipe( data, id = "id", stratum = "", time = "time", event = "event", treat = "treat", rx = "rx", censor_time = "censor_time", base_cov = "", aft_dist = "weibull", strata_main_effect_only = TRUE, low_psi = -2, hi_psi = 2, treat_modifier = 1, recensor = TRUE, admin_recensor_only = TRUE, autoswitch = TRUE, root_finding = "brent", alpha = 0.05, ties = "efron", tol = 1e-06, boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )
data |
The input data frame that contains the following variables:
|
id |
The name of the id variable in the input data. |
stratum |
The name(s) of the stratum variable(s) in the input data. |
time |
The name of the time variable in the input data. |
event |
The name of the event variable in the input data. |
treat |
The name of the treatment variable in the input data. |
rx |
The name of the rx variable in the input data. |
censor_time |
The name of the censor_time variable in the input data. |
base_cov |
The names of baseline covariates (excluding treat) in the input data for the causal AFT model and the outcome Cox model. |
aft_dist |
The assumed distribution for time to event for the AFT model. Options include "exponential", "weibull" (default), "loglogistic", and "lognormal". |
strata_main_effect_only |
Whether to only include the strata main
effects in the AFT model. Defaults to |
low_psi |
The lower limit of the causal parameter. |
hi_psi |
The upper limit of the causal parameter. |
treat_modifier |
The optional sensitivity parameter for the constant treatment effect assumption. |
recensor |
Whether to apply recensoring to counterfactual
survival times. Defaults to |
admin_recensor_only |
Whether to apply recensoring to administrative
censoring times only. Defaults to |
autoswitch |
Whether to exclude recensoring for treatment arms
with no switching. Defaults to |
root_finding |
Character string specifying the univariate
root-finding algorithm to use. Options are |
alpha |
The significance level to calculate confidence intervals. |
ties |
The method for handling ties in the Cox model, either "breslow" or "efron" (default). |
tol |
The desired accuracy (convergence tolerance) for |
boot |
Whether to use bootstrap to obtain the confidence
interval for hazard ratio. Defaults to |
n_boot |
The number of bootstrap samples. |
seed |
The seed to reproduce the bootstrap results. |
nthreads |
The number of threads to use in bootstrapping (0 means the default RcppParallel behavior). |
Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:
Estimate the causal parameter by iteratively fitting an
AFT model to the observed survival times for the treatment arm and
the counterfactual survival times for the control arm:
Compute counterfactual survival times for control patients using
the estimated .
Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.
Obtain the confidence interval for the hazard ratio using either
the ITT log-rank test p-value or bootstrap. When bootstrapping,
the interval and p-value are derived from a t-distribution
with n_boot - 1 degrees of freedom.
A list with the following components:
psi: The estimated causal parameter.
psi_CI: The confidence interval for psi.
psi_CI_type: The type of confidence interval for psi,
i.e., "log-rank p-value" or "bootstrap".
pvalue: The two-sided p-value.
pvalue_type: The type of two-sided p-value for treatment effect,
i.e., "log-rank" or "bootstrap".
hr: The estimated hazard ratio from the Cox model.
hr_CI: The confidence interval for hazard ratio.
hr_CI_type: The type of confidence interval for hazard ratio,
either "log-rank p-value" or "bootstrap".
event_summary: A data frame containing the count and percentage
of deaths and switches by treatment arm.
Sstar: A data frame containing the counterfactual untreated
survival times and event indicators for each treatment group.
The variables include id, stratum,
"t_star", "d_star", "treated", base_cov,
and treat.
kmstar: A data frame containing the Kaplan-Meier estimates
based on the counterfactual untreated survival times by treatment arm.
data_aft: The input data for the AFT model for
estimating psi. The variables include id, stratum,
"t_star", "d_star", "treated", base_cov,
and treat.
fit_aft: The fitted AFT model for estimating psi.
res_aft: The deviance residuals from the fitted AFT model.
data_outcome: The input data for the outcome Cox model of
counterfactual unswitched survival times.
The variables include id, stratum, "t_star",
"d_star", "treated", base_cov, and treat.
km_outcome: The Kaplan-Meier estimates of the survival
functions for the treatment and control groups based on the
counterfactual unswitched survival times.
lr_outcome: The log-rank test results for the treatment
effect based on the counterfactual unswitched survival times.
fit_outcome: The fitted outcome Cox model.
fail: Whether a model fails to converge.
psimissing: Whether the psi parameter cannot be estimated.
settings: A list containing the input parameter values.
fail_boots: The indicators for failed bootstrap samples
if boot is TRUE.
fail_boots_data: The data for failed bootstrap samples
if boot is TRUE.
hr_boots: The bootstrap hazard ratio estimates
if boot is TRUE.
psi_boots: The bootstrap psi estimates
if boot is TRUE.
Kaifeng Lu, [email protected]
Michael Branson and John Whitehead. Estimating a treatment effect in survival studies in which patients switch treatment. Statistics in Medicine. 2002;21(17):2449-2463.
Ian R White. Letter to the Editor: Estimating treatment effects in randomized trials with treatment switching. Statistics in Medicine. 2006;25(9):1619-1622.
library(dplyr) # Example 1: one-way treatment switching (control to active) data <- immdef %>% mutate(rx = 1-xoyrs/progyrs) fit1 <- ipe( data, id = "id", time = "progyrs", event = "prog", treat = "imm", rx = "rx", censor_time = "censyrs", aft_dist = "weibull", boot = FALSE) fit1 # Example 2: two-way treatment switching (illustration only) # the eventual survival time shilong1 <- shilong %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(-c("ps", "ttc", "tran")) shilong2 <- shilong1 %>% mutate(rx = ifelse(co, ifelse(bras.f == "MTA", dco/ady, 1 - dco/ady), ifelse(bras.f == "MTA", 1, 0))) fit2 <- ipe( shilong2, id = "id", time = "tstop", event = "event", treat = "bras.f", rx = "rx", censor_time = "dcut", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), aft_dist = "weibull", boot = FALSE) fit2library(dplyr) # Example 1: one-way treatment switching (control to active) data <- immdef %>% mutate(rx = 1-xoyrs/progyrs) fit1 <- ipe( data, id = "id", time = "progyrs", event = "prog", treat = "imm", rx = "rx", censor_time = "censyrs", aft_dist = "weibull", boot = FALSE) fit1 # Example 2: two-way treatment switching (illustration only) # the eventual survival time shilong1 <- shilong %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(-c("ps", "ttc", "tran")) shilong2 <- shilong1 %>% mutate(rx = ifelse(co, ifelse(bras.f == "MTA", dco/ady, 1 - dco/ady), ifelse(bras.f == "MTA", 1, 0))) fit2 <- ipe( shilong2, id = "id", time = "tstop", event = "event", treat = "bras.f", rx = "rx", censor_time = "dcut", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), aft_dist = "weibull", boot = FALSE) fit2
Obtains the estimate of milestone survival difference between two treatment groups.
kmdiff( data, stratum = "", treat = "treat", time = "time", time2 = "", event = "event", weight = "", milestone = 0, survDiffH0 = 0, conflev = 0.95 )kmdiff( data, stratum = "", treat = "treat", time = "time", time2 = "", event = "event", weight = "", milestone = 0, survDiffH0 = 0, conflev = 0.95 )
data |
The input data frame that contains the following variables:
|
stratum |
The name of the stratum variable in the input data. |
treat |
The name of the treatment variable in the input data. |
time |
The name of the time variable or the left end of each interval for counting process data in the input data. |
time2 |
The name of the right end of each interval for counting process data in the input data. |
event |
The name of the event variable in the input data. |
weight |
The name of the weight variable in the input data. |
milestone |
The milestone time at which to calculate the survival probability. |
survDiffH0 |
The difference in milestone survival probabilities under the null hypothesis. Defaults to 0 for superiority test. |
conflev |
The level of the two-sided confidence interval for the difference in milestone survival probabilities. Defaults to 0.95. |
A data frame with the following variables:
milestone: The milestone time relative to randomization.
survDiffH0: The difference in milestone survival probabilities
under the null hypothesis.
surv1: The estimated milestone survival probability for
the treatment group.
surv2: The estimated milestone survival probability for
the control group.
survDiff: The estimated difference in milestone survival
probabilities.
vsurv1: The variance for surv1.
vsurv2: The variance for surv2.
sesurvDiff: The standard error for survDiff.
survDiffZ: The Z-statistic value.
survDiffPValue: The two-sided p-value.
lower: The lower bound of confidence interval.
upper: The upper bound of confidence interval.
conflev: The level of confidence interval.
Kaifeng Lu, [email protected]
kmdiff(data = rawdata[rawdata$iterationNumber == 1, ], stratum = "stratum", treat = "treatmentGroup", time = "timeUnderObservation", event = "event", milestone = 12)kmdiff(data = rawdata[rawdata$iterationNumber == 1, ], stratum = "stratum", treat = "treatmentGroup", time = "timeUnderObservation", event = "event", milestone = 12)
Obtains the Kaplan-Meier estimates of the survival curve.
kmest( data, stratum = "", time = "time", time2 = "", event = "event", weight = "", conftype = "log-log", conflev = 0.95, keep_censor = FALSE )kmest( data, stratum = "", time = "time", time2 = "", event = "event", weight = "", conftype = "log-log", conflev = 0.95, keep_censor = FALSE )
data |
The input data frame that contains the following variables:
|
stratum |
The name(s) of the stratum variable(s) in the input data. |
time |
The name of the time variable or the left end of each interval for counting process data in the input data. |
time2 |
The name of the right end of each interval for counting process data in the input data. |
event |
The name of the event variable in the input data. |
weight |
The name of the weight variable in the input data. |
conftype |
The type of the confidence interval. One of "none", "plain", "log", "log-log" (the default), or "arcsin". The arcsin option bases the intervals on asin(sqrt(survival)). |
conflev |
The level of the two-sided confidence interval for the survival probabilities. Defaults to 0.95. |
keep_censor |
Whether to retain the censoring time in the output data frame. |
A data frame with the following variables:
size: The number of subjects in the stratum.
time: The event time.
nrisk: The number of subjects at risk.
nevent: The number of subjects having the event.
ncensor: The number of censored subjects.
surv: The Kaplan-Meier estimate of the survival probability.
sesurv: The standard error of the estimated survival
probability based on the Greendwood formula.
lower: The lower bound of confidence interval if requested.
upper: The upper bound of confidence interval if requested.
conflev: The level of confidence interval if requested.
conftype: The type of confidence interval if requested.
stratum: The stratum.
Kaifeng Lu, [email protected]
kmest(data = aml, stratum = "x", time = "time", event = "status")kmest(data = aml, stratum = "x", time = "time", event = "status")
Obtains the parameter estimates from parametric regression models with uncensored, right censored, left censored, or interval censored data.
liferegr( data, stratum = "", time = "time", time2 = "", event = "event", covariates = "", weight = "", offset = "", id = "", dist = "weibull", init = NA_real_, robust = FALSE, plci = FALSE, alpha = 0.05, maxiter = 50, eps = 1e-09 )liferegr( data, stratum = "", time = "time", time2 = "", event = "event", covariates = "", weight = "", offset = "", id = "", dist = "weibull", init = NA_real_, robust = FALSE, plci = FALSE, alpha = 0.05, maxiter = 50, eps = 1e-09 )
data |
The input data frame that contains the following variables:
|
stratum |
The name(s) of the stratum variable(s) in the input data. |
time |
The name of the time variable or the left end of each interval for interval censored data in the input data. |
time2 |
The name of the right end of each interval for interval censored data in the input data. |
event |
The name of the event variable in the input data for right censored data. |
covariates |
The vector of names of baseline covariates in the input data. |
weight |
The name of the weight variable in the input data. |
offset |
The name of the offset variable in the input data. |
id |
The name of the id variable in the input data. |
dist |
The assumed distribution for time to event. Options include "exponential", "weibull", "lognormal", and "loglogistic" to be modeled on the log-scale, and "normal" and "logistic" to be modeled on the original scale. |
init |
A vector of initial values for the model parameters, including regression coefficients and the log scale parameter. By default, initial values are derived from an intercept-only model. If this approach fails, ordinary least squares (OLS) estimates, ignoring censoring, are used instead. |
robust |
Whether a robust sandwich variance estimate should be computed. In the presence of the id variable, the score residuals will be aggregated for each id when computing the robust sandwich variance estimate. |
plci |
Whether to obtain profile likelihood confidence interval. |
alpha |
The two-sided significance level. |
maxiter |
The maximum number of iterations. |
eps |
The tolerance to declare convergence. |
There are two ways to specify the model, one for right censored data through the time and event variables, and the other for interval censored data through the time (lower) and time2 (upper) variables. For the second form, we follow the convention used in SAS PROC LIFEREG:
If lower is not missing, upper is not missing, and lower is equal to upper, then there is no censoring and the event occurred at time lower.
If lower is not missing, upper is not missing, and lower < upper, then the event time is censored within the interval (lower, upper).
If lower is missing, but upper is not missing, then upper will be used as the left censoring value.
If lower is not missing, but upper is missing, then lower will be used as the right censoring value.
If lower is not missing, upper is not missing, but lower > upper, or if both lower and upper are missing, then the observation will not be used.
A list with the following components:
sumstat: The data frame of summary statistics of model fit
with the following variables:
n: The number of observations.
nevents: The number of events.
loglik0: The log-likelihood under null.
loglik1: The maximum log-likelihood.
niter: The number of Newton-Raphson iterations.
dist: The assumed distribution.
p: The number of parameters, including the intercept,
regression coefficients associated with the covariates, and
the log scale parameters for the strata.
nvar: The number of regression coefficients associated
with the covariates (excluding the intercept).
robust: Whether the robust sandwich variance estimate
is requested.
fail: Whether the model fails to converge.
parest: The data frame of parameter estimates with the
following variables:
param: The name of the covariate for the parameter estimate.
beta: The parameter estimate.
sebeta: The standard error of parameter estimate.
z: The Wald test statistic for the parameter.
expbeta: The exponentiated parameter estimate.
lower: The lower limit of confidence interval.
upper: The upper limit of confidence interval.
p: The p-value from the chi-square test.
method: The method to compute the confidence interval and
p-value.
sebeta_naive: The naive standard error of parameter estimate
if robust variance is requested.
linear_predictors: The vector of linear predictors.
p: The number of parameters.
nvar: The number of columns of the design matrix excluding
the intercept.
param: The parameter names.
beta: The parameter estimate.
vbeta: The covariance matrix for parameter estimates.
vbeta_naive: The naive covariance matrix for parameter estimates.
terms: The terms object.
xlevels: A record of the levels of the factors used in fitting.
settings: A list containing the input parameter values.
Kaifeng Lu, [email protected]
John D. Kalbfleisch and Ross L. Prentice. The Statistical Analysis of Failure Time Data. Wiley: New York, 1980.
library(dplyr) # right censored data (fit1 <- liferegr( data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = (treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat", dist = "weibull")) # tobit regression for left censored data (fit2 <- liferegr( data = tobin %>% mutate(time = ifelse(durable>0, durable, NA)), time = "time", time2 = "durable", covariates = c("age", "quant"), dist = "normal"))library(dplyr) # right censored data (fit1 <- liferegr( data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = (treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat", dist = "weibull")) # tobit regression for left censored data (fit2 <- liferegr( data = tobin %>% mutate(time = ifelse(durable>0, durable, NA)), time = "time", time2 = "durable", covariates = c("age", "quant"), dist = "normal"))
This data set contains information on 418 patients with primary biliary cirrhosis.
liverliver
An object of class tbl_df (inherits from tbl, data.frame) with 418 rows and 7 columns.
TimeThe follow-up time in years from the time of registration. The event could be liver transplantation, death, or the end of the study, whichever came first
StatusA censoring indicator, where a value of 1 indicating a death event, and 0 indicating a censored observation (the patient survived past the observation time)
AgeThe patient's age in years
AlbuminSerum albumin level in g/dl
BilirubinSerum bilirubin level in mg/dl
EdemaEdema status, where a value of 0 indicates no edema, 0.5 indicates edema successfully treated with diuretics, and 1 indicates edema despite diuretic therapy
ProtimeProthrombin time in seconds
Obtains the parameter estimates from logistic regression models with binary data.
logisregr( data, event = "event", covariates = "", freq = "", weight = "", offset = "", id = "", link = "logit", init = NA_real_, robust = FALSE, firth = FALSE, flic = FALSE, plci = FALSE, alpha = 0.05, maxiter = 50, eps = 1e-09 )logisregr( data, event = "event", covariates = "", freq = "", weight = "", offset = "", id = "", link = "logit", init = NA_real_, robust = FALSE, firth = FALSE, flic = FALSE, plci = FALSE, alpha = 0.05, maxiter = 50, eps = 1e-09 )
data |
The input data frame that contains the following variables:
|
event |
The name of the event variable in the input data. |
covariates |
The vector of names of baseline covariates in the input data. |
freq |
The name of the frequency variable in the input data. The frequencies must be the same for all observations within each cluster as indicated by the id. Thus freq is the cluster frequency. |
weight |
The name of the weight variable in the input data. |
offset |
The name of the offset variable in the input data. |
id |
The name of the id variable in the input data. |
link |
The link function linking the response probabilities to the linear predictors. Options include "logit" (default), "probit", and "cloglog" (complementary log-log). |
init |
A vector of initial values for the model parameters. By default, initial values are derived from an intercept-only model. |
robust |
Whether a robust sandwich variance estimate should be computed. In the presence of the id variable, the score residuals will be aggregated for each id when computing the robust sandwich variance estimate. |
firth |
Whether the firth's bias reducing penalized likelihood
should be used. The default is |
flic |
Whether to apply intercept correction to obtain more
accurate predicted probabilities. The default is |
plci |
Whether to obtain profile likelihood confidence interval. |
alpha |
The two-sided significance level. |
maxiter |
The maximum number of iterations. |
eps |
The tolerance to declare convergence. |
Fitting a logistic regression model using Firth's bias reduction method is equivalent to penalization of the log-likelihood by the Jeffreys prior. Firth's penalized log-likelihood is given by
and the components of the gradient are computed as
The Hessian matrix is not modified by this penalty.
Firth's method reduces bias in maximum likelihood estimates of coefficients, but it introduces a bias toward one-half in the predicted probabilities.
A straightforward modification to Firth’s logistic regression to achieve unbiased average predicted probabilities involves a post hoc adjustment of the intercept. This approach, known as Firth’s logistic regression with intercept correction (FLIC), preserves the bias-corrected effect estimates. By excluding the intercept from penalization, it ensures that we don't sacrifice the accuracy of effect estimates to improve the predictions.
A list with the following components:
sumstat: The data frame of summary statistics of model fit
with the following variables:
n: The number of subjects.
nevents: The number of events.
loglik0: The (penalized) log-likelihood under null.
loglik1: The maximum (penalized) log-likelihood.
niter: The number of Newton-Raphson iterations.
p: The number of parameters, including the intercept,
and regression coefficients associated with the covariates.
link: The link function.
robust: Whether a robust sandwich variance estimate should
be computed.
firth: Whether the firth's penalized likelihood is used.
flic: Whether to apply intercept correction.
fail: Whether the model fails to converge.
loglik0_unpenalized: The unpenalized log-likelihood under null.
loglik1_unpenalized: The maximum unpenalized log-likelihood.
parest: The data frame of parameter estimates with the
following variables:
param: The name of the covariate for the parameter estimate.
beta: The parameter estimate.
sebeta: The standard error of parameter estimate.
z: The Wald test statistic for the parameter.
expbeta: The exponentiated parameter estimate.
lower: The lower limit of confidence interval.
upper: The upper limit of confidence interval.
p: The p-value from the chi-square test.
method: The method to compute the confidence interval and
p-value.
sebeta_naive: The naive standard error of parameter estimate.
fitted: The data frame with the following variables:
linear_predictors: The linear fit on the link function scale.
fitted_values: The fitted probabilities of having an event,
obtained by transforming the linear predictors by the inverse of
the link function.
p: The number of parameters.
link: The link function.
param: The parameter names.
beta: The parameter estimate.
vbeta: The covariance matrix for parameter estimates.
vbeta_naive: The naive covariance matrix for parameter estimates.
linear_predictors: The linear fit on the link function scale.
fitted_values: The fitted probabilities of having an event.
terms: The terms object.
xlevels: A record of the levels of the factors used in fitting.
settings: A list containing the input parameter values.
Kaifeng Lu, [email protected]
David Firth. Bias Reduction of Maximum Likelihood Estimates. Biometrika 1993; 80:27–38.
Georg Heinze and Michael Schemper. A solution to the problem of separation in logistic regression. Statistics in Medicine 2002;21:2409–2419.
Rainer Puhr, Georg Heinze, Mariana Nold, Lara Lusa, and Angelika Geroldinger. Firth's logistic regression with rare events: accurate effect estimates and predictions? Statistics in Medicine 2017; 36:2302-2317.
(fit1 <- logisregr( ingots, event = "NotReady", covariates = "Heat*Soak", freq = "Freq"))(fit1 <- logisregr( ingots, event = "NotReady", covariates = "Heat*Soak", freq = "Freq"))
Obtains the log-rank test using the Fleming-Harrington family of weights.
lrtest( data, stratum = "", treat = "treat", time = "time", time2 = "", event = "event", weight = "", weight_readj = FALSE, rho1 = 0, rho2 = 0 )lrtest( data, stratum = "", treat = "treat", time = "time", time2 = "", event = "event", weight = "", weight_readj = FALSE, rho1 = 0, rho2 = 0 )
data |
The input data frame or list of data frames that contains the following variables:
|
stratum |
The name(s) of the stratum variable(s) in the input data. |
treat |
The name of the treatment variable in the input data. |
time |
The name of the time variable or the left end of each interval for counting process data in the input data. |
time2 |
The name of the right end of each interval for counting process data in the input data. |
event |
The name of the event variable in the input data. |
weight |
The name of the weight variable in the input data. |
weight_readj |
Whether the weight variable at each event time
will be readjusted to be proportional to the number at risk by
treatment group. Defaults to |
rho1 |
The first parameter of the Fleming-Harrington family of weighted log-rank test. Defaults to 0 for conventional log-rank test. |
rho2 |
The second parameter of the Fleming-Harrington family of weighted log-rank test. Defaults to 0 for conventional log-rank test. |
A data frame with the following variables:
uscore: The numerator of the log-rank test statistic.
vscore: The variance of the log-rank score test statistic.
logRankZ: The Z-statistic value.
logRankPValue: The two-sided p-value.
weight_readj: Whether the weight variable will be readjusted.
rho1: The first parameter of the Fleming-Harrington weights.
rho2: The second parameter of the Fleming-Harrington weights.
Kaifeng Lu, [email protected]
lrtest(rawdata[rawdata$iterationNumber == 1, ], stratum = "stratum", treat = "treatmentGroup", time = "timeUnderObservation", event = "event", rho1 = 0.5, rho2 = 0)lrtest(rawdata[rawdata$iterationNumber == 1, ], stratum = "stratum", treat = "treatmentGroup", time = "timeUnderObservation", event = "event", rho1 = 0.5, rho2 = 0)
Excludes data after treatment switching when fitting the switching model to estimate the probabilities of not switching and then switching. The inverse of these probabilities (inverse probability of treatment weights) are then used as weights in a Cox model including data after switching to estimate the adjusted hazard ratio.
msm( data, id = "id", stratum = "", tstart = "tstart", tstop = "tstop", event = "event", treat = "treat", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", numerator = "", denominator = "", strata_main_effect_only = TRUE, ns_df = 3, firth = FALSE, flic = FALSE, stabilized_weights = TRUE, trunc = 0, trunc_upper_only = TRUE, swtrt_control_only = TRUE, treat_alt_interaction = TRUE, alpha = 0.05, ties = "efron", boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )msm( data, id = "id", stratum = "", tstart = "tstart", tstop = "tstop", event = "event", treat = "treat", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", numerator = "", denominator = "", strata_main_effect_only = TRUE, ns_df = 3, firth = FALSE, flic = FALSE, stabilized_weights = TRUE, trunc = 0, trunc_upper_only = TRUE, swtrt_control_only = TRUE, treat_alt_interaction = TRUE, alpha = 0.05, ties = "efron", boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )
data |
The input data frame that contains the following variables:
|
id |
The name of the id variable in the input data. |
stratum |
The name(s) of the stratum variable(s) in the input data. |
tstart |
The name of the tstart variable in the input data. |
tstop |
The name of the tstop variable in the input data. |
event |
The name of the event variable in the input data. |
treat |
The name of the treatment variable in the input data. |
swtrt |
The name of the swtrt variable in the input data. |
swtrt_time |
The name of the swtrt_time variable in the input data. |
base_cov |
The names of baseline covariates (excluding treat) in the input data for the Cox model. |
numerator |
The names of baseline covariates (excluding treat) in the input data for the numerator switching model for stabilized weights. |
denominator |
The names of baseline (excluding treat) and time-dependent covariates in the input data for the denominator switching model. |
strata_main_effect_only |
Whether to only include the strata main
effects in the logistic regression switching model. Defaults to
|
ns_df |
Degrees of freedom for the natural cubic spline for visit-specific intercepts of the pooled logistic regression model. Defaults to 3 for two internal knots at the 33 and 67 percentiles of the treatment switching times. |
firth |
Whether the Firth's bias reducing penalized likelihood should be used. |
flic |
Whether to apply intercept correction to obtain more accurate predicted probabilities. |
stabilized_weights |
Whether to use the stabilized weights.
The default is |
trunc |
The truncation fraction of the weight distribution. Defaults to 0 for no truncation in weights. |
trunc_upper_only |
Whether to truncate the weights from the upper
end of the weight distribution only. Defaults to |
swtrt_control_only |
Whether treatment switching occurred only in
the control group. The default is |
treat_alt_interaction |
Whether to include an interaction between randomized and alternative treatments in the outcome model when both randomized arms can switch to alternative treatment. |
alpha |
The significance level to calculate confidence intervals. |
ties |
The method for handling ties in the Cox model, either "breslow" or "efron" (default). |
boot |
Whether to use bootstrap to obtain the confidence
interval for hazard ratio. Defaults to |
n_boot |
The number of bootstrap samples. |
seed |
The seed to reproduce the bootstrap results. |
nthreads |
The number of threads to use in bootstrapping (0 means the default RcppParallel behavior). |
The hazard ratio and confidence interval under a no-switching scenario are obtained as follows:
Exclude observations after treatment switch when fitting the switching model.
Define crossover indicators for the last time interval of each subject.
Fit the denominator switching model (and numerator model for stabilized weights) using a pooled logistic regression model to estimate the inverse probability of treatment weights (IPTWs).
The probability of remaining unswitched is calculated as
and multiplied
over time before treatment switch.
At the time of switching, this product is multiplied by the predicted probability of switching.
After treatment switch, the IPTW remains constant.
The inverse of the probability at the start of each interval is used as the interval weight.
Fit a weighted Cox model to the outcome survival times, including data after treatment switch, to estimate the hazard ratio.
Construct the p-value and confidence interval for the hazard ratio
using either robust sandwich variance or bootstrapping. When
bootstrapping is used, the confidence interval and p-value are
based on a t-distribution with n_boot - 1 degrees of freedom.
A list with the following components:
pvalue: The two-sided p-value.
pvalue_type: The type of two-sided p-value for treatment effect,
i.e., "Cox model" or "bootstrap".
hr: The estimated hazard ratio from the Cox model.
hr_CI: The confidence interval for hazard ratio.
hr_CI_type: The type of confidence interval for hazard ratio,
either "Cox model" or "bootstrap".
event_summary: A data frame containing the count and percentage
of deaths and switches by treatment arm.
data_switch: A list of input data for the switching models by
treatment group. The variables include id, stratum,
"tstart", "tstop", "cross", denominator,
swtrt, and swtrt_time. In addition,
stratum variables are converted to dummy variables, and
natural cubic spline basis variables are created for the visit-specific
intercepts.
fit_switch: A list of fitted switching models for the
denominator and numerator by treatment group.
data_outcome: The input data for the outcome Cox model
including the inverse probability of censoring weights.
The variables include id, stratum, "tstart",
"tstop", "event", "treated", "crossed",
"unstablized_weight", "stabilized_weight",
base_cov, and treat.
If treat_alt_interaction is TRUE,
the data set also includes the "treated_crossed" variable.
weight_summary: A data frame summarizing the weights by
treatment arm.
km_outcome: The Kaplan-Meier estimates of the survival
functions for the treatment and control groups based on the
weighted outcome data truncated at time of treatment switching.
lr_outcome: The log-rank test results for the treatment
effect based on the weighted outcome data truncated at time of
treatment switching.
fit_outcome: The fitted outcome Cox model.
fail: Whether a model fails to converge.
settings: A list containing the input parameter values.
fail_boots: The indicators for failed bootstrap samples
if boot is TRUE.
fail_boots_data: The data for failed bootstrap samples
if boot is TRUE.
hr_boots: The bootstrap hazard ratio estimates
if boot is TRUE.
Kaifeng Lu, [email protected]
James M. Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550-560.
Miguel Angel Hernan, Babette Brumback, and James M. Robins. Marginal structural modesl to estimate the causual effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561-570.
Jing Xu, Guohui Liu, and Bingxia Wang. Bias and Type I error control in correcting treatment effect for treatment switching using marginal structural models in Phase III oncology trials. Journal of Biopharmaceutical Statistics. 2022;32(6):897-914.
sim1 <- tssim( tdxo = 1, coxo = 1, allocation1 = 1, allocation2 = 1, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 0.0000855, accrualIntensity = 20/30, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500, NSim = 100, seed = 314159) fit1 <- msm( sim1[[1]], id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "trtrand", swtrt = "xo", swtrt_time = "xotime", base_cov = "bprog", numerator = "bprog", denominator = c("bprog", "L"), ns_df = 3, swtrt_control_only = TRUE, boot = FALSE) fit1sim1 <- tssim( tdxo = 1, coxo = 1, allocation1 = 1, allocation2 = 1, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 0.0000855, accrualIntensity = 20/30, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500, NSim = 100, seed = 314159) fit1 <- msm( sim1[[1]], id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "trtrand", swtrt = "xo", swtrt_time = "xotime", base_cov = "bprog", numerator = "bprog", denominator = c("bprog", "L"), ns_df = 3, swtrt_control_only = TRUE, boot = FALSE) fit1
Obtains the hazard ratio estimates from the proportional hazards regression model with right censored or counting process data.
phregr( data, stratum = "", time = "time", time2 = "", event = "event", covariates = "", weight = "", offset = "", id = "", ties = "efron", init = NA_real_, robust = FALSE, est_basehaz = TRUE, est_resid = TRUE, firth = FALSE, plci = FALSE, alpha = 0.05, maxiter = 50, eps = 1e-09 )phregr( data, stratum = "", time = "time", time2 = "", event = "event", covariates = "", weight = "", offset = "", id = "", ties = "efron", init = NA_real_, robust = FALSE, est_basehaz = TRUE, est_resid = TRUE, firth = FALSE, plci = FALSE, alpha = 0.05, maxiter = 50, eps = 1e-09 )
data |
The input data frame that contains the following variables:
|
stratum |
The name(s) of the stratum variable(s) in the input data. |
time |
The name of the time variable or the left end of each interval for counting process data in the input data. |
time2 |
The name of the right end of each interval for counting process data in the input data. |
event |
The name of the event variable in the input data. |
covariates |
The vector of names of baseline and time-dependent covariates in the input data. |
weight |
The name of the weight variable in the input data. |
offset |
The name of the offset variable in the input data. |
id |
The name of the id variable in the input data. |
ties |
The method for handling ties, either "breslow" or "efron" (default). |
init |
The vector of initial values. Defaults to zero for all variables. |
robust |
Whether a robust sandwich variance estimate should be computed. In the presence of the id variable, the score residuals will be aggregated for each id when computing the robust sandwich variance estimate. |
est_basehaz |
Whether to estimate the baseline hazards.
Defaults to |
est_resid |
Whether to estimate the martingale residuals.
Defaults to |
firth |
Whether to use Firth’s penalized likelihood method.
Defaults to |
plci |
Whether to obtain profile likelihood confidence interval. |
alpha |
The two-sided significance level. |
maxiter |
The maximum number of iterations. |
eps |
The tolerance to declare convergence. |
A list with the following components:
sumstat: The data frame of summary statistics of model fit
with the following variables:
n: The number of observations.
nevents: The number of events.
loglik0: The (penalized) log-likelihood under null.
loglik1: The maximum (penalized) log-likelihood.
scoretest: The score test statistic.
niter: The number of Newton-Raphson iterations.
ties: The method for handling ties, either "breslow" or
"efron".
p: The number of columns of the Cox model design matrix.
robust: Whether to use the robust variance estimate.
firth: Whether to use Firth's penalized likelihood method.
fail: Whether the model fails to converge.
loglik0_unpenalized: The unpenalized log-likelihood under null.
loglik1_unpenalized: The maximum unpenalized log-likelihood.
parest: The data frame of parameter estimates with the
following variables:
param: The name of the covariate for the parameter estimate.
beta: The log hazard ratio estimate.
sebeta: The standard error of log hazard ratio estimate.
z: The Wald test statistic for log hazard ratio.
expbeta: The hazard ratio estimate.
lower: The lower limit of confidence interval.
upper: The upper limit of confidence interval.
p: The p-value from the chi-square test.
method: The method to compute the confidence interval and
p-value.
sebeta_naive: The naive standard error of log hazard ratio
estimate if robust variance is requested.
basehaz: The data frame of baseline hazards with the following
variables (if est_basehaz is TRUE):
time: The observed event time.
nrisk: The number of patients at risk at the time point.
nevent: The number of events at the time point.
haz: The baseline hazard at the time point.
varhaz: The variance of the baseline hazard at the time point
assuming the parameter beta is known.
gradhaz: The gradient of the baseline hazard with respect to
beta at the time point (in the presence of covariates).
stratum: The stratum.
residuals: The martingale residuals.
linear_predictors: The vector of linear predictors.
p: The number of parameters.
param: The parameter names.
beta: The parameter estimate.
vbeta: The covariance matrix for parameter estimates.
vbeta_naive: The naive covariance matrix for parameter estimates.
terms: The terms object.
xlevels: A record of the levels of the factors used in fitting.
settings: A list containing the input parameter values.
Kaifeng Lu, [email protected]
Per K. Anderson and Richard D. Gill. Cox's regression model for counting processes, a large sample study. Annals of Statistics 1982; 10:1100-1120.
Terry M. Therneau and Patricia M. Grambsch. Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.
library(dplyr) # Example 1 with right-censored data (fit1 <- phregr( data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = 1*(treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat", est_basehaz = FALSE, est_resid = FALSE)) # Example 2 with counting process data and robust variance estimate (fit2 <- phregr( data = heart %>% mutate(rx = as.numeric(transplant) - 1), time = "start", time2 = "stop", event = "event", covariates = c("rx", "age"), id = "id", robust = TRUE, est_basehaz = TRUE, est_resid = TRUE))library(dplyr) # Example 1 with right-censored data (fit1 <- phregr( data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = 1*(treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat", est_basehaz = FALSE, est_resid = FALSE)) # Example 2 with counting process data and robust variance estimate (fit2 <- phregr( data = heart %>% mutate(rx = as.numeric(transplant) - 1), time = "start", time2 = "stop", event = "event", covariates = c("rx", "age"), id = "id", robust = TRUE, est_basehaz = TRUE, est_resid = TRUE))
This function prepares a counting-process style survival dataset
for analyses with time-dependent covariates. It merges baseline
and longitudinal data, fills in missing covariate values using
last-observation-carried-forward (LOCF), restricts to time points
where covariates change (optional), and constructs tstart, tstop,
and event variables suitable for use in survival models.
preptdc( adsl, adtdc, id = "SUBJID", randdt = "RANDDT", trtsdt = "TRTSDT", pddt = "PDDT", xodt = "XODT", osdt = "OSDT", died = "DIED", dcutdt = "DCUTDT", paramcd = "PARAMCD", adt = "ADT", aval = "AVAL", nodup = TRUE, offset = TRUE, nthreads = 0 )preptdc( adsl, adtdc, id = "SUBJID", randdt = "RANDDT", trtsdt = "TRTSDT", pddt = "PDDT", xodt = "XODT", osdt = "OSDT", died = "DIED", dcutdt = "DCUTDT", paramcd = "PARAMCD", adt = "ADT", aval = "AVAL", nodup = TRUE, offset = TRUE, nthreads = 0 )
adsl |
A data set containing baseline subject-level information.
It should include, at a minimum, subject ID ( |
adtdc |
A data set containing longitudinal
time-dependent covariate data, with subject ID ( |
id |
Character string specifying the column name for subject ID. |
randdt |
Character string specifying the column name for randomization date. |
trtsdt |
Character string specifying the column name for treatment start date. |
pddt |
Character string specifying the column name for progression date. |
xodt |
Character string specifying the column name for treatment crossover/switch date. |
osdt |
Character string specifying the column name for overall survival date (death date or last known alive date). |
died |
Character string specifying the column name for death indicator (0 = alive/censored, 1 = died). |
dcutdt |
Character string specifying the column name for data cut-off date. |
paramcd |
Character string specifying the column name for parameter code (identifying different covariates). |
adt |
Character string specifying the column name for analysis date in the time-dependent covariate dataset. |
aval |
Character string specifying the column name for analysis value (covariate values). |
nodup |
Logical; if |
offset |
Logical; if |
nthreads |
Integer number of threads to use for ‘data.table’ (0 means the default data.table behavior). |
The function performs the following steps:
Merge adsl and adtdc to obtain randomization date and
treatment start date.
Define adt2 as adt if adt > trtsdt,
and randdt if adt <= trtsdt (i.e., baseline time point).
This ensures that the baseline covariate value is the last
non-missing value at or before the treatment start date.
Post-baseline covariate values are anchored at their actual
analysis dates. The first record per subject corresponds to
survival time zero at randomization and ensures availability
of baseline covariates at randomization.
Keep the last record per subject, adt2, and paramcd.
Construct a complete skeleton so all covariates are present for each subject and time point.
Fill missing covariate values using LOCF.
Pivot to wide format with one row per subject and time point.
Optionally drop rows without covariate changes (nodup = TRUE).
Merge survival outcomes from adsl.
Compute time-to-event variables (ady, osdy, etc.), as well as
counting-process style variables tstart, tstop, and event.
A data set with one row per subject and time interval, including:
tstart, tstop — interval start and stop times
(days from randomization).
event — event indicator (0/1).
Covariates expanded to wide format.
Auxiliary variables such as progression indicator (pd),
treatment switch indicator (swtrt), and administrative
censoring time.
surv_data <- preptdc(adsl, adtdc, nodup = TRUE, nthreads = 1) head(surv_data)surv_data <- preptdc(adsl, adtdc, nodup = TRUE, nthreads = 1) head(surv_data)
A simulated data set with stratification and delayed treatment effect:
iterationNumberThe iteration number
arrivalTimeThe enrollment time for the subject
stratumThe stratum for the subject
treatmentGroupThe treatment group for the subject
timeUnderObservationThe time under observation since randomization
eventWhether the subject experienced the event
dropoutEventWhether the subject dropped out
rawdatarawdata
An object of class data.frame with 4910 rows and 7 columns.
Simulates datasets to evaluate the performance of various recensoring strategies under the Rank Preserving Structural Failure Time Model (RPSFTM) for handling treatment switching in survival analysis.
recensor_sim_rpsftm( nsim = 100L, n = 400L, shape = 1.5, scale = 553.9, gamma = 0.001, tfmin = 407.5, tfmax = 407.5, psi = -0.4621, omega = 0, pswitch = 0.7, a = 2, b = 4, low_psi = -5, hi_psi = 5, treat_modifier = 1, recensor_type = 1L, admin_recensor_only = TRUE, autoswitch = TRUE, alpha = 0.05, ties = "efron", tol = 1e-06, boot = TRUE, n_boot = 100L, seed = 0L )recensor_sim_rpsftm( nsim = 100L, n = 400L, shape = 1.5, scale = 553.9, gamma = 0.001, tfmin = 407.5, tfmax = 407.5, psi = -0.4621, omega = 0, pswitch = 0.7, a = 2, b = 4, low_psi = -5, hi_psi = 5, treat_modifier = 1, recensor_type = 1L, admin_recensor_only = TRUE, autoswitch = TRUE, alpha = 0.05, ties = "efron", tol = 1e-06, boot = TRUE, n_boot = 100L, seed = 0L )
nsim |
Number of simulated datasets. |
n |
Number of subjects per simulation. |
shape |
Shape parameter of the Weibull distribution for time to death. |
scale |
Scale parameter of the Weibull distribution for time to death in the control group. |
gamma |
Rate parameter of the exponential distribution for random dropouts in the control group. |
tfmin |
Minimum planned follow-up time (in days). |
tfmax |
Maximum planned follow-up time (in days). |
psi |
Log time ratio of death time for control vs experimental treatment. |
omega |
Log time ratio of dropout time for control vs experimental treatment. |
pswitch |
Probability of treatment switching at disease progression. |
a |
Shape parameter 1 of the Beta distribution for time to disease progression as a fraction of time to death. |
b |
Shape parameter 2 of the Beta distribution for time to disease progression. |
low_psi |
Lower bound for the search interval of the causal
parameter |
hi_psi |
Upper bound for the search interval of the causal
parameter |
treat_modifier |
Sensitivity parameter modifying the constant treatment effect assumption. |
recensor_type |
Type of recensoring to apply:
|
admin_recensor_only |
Logical. If |
autoswitch |
Logical. If |
alpha |
Significance level for confidence interval calculation (default is 0.05). |
ties |
Method for handling tied event times in the Cox model.
Options are |
tol |
Convergence tolerance for root-finding in estimation of
|
boot |
Logical. If |
n_boot |
Number of bootstrap samples, used only if
|
seed |
Optional. Random seed for reproducibility. |
A data frame summarizing the simulation results, including:
recensor_type, admin_recensor_only: Settings
used in the simulation.
Event rates: p_event_1, p_dropout_1,
p_admin_censor_1, p_event_0,
p_dropout_0, p_admin_censor_0.
Progression and switching: p_pd_0, p_swtrt_0,
p_recensored_0.
Causal parameter () estimates: psi,
psi_est, psi_bias,
psi_se, psi_mse.
Log hazard ratio estimates: loghr, loghr_est,
loghr_se, loghr_mse.
Hazard ratio metrics: hr, hr_est (geometric mean),
hr_pctbias (percent bias).
Standard errors of log hazard ratio: loghr_se_cox,
loghr_se_lr, loghr_se_boot.
Coverage probabilities: hr_ci_cover_cox,
hr_ci_cover_lr, hr_ci_cover_boot.
Kaifeng Lu, [email protected]
result <- recensor_sim_rpsftm( nsim = 10, n = 400, shape = 1.5, scale = exp(6.3169), gamma = 0.001, tfmin = 407.5, tfmax = 407.5, psi = log(0.5) / 1.5, omega = log(1), pswitch = 0.7, a = 2, b = 4, low_psi = -5, hi_psi = 5, treat_modifier = 1, recensor_type = 1, admin_recensor_only = TRUE, autoswitch = TRUE, alpha = 0.05, tol = 1e-6, boot = TRUE, n_boot = 10, seed = 314159)result <- recensor_sim_rpsftm( nsim = 10, n = 400, shape = 1.5, scale = exp(6.3169), gamma = 0.001, tfmin = 407.5, tfmax = 407.5, psi = log(0.5) / 1.5, omega = log(1), pswitch = 0.7, a = 2, b = 4, low_psi = -5, hi_psi = 5, treat_modifier = 1, recensor_type = 1, admin_recensor_only = TRUE, autoswitch = TRUE, alpha = 0.05, tol = 1e-6, boot = TRUE, n_boot = 10, seed = 314159)
Obtains the response, martingale, deviance, dfbeta, and likelihood displacement residuals for a parametric regression model for failure time data.
residuals_liferegr( object, type = c("response", "martingale", "deviance", "dfbeta", "dfbetas", "working", "ldcase", "ldresp", "ldshape", "matrix"), collapse = FALSE, weighted = (type %in% c("dfbeta", "dfbetas")) )residuals_liferegr( object, type = c("response", "martingale", "deviance", "dfbeta", "dfbetas", "working", "ldcase", "ldresp", "ldshape", "matrix"), collapse = FALSE, weighted = (type %in% c("dfbeta", "dfbetas")) )
object |
The output from the |
type |
The type of residuals desired, with options including
|
collapse |
Whether to collapse the residuals by |
weighted |
Whether to compute weighted residuals. |
The algorithms follow the residuals.survreg function in the
survival package, except for martingale residuals, which
are defined only for event or right-censored data for exponential,
weibull, lognormal, and loglogistic distributions.
Either a vector or a matrix of residuals, depending on the specified type:
response residuals are on the scale of the original data.
martingale residuals are event indicators minus the cumulative
hazards for event or right-censored data.
working residuals are on the scale of the linear predictor.
deviance residuals are on the log-likelihood scale.
dfbeta residuals are returned as a matrix, where the
-th row represents the approximate change in the model
coefficients resulting from the inclusion of subject .
dfbetas residuals are similar to dfbeta residuals, but
each column is scaled by the standard deviation of the
corresponding coefficient.
matrix residuals are a matrix of derivatives of the
log-likelihood function. Let be the log-likelihood, be
the linear predictor (), and be .
Then the resulting matrix contains six columns: ,
, ,
, , and
.
ldcase residulas are likelihood displacement for case weight
perturbation.
ldresp residuals are likelihood displacement for response value
perturbation.
ldshape residuals are likelihood displacement related to the
shape parameter.
Kaifeng Lu, [email protected]
Escobar, L. A. and Meeker, W. Q. Assessing influence in regression analysis with censored data. Biometrics 1992; 48:507-528.
library(dplyr) fit1 <- liferegr( data = tobin %>% mutate(time = ifelse(durable>0, durable, NA)), time = "time", time2 = "durable", covariates = c("age", "quant"), dist = "normal") resid <- residuals_liferegr(fit1, type = "response") head(resid)library(dplyr) fit1 <- liferegr( data = tobin %>% mutate(time = ifelse(durable>0, durable, NA)), time = "time", time2 = "durable", covariates = c("age", "quant"), dist = "normal") resid <- residuals_liferegr(fit1, type = "response") head(resid)
Obtains the martingale, deviance, score, or Schoenfeld residuals for a proportional hazards regression model.
residuals_phregr( object, type = c("martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas", "scaledsch"), collapse = FALSE, weighted = (type %in% c("dfbeta", "dfbetas")) )residuals_phregr( object, type = c("martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas", "scaledsch"), collapse = FALSE, weighted = (type %in% c("dfbeta", "dfbetas")) )
object |
The output from the |
type |
The type of residuals desired, with options including
|
collapse |
Whether to collapse the residuals by |
weighted |
Whether to compute weighted residuals. |
For score and Schoenfeld type residuals, the proportional hazards model
must include at least one covariate. The algorithms for deviance,
dfbeta, dfbetas, and scaledsch residuals follow
the residuals.coxph function in the survival package.
For martingale and deviance residuals, the result is a vector
with one element corresponding to each subject (without collapse).
For score residuals, the result is a matrix where each row represents
a subject and each column corresponds to a variable. The row order
aligns with the input data used in the original fit. For Schoenfeld
residuals, the result is a matrix with one row for each event and
one column per variable. These rows are sorted by time within strata,
with the attributes stratum and time included.
Score residuals represent each individual's contribution to the score
vector. Two commonly used transformations of this are dfbeta,
which represents the approximate change in the coefficient vector
if the observation is excluded, and dfbetas, which gives the
approximate change in the coefficients scaled by the standard error
of the coefficients.
Kaifeng Lu, [email protected]
Terry M. Therneau, Patricia M. Grambsch, and Thomas M. Fleming. Martingale based residuals for survival models. Biometrika 1990; 77:147-160.
Patricia M. Grambsch and Terry M. Therneau. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994; 81:515-26.
library(dplyr) # Example 1 with right-censored data fit1 <- phregr(data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = 1*(treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat") ressco <- residuals_phregr(fit1, type = "score") head(ressco) # Example 2 with counting process data fit2 <- phregr(data = heart %>% mutate(rx = as.numeric(transplant) - 1), time = "start", time2 = "stop", event = "event", covariates = c("rx", "age"), id = "id", robust = TRUE) resssch <- residuals_phregr(fit2, type = "scaledsch") head(resssch)library(dplyr) # Example 1 with right-censored data fit1 <- phregr(data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = 1*(treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat") ressco <- residuals_phregr(fit1, type = "score") head(ressco) # Example 2 with counting process data fit2 <- phregr(data = heart %>% mutate(rx = as.numeric(transplant) - 1), time = "start", time2 = "stop", event = "event", covariates = c("rx", "age"), id = "id", robust = TRUE) resssch <- residuals_phregr(fit2, type = "scaledsch") head(resssch)
Obtains the estimate of restricted mean survival time difference between two treatment groups.
rmdiff( data, stratum = "", treat = "treat", time = "time", event = "event", milestone = 0, rmstDiffH0 = 0, conflev = 0.95, biascorrection = FALSE )rmdiff( data, stratum = "", treat = "treat", time = "time", event = "event", milestone = 0, rmstDiffH0 = 0, conflev = 0.95, biascorrection = FALSE )
data |
The input data frame that contains the following variables:
|
stratum |
The name of the stratum variable in the input data. |
treat |
The name of the treatment variable in the input data. |
time |
The name of the time variable in the input data. |
event |
The name of the event variable in the input data. |
milestone |
The milestone time at which to calculate the restricted mean survival time. |
rmstDiffH0 |
The difference in restricted mean survival times under the null hypothesis. Defaults to 0 for superiority test. |
conflev |
The level of the two-sided confidence interval for the difference in restricted mean survival times. Defaults to 0.95. |
biascorrection |
Whether to apply bias correction for the variance estimate of individual restricted mean survival times. Defaults to no bias correction. |
A data frame with the following variables:
milestone: The milestone time relative to randomization.
rmstDiffH0: The difference in restricted mean survival times
under the null hypothesis.
rmst1: The estimated restricted mean survival time for
the treatment group.
rmst2: The estimated restricted mean survival time for
the control group.
rmstDiff: The estimated difference in restricted mean
survival times.
vrmst1: The variance for rmst1.
vrmst2: The variance for rmst2.
sermstDiff: The standard error for rmstDiff.
rmstDiffZ: The Z-statistic value.
rmstDiffPValue: The two-sided p-value.
lower: The lower bound of confidence interval.
upper: The upper bound of confidence interval.
conflev: The level of confidence interval.
biascorrection: Whether to apply bias correction for the
variance estimate of individual restricted mean survival times.
Kaifeng Lu, [email protected]
rmdiff(data = rawdata[rawdata$iterationNumber == 1, ], stratum = "stratum", treat = "treatmentGroup", time = "timeUnderObservation", event = "event", milestone = 12)rmdiff(data = rawdata[rawdata$iterationNumber == 1, ], stratum = "stratum", treat = "treatmentGroup", time = "timeUnderObservation", event = "event", milestone = 12)
Obtains the estimate of restricted means survival time for each stratum.
rmest( data, stratum = "", time = "time", event = "event", milestone = 0, conflev = 0.95, biascorrection = FALSE )rmest( data, stratum = "", time = "time", event = "event", milestone = 0, conflev = 0.95, biascorrection = FALSE )
data |
The input data frame that contains the following variables:
|
stratum |
The name of the stratum variable in the input data. |
time |
The name of the time variable in the input data. |
event |
The name of the event variable in the input data. |
milestone |
The milestone time at which to calculate the restricted mean survival time. |
conflev |
The level of the two-sided confidence interval for the survival probabilities. Defaults to 0.95. |
biascorrection |
Whether to apply bias correction for the variance estimate. Defaults to no bias correction. |
A data frame with the following variables:
stratum: The stratum variable.
size: The number of subjects in the stratum.
milestone: The milestone time relative to randomization.
rmst: The estimate of restricted mean survival time.
stderr: The standard error of the estimated rmst.
lower: The lower bound of confidence interval if requested.
upper: The upper bound of confidence interval if requested.
conflev: The level of confidence interval if requested.
biascorrection: Whether to apply bias correction for the
variance estimate.
Kaifeng Lu, [email protected]
rmest(data = aml, stratum = "x", time = "time", event = "status", milestone = 24)rmest(data = aml, stratum = "x", time = "time", event = "status", milestone = 24)
Estimates the causal treatment effect parameter using g-estimation based on the log-rank test, Cox model, or parametric survival/accelerated failure time (AFT) model. The method uses counterfactual untreated survival times to estimate the causal parameter and derives the adjusted hazard ratio from the Cox model using counterfactual unswitched survival times.
rpsftm( data, id = "id", stratum = "", time = "time", event = "event", treat = "treat", rx = "rx", censor_time = "censor_time", base_cov = "", psi_test = "logrank", aft_dist = "weibull", strata_main_effect_only = TRUE, low_psi = -2, hi_psi = 2, n_eval_z = 101, treat_modifier = 1, recensor = TRUE, admin_recensor_only = TRUE, autoswitch = TRUE, gridsearch = TRUE, root_finding = "brent", alpha = 0.05, ties = "efron", tol = 1e-06, boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )rpsftm( data, id = "id", stratum = "", time = "time", event = "event", treat = "treat", rx = "rx", censor_time = "censor_time", base_cov = "", psi_test = "logrank", aft_dist = "weibull", strata_main_effect_only = TRUE, low_psi = -2, hi_psi = 2, n_eval_z = 101, treat_modifier = 1, recensor = TRUE, admin_recensor_only = TRUE, autoswitch = TRUE, gridsearch = TRUE, root_finding = "brent", alpha = 0.05, ties = "efron", tol = 1e-06, boot = FALSE, n_boot = 1000, seed = 0, nthreads = 0 )
data |
The input data frame that contains the following variables:
|
id |
The name of the id variable in the input data. |
stratum |
The name(s) of the stratum variable(s) in the input data. |
time |
The name of the time variable in the input data. |
event |
The name of the event variable in the input data. |
treat |
The name of the treatment variable in the input data. |
rx |
The name of the rx variable in the input data. |
censor_time |
The name of the censor_time variable in the input data. |
base_cov |
The names of baseline covariates (excluding
treat) in the input data for the outcome Cox model.
These covariates will also be used in the Cox model for estimating
|
psi_test |
The survival function to calculate the Z-statistic, e.g., "logrank" (default), "phreg", or "lifereg". |
aft_dist |
The assumed distribution for time to event for the AFT
model when |
strata_main_effect_only |
Whether to only include the strata main
effects in the AFT model. Defaults to |
low_psi |
The lower limit of the causal parameter. |
hi_psi |
The upper limit of the causal parameter. |
n_eval_z |
The number of points between |
treat_modifier |
The optional sensitivity parameter for the constant treatment effect assumption. |
recensor |
Whether to apply recensoring to counterfactual
survival times. Defaults to |
admin_recensor_only |
Whether to apply recensoring to administrative
censoring times only. Defaults to |
autoswitch |
Whether to exclude recensoring for treatment arms
with no switching. Defaults to |
gridsearch |
Whether to use grid search to estimate the causal
parameter |
root_finding |
Character string specifying the univariate
root-finding algorithm to use. Options are |
alpha |
The significance level to calculate confidence intervals. |
ties |
The method for handling ties in the Cox model, either "breslow" or "efron" (default). |
tol |
The desired accuracy (convergence tolerance) for |
boot |
Whether to use bootstrap to obtain the confidence
interval for hazard ratio. Defaults to |
n_boot |
The number of bootstrap samples. |
seed |
The seed to reproduce the bootstrap results. |
nthreads |
The number of threads to use in bootstrapping (0 means the default RcppParallel behavior). |
Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:
Estimate the causal parameter using g-estimation based on
the log-rank test (default), Cox model, or parametric survival/AFT
model, using counterfactual untreated survival times for
both arms:
Compute counterfactual survival times for control patients using
the estimated .
Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.
Obtain the confidence interval for the hazard ratio using either
the ITT log-rank test p-value or bootstrap. When bootstrapping,
the interval and p-value are derived from a t-distribution
with n_boot - 1 degrees of freedom.
If grid search is used to estimate , the estimated
is the one with the smallest absolute value among those at which
the Z-statistic is zero based on linear interpolation.
If root finding is used, the estimated is
the solution to the equation where the Z-statistic is zero.
A list with the following components:
psi: The estimated causal parameter.
psi_roots: Vector of psi values at which the Z-statistic
is zero, identified using grid search and linear interpolation.
psi_CI: The confidence interval for psi.
psi_CI_type: The type of confidence interval for psi,
i.e., "grid search", "root finding", or "bootstrap".
pvalue: The two-sided p-value.
pvalue_type: The type of two-sided p-value for treatment effect,
i.e., "log-rank" or "bootstrap".
hr: The estimated hazard ratio from the Cox model.
hr_CI: The confidence interval for hazard ratio.
hr_CI_type: The type of confidence interval for hazard ratio,
either "log-rank p-value" or "bootstrap".
event_summary: A data frame containing the count and percentage
of deaths and switches by treatment arm.
eval_z: A data frame containing the Z-statistics for treatment
effect evaluated at a sequence of psi values. Used to plot and
check if the range of psi values to search for the solution and
limits of confidence interval of psi need be modified.
Sstar: A data frame containing the counterfactual untreated
survival times and event indicators for each treatment group.
The variables include id, stratum,
"t_star", "d_star", "treated", base_cov,
and treat.
kmstar: A data frame containing the Kaplan-Meier estimates
based on the counterfactual untreated survival times by treatment arm.
data_outcome: The input data for the outcome Cox model of
counterfactual unswitched survival times.
The variables include id, stratum, "t_star",
"d_star", "treated", base_cov, and treat.
km_outcome: The Kaplan-Meier estimates of the survival
functions for the treatment and control groups based on the
counterfactual unswitched survival times.
lr_outcome: The log-rank test results for the treatment
effect based on the counterfactual unswitched survival times.
fit_outcome: The fitted outcome Cox model.
fail: Whether a model fails to converge.
psimissing: Whether the psi parameter cannot be estimated.
settings: A list containing the input parameter values.
fail_boots: The indicators for failed bootstrap samples
if boot is TRUE.
fail_boots_data: The data for failed bootstrap samples
if boot is TRUE.
hr_boots: The bootstrap hazard ratio estimates
if boot is TRUE.
psi_boots: The bootstrap psi estimates
if boot is TRUE.
Kaifeng Lu, [email protected]
James M. Robins and Anastasios A. Tsiatis. Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics. 1991;20(8):2609-2631.
Ian R. White, Adbel G. Babiker, Sarah Walker, and Janet H. Darbyshire. Randomization-based methods for correcting for treatment changes: Examples from the CONCORDE trial. Statistics in Medicine. 1999;18(19):2617-2634.
library(dplyr) # Example 1: one-way treatment switching (control to active) data <- immdef %>% mutate(rx = 1-xoyrs/progyrs) fit1 <- rpsftm( data, id = "id", time = "progyrs", event = "prog", treat = "imm", rx = "rx", censor_time = "censyrs", boot = FALSE) fit1 # Example 2: two-way treatment switching (illustration only) # the eventual survival time shilong1 <- shilong %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(-c("ps", "ttc", "tran")) shilong2 <- shilong1 %>% mutate(rx = ifelse(co, ifelse(bras.f == "MTA", dco/ady, 1 - dco/ady), ifelse(bras.f == "MTA", 1, 0))) fit2 <- rpsftm( shilong2, id = "id", time = "tstop", event = "event", treat = "bras.f", rx = "rx", censor_time = "dcut", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), low_psi = -3, hi_psi = 3, boot = FALSE) fit2library(dplyr) # Example 1: one-way treatment switching (control to active) data <- immdef %>% mutate(rx = 1-xoyrs/progyrs) fit1 <- rpsftm( data, id = "id", time = "progyrs", event = "prog", treat = "imm", rx = "rx", censor_time = "censyrs", boot = FALSE) fit1 # Example 2: two-way treatment switching (illustration only) # the eventual survival time shilong1 <- shilong %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(-c("ps", "ttc", "tran")) shilong2 <- shilong1 %>% mutate(rx = ifelse(co, ifelse(bras.f == "MTA", dco/ady, 1 - dco/ady), ifelse(bras.f == "MTA", 1, 0))) fit2 <- rpsftm( shilong2, id = "id", time = "tstop", event = "event", treat = "bras.f", rx = "rx", censor_time = "dcut", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), low_psi = -3, hi_psi = 3, boot = FALSE) fit2
This data set deals with urinary tract infection in sexually active college women, along with covariate information on age an contraceptive use. The variables are all binary and coded in 1 (condition is present) and 0 (condition is absent).
sexaggsexagg
An object of class data.frame with 36 rows and 9 columns.
caseurinary tract infection, the study outcome variable
age>= 24 years
diause of diaphragm
ocuse of oral contraceptive
vicuse of condom
vicluse of lubricated condom
visuse of spermicide
The original SHIdat data set contains an anonymized excerpt of data from the SHIVA01 trial. This was the first randomized clinical trial that aimed at comparing molecularly targeted therapy based on tumor profiling (MTA) versus conventional therapy (CT) for advanced cancer. Patients were randomly assigned to receive the active or control treatment and may switch to the other arm or subsequent anti-cancer therapy upon disease progression. The restructured data is in the long format.
idThe patient's identifier
tstartThe start of the time interval
tstopThe end of the time interval
eventWhether the patient died at the end of the interval
agerandThe patient's age (in years) at randomization
sex.fThe patients' gender, either Male or Female
tt_LnumThe number of previous lines of treatment
rmh_alea.cThe Royal Marsden Hospital score segregated into two categories
pathway.fThe molecular pathway altered (the hormone receptors pathway, the PI3K/ AKT/mTOR pathway, and the RAF/MEK pathway)
bras.fThe patient's randomized arm, either MTA or CT
psThe ECOG performance status
ttcThe presence of concomitant treatments
tranThe use of platelet transfusions
dpdThe relative day of a potential progression
dcoThe relative day of treatment switching
adyThe relative day of the latest news
dcutThe relative day of administrative cutoff
pdWhether the patient had disease progression
coWhether the patient switched treatment
shilongshilong
An object of class data.frame with 602 rows and 19 columns.
The data analyzed are the 16 selected cases in Lipsitz et al. (1994). The binary response is the wheezing status of 16 children at ages 9, 10, 11, and 12 years. A value of 1 of wheezing status indicates the occurrence of wheezing. The explanatory variables city of residence, age, and maternal smoking status at the particular age.
sixsix
An object of class tbl_df (inherits from tbl, data.frame) with 64 rows and 6 columns.
casecase id
citycity of residence
ageage of the child
smokematernal smoking status
wheezewheezing status
Obtains the predicted survivor function for a proportional hazards regression model.
survfit_phregr( object, newdata, sefit = TRUE, conftype = "log-log", conflev = 0.95 )survfit_phregr( object, newdata, sefit = TRUE, conftype = "log-log", conflev = 0.95 )
object |
The output from the |
newdata |
A data frame with the same variable names as those that
appear in the |
sefit |
Whether to compute the standard error of the survival estimates. |
conftype |
The type of the confidence interval. One of |
conflev |
The level of the two-sided confidence interval for the survival probabilities. Defaults to 0.95. |
If newdata is not provided and there is no covariate, survival
curves based on the basehaz data frame will be produced.
A data frame with the following variables:
id: The id of the subject for counting-process data with
time-dependent covariates.
time: The observed times in the data used to fit
phregr.
nrisk: The number of patients at risk at the time point in the
data used to fit phregr.
nevent: The number of patients having event at the time point
in the data used to fit phregr.
cumhaz: The cumulative hazard at the time point.
surv: The estimated survival probability at the time point.
sesurv: The standard error of the estimated survival probability.
lower: The lower confidence limit for survival probability.
upper: The upper confidence limit for survival probability.
conflev: The level of the two-sided confidence interval.
conftype: The type of the confidence interval.
covariates: The values of covariates based on newdata.
stratum: The stratum of the subject.
Kaifeng Lu, [email protected]
Terry M. Therneau and Patricia M. Grambsch. Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.
library(dplyr) # Example 1 with right-censored data fit1 <- phregr(data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = 1*(treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat") surv1 <- survfit_phregr(fit1, newdata = data.frame( stratum = as.integer(c(1,1,2,2)), treat = c(1,0,1,0))) head(surv1) # Example 2 with counting process data and robust variance estimate fit2 <- phregr(data = heart %>% mutate(rx = as.numeric(transplant) - 1), time = "start", time2 = "stop", event = "event", covariates = c("rx", "age"), id = "id", robust = TRUE) surv2 <- survfit_phregr(fit2, newdata = data.frame( id = c(4,4,11,11), age = c(-7.737,-7.737,-0.019,-0.019), start = c(0,36,0,26), stop = c(36,39,26,153), rx = c(0,1,0,1))) head(surv2)library(dplyr) # Example 1 with right-censored data fit1 <- phregr(data = rawdata %>% filter(iterationNumber == 1) %>% mutate(treat = 1*(treatmentGroup == 1)), stratum = "stratum", time = "timeUnderObservation", event = "event", covariates = "treat") surv1 <- survfit_phregr(fit1, newdata = data.frame( stratum = as.integer(c(1,1,2,2)), treat = c(1,0,1,0))) head(surv1) # Example 2 with counting process data and robust variance estimate fit2 <- phregr(data = heart %>% mutate(rx = as.numeric(transplant) - 1), time = "start", time2 = "stop", event = "event", covariates = c("rx", "age"), id = "id", robust = TRUE) surv2 <- survfit_phregr(fit2, newdata = data.frame( id = c(4,4,11,11), age = c(-7.737,-7.737,-0.019,-0.019), start = c(0,36,0,26), stop = c(36,39,26,153), rx = c(0,1,0,1))) head(surv2)
Obtains the Brookmeyer-Crowley confidence interval for quantiles of right-censored time-to-event data.
survQuantile( time, event, cilevel = 0.95, transform = "loglog", probs = as.numeric(c(0.25, 0.5, 0.75)) )survQuantile( time, event, cilevel = 0.95, transform = "loglog", probs = as.numeric(c(0.25, 0.5, 0.75)) )
time |
The vector of possibly right-censored survival times. |
event |
The vector of event indicators. |
cilevel |
The confidence interval level. Defaults to 0.95. |
transform |
The transformation of the survival function to use to construct the confidence interval. Options include "linear" (alternatively "plain"), "log", "loglog" (alternatively "log-log" or "cloglog"), "asinsqrt" (alternatively "asin" or "arcsin"), and "logit". Defaults to "loglog". |
probs |
The vector of probabilities to calculate the quantiles. Defaults to c(0.25, 0.5, 0.75). |
A data frame containing the estimated quantile and confidence interval corresponding to each specified probability. It includes the following variables:
prob: The probability to calculate the quantile.
quantile: The estimated quantile.
lower: The lower limit of the confidence interval.
upper: The upper limit of the confidence interval.
cilevel: The confidence interval level.
transform: The transformation of the survival function to use
to construct the confidence interval.
Kaifeng Lu, [email protected]
survQuantile( time = c(33.7, 3.9, 10.5, 5.4, 19.5, 23.8, 7.9, 16.9, 16.6, 33.7, 17.1, 7.9, 10.5, 38), event = c(0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1), probs = c(0.25, 0.5, 0.75))survQuantile( time = c(33.7, 3.9, 10.5, 5.4, 19.5, 23.8, 7.9, 16.9, 16.6, 33.7, 17.1, 7.9, 10.5, 38), event = c(0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1), probs = c(0.25, 0.5, 0.75))
Data from Tobin's original paper.
durableDurable goods purchase
ageAge in years
quantLiquidity ratio (x 1000)
tobintobin
An object of class data.frame with 20 rows and 3 columns.
Estimates the causal parameter using g-estimation by fitting a pooled logistic regression switching model that includes counterfactual unswitched survival times and time-dependent confounders as covariates. The adjusted hazard ratio is then obtained from the Cox model using counterfactual unswitched survival times based on the estimated causal parameter.
tsegest( data, id = "id", stratum = "", tstart = "tstart", tstop = "tstop", event = "event", treat = "treat", censor_time = "censor_time", pd = "pd", pd_time = "pd_time", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", conf_cov = "", strata_main_effect_only = TRUE, ns_df = 3, firth = FALSE, flic = FALSE, low_psi = -2, hi_psi = 2, n_eval_z = 101, recensor = TRUE, admin_recensor_only = TRUE, swtrt_control_only = TRUE, gridsearch = TRUE, root_finding = "brent", alpha = 0.05, ties = "efron", tol = 1e-06, offset = 1, boot = TRUE, n_boot = 1000, seed = 0, nthreads = 0 )tsegest( data, id = "id", stratum = "", tstart = "tstart", tstop = "tstop", event = "event", treat = "treat", censor_time = "censor_time", pd = "pd", pd_time = "pd_time", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", conf_cov = "", strata_main_effect_only = TRUE, ns_df = 3, firth = FALSE, flic = FALSE, low_psi = -2, hi_psi = 2, n_eval_z = 101, recensor = TRUE, admin_recensor_only = TRUE, swtrt_control_only = TRUE, gridsearch = TRUE, root_finding = "brent", alpha = 0.05, ties = "efron", tol = 1e-06, offset = 1, boot = TRUE, n_boot = 1000, seed = 0, nthreads = 0 )
data |
The input data frame that contains the following variables:
|
id |
The name of the id variable in the input data. |
stratum |
The name(s) of the stratum variable(s) in the input data. |
tstart |
The name of the tstart variable in the input data. |
tstop |
The name of the tstop variable in the input data. |
event |
The name of the event variable in the input data. |
treat |
The name of the treatment variable in the input data. |
censor_time |
The name of the censor_time variable in the input data. |
pd |
The name of the pd variable in the input data. |
pd_time |
The name of the pd_time variable in the input data. |
swtrt |
The name of the swtrt variable in the input data. |
swtrt_time |
The name of the swtrt_time variable in the input data. |
base_cov |
The names of baseline covariates (excluding treat) in the input data for the Cox model. |
conf_cov |
The names of confounding variables (excluding treat) in the input data for the logistic regression switching model. |
strata_main_effect_only |
Whether to only include the strata main
effects in the logistic regression switching model. Defaults to
|
ns_df |
Degrees of freedom for the natural cubic spline for visit-specific intercepts of the pooled logistic regression model. Defaults to 3 for two internal knots at the 33 and 67 percentiles of the treatment switching times. |
firth |
Whether the Firth's bias reducing penalized likelihood should be used. |
flic |
Whether to apply intercept correction to obtain more accurate predicted probabilities. |
low_psi |
The lower limit of the causal parameter. |
hi_psi |
The upper limit of the causal parameter. |
n_eval_z |
The number of points between |
recensor |
Whether to apply recensoring to counterfactual
survival times. Defaults to |
admin_recensor_only |
Whether to apply recensoring to administrative
censoring times only. Defaults to |
swtrt_control_only |
Whether treatment switching occurred only in
the control group. The default is |
gridsearch |
Whether to use grid search to estimate the causal
parameter |
root_finding |
Character string specifying the univariate
root-finding algorithm to use. Options are |
alpha |
The significance level to calculate confidence intervals. |
ties |
The method for handling ties in the Cox model, either "breslow" or "efron" (default). |
tol |
The desired accuracy (convergence tolerance) for |
offset |
The offset to calculate the time from disease progression
to death or censoring, the time from disease progression to treatment
switch, and the time from treatment switch to death or censoring.
We can set |
boot |
Whether to use bootstrap to obtain the confidence
interval for hazard ratio. Defaults to |
n_boot |
The number of bootstrap samples. |
seed |
The seed to reproduce the bootstrap results. |
nthreads |
The number of threads to use in bootstrapping (0 means the default RcppParallel behavior). |
Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:
Fit a pooled logistic regression switching model among control-arm patients who experienced disease progression:
where is the switch indicator for subject at
observation ,
is the counterfactual
survival time given a specific , and
represents the time-dependent confounders.
Natural cubic splines of time can be included to model time-varying
baseline hazards. is defined relative to the
secondary baseline at disease progression and represents
post-progression counterfactual survival, where and
correspond to time spent after progression on control
and experimental treatments, respectively.
Martingale residuals may be used in place of counterfactual survival
times to account for censoring.
Identify the value of for which the Z-statistic of
is approximately zero. This value is the causal
parameter estimate.
Compute counterfactual survival times for control patients using
the estimated .
Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.
When bootstrapping is used, derive the confidence interval and
p-value for the hazard ratio from a t-distribution with
n_boot - 1 degrees of freedom.
If treatment switching occurs before or in the absence of recorded disease progression, the patient is considered to have progressed at the time of treatment switching.
If grid search is used to estimate , the estimated
is the one with the smallest absolute value among those at which
the Z-statistic is zero based on linear interpolation.
If root finding is used, the estimated is
the solution to the equation where the Z-statistic is zero.
A list with the following components:
psi: The estimated causal parameter for the control group.
psi_roots: Vector of psi values for the control group
at which the Z-statistic is zero, identified using grid search and
linear interpolation.
psi_CI: The confidence interval for psi.
psi_CI_type: The type of confidence interval for psi,
i.e., "grid search", "root finding", or "bootstrap".
logrank_pvalue: The two-sided p-value of the log-rank test
for the ITT analysis.
cox_pvalue: The two-sided p-value for treatment effect based on
the Cox model applied to counterfactual unswitched survival times.
If boot is TRUE, this value represents the
bootstrap p-value.
hr: The estimated hazard ratio from the Cox model.
hr_CI: The confidence interval for hazard ratio.
hr_CI_type: The type of confidence interval for hazard ratio,
either "Cox model" or "bootstrap".
event_summary: A data frame containing the count and percentage
of deaths, disease progressions, and switches by treatment arm.
data_switch: The list of input data for the time from
disease progression to switch by treatment group. The variables
include id, stratum, "swtrt",
and "swtrt_time". If swtrt == 0, then swtrt_time
is censored at the time from disease progression to death or censoring.
km_switch: The list of Kaplan-Meier plot data for the
time from disease progression to switch by treatment group.
eval_z: The list of data by treatment group containing
the Wald statistics for the coefficient of the counterfactual
in the logistic regression switching model, evaluated at
a sequence of psi values. Used to plot and check
if the range of psi values to search for the solution
and limits of confidence interval of psi need be modified.
data_nullcox: The list of input data for counterfactual
survival times for the null Cox model by treatment group.
The variables include id, stratum,
"t_star" and "d_star".
fit_nullcox: The list of fitted null Cox models for
counterfactual survival times by treatment group, which contains
the martingale residuals.
data_logis: The list of input data for pooled logistic
regression models for treatment switching using g-estimation.
The variables include id, stratum,
"tstart", "tstop", "cross",
"counterfactual", conf_cov, ns,
pd_time, swtrt, and swtrt_time.
fit_logis: The list of fitted pooled logistic regression
models for treatment switching using g-estimation.
data_outcome: The input data for the outcome Cox model
of counterfactual unswitched survival times.
The variables include id, stratum, "t_star",
"d_star", "treated", base_cov and treat.
km_outcome: The Kaplan-Meier estimates of the survival
functions for the treatment and control groups based on the
counterfactual unswitched survival times.
lr_outcome: The log-rank test results for the treatment
effect based on the counterfactual unswitched survival times.
fit_outcome: The fitted outcome Cox model.
fail: Whether a model fails to converge.
psimissing: Whether the psi parameter cannot be estimated.
settings: A list containing the input parameter values.
psi_trt: The estimated causal parameter for the experimental
group if swtrt_control_only is FALSE.
psi_trt_roots: Vector of psi_trt values for the
experimental group at which the Z-statistic is zero, identified using
grid search and linear interpolation, if swtrt_control_only
is FALSE.
psi_trt_CI: The confidence interval for psi_trt if
swtrt_control_only is FALSE.
fail_boots: The indicators for failed bootstrap samples
if boot is TRUE.
fail_boots_data: The data for failed bootstrap samples
if boot is TRUE.
hr_boots: The bootstrap hazard ratio estimates
if boot is TRUE.
psi_boots: The bootstrap psi estimates
if boot is TRUE.
psi_trt_boots: The bootstrap psi_trt estimates
if boot is TRUE and swtrt_control_only is
FALSE.
Kaifeng Lu, [email protected]
NR Latimer, IR White, K Tilling, and U Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.
library(dplyr) sim1 <- tsegestsim( n = 500, allocation1 = 2, allocation2 = 1, pbprog = 0.5, trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, scale1 = 360, shape2 = 1.7, scale2 = 688, pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, milestone = 546, seed = 2000) data1 <- sim1$paneldata %>% mutate(visit7on = ifelse(progressed, tstop > timePFSobs + 105, 0)) fit1 <- tsegest( data = data1, id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "trtrand", censor_time = "censor_time", pd = "progressed", pd_time = "timePFSobs", swtrt = "xo", swtrt_time = "xotime", base_cov = "bprog", conf_cov = c("bprog*cattdc", "timePFSobs", "visit7on"), ns_df = 3, low_psi = -1, hi_psi = 1, n_eval_z = 101, recensor = TRUE, admin_recensor_only = TRUE, swtrt_control_only = TRUE, alpha = 0.05, ties = "efron", tol = 1.0e-6, offset = 0, boot = FALSE) fit1library(dplyr) sim1 <- tsegestsim( n = 500, allocation1 = 2, allocation2 = 1, pbprog = 0.5, trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, scale1 = 360, shape2 = 1.7, scale2 = 688, pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, milestone = 546, seed = 2000) data1 <- sim1$paneldata %>% mutate(visit7on = ifelse(progressed, tstop > timePFSobs + 105, 0)) fit1 <- tsegest( data = data1, id = "id", tstart = "tstart", tstop = "tstop", event = "event", treat = "trtrand", censor_time = "censor_time", pd = "progressed", pd_time = "timePFSobs", swtrt = "xo", swtrt_time = "xotime", base_cov = "bprog", conf_cov = c("bprog*cattdc", "timePFSobs", "visit7on"), ns_df = 3, low_psi = -1, hi_psi = 1, n_eval_z = 101, recensor = TRUE, admin_recensor_only = TRUE, swtrt_control_only = TRUE, alpha = 0.05, ties = "efron", tol = 1.0e-6, offset = 0, boot = FALSE) fit1
Obtains the simulated data for baseline prognosis, disease progression, treatment switching, death, and time-dependent covariates.
tsegestsim( n = 500L, allocation1 = 2L, allocation2 = 1L, pbprog = 0.5, trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, scale1 = 360, shape2 = 1.7, scale2 = 688, pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, milestone = 546, seed = 0L )tsegestsim( n = 500L, allocation1 = 2L, allocation2 = 1L, pbprog = 0.5, trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, scale1 = 360, shape2 = 1.7, scale2 = 688, pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, milestone = 546, seed = 0L )
n |
The total sample size for two treatment arms combined. |
allocation1 |
The number of subjects in the active treatment group in a randomization block. |
allocation2 |
The number of subjects in the control group in a randomization block. |
pbprog |
The probability of having poor prognosis at baseline. |
trtlghr |
The treatment effect in terms of log hazard ratio. |
bprogsl |
The poor prognosis effect in terms of log hazard ratio. |
shape1 |
The shape parameter for the Weibull event distribution for the first component. |
scale1 |
The scale parameter for the Weibull event distribution for the first component. |
shape2 |
The shape parameter for the Weibull event distribution for the second component. |
scale2 |
The scale parameter for the Weibull event distribution for the second component. |
pmix |
The mixing probability of the first component Weibull distribution. |
admin |
The administrative censoring time. |
pcatnotrtbprog |
The probability of developing metastatic disease on control treatment with poor baseline prognosis. |
pcattrtbprog |
The probability of developing metastatic disease on active treatment with poor baseline prognosis. |
pcatnotrt |
The probability of developing metastatic disease on control treatment with good baseline prognosis. |
pcattrt |
The probability of developing metastatic disease on active treatment with good baseline prognosis. |
catmult |
The impact of metastatic disease on shortening remaining survival time. |
tdxo |
Whether treatment crossover depends on time-dependent covariates between disease progression and treatment switching. |
ppoor |
The probability of switching for poor baseline prognosis with no metastatic disease. |
pgood |
The probability of switching for good baseline prognosis with no metastatic disease. |
ppoormet |
The probability of switching for poor baseline prognosis after developing metastatic disease. |
pgoodmet |
The probability of switching for good baseline prognosis after developing metastatic disease. |
xomult |
The direct effect of crossover on extending remaining survival time. |
milestone |
The milestone to calculate restricted mean survival time. |
seed |
The seed to reproduce the simulation results. |
A list with two data frames.
sumdata: A summary data frame with the following variables:
simtrueconstmean: The true control group restricted mean
survival time (RMST).
simtrueconstlb: The lower bound for control group RMST.
simtrueconstub: The upper bound for control group RMST.
simtrueconstse: The standard error for control group RMST.
simtrueexpstmean: The true experimental group restricted
mean survival time (RMST).
simtrueexpstlb: The lower bound for experimental group RMST.
simtrueexpstub: The upper bound for experimental group RMST.
simtrueexpstse: The standard error for experimental group
RMST.
simtrue_coxwbprog_hr: The treatment hazard ratio from the
Cox model adjusting for baseline prognosis.
simtrue_cox_hr: The treatment hazard ratio from the Cox
model without adjusting for baseline prognosis.
simtrue_aftwbprog_af: The average acceleration factor from
the Weibull AFT model adjusting for baseline prognosis.
simtrue_aft_af: The average acceleration factor from
the Weibull AFT model without adjusting for baseline prognosis.
paneldata: A counting process style subject-level data frame
with the following variables:
id: The subject ID.
trtrand: The randomized treatment arm.
bprog: Whether the patient had poor baseline prognosis.
tstart: The left end of time interval.
tstop: The right end of time interval.
event: Whether the patient died at the end of the interval.
timeOS: The observed survival time.
died: Whether the patient died during the study.
progressed: Whether the patient had disease progression.
timePFSobs: The observed time of disease progression at
regular scheduled visits.
progtdc: The time-dependent covariate for progression.
catevent: Whether the patient developed metastatic disease.
cattime: When the patient developed metastatic disease.
cattdc: The time-dependent covariate for cat event.
xo: Whether the patient switched treatment.
xotime: When the patient switched treatment.
xotdc: The time-dependent covariate for treatment
switching.
censor_time: The administrative censoring time.
Kaifeng Lu, [email protected]
NR Latimer, IR White, K Tilling, and U Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.
sim1 <- tsegestsim( n = 500, allocation1 = 2, allocation2 = 1, pbprog = 0.5, trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, scale1 = 360, shape2 = 1.7, scale2 = 688, pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, milestone = 546, seed = 2000)sim1 <- tsegestsim( n = 500, allocation1 = 2, allocation2 = 1, pbprog = 0.5, trtlghr = -0.5, bprogsl = 0.3, shape1 = 1.8, scale1 = 360, shape2 = 1.7, scale2 = 688, pmix = 0.5, admin = 5000, pcatnotrtbprog = 0.5, pcattrtbprog = 0.25, pcatnotrt = 0.2, pcattrt = 0.1, catmult = 0.5, tdxo = 1, ppoor = 0.1, pgood = 0.04, ppoormet = 0.4, pgoodmet = 0.2, xomult = 1.4188308, milestone = 546, seed = 2000)
Estimates the causal parameter by fitting an accelerated failure time (AFT) model comparing post-progression survival between switchers and non-switchers, and derives the adjusted hazard ratio from the Cox model using counterfactual unswitched survival times based on the estimated causal parameter.
tsesimp( data, id = "id", stratum = "", time = "time", event = "event", treat = "treat", censor_time = "censor_time", pd = "pd", pd_time = "pd_time", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", base2_cov = "", aft_dist = "weibull", strata_main_effect_only = TRUE, recensor = TRUE, admin_recensor_only = TRUE, swtrt_control_only = TRUE, alpha = 0.05, ties = "efron", offset = 1, boot = TRUE, n_boot = 1000, seed = 0, nthreads = 0 )tsesimp( data, id = "id", stratum = "", time = "time", event = "event", treat = "treat", censor_time = "censor_time", pd = "pd", pd_time = "pd_time", swtrt = "swtrt", swtrt_time = "swtrt_time", base_cov = "", base2_cov = "", aft_dist = "weibull", strata_main_effect_only = TRUE, recensor = TRUE, admin_recensor_only = TRUE, swtrt_control_only = TRUE, alpha = 0.05, ties = "efron", offset = 1, boot = TRUE, n_boot = 1000, seed = 0, nthreads = 0 )
data |
The input data frame that contains the following variables:
|
id |
The name of the id variable in the input data. |
stratum |
The name(s) of the stratum variable(s) in the input data. |
time |
The name of the time variable in the input data. |
event |
The name of the event variable in the input data. |
treat |
The name of the treatment variable in the input data. |
censor_time |
The name of the censor_time variable in the input data. |
pd |
The name of the pd variable in the input data. |
pd_time |
The name of the pd_time variable in the input data. |
swtrt |
The name of the swtrt variable in the input data. |
swtrt_time |
The name of the swtrt_time variable in the input data. |
base_cov |
The names of baseline covariates (excluding treat) in the input data for the outcome Cox model. |
base2_cov |
The names of baseline and secondary baseline covariates (excluding treat) in the input data for the AFT model for post-progression survival. |
aft_dist |
The assumed distribution for time to event for the AFT model. Options include "exponential", "weibull" (default), "loglogistic", and "lognormal". |
strata_main_effect_only |
Whether to only include the strata main
effects in the AFT model. Defaults to |
recensor |
Whether to apply recensoring to counterfactual
survival times. Defaults to |
admin_recensor_only |
Whether to apply recensoring to administrative
censoring times only. Defaults to |
swtrt_control_only |
Whether treatment switching occurred only in
the control group. The default is |
alpha |
The significance level to calculate confidence intervals. |
ties |
The method for handling ties in the Cox model, either "breslow" or "efron" (default). |
offset |
The offset to calculate the time disease progression to
death or censoring. We can set |
boot |
Whether to use bootstrap to obtain the confidence
interval for hazard ratio. Defaults to |
n_boot |
The number of bootstrap samples. |
seed |
The seed to reproduce the bootstrap results. |
nthreads |
The number of threads to use in bootstrapping (0 means the default RcppParallel behavior). |
Assuming one-way switching from control to treatment, the hazard ratio and confidence interval under a no-switching scenario are obtained as follows:
Estimate the causal parameter by fitting an AFT model
comparing post-progression survival between switchers and non-switchers
in the control group who experienced disease progression.
Compute counterfactual survival times for control patients using
the estimated .
Fit a Cox model to the observed survival times for the treatment group and the counterfactual survival times for the control group to estimate the hazard ratio.
When bootstrapping is used, derive the confidence interval and
p-value for the hazard ratio from a t-distribution with
n_boot - 1 degrees of freedom.
If treatment switching occurs before or in the absence of recorded disease progression, the patient is considered to have progressed at the time of treatment switching.
A list with the following components:
psi: The estimated causal parameter for the control group.
psi_CI: The confidence interval for psi.
psi_CI_type: The type of confidence interval for psi,
i.e., "AFT model" or "bootstrap".
pvalue: The two-sided p-value.
pvalue_type: The type of two-sided p-value for treatment effect,
i.e., "Cox model" or "bootstrap".
hr: The estimated hazard ratio from the Cox model.
hr_CI: The confidence interval for hazard ratio.
hr_CI_type: The type of confidence interval for hazard ratio,
either "Cox model" or "bootstrap".
event_summary: A data frame containing the count and percentage
of deaths, disease progressions, and switches by treatment arm.
data_aft: A list of input data for the AFT model by treatment
group. The variables include id, stratum, "pps",
"event", "swtrt", base2_cov, pd_time,
swtrt_time, and time.
fit_aft: A list of fitted AFT models by treatment group.
res_aft: A list of deviance residuals from the AFT models
by treatment group.
data_outcome: The input data for the outcome Cox model
of counterfactual unswitched survival times.
The variables include id, stratum, "t_star",
"d_star", "treated", base_cov, and treat.
km_outcome: The Kaplan-Meier estimates of the survival
functions for the treatment and control groups based on the
counterfactual unswitched survival times.
lr_outcome: The log-rank test results for the treatment
effect based on the counterfactual unswitched survival times.
fit_outcome: The fitted outcome Cox model.
fail: Whether a model fails to converge.
psimissing: Whether the psi parameter cannot be estimated.
settings: A list containing the input parameter values.
psi_trt: The estimated causal parameter for the experimental
group if swtrt_control_only is FALSE.
psi_trt_CI: The confidence interval for psi_trt
if swtrt_control_only is FALSE.
fail_boots: The indicators for failed bootstrap samples
if boot is TRUE.
fail_boots_data: The data for failed bootstrap samples
if boot is TRUE.
hr_boots: The bootstrap hazard ratio estimates
if boot is TRUE.
psi_boots: The bootstrap psi estimates
if boot is TRUE.
psi_trt_boots: The bootstrap psi_trt estimates
if boot is TRUE and swtrt_control_only is
FALSE.
Kaifeng Lu, [email protected]
Nicholas R Latimer, KR Abrams, PC Lambert, MK Crowther, AJ Wailoo, JP Morden, RL Akehurst, and MJ Campbell. Adjusting for treatment switching in randomised controlled trials - A simulation study and a simplified two-stage method. Statistical Methods in Medical Research. 2017;26(2):724-751.
library(dplyr) # modify pd and dpd based on co and dco shilong <- shilong %>% mutate(dpd = ifelse(co & !pd, dco, dpd), pd = ifelse(co & !pd, 1, pd)) %>% mutate(dpd = ifelse(pd & co & dco < dpd, dco, dpd)) # the eventual survival time shilong1 <- shilong %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(-c("ps", "ttc", "tran")) # the last value of time-dependent covariates before pd shilong2 <- shilong %>% filter(pd == 0 | tstart <= dpd) %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(bras.f, id, ps, ttc, tran) # combine baseline and time-dependent covariates shilong3 <- shilong1 %>% left_join(shilong2, by = c("bras.f", "id")) # apply the two-stage method fit1 <- tsesimp( data = shilong3, id = "id", time = "tstop", event = "event", treat = "bras.f", censor_time = "dcut", pd = "pd", pd_time = "dpd", swtrt = "co", swtrt_time = "dco", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f", "ps", "ttc", "tran"), aft_dist = "weibull", alpha = 0.05, recensor = TRUE, swtrt_control_only = FALSE, offset = 1, boot = FALSE) fit1library(dplyr) # modify pd and dpd based on co and dco shilong <- shilong %>% mutate(dpd = ifelse(co & !pd, dco, dpd), pd = ifelse(co & !pd, 1, pd)) %>% mutate(dpd = ifelse(pd & co & dco < dpd, dco, dpd)) # the eventual survival time shilong1 <- shilong %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(-c("ps", "ttc", "tran")) # the last value of time-dependent covariates before pd shilong2 <- shilong %>% filter(pd == 0 | tstart <= dpd) %>% arrange(bras.f, id, tstop) %>% group_by(bras.f, id) %>% slice(n()) %>% select(bras.f, id, ps, ttc, tran) # combine baseline and time-dependent covariates shilong3 <- shilong1 %>% left_join(shilong2, by = c("bras.f", "id")) # apply the two-stage method fit1 <- tsesimp( data = shilong3, id = "id", time = "tstop", event = "event", treat = "bras.f", censor_time = "dcut", pd = "pd", pd_time = "dpd", swtrt = "co", swtrt_time = "dco", base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f"), base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c", "pathway.f", "ps", "ttc", "tran"), aft_dist = "weibull", alpha = 0.05, recensor = TRUE, swtrt_control_only = FALSE, offset = 1, boot = FALSE) fit1
Simulates data for studies involving treatment switching, incorporating time-dependent confounding. The generated data can be used to evaluate methods for handling treatment switching in survival analysis.
tssim( tdxo = FALSE, coxo = TRUE, allocation1 = 1L, allocation2 = 1L, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 8.55e-05, accrualTime = 0L, accrualIntensity = NA_real_, followupTime = NA_real_, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500L, NSim = 1000L, seed = 0L )tssim( tdxo = FALSE, coxo = TRUE, allocation1 = 1L, allocation2 = 1L, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 8.55e-05, accrualTime = 0L, accrualIntensity = NA_real_, followupTime = NA_real_, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500L, NSim = 1000L, seed = 0L )
tdxo |
Logical indicator for timing of treatment switching:
|
coxo |
Logical indicator for arm-specific treatment switching:
|
allocation1 |
Number of subjects in the active treatment group in a randomization block. Defaults to 1 for equal randomization. |
allocation2 |
Number of subjects in the control group in a randomization block. Defaults to 1 for equal randomization. |
p_X_1 |
Probability of poor baseline prognosis in the experimental arm. |
p_X_0 |
Probability of poor baseline prognosis in the control arm. |
rate_T |
Baseline hazard rate for time to death. |
beta1 |
Log hazard ratio for randomized treatment ( |
beta2 |
Log hazard ratio for baseline covariate ( |
gamma0 |
Intercept for the time-dependent covariate model
( |
gamma1 |
Coefficient for lagged treatment switching ( |
gamma2 |
Coefficient for lagged |
gamma3 |
Coefficient for baseline covariate ( |
gamma4 |
Coefficient for randomized treatment ( |
zeta0 |
Intercept for the disease progression model ( |
zeta1 |
Coefficient for |
zeta2 |
Coefficient for baseline covariate ( |
zeta3 |
Coefficient for randomized treatment ( |
alpha0 |
Intercept for the treatment switching model ( |
alpha1 |
Coefficient for |
alpha2 |
Coefficient for baseline covariate ( |
theta1_1 |
Negative log time ratio for |
theta1_0 |
Negative log time ratio for |
theta2 |
Negative log time ratio for |
rate_C |
Hazard rate for random (dropout) censoring. |
accrualTime |
A vector that specifies the starting time of
piecewise Poisson enrollment time intervals. Must start with 0,
e.g., |
accrualIntensity |
A vector of accrual intensities. One for each accrual time interval. |
followupTime |
Follow-up time for a fixed follow-up design. |
fixedFollowup |
Whether a fixed follow-up design is used. Defaults to 0 for variable follow-up. |
plannedTime |
The calendar time for the analysis. |
days |
Number of days in each treatment cycle. |
n |
Number of subjects per simulation. |
NSim |
Number of simulated datasets. |
seed |
Random seed for reproducibility. |
The simulation algorithm is adapted from Xu et al. (2022) and simulates
disease progression status while incorporating the multiplicative
effects of both baseline and time-dependent covariates on survival time.
The design options tdxo and coxo specify
the timing of treatment switching and the study arm eligibility
for switching, respectively. Time is measured in days.
In a fixed follow-up design, all subjects share the same follow-up duration. In contrast, under a variable follow-up design, follow-up time also depends on each subject's enrollment date. The number of treatment cycles for a subject is determined by dividing the follow-up time by the number of days in each cycle.
At randomization, subjects are assigned to treatment arms
using block randomization, with allocation1 patients
assigned to active treatment and allocation2 to control
within each randomized block. A baseline covariate is
also generated for each subject:
The initial survival time is drawn from an exponential distribution with hazard:
We define the event indicator at cycle as
The initial states are set to
, , ,
. For each treatment cycle ,
we set .
Generate time-dependent covariates:
If , set and
, which completes data generation
for subject .
If , set ,
, and perform the following before proceeding to
the next cycle for the subject.
Generate disease progression status:
If ,
Otherwise, set .
Generate alternative therapy status:
If , determine switching eligibility
based on design options.
If switching is allowed:
If switching is now allowed, set .
If , set (once switched to
alternative therapy, remain on alternative therapy).
Update survival time based on changes in alternative therapy status and time-dependent covariates:
Additional random censoring times are generated from an exponential
distribution with hazard rate .
An extra record is generated when the minimum of the latent survival time, the random censoring time, and the administrative censoring time is greater than the number of regular treatment cycles times days per cycle.
Finally we apply the lag function so that and
represent the PD status and alternative therapy status
at the start of cycle (and thus remain appplicable for the
entire cycle ) for subject .
To estimate the true treatment effect in a Cox marginal
structural model, one can set , which
effectively forces (disabling treatment switching).
The coefficient for the randomized treatment can then be estimated
using a Cox proportional hazards model.
A list of data frames, each containing simulated longitudinal covariate, pd status, alternative therapy status, and event history data with the following variables:
id: Subject identifier.
arrival_time: The enrollment time for the subject.
trtrand: Randomized treatment assignment (0 = control,
1 = experimental)
bprog: Baseline prognosis (0 = good, 1 = poor).
tpoint: Treatment cycle index.
tstart: Start day of the treatment cycle.
tstop: End day of the treatment cycle.
L: Time-dependent covariate at tstart predicting
survival and switching; affected by treatment switching.
Llag: Lagged value of L.
Z: Disease progression status at tstart.
A: Treatment switching status at tstart.
Alag: Lagged value of A.
event: Death indicator at tstop.
timeOS: Observed time to death or censoring.
died: Indicator of death by end of follow-up.
progressed: Indicator of disease progression by end of
follow-up.
timePD: Observed time to progression or censoring.
xo: Indicator for whether treatment switching occurred.
xotime: Time of treatment switching (if applicable).
censor_time: Administrative censoring time.
Kaifeng Lu, [email protected]
Jessica G. Young, and Eric J. Tchetgen Tchetgen. Simulation from a known Cox MSM using standard parametric models for the g-formula. Statistics in Medicine. 2014;33(6):1001-1014.
NR Latimer, IR White, K Tilling, and U Siebert. Improved two-stage estimation to adjust for treatment switching in randomised trials: g-estimation to address time-dependent confounding. Statistical Methods in Medical Research. 2020;29(10):2900-2918.
Jing Xu, Guohui Liu, and Bingxia Wang. Bias and type I error control in correcting treatment effect for treatment switching using marginal structural models in Phse III oncology trials. Journal of Biopharmaceutical Statistics. 2022;32(6):897-914.
library(dplyr) simulated.data <- tssim( tdxo = 1, coxo = 1, allocation1 = 1, allocation2 = 1, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 0.0000855, accrualIntensity = 20/30, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500, NSim = 100, seed = 314159) simulated.data[[1]] %>% filter(id == 1)library(dplyr) simulated.data <- tssim( tdxo = 1, coxo = 1, allocation1 = 1, allocation2 = 1, p_X_1 = 0.3, p_X_0 = 0.3, rate_T = 0.002, beta1 = -0.5, beta2 = 0.3, gamma0 = 0.3, gamma1 = -0.9, gamma2 = 0.7, gamma3 = 1.1, gamma4 = -0.8, zeta0 = -3.5, zeta1 = 0.5, zeta2 = 0.2, zeta3 = -0.4, alpha0 = 0.5, alpha1 = 0.5, alpha2 = 0.4, theta1_1 = -0.4, theta1_0 = -0.4, theta2 = 0.2, rate_C = 0.0000855, accrualIntensity = 20/30, fixedFollowup = FALSE, plannedTime = 1350, days = 30, n = 500, NSim = 100, seed = 314159) simulated.data[[1]] %>% filter(id == 1)
Obtains the scaled Schoenfeld residuals and tests the proportional hazards assumption using a score test for the interaction between each covariate and a transformed time variable.
zph_phregr(object, transform = "km")zph_phregr(object, transform = "km")
object |
The output from the |
transform |
A character string indicating how survival times should be transformed before the test is performed. Supported values include "identity", "log", "rank", and "km" (default). |
This corresponds to the cox.zph function from the survival
package with terms = FALSE and global = TRUE.
A list with the following components:
table A matrix with one row for each parameter and a final
row for the global test. The columns contain the score test
for adding the time-dependent term, the degrees of freedom,
and the two-sided p-value.
x The transformed time values.
time The original (untransformed) event times, with tied event
times repeated.
strata The stratum index for each event.
y The matrix of scaled Schoenfeld residuals, with one column
for each parameter and one row for each event. Column names correspond
to the parameter names.
var An approximate covariance matrix of the scaled Schoenfeld
residuals, used to construct an approximate standard error band for
plots.
transform the transformation applied to the time values.
Kaifeng Lu, [email protected]
Patricia M. Grambsch and Terry M. Therneau. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994; 81:515-26.
fit <- phregr(data = liver, time = "Time", event = "Status", covariates = c("log(Bilirubin)", "log(Protime)", "log(Albumin)", "Age", "Edema"), ties = "breslow") zph <- zph_phregr(fit, transform = "log") zph$tablefit <- phregr(data = liver, time = "Time", event = "Status", covariates = c("log(Bilirubin)", "log(Protime)", "log(Albumin)", "Age", "Edema"), ties = "breslow") zph <- zph_phregr(fit, transform = "log") zph$table