Help Center

R Language

How to fit proportional hazards model in R?

Fit Proportional Hazards Regression Model

Description

Fits a Cox proportional hazards regression model. Time dependent variables, time dependent strata, multiple events per subject, and other extensions are incorporated using the counting process formulation of Andersen and Gill.

Examples

# Create the simplest test data set

test1 <- list(time=c(4,3,1,1,2,2,3),

status=c(1,1,1,0,1,1,0),

x=c(0,2,1,1,1,0,0),

sex=c(0,0,0,0,1,1,1))

# Fit a stratified model

coxph(Surv(time, status) ~ x + strata(sex), test1)

# Create a simple data set for a time-dependent model

test2 <- list(start=c(1,2,5,2,1,7,3,4,8,8),

stop=c(2,3,6,7,8,9,9,9,14,17),

event=c(1,1,1,1,1,1,1,0,0,0),

x=c(1,0,0,1,0,1,1,1,0,0))

summary(coxph(Surv(start, stop, event) ~ x, test2))

# Create a simple data set for a time-dependent model

test2 <- list(start=c(1, 2, 5, 2, 1, 7, 3, 4, 8, 8),

stop =c(2, 3, 6, 7, 8, 9, 9, 9,14,17),

event=c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0),

x =c(1, 0, 0, 1, 0, 1, 1, 1, 0, 0) )

summary( coxph( Surv(start, stop, event) ~ x, test2))

# Fit a stratified model, clustered on patients

bladder1 <- bladder[bladder$enum < 5, ]

coxph(Surv(stop, event) ~ (rx + size + number) * strata(enum),

cluster = id, bladder1)

# Fit a time transform model using current age

coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,

tt=function(x,t,...) pspline(x + t/365.25))

Usage

coxph(formula, data=, weights, subset,

na.action, init, control,

ties=c("efron","breslow","exact"),

singular.ok=TRUE, robust,

model=FALSE, x=FALSE, y=TRUE, tt, method=ties,

id, cluster, istate, statedata, nocenter=c(-1, 0, 1), ...)

Arguments

formula

a formula object, with the response on the left of a ~ operator, and the terms on the right. The response must be a survival object as returned by the Surv function.

data

a data.frame in which to interpret the variables named in the formula, or in the subset and the weights argument.

weights

vector of case weights, see the note below. For a thorough discussion of these see the book by Therneau and Grambsch.

subset

expression indicating which subset of the rows of data should be used in the fit. All observations are included by default.

na.action

a missing-data filter function. This is applied to the model.frame after any subset argument has been used. Default is options()\$na.action.

init

vector of initial values of the iteration. Default initial value is zero for all variables.

control

Object of class coxph.control specifying iteration limit and other control options. Default is coxph.control(...).

ties

a character string specifying the method for tie handling. If there are no tied death times all the methods are equivalent. Nearly all Cox regression programs use the Breslow method by default, but not this one. The Efron approximation is used as the default here, it is more accurate when dealing with tied death times, and is as efficient computationally. The “exact partial likelihood” is equivalent to a conditional logistic model, and is appropriate when the times are a small set of discrete values. See further below.

singular.ok

logical value indicating how to handle collinearity in the model matrix. If TRUE, the program will automatically skip over columns of the X matrix that are linear combinations of earlier columns. In this case the coefficients for such columns will be NA, and the variance matrix will contain zeros. For ancillary calculations, such as the linear predictor, the missing coefficients are treated as zeros.

robust

should a robust variance be computed. The default is TRUE if: there is a cluster argument, there are case weights that are not 0 or 1, or there are id values with more than one event.

optional variable name that identifies subjects. Only necessary when a subject can have multiple rows in the data, and there is more than one event type. This variable will normally be found in data.

cluster

optional variable which clusters the observations, for the purposes of a robust variance. If present, it implies robust. This variable will normally be found in data.

istate

optional variable giving the current state at the start each interval. This variable will normally be found in data.

statedata

optional data set used to describe multistate models.

model

logical value: if TRUE, the model frame is returned in component model.

logical value: if TRUE, the x matrix is returned in component x.

logical value: if TRUE, the response vector is returned in component y.

optional list of time-transform functions.

method

alternate name for the ties argument.

nocenter

columns of the X matrix whose values lie strictly within this set are not recentered. Remember that a factor variable becomes a set of 0/1 columns.

...

Other arguments will be passed to coxph.control

Details

The proportional hazards model is usually expressed in terms of a single survival time value for each person, with possible censoring. Andersen and Gill reformulated the same problem as a counting process; as time marches onward we observe the events for a subject, rather like watching a Geiger counter. The data for a subject is presented as multiple rows or "observations", each of which applies to an interval of observation (start, stop].

The routine internally scales and centers data to avoid overflow in the argument to the exponential function. These actions do not change the result, but lead to more numerical stability. Any column of the X matrix whose values lie within nocenter list are not recentered. The practical consequence of the default is to not recenter dummy variables corresponding to factors. However, arguments to offset are not scaled since there are situations where a large offset value is a purposefully used. In general, however, users should not avoid very large numeric values for an offset due to possible loss of precision in the estimates.