\documentclass{article}[11pt]
\usepackage{Sweave}
\usepackage{amsmath}
\addtolength{\textwidth}{1in}
\addtolength{\oddsidemargin}{-.5in}
\setlength{\evensidemargin}{\oddsidemargin}
%\VignetteIndexEntry{Multi-state models and competing risks}

\SweaveOpts{keep.source=TRUE, fig=FALSE}
% Ross Ihaka suggestions
\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em}
\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em}
\fvset{listparameters={\setlength{\topsep}{0pt}}}
\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}}

% I had been putting figures in the figures/ directory, but the standard
%  R build script does not copy it and then R CMD check fails
\SweaveOpts{prefix.string=compete,width=6,height=4}
\newcommand{\myfig}[1]{\includegraphics[height=!, width=\textwidth]
                        {compete-#1.pdf}}
\setkeys{Gin}{width=\textwidth}
<<echo=FALSE>>=
options(continue="  ", width=60)
options(SweaveHooks=list(fig=function() par(mar=c(4.1, 4.1, .3, 1.1))))
pdf.options(pointsize=10) #text in graph about the same as regular text
options(contrasts=c("contr.treatment", "contr.poly")) #ensure default

require("survival")
@

\title{Multi-state models and competing risks}
\author{Terry Therneau \and Cynthia Crowson \and Elizabeth Atkinson}
\newcommand{\code}[1]{\texttt{#1}}

\begin{document}
\maketitle

\section{Multi-state models}
\begin{figure}

<<sfig1, fig=TRUE, echo=FALSE>>=
# A note to readers of this code: drawing multi-state figures in this
#  way via polygon and arrows statements is a major PITA.  Don't mimic
#  the code below, instead do yourself a favor and use a package
#  designed for the task such as diagram, DiagrammeR, shape or Gmisc.
# Survival is a recommended package that is included by lots of others so
#  I try to limit dependencies in the survival vignettes.
#
par(mar=c(.1, .1, .1, .1))
frame()
oldpar <- par(usr=c(0,100,0,100))
# first figure
xx <- c(0, 10, 10, 0)
yy <- c(0, 0, 10, 10)
polygon(xx +10, yy+70)
polygon(xx +30, yy+70)
arrows( 22, 75, 28, 75, length=.1)
text(c(15, 35), c(75,75), c("Alive", "Dead"))

# second figure
polygon(xx +60, yy+70)  
for (j in c(55, 70, 85)) {
    polygon(xx +80, yy+j)
    arrows(72, (5*75 +j+5)/6, 78, (100+j*5)/6, length=.1)
}
text(c(65, 85,85,85), c(70,55,70,85)+5, c("A", "D3", "D2", "D1")) 

# third figure
polygon(xx+10, yy+25)
for (j in c(15,35)) {
    polygon(xx +30, yy+j)
    arrows(22, (5*30 +j+4)/6, 28, (54+j*5)/6, length=.1)
}
arrows(28, 2+(65 + 35*5)/6, 22, 2+ (160 + 40)/6, length=.1)
arrows(35, 33, 35, 27, length=.1)
text(c(15, 35,35), c(30, 20, 40), c("Health", "Death", "Illness"))

# fourth
for (i in c(50, 68)) polygon(xx+i, yy+25)
arrows(62, 30, 67, 30, length=.1)
arrows(80, 30, 84, 30, length=.1)
text(90, 30, "...", cex=2)
text(c(55, 73), c(30, 30), c("0", "1"))
par(oldpar)
@
  \caption{Four multi-state models.  The upper left panel depicts simple survival,
 the upper right is an example of competing risks, 
 the lower left panel is a multi-state
 illness-death model, and the lower right panel depicts sequential events.}
 \label{mfig1}
\end{figure}

A multi-state model is used to model a process where subjects transition from one state to the next. For instance, a standard survival curve can be thought of as a simple multi-state model with two states (alive and dead) and one transition between those two states.  A diagram illustrating this process is shown in the top left corner of figure \ref{mfig1}. In these types of diagrams, each box is a state and each arrow is a possible transition. The top right diagram depicts a classic competing risk analysis, where
all subjects start on the left and each subject can make a single transition
to one of 3 terminal states.
The bottom left diagram shows a common multi-state situation known as
the illness-death model with recovery.
Finally, the lower right diagram represents sequential events, such as repeated
infections in the CGD study. In that case one subject had 8
events so there are 9 states corresponding to entry into the study 
(0 infections) and the first, second, \ldots, eighth events.

As will be shown below, there are often multiple choices for the
state and transition diagram, and for some data sets it is revealing to
look at a problem from multiple views. In addition to deciding the diagram that best matches the research questions, the two other primary decisions are the choice of time scale for
the fits, e.g., time from entry in the study vs. time from entry in
the state, and what covariates will be used.

\section{Multi-state curves}
\subsection{Aalen-Johansen estimate}
As a starting point for the analysis, it is important to compute and plot 
estimates of $p(t)$, which is a vector containing the probability of being
in each of the states at time $t$.
If there is no censoring then $p$ becomes a simple tabulation at time
$t$ of all the states.
For the general case, we compute this using the Aalen-Johansen estimate via the 
\code{survfit} function.

Mathematically the estimate is simple.
For each unique time where an event occurs, form the
transition matrix $T(t)$ with elements or rates of 
 $\lambda_{ij}(t) =$ the fraction of subjects who transition from
state $i$ to $j$ at time $t$, among those in state $i$ just prior to 
time $t$.
($T$ is equal to the identity matrix at any time point without an
observed transition.) 
Then
\begin{equation}
  p(t) = p(0) \prod_{s \le t} T(s)  \label{AJ}
\end{equation}
where $p(0)$ is the initial distribution of subjects.

Let's work this out for the simple two-state alive $\rightarrow$ {death} model.
Let $n(t)$ be the number of subjects still at risk at time $t$ and
$d(t)$ the number of deaths at that time point.
All subjects start in the alive state and thus $p(0) = (1,0)$ and
the transition matrix is 
\begin{equation*}
  T(s) = \left( \begin{array}{cc} 
    \frac{n(s)- d(s)}{n(s)}  & \frac{n(s)}{d(s)} \\\\ 0 & 1 \end{array}
  \right)
\end{equation*}
The second row corresponds to the fact that death is an absorbing state.

Writing out the matrices for the first few transitions and multiplying
them leads to
\begin{equation}
  p_1(t) = \prod_{s \le t} \left[n(s) - d(s)\right] /n(s) \label{km}
\end{equation}
which we recognize as the Kaplan-Meier estimate of survival.
For the two state alive-dead model the Aalen-Johansen (AJ) estimate
has reprised the KM.
In the competing risks case $p(t)$ 
has an alternate form known as the \emph{cumulative incidence} (CI) function
\begin{equation}
  CI_k(t) = \int_0^t \lambda_k(u) S(u-) du \label{cuminc}
\end{equation}
where $\lambda_k$ is the incidence function for outcome $k$, and $S$ is the
overall survival curve for ``time to any endpoint''.  
(The label ``cumulative incidence''  is one of the more unfortunate ones in 
the survival lexicon,
since we normally use `incidence' and `hazard' as interchangeable synonyms but 
the CI is \emph{not} a cumulative hazard.)
Repeating the same matrix exercise for the competing risks, i.e. writing
out the Aalen-Johansen computation, exactly recovers the CI formula.
The CI is also a special case of the Aalen-Johansen.
The AJ estimate is very flexible; subjects can visit multiple states
during the course of a study, subjects can start after time 0 (delayed
entry), and they can start in any of the states. 
The \code{survfit} function implements the AJ estimate and 
will handle all these cases.

The standard error of the estimates is computed using an
infinitesimal jackknife.
Let $D(t)$ be a matrix with one row per subject and one column per state.
Each row contains the \emph{change} in $p(t)$ corresponding to subject $i$,
i.e., the derivative of $p$ with respect to the $i$th subject's case
weight $dp(t)/dw_i$. Then $V(t) = D'WD$ is the estimated variance-covariance
matrix of the estimates at time $t$, where $W$ is a diagonal matrix of
observation weights.
If a single subject is represented by multiple rows in the data set, then
$D$ is first collapsed to have one row per subject, the new row for subject
$i$ is the sum of the rows for the observations that represented the 
subject.  This is essentially the same algorithm as the robust variance for
a Cox model.
For simple two state alive -> dead model, the IJ estimate of variance 
is identical to the traditional Greenwood estimate for the variance of the
surivival curve $S$. 
(This was a surprise when we first observed it; proving the equivalence
was not straightforward.)


The $p(t)$ vector obeys the obvious constraint that its sum at 
any time
is equal to one; each person has to be somewhere.
I originally chose to label this as the \emph{current prevalence} estimate, 
since it estimates what fraction of the subjects are in any
given state across time.  
However the word ``prevalence'' is certain to generate confusion whenever
death is one of the states, due to its traditional use as the fraction of
living subjects who have a particular condition.
We will use the phrase \emph{probability in state} or simply $p$
from this point forward.

In the simple two state model Pr(alive) is the 
usual KM survival estimate, and we have 
$p_1(t) = 1- p2(t)$, Pr(alive) = 1 - Pr(dead).  
Plots for the 2 state case sometimes choose to show Pr(alive)
and sometimes Pr(dead). Which one is used often depends on a historical
whim of the disease specialty; 
cardiology journals for instance quite often use Pr(event) resulting in curves
that rise starting from zero, 
while oncology journals invariably use Pr(alive) giving curves that fall 
downhill from 1.
The \code{survfit} routine's historical default for the 2 state case is to 
print and plot Pr(alive)= $p_1(t)$, which reflects that the
author of the routine was working primarily in cancer trials at the time 
said default was chosen.

For simple survival we have gotten used to the idea of using Pr(dead) and
Pr(alive) interchangeably, but that habit needs to be left behind for
multi-state models, as curves of $1-p_k(t)$ 
= probability(any other state than $k$) are not useful.
In the multi-state case, individual curves can go both up and down.
For competing risks the curve for the initial state (leftmost
in the diagram) is rarely included in the final plot. Since the curves sum to
1, the full set is redundant.  Pr(nothing yet) is usually the least 
interesting of the set and so it is left off to make the plot less busy.
The remaining curves in the competing risks case rise from 0.
(This bothers some researchers as it 'just looks wrong' to them.)

\subsection{Examples}

\begin{figure}
<<crfig2, echo=FALSE, fig=TRUE>>=
par(mar=c(.1, .1, .1, .1))
frame()
oldpar <- par(usr=c(0,100,0,100))
# first figure
xx <- c(0, 10, 10, 0)
yy <- c(0, 0, 10, 10)
polygon(xx +10, yy+70) 
temp <- c(60, 80) 
for (j in 1:2) {
    polygon(xx + 30, yy+ temp[j])
    arrows(22, 70 + 3*j, 28, temp[j] +5, length=.1)
}
text(c(15, 35, 35), c(75, 65, 85),c("Entry", "Death", "PCM")) 
text(25, 55, "Competing Risk")

# Second figure
polygon(xx +60, yy+70)  
for (j in 1:2) {
    polygon(xx + 80, yy+ temp[j])
    arrows(72, 70+ 3*j, 78, temp[j] +5, length=.1)
}
text(50+ c(15, 35, 35), c(75, 65, 85),c("Entry", "Death", "PCM")) 
arrows(85, 78, 85, 72, length=.1)
text(75, 55, "Multi-state 1")

# third figure
polygon(xx+10, yy+25)
temp <- c(15, 35) 
for (j in 1:2) {
    polygon(2*xx +30, yy + temp[j])
    arrows(22, 25 + 3*j, 28, temp[j] +5, length=.1)
}
text(c(15, 40, 40), c(30, 20, 40),c("Entry", "Death w/o PCM", "PCM")) 
polygon(2*xx + 60, yy+temp[2])
arrows(52, 40, 58, 40, length=.1)
text(70, 40, "Death after PCM")
text(40, 10, "Multi-state 2")
@ 
\caption{Three models for the MGUS data.}
\label{mfig2}
\end{figure} 

Start with a simple competing risks problem as illustrated in the first diagram of figure \ref{mfig2}.
The \code{mgus2} data set contains the time to plasma cell malignancy (PCM)
and/or death
for 1384 subjects diagnosed with monoclonal gammopathy of undetermined
significance (MGUS).  Survival and progression time are in months. 
The code below creates an ordinary Kaplan-Meier curve of
post-diagnosis survival for these subjects, along with a histogram of
age at diagnosis.
The mean age at diagnosis is just over 70 years.

<<mgus1, fig=TRUE>>=
oldpar <- par(mfrow=c(1,2))
hist(mgus2$age, nclass=30, main='', xlab="Age")
with(mgus2, tapply(age, sex, mean))

mfit1 <- survfit(Surv(futime, death) ~ sex, data=mgus2)
mfit1
plot(mfit1, col=c(1,2), xscale=12, mark.time=FALSE, lwd=2,
     xlab="Years post diagnosis", ylab="Survival")
legend("topright", c("female", "male"), col=1:2, lwd=2, bty='n')
par(oldpar)
@ 

The xscale and yscale arguments to \code{plot.survfit} affect only the
axis labels, not the data.  Further additions to the plot region
such as \code{legend}, \code{lines}, or \code{text} remain in the 
original scale.  This simplifies programmatic additions such as adding
another curve to the plot, while making interactive additions such as a 
legend somewhat less simple.

As a second model for these subjects we will use competing risks
with PCM and death without malignancy
as the two terminal states, as shown in the upper left of figure \ref{mfig2}.
For this model we are only interested in the first event for each
subject.
Formally we are treating progression to a PCM
as an \emph{absorbing state}, i.e., one 
that subjects never exit.  
We create a variable \code{etime} containing the time to 
the first of progression, death, or last follow-up along with an event
variable that contains the outcome.
The starting data set \code{mgus2} has two pairs of variables
\code{(ptime, pstat)} that contain the time to progression
and \code{(futime, status)} that contain the time to death or last known
alive.
The code below creates the necessary \code{etime} and \code{event}
variables, then computes and plots the competing risks estimate.

<<mgus2, echo=TRUE, fig=TRUE>>=
etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
event <- factor(event, 0:2, labels=c("censor", "pcm", "death"))
table(event)

mfit2 <- survfit(Surv(etime, event) ~ sex, data=mgus2)
print(mfit2, rmean=240, scale=12)
mfit2$transitions

plot(mfit2, col=c(1,2,1,2), lty=c(2,2,1,1),
     mark.time=FALSE, lwd=2,  xscale=12,
     xlab="Years post diagnosis", ylab="Probability in State")
legend(240, .6, c("death:female", "death:male", "pcm:female", "pcm:male"), 
       col=c(1,2,1,2), lty=c(1,1,2,2), lwd=2, bty='n')
@ 

The \code{mfit2} call is nearly identical to that for an ordinary Kaplan-Meier, 
with the exception of the \code{event} variable.
\begin{enumerate}
  \item The event variable was created as a \emph{factor}, 
    whereas for ordinary single
    state survival the status is either 0/1 or TRUE/FALSE.  
    The first level of the factor must be censoring, which is the status code
    for those whose follow-up terminated without reaching a new endpoint.
    Codes for the remaining states can be in any order. The labels for the
    states are unrestricted, e.g., the first one does not have to be ``censor''.
    (It will however be treated as censoring, whatever the name.)
  \item A simple print of the \code{mfit2} object shows the order in
    which the curves will be displayed.  This information was used to 
    choose the line types and colors for the curves.
  \item The \code{mfit2} object contains curves for all the states, but 
    by default the entry state will not be plotted.
    The remaining curves all start at 0.
  \item The transitions component of the result is useful as a
    data check, e.g., if it showed a transition from death to PCM.
  \item Each subject's initial state is specified by the \code{istate}
    argument.  When this is omitted all subjects 
    are assumed to start from an entry state named `` '' (the empty
    string), as seen in the printout above.
\end{enumerate}

The printout shows that a male subject will spend, on average,
8.7 of his first 20 years post diagnosis in the entry state, 1.1
years in the PCM state and 10.3 of those 20 in the death state.
If a cutoff time is not given the default is to use the maximum observed
time for all curves, which is 424 months in this case.

The result of a multi-state \code{survfit} is a matrix of probabilities with
one row per time and one column per state.
First are the states found in the event variable (excluding
censoring) and then the states found in the \code{istate} variable,
removing any duplicates.  
By default any unnamed state is not plotted -- point 3 above --
for the simple reason that multiple event curves can 
very quickly get overcrowded with all the multiple lines.
Since the three MGUS states of entry/pcm/death must sum to 1 at any
given time (everyone has to be somewhere), one of the
three curves is redundant and the ``fraction still in the entry state''
curve is normally the least interesting. 
One can easily add this last state to the plot if desired, e.g.,
\code{lines(mfit2[,3], col=4, lty=1:2)}, since entry is the third state
in the printout.
(One can use, e.g., \code{mfit2[, 'pcm']} to select a state as
well, but an empty string does not work as the subscript.)

A common mistake with competing risks is to use the Kaplan-Meier
separately on each
event type while treating other event types as censored.
The next plot is an example of this for the PCM endpoint.
<<mgus3, fig=TRUE>>=
pcmbad <- survfit(Surv(etime, pstat) ~ sex, data=mgus2)
plot(pcmbad[2], mark.time=FALSE, lwd=2, fun="event", conf.int=FALSE, xscale=12,
     xlab="Years post diagnosis", ylab="Fraction with PCM")
lines(mfit2[2,1], lty=2, lwd=2, mark.time=FALSE, conf.int=FALSE)
legend(0, .25, c("Males, PCM, incorrect curve", "Males, PCM, competing risk"),
       col=1, lwd=2, lty=c(1,2), bty='n')
@ 

There are two problems with the \code{pcmbad} fit.  
The first is that it attempts to estimate the expected occurence of
plasma cell malignancy (PCM)
if all other causes of death were to be disallowed.
In this hypothetical world it is indeed true that many more subjects would
progress to PCM (the incorrect curve is higher), but it is also 
not a world that any of us will ever inhabit.
This author views the result in much the same light as discussions of
survival after the zombie apocalypse.
The second problem is that the computation for this
hypothetical case is only correct if all of the competing endpoints
are independent, a situation which is almost never true.
We thus have an unreliable estimate of an uninteresting quantity.
The competing risks curve, on the other hand,
estimates the fraction of MGUS subjects who \emph{will experience} 
PCM, a quantity sometimes known as the lifetime risk,
and one which is actually observable.

The last example chose to plot only a subset of the curves, something that is
often desirable in competing risks problems to avoid a
``tangle of yarn'' plot that simply has too many elements.
This is done by subscripting the \code{survfit} object.
For subscripting, multi-state curves behave as a matrix
with the outcomes as the second subscript. 
The columns are in order of the levels of \code{event},
i.e., as displayed by our earlier call to \code{table(event)}.   
The first subscript indexes the groups formed by the right hand side of
the model formula, and will be in the same order as simple survival curves.
Thus \code{mfit2[2,1]} corresponds to males (2) and the PCM endpoint (1).
Curves are listed and plotted in the usual matrix order of R.

A third example using the MGUS data treats it as a multi-state model
and it shown in the upper right of figure \ref{mfig2}.
In this version a subject can have multiple transitions and thus multiple
rows in the data set.  In this case it is necessary to identify which data rows go
with which subject via the \code{id} argument of \code{survfit};
valid estimates of the curves and their standard errors both depend on this.
Our model looks like the illness-death model of figure \ref{mfig1} but
with ``PCM'' as the upper state and no arrow for
a return from that state to health.  
The necessary data set will have two rows for any subject who has further
follow-up after a PCM and a single row for all others.
The data set is created below using the \code{tmerge} function, which is
discussed in detail in another vignette.

We need to decide what to do with the 9 subjects who have PCM
and death declared at the same month. 
(Some of these were cancer cases discovered at autopsy.)
They slipped through without comment in the earlier competing risks analysis;
only when setting up this second data set did we notice the ties.  Looking
back at the code, the prior example counted these subjects as a progression. 
In retrospect this is a defensible choice: even though
undetected before death, 
the disease must have been present for some amount of time previous and
so progression did occur first.
For the multi-state model we need to be explicit in how this is
coded since a sojourn time of 0 within a state is not allowed.
Below we push the progression time back by .1 month when there is a tie, but
that amount is entirely arbitrary.

<<mgus4, echo=TRUE>>=
ptemp <- with(mgus2, ifelse(ptime==futime & pstat==1, ptime-.1, ptime))
newdata <- tmerge(mgus2, mgus2,  id=id, death=event(futime, death),
                  pcm = event(ptemp, pstat))
newdata <- tmerge(newdata, newdata, id, enum=cumtdc(tstart))
with(newdata, table(death, pcm))
@ 
The table above shows that there are no observations in \code{newdata}
that have both a PCM and death, i.e., the ties have been resolved.
The last \code{tmerge} line above creates a variable \code{enum} which
simply counts rows for each person; it will be used later.

<<mgus4g, fig=TRUE>>=
temp <- with(newdata, ifelse(death==1, 2, pcm))
newdata$event <- factor(temp, 0:2, labels=c("censor", "pcm", "death"))  
mfit3 <- survfit(Surv(tstart, tstop, event) ~ sex, data=newdata, id=id)
print(mfit3, rmean=240, digits=2)
mfit3$transitions
plot(mfit3[,1], mark.time=FALSE, col=1:2, lty=1:2, lwd=2,
     xscale=12,
     xlab="Years post MGUS diagnosis", ylab="Fraction in the PCM state")
legend(48, .04, c("female", "male"), lty=1:2, col=1:2, lwd=2, bty='n') 
@ 
This plot is quite different in that it shows the fraction of subjects
\emph{currently} in the PCM state. 
Looking at our multi-state diagram this is the fraction
of subjects in the upper right PCM box.
The curve goes up whenever someone enters the box (progression) and down when
they leave (death).
Myeloma survival was quite short during the era of this study and the 
proportion currently in the PCM state rarely rises above 2 percent.

The result of \code{print(mfit3)} reveals, as expected, less time spent
in the PCM state.  In the prior \code{mfit2} model, subjects who enter
that state remain there for the duration; in this one they quickly
pass through.
It is worthwhile to check the \code{transitions} table in the output 
simply as a data check. In this case it shows subjects going from the
entry (unnamed) state to PCM and death along with transitions from PCM
to death.  This is as expected.  
An error in creating the input data can lead to surprising counts
and an even more surprising curve.

We have often found the three curve display below useful in the case
of a transient state.
It combines the results from competing risk model used above
along with a second fit that treats death
after PCM as a separate state from death before progression,
the \emph{multi-state 2} model of figure \ref{mfig2}.
In this plot the fraction of subjects currently in the PCM state
is shown by the distance between the two curves.
Only males are shown in the plot to minimize overlap.
<<mgus5, echo=TRUE, fig=TRUE>>=
# Death after PCM will correspond to data rows with
#  enum = 2 and event = death 
d2 <- with(newdata, ifelse(enum==2 & event=='death', 4, as.numeric(event)))
e2 <- factor(d2, labels=c("censor", "pcm", "death w/o pcm", 
                          "death after pcm"))
mfit4 <- survfit(Surv(tstart, tstop, e2) ~ sex, data=newdata, id=id)
plot(mfit2[2,], lty=c(1,2),
     xscale=12, mark.time=FALSE, lwd=2, 
     xlab="Years post diagnosis", ylab="Probability in State")
lines(mfit4[2,3], mark.time=FALSE, col=2, lty=1, lwd=2,
      conf.int=FALSE)

legend(200, .5, c("Death w/o PCM", "ever PCM", 
                  "Death after PCM"), col=c(1,1,2), lty=c(2,1,1), 
             lwd=2, bty='n', cex=.82)
@ 

\subsection{Further notes}
The Aalen-Johansen method used by \code{survfit} does not account for 
interval censoring, also known as panel data, 
where a subject's current state is recorded at some fixed time such as a
medical center visit but the actual times of transitions are unknown.
Such data requires further assumptions about the transition process in
order to model the outcomes and has a more complex likelihood. 
The \code{msm} package, for instance, deals with data of this type.
If subjects reliably come in at regular intervals then the 
difference between the two results can be small, e.g., the 
\code{msm} routine estimates time until progression \emph{occurred}
whereas \code{survfit} estimates time until progression was \emph{observed}.

\begin{itemize}
  \item When using multi-state data to create Aalen-Johansen estimates, 
    individuals
    are not allowed to have gaps in the middle of their time line.
    An example of this would be a data set with 
    (0, 30, pcm] and (50,70, death] as the two observations
    for a subject where the time from 30-70 is not accounted for.
  \item Subjects must stay in the same group over their entire observation
    time, i.e., variables on the right hand side of the equation cannot be
    time-dependent. 
  \item A transition to the same state is allowed, e.g., observations of
    (0,50, 1], (50, 75, 3], (75, 89, 4], (89, 93, 4] and (93, 100, 4] 
    for a subject who goes
    from entry to state 1, then to state 3, and finally to state 4.
    However, a warning message is issued for the data set in this case, since
    stuttering may instead be the result of a coding mistake.
    The same result is obtained if 
    the last three observations were collapsed to a single row of 
    (75, 100, 4].
\end{itemize}

\section{Rate models}
For simple two-state survival, the Cox model leads to three relationships
\begin{align}
  \lambda(t)  &= \lambda_0(t) e^{X\beta}  \label{hazard} \\
  \Lambda(t)  &= \Lambda_0(t) e^{X\beta} \label{cumhaz}\\
  S(t)       &= \exp(-\Lambda(t))  \label{surv}
  \end{align}
where $\lambda$, $\Lambda$ and $S$ are the hazard, cumulative hazard
and survival functions, respectively.
There is a single linear predictor which governs
both the rate $\lambda$ (the arrow in figure \ref{mfig1}) and probability
of residing in the left hand box of the figure.
For multi-state models this simplicity no longer holds; proportional
hazards does not lead to proportional $p(t)$ curves.
The task before us is more complex.

The analysis of multi-state data has four
key steps.  In order of importance:
\begin{enumerate}
  \item Draw a box and arrow figure describing the model.
  \item Think through the rates (arrows).
    \begin{enumerate}
      \item Which covariates should be attached to each rate? 
        Sometimes a covariate is important for one transition, but not for
        another.
      \item For which transitions should one or more of the covariates be 
        constrained to have the same coefficient? 
        Sometimes there will be a biologic rationale for this.  For other 
        studies an equivalence is forced simply because we have too many 
        unknowns and cannot accommodate them all.
        (This is the often the same reason that models
        contain very few interaction terms).
      \item Which, if any, of the transitions should share the same baseline 
          hazard?  Most of the time the baseline rates are all assumed to
          be different.
       \item Should there be random effects, and if so what is an appropriate
         correlation structure? 
         Do some pairs of transitions have a shared effect,  
         some pairs separate effects and others no random effect?   
         Mixed effects Cox models tend to need larger sample size --- does 
         the data set have enough events? 
    \end{enumerate}
  \item Build an appropriate data set.
  \item Fit the data.  Examine multiple summaries of the
    model fit, including the predicted occupancy curves.
\end{enumerate}
Step 1 is key to the entire endeavor.
We saw in figure \ref{mfig2} and the examples above that multiple views of a
multi-state process can be useful, and this will hold for modeling as well.
Step 3 will often
be the one that demands the most attention to detail.

\subsection{MGUS example}
Start with the simplest model for the MGUS data: 
a competing risks model (upper left diagram of figure \ref{mfig2}), distinct baseline
hazards for the two rates, no shared coefficients, and three covariates.
<<cfit1>>=
options(show.signif.stars = FALSE)  # display intelligence
cfit2 <- coxph(Surv(etime, event=="death") ~ age + sex + mspike, mgus2)
summary(cfit2, scale=c(10, 1, 1))   # scale age in decades 

@ 
The effect of age and sex on non-PCM mortality is profound, which is not
a surprise given the median starting age of \Sexpr{median(mgus2$age)}. Risk rises \Sexpr{round(exp(10*coef(cfit2)[1]),1)} fold per decade of age and
the death rate for males is \Sexpr{round(exp(coef(cfit2)[2]),1)} times as great
as that for females.  
The size of the serum monoclonal spike has almost no impact on non-PCM 
mortality.
A 1 unit increase changes mortality by only 6\%.

<<cfit2>>=
cfit1 <- coxph(Surv(etime, event=="pcm") ~ age + sex + mspike, mgus2)
cfit1
quantile(mgus2$mspike, na.rm=TRUE)
@ 
The mspike size has a major impact on progression, however; each 1 gram
change increases risk by \Sexpr{round(exp(coef(cfit1)[3]) ,1)} fold.
The interquartile range of \code{mspike} is 0.9 gram so this risk increase
is clinically important.
The effect of age on the progression rate is much less pronounced,
with a coefficient only 1/4 that for mortality, while the effect of sex
on progression is completely negligible.

The effect of sex on the \emph{lifetime} probability of PCM is not zero,
however.  Because of a longer lifetime, a female with MGUS will on average
spend more total years at risk for PCM than the average male, and so has
a larger lifetime risk of PCM.  
The average rate of progression is about 1\% per year, as shown below,
while the mean post diagnosis lifetime is 19 months longer for 
females. 
The overall effect is a 1.6\% increase in lifetime risk.
<<mpyears>>=
pfit1 <- pyears(Surv(ptime, pstat) ~ sex, mgus2, scale=12)
round(100* pfit1$event/pfit1$pyears, 1)  # PCM rate per year

temp <- summary(mfit1, rmean="common")  #print the mean survival time
round(temp$table[,1:6], 1)
@ 

Notice that each \code{coxph} fit essentially ignores the other event type(s).
In the figure, each rate (arrow) depends only on the box from which
it originates and the events which it enumerates.
Rates are instantaneous quantities, and depend only on the set of subjects
who are at risk at at a given  moment;
if someone is not at risk it really does not matter why.

When computing $p(t)$, on the other hand, all the
rates must be considered at once.  
The Aalen-Johansen estimate applies as before, but now the individual
entries $\lambda_{ij}(t)$ in each cell of the transition matrix are taken
from the relevant fit.
As is also the case with predicted survival curves from a simple 
Cox model, predicted probability-in-state curves correspond to a set of
prespecified covariate values.  
As an example we will generate the curves for four hypothetical
subjects: male and female, age 60 and 80, and serum m-spike of 1.2 grams.
These are the approximate quartiles of age, and the median mspike.

The Aalen-Johansen estimate for this simple 3-state competing risks setup
works with a matrix of this form:
\begin{equation*}
  \left( \begin{array}{ccc}  & \lambda_{12}{t} & \lambda_{13}(t) \\
       0 &  & 0 \\ 0 & 0 &  \end{array} \right)
\end{equation*}
As before, the diagonal elements are chosen so that each row sums to 1.  
Standard survival curve calculations for a Cox model can be used to
obtain $\lambda_{12}$, the rate of transition to the PCM state for our
four subjects, and $\lambda_{13}$ = the rate of transition to the ``death
before PCM'' state.  These are placed into a matrix and 
combined using a third call.
The standard errors from the individual curves won't be used and the
survfit routine is a bit faster if we skip them.

<<PCMcurve, fig=TRUE>>=
newdata <- expand.grid(sex=c("F", "M"), age=c(60, 80), mspike=1.2)
newdata

temp <- matrix(list(), 3,3)
dimnames(temp) <- list(from=c("Entry", "PCM", "Death"),
                       to  =c("Entry", "PCM", "Death"))
temp[1,2] <- list(survfit(cfit1, newdata, std.err=FALSE))
temp[1,3] <- list(survfit(cfit2, newdata, std.err=FALSE))
csurv  <- survfit(temp, p0 =c(1,0,0))
plot(csurv[,2], xmax=25*12, xscale=12, 
     xlab="Years after MGUS diagnosis", ylab="PCM",
     col=1:2, lty=c(1,1,2,2), lwd=2)
legend(10, .14, outer(c("female", "male   "), 
                     c("diagnosis at age 60", "diagnosis at age 80"), 
                      paste, sep=", "),
       col=1:2, lty=c(1,1,2,2), bty='n', lwd=2)
@ 
The individual survival curves that result from \code{survfit(cfit1)}
and \code{survfit(cfit2)} are not actually of interest, since 
each is a Cox model analog of the pcmbad curve we criticized earlier.
The cumulative hazard portion of the results is what is used to
build an Aalen-Johansen estimate.
(Calling \code{survfit} on a set of \code{survfit} objects is, I admit, a
bit confusing.  
It would perhaps have been better to name the second routine
``AalenJohansen'', but we use this often and didn't want to type that 
long a name.)

Sex has nearly no effect on the hazard of PCM, i.e., on any given day the
risk of conversion for a male is essentially the same as for a female of the
same age.  
Yet we see above that the fitted Cox models predict a higher lifetime risk 
for females, and an age effect on lifetime risk that is far from proportional.
Very few subjects acquire PCM more than 15 years after a MGUS diagnosis at
age 80 for the obvious reason: very few of them will still 
be alive.

Creating the `list' form matrix above is quite easy, in particular we do
not need to fill in elements on the diagonal, nor those for which no 
transitions occur, e.g., from death back to the entry state. 
The resulting \code{survfit} object is easy to plot or print using standard calls.
The approach has a number of caveats, however.
\begin{itemize}
  \item It does not produce standard errors for the curves, as a consequence
    of being two steps removed from the data.
  \item It is easy to ``fool'' the program.  For instance if you were to get 
    curves for females and males from \code{cfit1}, but the curves from 
    \code{cfit2} were in the reverse order of male then female, results will
    still be produced but they would not be valid.  The user is responsible
    for setting the problem up correctly.
  \item The R syntax for a matrix of lists is rather fussy, e.g., you can't
    leave the \code{list} function out of the lines that assign elements
    to \code{temp} above.
\end{itemize}
The \code{mstate} package addresses these issues, at the price of a 
somewhat more complex syntax.
 
<<year20>>=
# Print out a M/F results at 20 years
temp <- summary(csurv, time=20*12)$pstate
cbind(newdata, PCM= round(100*temp[,2], 1))
@ 
The above table shows that females are modeled to 
have a higher risk of 20 year progression, even
though their hazard at any given moment is nearly identical to males.
The difference at 20 years is on the order of our ``back of the napkin''
person-years estimate of 1\% progression per year * 1.7 more years of life
for the females, but the progression fraction varies substantially by group.  

\section{Fine-Gray model}
For the competing risk case the Fine-Gray model provides an alternate way of
looking at the data. 
As we saw above, the impact of a particular covariate on the final
values $P$ can be complex, even if the models for the hazards are relatively
simple. 
The primary idea of the Fine-Gray approach is to turn the multi-state 
problem into a collection of two-state ones. 
In the upper right diagram of figure 1, draw a circle around all of the
states except the chosen endpoint and collapse them into a single meta-state.
For the MGUS data these are 
\begin{itemize}
  \item Model 1
    \begin{itemize}
      \item left box: All subjects in the entry or ``death first'' state
      \item right box: PCM
    \end{itemize}
  \item Model 2
    \begin{itemize}
      \item left box: All subjects in the entry or ``PCM first'' state
      \item right box: Death (without PCM)
    \end{itemize}
\end{itemize}

An interesting aspect of this is that the fit can be done as a
two stage process: the first stage creates a special data set while the
second fits a weighted \code{coxph} or \code{survfit} model to the data.
The data set can be created using the \code{finegray} command.

<<fg1>>=
# (first three lines are identical to an earlier section)
etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
event <- factor(event, 0:2, labels=c("censor", "pcm", "death"))

pcmdat <- finegray(Surv(etime, event) ~ ., data=mgus2,
                   etype="pcm")
pcmdat[1:4, c(1:3, 11:14)]

deathdat <- finegray(Surv(etime, event) ~ ., data=mgus2,
                     etype="death")
dim(pcmdat)
dim(deathdat)
dim(mgus2)
@ 
The \code{finegray} command has been used to create two data sets:
one for the PCM endpoint and one for the death endpoint.
In each, four new variables have been added containing a survival
time \code{(fgstart, fgstop, fgstatus)} with an `ordinary' status of
0/1, along with a case weight and a large number of new rows.
We can use this new data set as yet another way to compute
multi-state survival curves, though there is no good reason to use this
rather roundabout approach instead of 
the simpler \code{survfit(Surv(etime, event) \textasciitilde sex)}.
<<pfit2>>=
# The PCM curves of the multi-state model
pfit2 <- survfit(Surv(fgstart, fgstop, fgstatus) ~ sex,
                data=pcmdat, weight=fgwt)
# The death curves of the multi-state model
dfit2 <- survfit(Surv(fgstart, fgstop, fgstatus) ~ sex, 
                  data=deathdat, weight=fgwt)
@ 
The two new curves are almost identical to the prior estimates, 
and in fact would be identical if we
had accounted for the slightly different censoring patterns in males and
females (by adding \code{strata(sex)} to the right hand side of the
\code{finegray} formulas).


A Cox model fit to the constructed data set yields the Fine-Gray
models for PCM and for death.  
<<fg2, fig=TRUE>>=
fgfit1 <- coxph(Surv(fgstart, fgstop, fgstatus) ~ sex, data=pcmdat,
               weight= fgwt)
summary(fgfit1)
fgfit2 <- coxph(Surv(fgstart, fgstop, fgstatus) ~ sex, data=deathdat,
               weight= fgwt)
fgfit2

mfit2 <- survfit(Surv(etime, event) ~ sex, data=mgus2) #reprise the AJ
plot(mfit2[,1], col=1:2,
     lwd=2,  xscale=12,
     conf.times=c(5, 15, 25)*12,
     xlab="Years post diagnosis", ylab="Fraction with PCM")
ndata <- data.frame(sex=c("F", "M"))
fgsurv1 <- survfit(fgfit1, ndata)
lines(fgsurv1, fun="event", lty=2, lwd=2, col=1:2)
legend("topleft", c("Female, Aalen-Johansen", "Male, Aalen-Johansen",
                 "Female, Fine-Gray", "Male, Fine-Gray"),
       col=1:2, lty=c(1,1,2,2), bty='n')

# rate models with only sex
pfitr <- coxph(Surv(etime, event=="pcm") ~ sex, mgus2)
dfitr <- coxph(Surv(etime, event=="death") ~ sex, mgus2)
temp <- matrix(list(), 3,3)
temp[1,2] <- list(survfit(pfitr, ndata, std.err=FALSE))
temp[1,3] <- list(survfit(dfitr, ndata, std.err=FALSE))
rcurve <- survfit(temp, p0=c(entry=1, pcm=0, death=0))
@ 
The FG model states that males have a less \emph{observed} PCM, 
by a factor of \Sexpr{round(exp(coef(fgfit1)), 2)}, and that
this hazard ratio is constant over time.
An overlaid plot of the non-parametric Aalen-Johansen estimates for the
PCM state (from \code{survfit}) along with predicted curves from the 
Fine-Gray model show that proportional hazards is not unreasonable for 
this particular fit.
The predicted values from the rate model, computed just above but not 
plotted on the curve, also fit well with the data.

When there is only a single categorical 0/1 covariate the Fine-Gray model
reduces to Gray's test of the subdistribution function, in the same way
that a \code{coxph} fit with a single categorical predictor is equivalent
to the log-rank test.

The mathematics behind the Fine-Gray estimate starts with
the functions $F_k(t) = p_k(t)$, where $p$ is the probability in state
function estimated by the AJ estimate.  This can
be thought of as the distribution function for the improper random variable
$T^*= I(\mbox{event type}=k)T + I(\mbox{event type}\ne k)\infty$.
Fine and Gray refer to $F_k$ as a subdistribution function.
In analogy to the survival probability in the two state model define
\begin{equation}
  \gamma_k(t) = - d \log[1-F_k(t)]/dt \label{FG}I 
\end{equation}
and assume that $\gamma_k(t;x) = \gamma_{k0}(t) \exp(X\beta)$.
In a 2 state alive $\longrightarrow$ death model, $\gamma$ becomes the
usual hazard function $\lambda$.
In the same way that our multivariate Cox model \code{cfit2} made the
simplifying assumption that the impact of male sex is to increase the
hazard for death
by a factor of \Sexpr{round(exp(coef(cfit2)['sexM']), 2)}, 
independent of the subject's age or serum mspike value,
the Fine-Gray model assumes that each covariate's effect on $\log(1-F)$ 
is a constant,
independent of other variables.  
Both model's assumptions are wonderfully simplifying with respect to 
understanding
a covariate, since we can think about each one separately from the others.
This is, of course, under the assumption that the model is correct:
additivity across covariates, linearity, and proportional hazards 
all hold.
In a multi-state model, however, these assumptions cannot hold for 
both the per-transition and Fine-Gray models formulations at the same time;
if it is true for one, it will not be true for the other.

Now consider a multivariate fit on age, sex, and serum m-spike.
<<fg3>>=
fgfit2a <- coxph(Surv(fgstart, fgstop, fgstatus) ~ age + sex + mspike,
                 data=pcmdat, weight=fgwt)

fgfit2b <-  coxph(Surv(fgstart, fgstop, fgstatus) ~ age + sex + mspike,
                 data=deathdat, weight=fgwt)
round(rbind(PCM= coef(fgfit2a), death=coef(fgfit2b)), 3)
@ 
The Fine-Gray fits show an effect of all three  variables on the subdistribution
rates.  Males have a lower lifetime risk of PCM before death and a higher 
risk of death before PCM, while a high serum m-spike works in the opposite
direction.  
The Cox models showed no effect of sex on the instantaneous hazard 
of PCM and none
for serum m-spike on the death rate.
However, as shown in the last section, the Cox models do predict 
a greater lifetime risk for females.
We had also seen that older subjects are less likely to experience PCM
due to the competing risk of death;
this is reflected in the FG model as a negative coefficient for age.

Now compute predicted survival curves for the model, and show them
alongside the predictions from the multi-state model.

<<finegray2, fig=TRUE>>=
oldpar <- par(mfrow=c(1,2))
newdata <- expand.grid(sex= c("F", "M"), age=c(60, 80), mspike=1.2)
fsurv1 <- survfit(fgfit2a, newdata)  # time to progression curves
plot(fsurv1, xscale=12, col=1:2, lty=c(1,1,2,2), lwd=2, fun='event',
     xlab="Years", ylab="Fine-Gray predicted", 
     xmax=12*25, ylim=c(0, .15))
legend(1, .15, c("Female, 60", "Male, 60","Female: 80", "Male, 80"),
       col=c(1,2,1,2), lty=c(1,1,2,2), lwd=2, bty='n')

plot(csurv[,2], xscale=12, col=1:2, lty=c(1,1,2,2), lwd=2,
     xlab="Years", ylab="Multi-state predicted", 
     xmax=12*25, ylim=c(0, .15))
legend(1, .15, c("Female, 60", "Male, 60","Female: 80", "Male, 80"),
       col=c(1,2,1,2), lty=c(1,1,2,2), lwd=2, bty='n')
par(oldpar)
@ 

The predictions as a function of age group are quite different for the
Fine-Gray model: new PCM cases are predicted 20+ years after diagnosis
in both the old and young age groups, while they are predicted to cease
in the multi-state fit.  
The average of the curves is nearly the same at each age, but 
the global proportional hazards assumption of
the FG model forces the curves to remain parallel.

We can check the proportional hazards assumption of the models using
the \code{cox.zph} function,
linearity of the continuous variables age and mspike by using
non-linear terms such as \code{pspline} or \code{ns}, and
additivity by exploring interactions.  
All are obvious and important next steps.  For instance, the proportional hazards assumption for age shows clear violations.

<<finegray-check, fig=TRUE>>=
zph.fgfit2a <- cox.zph(fgfit2a)
zph.fgfit2a
plot(zph.fgfit2a[1])
abline(h=coef(fgfit2a)[1], lty=2, col=2)
@

A weakness of the Fine-Gray approach is that since the two endpoints
are modeled separately, the results do not have to be consistent.
Below is a graph of the predicted fraction who have experienced neither
endpoint.  For subjects diagnosed at age 80 the Fine-Gray models predict that more than 
100\% will either progress or die by 30 years.
Predictions based on the Aalen-Johansen approach do not have this issue.
<<finegray3, fig=TRUE>>=
fsurv2 <- survfit(fgfit2b, newdata)  # time to progression curves
xtime <- 0:(30*12)  #30 years
y1a <- 1 - summary(fsurv1, times=xtime)$surv  #predicted pcm
y1b <- 1 - summary(fsurv2, times=xtime)$surv #predicted deaths before pcm
y1  <- (y1a + y1b)  #either

matplot(xtime/12, y1, col=1:2, lty=c(1,1,2,2), type='l',
        xlab="Years post diagnosis", ylab="FG: either endpoint")
abline(h=1, col=3)
legend("bottomright", c("Female, 60", "Male, 60","Female: 80", "Male, 80"),
       col=c(1,2,1,2), lty=c(1,1,2,2), lwd=2, bty='n')
@ 


The primary strength of the Fine-Gray model with respect to the Cox model
approach is that if lifetime risk is a primary question, then 
the model has given
us a simple and digestible answer to that question:
``females have a 1.2 fold higher lifetime risk of PCM, after adjustment for age 
and serum m-spike''.
This simplicity is not without a price, however, and these authors are 
not proponents of the approach.
There are five issues.
\begin{enumerate}
  \item The attempt to capture a complex process as a single value is
    grasping for a simplicity that does not exist for many (perhaps most)
    data sets.
     The necessary assumptions in a multivariate Cox model of proportional
    hazards, linearity of continuous variables, and no interactions are
    strong ones.  For the FG model these need to hold for a combined process
    --- the mixture of transition rates to each endpoint --- which turns out
    to be a more difficult barrier. 
  \item The sum of predictions need not be consistent.
  \item From the per-transition models one can work forward and compute
    $p(t)$, the occupancy probabilities for each state over time;
    both the hazard ratios and $p$ are useful summaries of the data.
    We don't have tools to work backwards from a Fine-Gray fit to the
    per transition hazards.
  \item The approach is viable only for competing risks and not for
    other multi-state models.
  \item The risk sets are odd.
\end{enumerate}
The last of these is perhaps the most frequently listed issue with the Fine-Gray
model, but it is actually a minor complaint.  
The state probabilities $p(t)$ in a multi-state model are implicitly 
fractions of the total population
we started with: someone who dies in month 1 is still a part of the denominator
for the fraction of subjects with PCM at 20 years.  
In the Fine-Gray formulas this subject explicitly appears in risk set 
denominators at a later time, which looks odd but is more of an artifact.

The first issue is substantial, however,
and checking the model assumptions of a Fine-Gray fit is
mandatory.  The second point is alarming, but it does not have a practical
impact unless there is long follow-up.

\section{Stacked data sets}
How does one fit risk models that have shared coefficients or baseline hazards?
One approach is to
fit the set of Cox models for the rates `all at once' on a combined
data set.  For the simple competing risks MGUS fit above, assume that
we wanted to add hemoglobin to the fit, with a common coefficient for
both the PCM and death endpoint.  (Anemia is a feature of both PCM and old
age.)  Create a stacked data set with $2n$ observations.  The first $n$
rows are the data set we would use for a time to PCM analysis, 
with a simple 0/1 status variable encoding the PCM outcome.
The second $n$ rows are the data set we would have used for the 
`death before PCM' fits, with status encoding the death-before-PCM endpoint.
A last variable, \code{group},  is 'pcm' for the first $n$ 
observations and
'death' for the remainder.  Then fit a model
<<pcmstack, echo=TRUE>>=
temp1 <- data.frame(mgus2, time=etime, status=(event=="pcm"), group='pcm')
temp2 <- data.frame(mgus2, time=etime, status=(event=="death"), group="death")
stacked <- rbind(temp1, temp2)
allfit <- coxph(Surv(time, status) ~ hgb + (age + sex)*strata(group),
                 data=stacked)
@ 
This fits a common effect for hemoglobin (hgb) but separate age and sex
effects for the two endpoints, along with separate baseline hazards.


\section{Other software}
\subsection{The mstate package}
As the number of states + transitions (arrows + boxes) gets larger then
the `by hand' approach used above for creating a stacked data set, labeling
coefficients, and producing multi-state curves becomes a challenge.
(It is still fairly easy to do, just not as easy to ensure it has
been done \emph{correctly}.)
The \code{mstate} package starts with a definition of the matrix of possible
transitions and uses that to drive tools that build and analyze the stacked
data set in a more automated fashion. 
We recommend it for more complex models.  (The tutorial above is about at
our personal threshold.)
A second advantage of \code{mstate} is that all the Cox
model fits are now in one well indexed object, which allows for calculation 
of proper confidence intervals for the state probabilities $p(t)$.
(Since all of the steps used the same transition matrix template, the
necessary computations are scripted and reliable.)

\subsection{The \code{msm} package}
There are two broad classes of multi-state data:
\begin{itemize}
  \item Panel data arises when subjects have regular visits, with the
    current state assessed at each visit.  We don't know when the transitions
    between states occur, or if other states may have been visited in the
    interim --- only the subject's state at specific times.  
  \item Survival data arises when we observe the transition times;
    death, for example.
\end{itemize}

The overall model (boxes and arrows), the quantities of interest (transition
rates and $p(t)$), and the desired printout and graphs are identical for
the two cases.  Much of the work in
creating a data set is also nearly the same.  
The underlying likelihood equations and resulting analytical methods for
solving the problem are, however, completely different.  
The \code{msm} package addresses panel data, while \code{survival}, \code{mstate}, and a host of
others are devoted to survival data.

\section{Conclusions}
When working with acute diseases,  such as advanced cancer or end-stage liver
disease, there is often a single dominating endpoint.
Ordinary single event Kaplan-Meier curves and Cox models are then
efficient and sufficient tools for much of the analysis.  
Such data was the primary use case for survival analysis earlier in the
authors' careers.  
Data with multiple important endpoints is now common, and multi-state
methods are an important addition to the statistical toolbox.  
As shown above, they are now readily available and easy to use.

It is sometimes assumed that the presence of competing risks 
\emph{requires} the use of a Fine-Gray model (we have seen it in referee
reports), but this is not correct.  
The model may often be useful, but is one available option among many.
Grasping the big picture for a multi-state data set is always a challenge
and we should make use of as many tools as possible.
We are often reminded of the story of a centenarian on his 100th birthday
proclaiming that he was looking forward to many more years ahead because
``I read the obituaries every day, and you almost never see someone 
over 100 listed there''.  
It is not always easy to jump between observed
deaths, hazard rates, and lifetime risk.

%\bibliographystyle{plain}
%\bibliography{refer}

\end{document}