Parametric Models in Survival Analysis for Lung Cancer Patients

The aim of this study is to estimate the survival function for the data of lung cancer patients, using parametric methods (Weibull, Gumbel, exponential and log-logistic). Comparisons between the proposed estimation method have been performed using statistical indicator Akaike information Criterion, Akaike information criterion corrected and Bayesian information Criterion, concluding that the survival function for the lung cancer by using Gumbel distribution model is the best. The expected values of the survival function of all estimation methods that are proposed in this study have been decreasing gradually with increasing failure times for lung cancer patients, which means that there is an opposite relationship failure times and survival function.

Gumbel showed that the Weibull distribution and the type smallest extreme value distribution are the same [4]. Log-logistic distribution is a very important reliability model as it fits well in many applied situations of reliability data analysis. Another advantage with the log-logistic distribution lies in its closed form expression for survival and failure rate functions that makes it important over log-normal distribution [5]. Salman and Farhan [6] estimated the survival function for the patients of lung cancer; they used several nonparametric estimation methods, and concluded that the shrinkage was the best method.
The main objective of this research is to estimate the survival function for the data of lung cancer patients, by using parametric methods and determine the best and most efficient distribution.

Survival Analysis
Survival function is the probability that a system or component will survive without failure during a specified time interval [0, ] under given operating conditionals, denoted by , which is defined as [7]: Where, is a random variable, is the time of death.
The survival function ( ) is the probability that the patient will survival till time . Survival probability is usually assumed to approach zero as age increases, with: (0) = 1 lim →∞ ( ) = 0 ( ) is decreasing and continuous from right side. It is linked with the failure distribution function ( ) and in fact, it is the complement of it, i.e.:

Life Time Distribution Function
The life time distribution function, is defined as the complement of the survival function [1], If ( ) is differentiable then the derivative, which is the density function of the lifetime distribution is, The function ( ) is sometimes called the event density; it is the rate of death or failure events per unit time.

Hazard Function
Hazard function is also known as the immediate failure rate [8]. This is the limit of the conditional probability that an item will fail in the time interval [ , + ∆ ] when we know that the item is functioning at time t is By dividing this probability by the length of the time interval, ∆ , and letting ∆ → 0, we get the rate function ( ℎ( )) , and it is defined as: Where ( ) the failure density functions, and ( ) is the survival function. Then, So hazard is instantaneous mortality rate conditional on previous survival, and the integrated form of cumulative hazard

Weibull Distribution
The Weibull distribution is continuous distribution. It is one of the most widely applied life distributions in reliability analysis [9] and [10].
The probability density function is: Where ( > 0) is shape parameter and ( > 0) is the scale parameter of the distribution. The mean and variance of Weibull distribution are respectively: The cumulative distribution function is defined as: The survival function is defined as: The failure rate (or hazard rate) function is given by:

Gumbel Distribution
Gumbel (1958) denotes this distribution of the type I distribution of the smallest extreme, called the Gumbel distribution of the smallest extreme [8]. The Gumbel distribution is a very common distribution due to its global applicability in several fields and its wide applications [11] and [13]. The probability density function is: where = − , is the location parameter, and is the scale parameter of the distribution.
The mean and variance of distribution are respectively: is Euler's constant (0.577215) The cumulative distribution function is defined as: The survival function is defined as: The failure rate (or hazard rate) function is given by:

3.3.Exponential Distribution
The exponential distribution is a special case of two-parameter distribution (or gamma distribution) when the shape parameter is ( = 1) in equation 10, then the probability density function is [11]: where > 0 is the scale parameter of the distribution, and the mean and variance of exponential distribution are respectively: The cumulative distribution function is defined as: The survival function is defined as: The failure rate (or hazard rate) is given by:

Log-Logistic Distribution
Log-logistic distribution is widely used in survival analysis when the failure rate function presents an unmoral shape [5]: where ( > 0) is scale parameter and ( > 0) is the shape parameter of the distribution.
The mean and variance are respectively [12], The cumulative distribution function is defined as: Also the survival function is defined as: The hazard function is given by:

Goodness of Fit Test
In order to compare the distributions, we consider some other criterion like Akaike information Criterion ( ), Akaike information criterion corrected ( ) and Bayesian information Criterion ( ) for the real data set [2]. The best distribution corresponds to lower , , and values [7] and [13]: Where k is the number of parameters in the statistical model, the sample size and is the maximized value of the likelihood function for the estimated model.

Data Analysis and Results
The dataset used in this study consists of a sample of (118) lung cancer patients obtained from Salman and Farhan [6] and given in Table 2.
We can find the estimated value of the parameters and its confidence intervals for the distributions by using maximum likelihood estimation method as follows:       Figure 4: The curve of log logistic distribution for survival function  The results in Table 3 indicate that the distribution has the lowest , and values than the Weibull, log-logistic, and exponential. Hence Gumbel distribution leads to a better fit than the other three distributions.

Conclusions
From the practical work, it is concluded that the Gumbel distribution has the lowest , and values than the Weibull, exponential, and log-logistic distributions. We conclude that the survival function for the lung cancer by using Gumbel distribution model is the best. And the expected values of the survival function of all estimation methods which are proposed in this article has been decreasing progressively with increasing failure times for lung cancer patients: this means that there is an opposite relationship failure times and survival function.