Nonparametric predictive inference with parametric copula for survival analysis

. Many real-world problems of statistical inference involve dependent bivariate data including survival analysis. This paper presents new nonparametric methods for predictive inference for survival analysis involving a future bivariate observation. The method combine between bivariate Nonparametric Predictive Inference (NPI) for the marginals with parametric copula to take dependence structure into account. The proposed method is a discretized version of the parametric copula. The NPI fits the marginal and very straight forward computations. Generally, NPI is a frequentist approach which infer a future observation based on past data. The proposed method resulting imprecision is robustness with regard to the assumed parametric copula in the marginal for prediction. This is practical for small data set. The suggestion is to use a basic parametric copula for small data sets. We investigate and discuss the performance of these methods by presenting results from simulation studies. The method is further illustrated via application in survival analysis using data sets from the literature.


Introduction
Survival analysis is defined as a set of methods or tools for analysing data where the variable is the time until the occurrence of an event of interest [1,2,3,4]. For example, time to be healed, time to death, length of stay in a hospital, time to marriage, time to divorce, money paid by health insurance or viral load measurement.
In this study, we focus on time-to-event data such as onset of disease in medical and time to failure of mechanical system in engineering. There are many methods use to analysing the data such as proportional hazards and accelerated failure time models have been developed. However, these methods are used when the assumption is independent [5,6,7]. For dependent assumption of observed failure time such as clustered survival data or parallel events, those method are unsuitable whereby the dependent of the data are unaccounted. For example, clustered survival data arise when event times belonging to the same cluster are correlated. Consider diagnosis of hip fracture being healed in a dog from [8]. In the study, the time to diagnosis is measured by two different imaging techniques. The first technique is radiography (RX) and the second techniques is an ultrasound (US). This resulting two clustered diagnosis times which should consider the dependence structure.
Therefore, in this paper, we introduce survival analysis data using Nonparametric Predictive Inference (NPI) for the marginals with parametric copula to take dependence into account. NPI is a frequentist statistical framework for inference on a future observation based on past data observations [9]. The uncertainty in NPI is quantifies through imprecision which only based on a few assumption. Basically, the imprecise probabilities is a classic probability theory which allow for partial probability specification and useful if applicable information is not enough and difficult to obtain. In this paper, we are focusing on predictive inference involving a future bivariate observation for survival analysis data.

Bivariate data
Let (Xi, Yi) be a bivariate random quantity where i = 1, . . . , n. Let Xn+1 and Yn+1 represent the future observations of the random quantities X and Y , respectively and Xn+1 and Yn+1 represent the transformations of Xn+1 and Yn+1 as follows given in [10] and [11]: The transformation is from the real plane R 2 into [0, 1] 2 , where i, j = 1, 2, ..., n + 1. The method that follows is applied to the transformed data.

Copula
A Copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform. Copulas are used to describe the dependence between random variables. By the well-known theorem by Sklar's [12], every joint cumulative distribution function F of continuous random quantities (X, Y ) can be written as F (x, y) = C(Fx(x), Fy (y)) for all (x, y) ∈ R 2 , where Fx(x) and Fy (y) are the continuous marginal distributions and C : [0, 1] × [0, 1] → [0, 1] the unique copula associated to this joint distribution, F (x, y). So, a copula is a joint cumulative distribution function whose arginal are uniformly distributed on [0, 1] [12,13]. By using the NPI marginal cumulative distribution functions, we have discretized uniform marginal distributions on [0,1], which therefore fully correspond to copula [10,11]. Therefore, the transformation shows that the arginal which we use NPI approach can be easily combined with any parametric copulas to reflect the dependence structure.

Combine NPI with parametric copula
NPI on the arginal can be combined with the estimated parametric copula density, θ as follows [10], where i = 1, …, n and PC (·|θ ) represents the copula-based probability with estimated density As given in [10], equation hij (θ ) can be considered to infer about an event E that involves the next observation (Xn+1, Yn+1). Let E (Xn+1, Yn+1) For example, we are interested in the event Dn+1 = Xn+1/Yn+1 > d where without loss of generality, Y > 0. Then, the lower and upper probability for the event is where U = {(i, j) : x(i)/y(j) > d}. Basically, equation (8) and (9) are survival function equations.

Predictive performance
We performed a simulation study to obtain indications of the predictive performance of this approach. Let   as follows [10] for q∈ (0, 1),: The proposed method performs well, if the two following inequalities hold, The data were simulated from a copula family and parametric method copula were used in this simulation study;

Example: survival analysis data
These data describe the lengths of time required for patients with headaches to achieve relief, each patient receives a standard treatment and a new treatment on separate occasions. The times are recorded to the nearest tenth of a minute [1]. Let (X, Y) denote the bivariate variable (the time to relapse of the i th patient on the first treatment, the time to relapse of the i th patient on the second treatment), and suppose that we are interested in the ratio of these two values for the next observation, Dn+1 = Yn+1/Xn+1 > d where, without loss of generality, X > 0.
For these data, we used bivariate Normal copula, C (θ) and the lower and upper probabilities for the event are presented in fig.1.

Conclusion
Generally, the main conclusion we draw from the prediction performance of this method is performed well for small values of n, while for larger data sets a nonparametric copula can be used in order to learn more about the dependence structure from the data. The imprecision of the proposed method provides a sufficient robustness which consider to have a good frequentist properties specifically for the predictive inferences. The results is depending on the parametric copulas used, the random quantity and the percentiles studied. The imprecision decreases for increasing sample size.