Odds and Probabilities Estimation for the Survival of Breast Cancer Patients with Cancer Stages 2 & 3

Breast cancer is one of the leading causes of death in the Philippines. One out of four who are diagnosed with breast cancer die within the first five years, and no less than 40 percent die within 10 years and it has continous rise as time passes by. Therefore, it is very important to know the factors that can help for the survival rate of the patients. The purpose of this study is to identify the best possible treatment or combination of treatments. The researchers considered four independent variables namely: Completed Surgery, Completed Chemotherapy, Completed Hormonotherapy and Completed Radiotherapy. The researchers limit this study for only 160 patients with stage 2 and 135 with stage 3 for a total of 295 patients considering the data gathered from three hospitals from Metro Manila. The names of the hospitals were not declared due to confidentiality of data. In identifying the best treatment or combination of treatments, odds, probabilities and odds ratios of patients, Logistic Regression Analysis was used.


Introduction
According to Philippine Breast CancerNetwork (PBCN), the Philippines has the highest incidence rate of breast cancer in Asia and registered the highest increase of 589% among 187 countries over a 30 year period from 1980 to 2010.Breast cancer is the leading killer of women ages 35 to54 worldwide.[1] Also, according to the Philippine Society of Medical Oncology in 2012, an estimated 3 out of 100 Filipino women will contract the disease before age 75 and 1 out of 100 will die before the age of 75.[2] Breast cancer is a malignant tumor that starts in the cells of the breast.A malignant tumor is a group of cancer cells that can grow into surrounding tissues or spread to distant areas of the body.The disease occurs almost entirely in women, but men can get it, too.[3] The first symptom of breast cancer that most women notice is a lump or an area of thickened tissue in their breast.Most breast lumps aren't cancerous but it's always best to have them checked by their doctor.An individual should see her doctor if she notices any of the following: a lump or area of thickened tissue in either breast, a change in the size or shape of one or both breast, discharge from either of the nipples, a lump or swelling in either of their armpits, dimpling on the skin of your breast, a rash on or around the nipple and a change in the appearance of the nipple.Breast pain isn't usually a symptom of breast cancer.[4] Women with breast cancer may have questions about their prognosis and survival.Prognosis and survival depend on many factors.Only a doctor familiar with a person's medical history, type of cancer, stage, characteristics of the cancer, treatments chosen and response to treatment can put all of this information together with survival statistics to arrive at a prognosis.The researchers considered Completed Surgery, Completed Chemotherapy, Completed Hormonotherapy and Completed Radiotherapy as predictive factors because doctors believe that these factors can affect the survival rate of patients having breast cancer.[5] Some treatments remove or destroy the disease within the breast and nearby tissues, such as lymph nodes.These include: Surgery, radiation therapy, chemotherapy, hormone therapy and targeted therapy but the researchers only limit the study without the data for targeted therapy due to lack of patients that undergo this treatment.[6] Surgery aims to remove the breast cancer with a margin (border) of normal tissue to reduce the risk of the cancer coming back in the breast (known as local recurrence) and to try to stop any spread elsewhere in the body.The amount of tissue removed depends on the area of the breast affected and the size of the cancer in your breast.Surgery is usually the first treatment for breast cancer, although sometimes chemotherapy or hormone therapy is offered first.This is to begin treating the whole body or to shrink the cancer so that surgery may be less extensive.[7] Chemotherapy treatment uses medicine to weaken and destroy cancer cells in the body, including cells at the original cancer site and any cancer cells that may have spread to another part of the body.[8] Hormone therapy (also called hormonal therapy, hormone treatment, or endocrine therapy) slows or stops the growth of hormone-sensitive tumors by blocking the body's ability to produce hormones or by interfering with hormone action.Tumors that are hormone-insensitive do not respond to hormone therapy.Hormone therapy for breast cancer is not the same as menopausal hormone therapy or female hormone replacement therapy, in which hormones are given to reduce the symptoms of menopause.[9] Radiotherapy is a treatment for cancer that uses carefully measured and controlled high energy xrays.In primary breast cancer it aims to destroy any cancer cells that may be left behind in the breast area after surgery.[10] Breast cancer is still the leading cause of cancer among women, accounting for 28 percent of the total cases.One out of four who are diagnosed with breast cancer die within the first five years, and no less than 40 percent die within 10 years.[11] Therefore, it is very important to know the factors that can help for the survival rate of patients experiencing breast cancer.

Objective of the Study
The main objective of the study is to make a model that generates probabilities and odds of surviving a breast cancer using Logistic Regression Analysis.This study also aims to identify which among the following treatments or combination of treatments namely surgery, chemotherapy, hormonal therapy and radiotherapy are significant variables that highly affect the survival rate of breast cancer.

Statement of the Problem
The study aims to estimate the odds of an individual experiencing breast cancer with factors which are considered to be significant contributors to the survival of the disease using logistic regression.Specifically, it seeks answers to the following questions:

Scope and Limitation
The researchers limited the study as there were only 290 health data of female breast cancer patients with 160 of them are experiencing stage 2 of the disease and 135 patients are with 3 rd stage.The data were gathered from three hospitals from Metro Manila.The names of the hospitals were not declared due to confidentiality of the data.The researchers applied logistic regression analysis to identify key determinants of breast cancer from the qualifying health survey data and clinic pathological predictors of the patients.

Population and Covariates
The model for odds and probabilities of survival rate of patients were obtained using the data gathered from three hospitals in Metro Manila compiled by the specialists and experts to imply female patients with breast cancer disease.The names of the hospitals were not declared because of the confidentiality of the data.About 95% of the 160 breast cancer patients with stage 2 disease survived the disease and 5% didn't.On patients with 3 rd stage, there were 87.4% survived the disease and 12.6% didn't.There were four treatments that contribute to the overall model.These are Completed Surgery, Completed Chemotherapy, Completed Hormonotherapy and Completed Radiotherapy.

Statistical Tool
The researchers used the statistical software Statistical Package for Social Science (SPSS) in identifying the significant predictors that actually affect the survival of patients as well as computing the odds, probabilities and their odds ratios by using the Logistic Regression.SPSS is one of the most popular statistical packages which can perform highly complex data manipulation and analysis with simple instructions.[12]

Logistic Regression
Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome.The outcome is measured with a dichotomous variable in which there are only two possible outcomes.[13] Logistic Regression does not make many of the key assumptions of linear regression and generalized linear models that are based on ordinary least squares algorithms.
Firstly, it does not need a linear relationship between the dependent and independent variables.Logistic regression can handle all sorts of relationships, because it applies a non-linear log transformation to the predicted odds ratio.Secondly, the independent variables do not need to be multivariate normal -although multivariate normality yields a more stable solution.Also the error terms (the residuals) do not need to be multivariate normally distributed.Thirdly, homoscedasticity is not needed.Logistic regression does not need variances to be heteroscedastic for each level of the independent variables.Lastly, it can handle ordinal and nominal data as independent variables.The independent variables do not need to be metric (interval or ratio scaled).However, some assumptions still apply.First, the dependent variable must be binary for binary logistic regression and ordinal for ordinal logistic regression.Second, the outcome variable needs to be coded properly that is, 1 for desired outcome and 0 for the event not occurring.Third, only meaningful variables should be included.Fourthly, independent variables should be independent that is data points should not be from another variable (e.g.before-and-after measurements and matched pairings) and there should be no multicollinearity among the variables.Fifth, logistic regression assumes linearity of independent variables and log odds.Lastly, it requires quite large sample size.[14] The probability of the outcome is measured by the odds of occurrence of an event.If P is the probability of an event, then (1-P) is the probability of it not occurring.

Odds of success = P / 1-P
The simple logistic model has the form: Taking the antilog of Equation 1 on both sides, one derives an equation to predict the probability of the occurrence of the outcome of interest as follows: where is the probability of the outcome of interest or "event", α is the Y intercept, β is the regression coefficient and e = 2.71828 is the base of the system of natural logarithms.X can be continuous or categorical, but Y is always categorical.Extending the logic of the simple logistic regression to multiple predictors, one can construct a complex logistic regression for Y as follows: Therefore , where is once again the probability of the event, α is the Y intercept, βs are the regression coefficients, and Xs are a set of predictors.[23] An odds ratio is a measure of association between an exposure and an outcome.The odds ratio represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.Odds ratios are most commonly used in case-control studies; however they can also be used in cross-sectional and cohort study designs as well.[15] The odds ratio is given by : [16] OR = (5) Where: = odds of success at condition 2 = odds of success at condition 1 The odds of a stage 2 breast cancer patient who didn't undergo surgery nor chemotherapy is given by which also means that her probability of surviving breast cancer is Y = (Odds) / (1+Odds) = (0.056) / (1+0.056)= 5.4%.

Significant Parameters
By running the regression with an alpha of 0.05, the researchers found that among the four initial independent variables, only two are significant: Completed Surgery The odds of a stage 3 breast cancer patient who didn't undergo radiotherapy is given by which also means that her probability of surviving breast cancer is .

Significant Factors
By using the Logistic Regression Analysis, it was concluded that for stage 2 breast cancer patients, the significant predictors are completed surgery and chemotherapy having p-values of 0.001 and 0.015 respectively while on stage 3, radiotherapy is the only significant factor having a p-value of 0.05.For stage 3, the odds of a stage 3 breast cancer patient that completed radiotherapy only is e 1.329+1.022(1)= 10.4961 to Y = (10.4961)/ (1+10.4961)= 91.31%.

Odds Ratio of Single and Combination of Treatments
The model predicts the odds ratio for completed surgery to be OR = [e -2.866+4.812 (1)] / (e -2.866 ) = 122.978.Therefore, the survival of a stage 2 breast cancer is likely to happen to a patient that has completed surgery by 123 times compared to a person that didn't undergo of any of these treatments.The model also predicts the odds ratio for completed chemotherapy to be OR = [e -2.866+2.178 (1)] / (e -2.866 ) = 8.7846.Out of all single and combination factors, the combination of both treatments is better than applying a single treatment only with an odds ratio of more than a thousand indicating that a breast cancer patient that undergo both surgery and chemotherapy will survive the disease by more likely a thousand times compared to a cancer patient that didn't undergo any of these treatments.It is computed as OR = [e -2.866+4.812(1)+2.178 (1)] / (e -2.866 ) = 1080.3064.

Odds Ratio of Single Treatment
The model predicts the odds ratio for completed radiotherapy to be OR = [e 1.329+1.022 (1)] / (e 1.329 ) = 2.7790.Therefore, the survival of a stage 3 breast cancer is likely to happen to a patient that has completed radiotherapy only by almost thrice compared to a person that didn't undergo any treatment.

Conclusions and Recommendations
Researchers conclude that for stage 2, the combination of surgery and chemotherapy are the best treatments that can be applied to a patient for her to get a better chance of surviving the disease and for stage 3, radiotherapy is preferred.The study found out that the survival of the disease for breast cancer patients with stage 2 is more likely to happen to a patient that undergone the combination of surgery and chemotherapy by more than a thousand times compared to a patient that didn't undergone any of these treatments.Also for stage 3, a patient that undergone radiation is three times more likely to survive the disease compared to a patient that didn't undergone any treatment.
The researchers recommend that adding more independent variables such as age, menopausal status, type of cancer, histological grade, lymphatic and vascular invasion, family history and size of primary tumor will enhance the model because some studies proved that these factors really influence the survival rate of patients with breast cancer.The researchers also recommend adding more participating hospitals in order to increase the sample size for a better model.

Table 1 .
Profile of Stage 2 & 3 Breast Cancer PatientsThe table shows the profile of the breast cancer patients with stage 2 or stage 3 diseases.Majority of the stage 2 Running the Logistic Regression again but this time, removing all the insignificant factors and leaving only all significant variables in the final model (Forward Stepwise Method), the results implied that Surgery and Chemotherapy are the significant variables with regression coefficients of 4.812 and 2.173 respectively.
3.2.Logistic Model for Stage 2 Breast Cancer Patients3.2.1.Significant ParametersBy running the Logistic Regression Analysis, only the completed surgery appears to be significant but the