Traditional Chinese Medicine (TCM) Diagnosis Model Building Based on Multi-label Classification

In the study, we propose a TCM diagnosis model that can be used for multi-label classification and give clear diagnosis, as well as the basis for diagnosis and differentiation when the symptoms correspond to multiple diseases or syndromes. The implementation of the model is divided into three steps. Firstly, choose the machine learning algorithm to train the TCM diagnosis model. The features of the training data are symptoms and the labels are diseases or syndromes. Secondly, give the number α (α>1, α∈Z+) , the model will output the diagnoses with the top α highest probability according to the input symptoms as candidate diagnoses. Finally, the rules of differential diagnosis are designed to determine which candidate diagnoses should be reserved, thereby complete the multi-label classification. In our test dataset, by 10-fold cross-validation, the average accuracy of the single label classification was 0.882; the average precision was 0.974; the average recall was 1.000; the average f1 score was 0.967; the average accuracy of the multi-label classification was 0.706; the average micro precision was 0.934; the average micro recall was 0.941 and the average hamming loss was 0.060. Through the test we can know that this model had a good potential for auxiliary decision making in clinical diagnosis and treatment.


Introduction
In the traditional Chinese medicine(TCM) diagnosis, TCM practitioners analyze corresponding syndromes of patients and conduct differential diagnosis based on their information obtained through TCM four diagnostic methods to ensure the accuracy of diagnosis and avoid misdiagnosis and missed diagnosis.
For this process, an intelligent auxiliary model of TCM clinical diagnosis and treatment can be set up through machine learning, which can help practitioners use complex medical knowledge to deal with various medical problems more efficiently and quickly in the decision making of clinical diagnosis to avoid omissions and losses of important information and clues, so as to find more solutions for difficult miscellaneous diseases [1].
In recent years, the development of machine learning has provided many new methods for the auxiliary model of TCM clinical diagnosis. For example, neural network and random forest are used for modeling, which show high accuracy in clinical diagnosis of multi-class classification [2][3][4]. However, because of multi-class classification, which means only one diagnosis result is output, it is difficult for the model to deal with clinical symptoms corresponding to multiple diagnoses, and the diagnosis results output lack of understandable diagnostic evidence and differential diagnosis.
Therefore, in this study, according to the diagnosis and identification process based on symptoms of TCM, that is, inferring the corresponding TCM disease and syndrome according to the clinical symptoms and differential diagnosis, we put forward a TCM diagnosis model with multi-label classification. This model can provide multiple TCM diagnosis results that may be mapped according to the patient's clinical symptoms, and conduct differential diagnosis, remove the wrong results, make the model more consistent with the clinical diagnosis process.

Multi-label classification
In terms of the traditional classification problem, one instance (feature vector) is associated with one label [5]. That is, single instance and single label. For example, in the Treatise on Febrile Diseases, the disease corresponding to the following clinical symptoms including floating pulse, headache, stiff neck and aversion to cold is disease of the Taiyang channel.
But in reality, things themselves are complex, and one thing can associate multiple labels at the same time [5], namely single instance with multiple labels. For example, in the Chinese Internal Medicine [6], the disease corresponding to the following clinical symptoms including fever, little aversion to wind, sweating, headache, red face, cough, sputum, nasal congestion, a sharp tongue-red and rapid pulse is a cold with the syndrome of exterior attacked by wind-heat. That is, the corresponding diagnosis of this group of clinical symptoms is multi-label, which can be divided into two levels to establish labels: the first level of labeling is the TCM disease level, i.e., "cold", and the second level of labeling is the syndrome level, i.e., "syndrome of exterior attacked by wind-heat". This example only takes two levels. In actual use, more levels can be built as needed. Compared with single label classification, TCM diagnosis model based on multilabel classification will have a wider scope of application.

Relative work
In last several years, great progress has been made in the study of clinical diagnosis and treatment models for multi-label classification. For example, reference [7] used BP neural network and chose the sigmoid function as the output function to train the TCM diagnosis and treatment model. The features of the training samples were symptoms, and the labels were traditional Chinese medicines. After the training was completed, the model can output a variety of traditional Chinese medicines according to the symptoms. Reference [8] used a recurrent neural network for multi-label diagnostic modeling, which can provide corresponding diagnoses based on the phenotyping features of dynamic changes. Since the training samples and the test samples were multi-label, the output function of this model was also sigmoid. Reference [9] used deep learning and one vs rest strategy for multi-label classification diagnosis modeling for the syndromes of TCM spleen and stomach diseases and achieved good results.
Compared with the above models, the model proposed in this paper is in accordance with the diagnosis and differential diagnosis process based on symptoms of TCM. Firstly, provide multiple possible candidate diagnoses for the input symptoms, then carry out differential diagnosis and exclude the candidate diagnoses that the model considers to be wrong, so as to achieve diagnostic multi-label classification.

Diagnosis model of TCM with multilabel classification
In traditional Chinese medicine, symptoms are the main basis for determining disease types and identifying syndromes, and thus are of important significance in the diagnosis of TCM [10]. Therefore, this model takes clinical symptoms as an important basis for identifying diseases and syndromes, and optimizes the decisionmaking process of the TCM diagnosis model with the use of single machine learning algorithm by means of differential diagnosis based on symptoms to achieve multi-label classification.
In short, the decision-making process of the model is as follows: firstly, output α (α> 1, α∈Z + ) diagnoses as candidate diagnoses based on the input clinical symptom. Secondly, extract the symptoms corresponding to each candidate diagnosis from the input symptoms. Finally, conduct differential diagnoses, namely, determine whether the symptoms corresponding to each candidate diagnosis simultaneously correspond to any possible diagnoses except the candidate diagnosis. If so, this candidate diagnosis can be excluded and if not, the diagnosis shall be reserved. Therefore, the main task of this model is to analyze the following issues: • Which candidate diagnoses may be mapped by the currently input symptoms?
• Which current input symptoms are corresponding to each candidate diagnosis?
• Are the input symptoms corresponding to each candidate diagnosis corresponding to other possible diagnoses except the candidate diagnosis at the same time?

Methods
The model can be divided into two major parts. The first part is the TCM diagnosis model constructed by machine learning, and the second part is the differential diagnosis controlled by logic rules. The operation of this model includes three processing steps: multiple diagnostic outputs, reverse extraction and result identification.
In order to explain in detail the operation of the model, we use the Syndrome of Ephedra Decoction, Syndrome of Daqinglong Decoction, Syndrome of Puerariae Decoction and Syndrome of Gui-zhi Decoction in Treatise on Febrile Diseases for explanation.

Multiple diagnostic outputs
Multiple diagnostic outputs are completed in the machine learning, such as the diagnosis model with the use of the BP neural network. The features of the training set are symptoms, and the every label is disease or syndrome. For example, as for one sample, its label is "Syndrome of Gui-zhi Decoction" and its features include "fever, aversion to wind, headache, sweating, slower pulse". After the training is completed, the model will output the diagnoses with the highest probability of the top α according to the input symptoms as candidate diagnoses. The process after the model training is completed can be expressed as: Equation (1) is a binarization function that converts a set of clinical symptoms to X∈{0,1} 1xN , where N is the number of symptoms. Equation (2) is an output function of a TCM diagnosis model constructed by a machine learning algorithm and can be used to output the probability of each diagnosis result according to X. Equation (3) indicates that the diagnosis results output by the model are sorted by probability, and the diagnoses with the highest probability of the top α are taken as the candidate diagnoses.
For example, input t as the main symptoms of Syndrome of Ephedra Decoction: [aversion to wind, fever, sweating, headache, rapid pulse], so X can be [ However, it should be noted that this process lowers the output threshold to output multiple diagnoses, so it usually contains wrong diagnosis, especially if there is only one correct result.

Reverse extraction
Reverse extraction is based on the multiple candidate diagnoses given in the previous step, extracting the symptoms supporting the candidate diagnoses from the input symptoms. The process can be expressed as: Equation (4) can be used to calculate the intersection between the main symptoms of each candidate diagnosis and input symptoms for the extraction of symptoms Z i that support the candidate diagnosis from the input symptoms. T refers to a binary-coded matrix of main symptoms of diseases and syndromes that can be provided by experts in the field. The order of T is the same as the target classification of the machine learning model. C i represents the ith element in the candidate diagnoses C, then Tci represents the binary code of the candidate diagnosis C i corresponding to the main symptoms. The process of combining (1)(2) is shown in Figure 1.

Result identification
The result identification shall return to the diagnosis model with the use of machine learning and conduct the second part, namely, the differential diagnosis. First, input Z i into the diagnosis model to reacquire the corresponding probability of C i , as shown in Figure 2. Next, activate the differential diagnosis shown in Figure  3, and judge whether there are other possible diagnoses corresponding to these symptoms Z i at the level corresponding to C i , so as to determine whether to reserve this candidate diagnosis C i . For example C 0 is at the syndrome level of Syndrome of Ephedra Decoction, then the differential diagnosis is only executed between syndrome and syndrome. The main processing here is: Equation (6) refers to update the probability that C i corresponds based on y. Equation (8) can be used to obtain the intersection of Z i and T and input the intersection into Equation (9) to determine whether Z i corresponds to other diagnosis results except C i . Equation (9) is a differential diagnosis rule as shown in Figure 3, and can be used to output the determined diagnosis; and the θ is a threshold of the output probability. In the differential diagnosis in Figure 3, the hamming distance between the clinical symptoms corresponding to Z i and H d can be determined. When the two are completely the same, the hamming distance is 0, which proves that the candidate diagnosis C i has the same symptoms as other possible diagnoses. So the model will exclude candidate diagnosis C i . Through differential diagnosis, the model can determine which candidate diagnoses can be reserved, so as to achieve multi-label classification of TCM diagnosis.
For example, substitute Z 0 : [1, 1, 0, 1, 0, 1, ...,1, ...,0] in the above example, that is, the binary code of the corresponding symptoms of Syndrome of Ephedra Decoction to (5) ]. Finally substitute these results into (9), then the output result can be 0, that is, to reserve the diagnosis of Syndrome of Ephedra Decoction.  In summary, the TCM diagnosis model proposed in this study is to classify multiple labels into multiple single instance single label classification problems. It can not only output diagnosis results, but also have differential diagnosis and strict diagnostic basis control, allowing the physician to know how the model made identification and diagnose from multiple aspects. This model is not limited to BP neural networks, and other algorithms such as random forest can also be used. Modeling is shown in Table 1.

Table 1. Algorithm
Step 1: Build model Input: Feature matrix X, label matrix Y

Select machine leaning algorithm and build model by input
Step 2: Predict Input: X,α,T, θ

Experiment
For the experiment of model diagnosis performance, there is no publicly available TCM disease and syndrome dataset, so we use the dataset of 103 clinical samples collected from the experts of the Endocrinology Department of Nephropathy to train and test the model. The diseases included in the dataset are as follows: 8 cases of consumptive thirst, 7 cases of kidney water, and 6 cases of Guange (dysuria and vomiting). Syndromes contained: 11 cases of kidney-yang deficiency, 17 cases of water-rheum collecting internally, 21 cases of blood stasis, and 33 cases of deficiency of both qi and blood. The samples in the dataset include two levels, that is, disease and syndrome. Then establish label bases on above levels, and the labels of the level of disease are as follows: consumptive thirst, kidney water and Guange. The labels of the level of syndrome are as follows: kidney-yang deficiency, water-rheum collecting internally, blood stasis and deficiency of both qi and blood. The features of each sample in the dataset correspond to a set of clinical symptoms in the above labels, and the set of symptoms only correspond to one label. For example, the features are "ache of waist, nocturia, fatigue, white fur, sink pulse", then the label is "deficiency of kidney yang".
The samples in the dataset are single instance single label, and each sample corresponds to only one identified disease or syndrome label. Compared to the structure of one sample corresponding to multiple labels, this can make the sample's labels have clear symptoms as their basis, and because there is only one label, omission of labels or baseless labels can be avoided, ensuring the reliability of the dataset. This dataset is subject indexed by the TCM Thesaurus in the Chinese medicine prescription intelligent analysis system(CPIAS), which normalizes the dataset content and makes the features and labels with the same meaning expressed in the same way. The symptoms and number of the dataset, number of samples and category labels are shown in Table 2 and Table 3.
In this study, we use 10-fold cross-validation to test, clinical symptoms as input, and diagnosis results as output, and test results were evaluated to verify the manifestation of single label classification. The manifestation of multi-label classification was verified by collecting 15 samples with the label patterns of the disease + syndrome. For example, the multi-label: consumptive thirst + deficiency of kidney yang + blood stasis.
The models proposed in this study were modeled with the use of random forest and BP neural network (two machine learning algorithms used to train TCM diagnosis model). In addition, we used the same training set, adopted one vs rest strategy to establish a baseline model between random forest and BP neural network. The parameters of all models are same. After many times of filtering of the main parameters of modeling, a better performing set is shown in Table 4.

Results
The test results of the model single label classification were evaluated using accuracy, precision, recall and f1 score. For the multi-label classification, we used macro precision, macro recall, micro precision, micro recall and hamming loss to evaluate multi-label classification capabilities. All the tests use 10-fold cross-validation.

Single Instance single label classification
In single label classification, the average accuracy, average precision, average recall and average f1 score of 10-fold cross-validation are shown in Table 5.

Single Instance multi-label classification
The multi-label classification capabilities are evaluated by the average macro precision, average macro recall, average micro precision, average micro recall, and average hamming loss after 10-fold cross-validation. See Table 6 and Table 7.

Performance and analysis of the model
The one vs rest strategy can be used for multi-label classification. However, for samples collected in this study, every sample only corresponds to one label. This makes it difficult for the model to set the threshold of probability output when the input symptoms of the model correspond to multiple labels due to no similar sample in the training process. If the threshold set is too small, there will be too many output results, so there will be wrong diagnoses in the results; if the threshold set is too high, there will be omissions of diagnoses, so it will be difficult to exert the advantages of multi-label classification.
On the contrary, the model proposed in this paper does not depend entirely on the model's result probability to make the final diagnosis, but improves the judgment method of the model through algorithm adaptation. In the simulation, the model can narrow the diagnosis scope, identify the main symptoms of the syndrome and conduct differential diagnosis according to clinical symptoms. The model can select the corresponding disease or syndromes of the clinical symptoms, conduct differential diagnosis and exclude the results that the model considered to be wrong, so as to determine the number of labels, thereby realizing the learning of single instance single label sample and performing multi-label classification. In addition, when the sample is labeled in this study, we conducted attribute labeling of the level of TCM disease or TCM syndromes, so as to make manual labeling and model judgment have a clear level.

Problems and proposed solutions of models in multi-label classification
Although the TCM diagnosis model proposed in this paper has good performance in multi-label classification, there is also one problem, that is, heavy workload, because experts need to label the main symptoms corresponding to each diagnosis result.
For this problem, it may be further combined with the Gibbs sampling method used in the topic model [11] to quantify the importance of symptoms for the diagnosis of diseases or syndromes, assisting experts to label more easily, and promising to solve the heavy labeling workload of experts. In addition, there is another problem, that is, the training samples are too small and there is a risk of overfitting or underfitting. Therefore, we will further explore small samples learning method of one shot learning [12] in the future in order to solve the problem of small samples learning in TCM.

Conclusion
The intelligent TCM diagnosis model proposed in this paper not only can provide the probability of results, but also can realize the differential diagnosis and the extraction of syndrome evidence on the basis of multilabel classification, making the results more interpretable. In order to further establish a syndrome differentiation and treatment model based on the principle of "if there is manifestation of this syndrome, we can use its corresponding prescription", an effective method is provided to enable it to have the potential for auxiliary clinically diagnosis and treatment.