Tibetan interrogative sentence recognition and classification based on phrase features

The recognition of Tibetan interrogative sentences is a basic work in natural language processing, which has a wide application value in terms of Tibetan syntactic analysis, semantic analysis, intelligent question answering, search engine and other research fields. Employing interrogative pronouns as a entry point to analyze the phrase features before and after interrogative pronouns, the paper proposes a method for Tibetan interrogative sentence recognition and classification based on phrase features by designing a Tibetan interrogative sentence recognition and classification model based on phrase features. Experimental results show that the recognition accuracy, recall rate and F value of this method are 98.21%, 100.00% and 99.10% respectively, and the average classification accuracy, recall rate and F value are 96.98%, 100.00% and 98.39%, respectively.


Introduction
With the development of computer technology, the research of Tibetan natural language processing has gradually developed from word level to sentence level. Tibetan interrogative sentence is a common sentence pattern, and its recognition and classification is one of the key technologies in Tibetan syntactic analysis, semantic analysis, intelligent question answering, search engine and other tasks.
In the recognition methods of sentences and sentence patterns, the commonly used methods are rule method, statistical method and the combination of rules and statistics, etc. There are many documents on Chinese sentence pattern recognition. Literature [1][2][3][4] employs different methods to identify and classify Chinese subjective sentences, explanatory opinion sentences, opinion sentences, and graceful sentences, all of which have achieved good experimental results. In terms of Tibetan sentence and sentence pattern recognition, because there is no obvious boundary symbol in Tibetan sentence, the current research mainly focuses on sentence boundary recognition technology [5][6][7][8][9][10][11][12][13][14], which provides a theoretical basis for the study of Tibetan sentence boundary recognition. The research on Tibetan sentence pattern recognition and classification technology has not been reported. The research shows that identifying different sentence patterns and classifying them can improve the performance of question answering system. Analyzed the phrase features before and after interrogative pronouns.

Tibetan interrogative sentence recognition and classification model
In Tibetan written language, each interrogative sentence contains at least one interrogative pronoun with distinct structural features. Taking interrogative pronouns as the starting point, this paper designs a Tibetan interrogative sentence recognition and classification model with phrase features as shown in Fig.1.
The Tibetan interrogative sentence recognition and classification model based on phrase features includes phrase feature analysis and question sentence recognition module. There are two parts in the phrase feature analysis module: interrogative word recognition and phrase feature analysis. In the part of interrogative word recognition, interrogative pronouns are identified by ry1, ry2, and ry3. The phrase feature analysis part obtains the phrase feature Feature1 or Feature2 or...or Feature8 of the corresponding question sentence by analyzing the phrase features before and after ry. The interrogative sentence recognition module recognizes and classifies Tibetan interrogative sentences exploits phrase characteristics.

An analysis of the features of Tibetan interrogative sentences
Tibetan interrogative sentence is a sentence pattern classified according to the mood of the sentence. It is a sentence that asks others questions about the type and nature of the things in question [15][16][17][18]. Compared with declarative sentences, imperative sentences and exclamatory sentences, Tibetan interrogative sentences have obvious differences in mood and emotional color.However, the current technology can not identify interrogative sentences according to mood and emotional color. By analyzing the structural features of Tibetan interrogative sentences, we find that each interrogative sentence contains at least one interrogative word (called interrogative pronoun ry in part of speech marker set, also known as interrogative pronoun below). Tibetan interrogative pronouns are very clear and limited in number. In order to analyze the features of interrogative sentences, we divide  interrogative pronouns into three categories. The classification of Tibetan interrogative pronouns is shown in Table 1.  Table 1, except for " ནམ ", all the others belong to one type, and there is no multicategory problem. The type of the interrogative pronoun " ནམ " can be judged according to its position and context. When it appears after the verb, adjective or auxiliary verb, it belongs to ry1, otherwise it belongs to ry2.
Employ interrogative pronouns as an entry point, we analyze the grammatical structure and structural characteristics of Tibetan interrogative sentences. According to the different combination characteristics of interrogative pronouns and their contexts, we can divide them into general interrogative sentence (TIS1), emphatic interrogative sentence (TIS2), specific interrogative sentence (TIS3), optional interrogative sentence (TIS4), yes no interrogative sentence (TIS5), ཨེ interrogative sentence (TIS6) , Self-questioning and selfanswering questions (TIS7) and multiple interrogative pronouns (TIS8) etc eight types of interrogative sentence pattern. Among the eight types of interrogative sentences, the phrases that can be used before and after the interrogative pronouns are different, and the characteristics of the Tibetan interrogative sentence phrases shown in Table 2 are obtained through statistics.

Tibetan interrogative sentence recognition and classification based on phrase features
The interrogative pronouns and corresponding phrases in different types of Tibetan interrogative sentences have different features. Taking the interrogative pronoun ry as the entry point, the phrase features before and after the interrogative pronoun ry are analyzed, and the Tibetan interrogative phrase feature analysis algorithm (Algorithm 1) and the Tibetan interrogative sentence recognition and classification algorithm based on phrase features (Algorithm 2) are designed. The specific algorithm is as follows: The function of Algorithm 1 (Feature_analysis) is to analyze the phrase features before and after the interrogative pronoun ry according to Table 2, and match the phrase features of the eight types of interrogative sentences in Table 2 The function of Algorithm 2 is to call Algorithm 1 for each sentence containing interrogative pronoun ry in Tibetan text Input_file to obtain its corresponding phrase feature Feature, identify the interrogative sentence based on the returned phrase feature Feature, and classify it into Corresponding interrogative sentence library TIS1-TIS8.

Experimental data description and experimental design
In order to experimentally verify the performance of the feature-based Tibetan interrogative recognition and classification method, 5200 sentences including declarative sentences, interrogative sentences, exclamation sentences and imperative sentences were selected from the Tibetan corpus established by the research group as the experimental corpus. Among them, there are 1100 interrogative sentences. The distribution of various types of interrogative sentences is shown in Fig. 2.   Fig. 2. Distribution of interrogative sentence types in experimental corpus.
Because the Tibetan interrogative sentence recognition and classification method based on phrase feature can recognize Tibetan interrogative sentence and classify it at the same time. In order to verify the effectiveness of the algorithm, we design two groups of experiments to test the recognition and classification performance of Tibetan interrogative sentence. Moreover, because there is no literature report on the research of Tibetan sentence pattern recognition technology, this article failed to compare and analyze the experimental data and results in other literature during the experiment. The experimental results of the two groups are shown in Tables 3 and 4, respectively. In Table 3  In Table 4, C represents the type of interrogative sentence, Si represents the number of the i-th (i=1, 2,..., 8) type interrogative sentences in the experimental corpus, and Mi represents the number of the i-th type interrogative sentences identified by the algorithm in this paper. Si∩Mi represents the intersection of Si and Mi, that is, the number of sentences correctly classified by the algorithm in this paper. Table 3 show that the Tibetan interrogative sentence recognition method based on phrase features has achieved good recognition effect, which basically meet the practical needs. Because some declarative sentences contain not only interrogative pronouns, but also the phrasal features of these declarative sentences are the same as those of interrogative sentences, so this kind of declarative sentences are identified as interrogative sentences, and the situation of Mi> Si occurs. Experiment 2 According to the experimental data in Table 4, the average classification accuracy, recall rate and F value of Tibetan interrogative sentences reached 96.83%, 100% and 98.38%, respectively. Except for three sentence patterns of TIS3, TIS7 and TIS8, the other classification evaluation indexes have reached 100%, which indicates that the classification accuracy, recall rate and F value of Tibetan interrogative sentence are 96.83%, 100% and 98.38%, respectively The classification of Tibetan interrogative sentences based on phrase features has also achieved good results. The reason that affects the classification of tis3, TIS7 and TIS8 is that the phrase features of these three types are the same as those of declarative sentences with interrogative pronouns.

Conclusion
It is the key technology and premise of syntactic analysis, text classification and sentiment analysis to identify different sentence patterns and conduct targeted research. This paper designs a Tibetan interrogative sentence recognition model based on phrase features, analyzes the phrase features before and after the interrogative pronouns, and proposes a Tibetan interrogative sentence recognition and classification method based on phrase features. In order to verify the effectiveness of this method, we design two groups of experiments to examine the performance of interrogative sentence recognition and classification. The experimental results show that the recognition accuracy, recall rate and F value of interrogative sentences are 98.21%, 100.00% and 99.10% respectively, and the average classification accuracy, recall rate and F value are 96.83%, 100% and 98.38%, respectively.
The National Natural Science Foundation of China (618666032,61966031,61662061,61063033), National Key Research and Development Project, Projects funded by the Department of science and