Sentence Similarity Research Based on Chinese FrameNet and Semantic Dependency Parsing

According to the problems existing in the similarity comparison of Chinese sentences,this paper proposed a sentence similarity computing method which combined with advantages of Chinese frameNet method and semantic dependency parsing method.This method is based on the framework of semantic.And firstly,the method analyzed and calculated the similarity of two frameworks;Further,it analyzed semantic dependency relationship existing in the core frame elements from the two aspects of the meaning and the overall dependency relation;Finally,it put forward the fusion sentence similarity calculation formula.Experimental results show that compared with the method based on space vector model and based on HowNet and based on Frame Semantic Parsing, this method has higher accuracy in the similarity judgment of Chinese sentences.


Introduction
Sentence similarity research has shown great potential in the fields of machine translation [1],depth information retrieval [2],automatic question answering [3] and so on.This paper will start from the Chinese FrameNet and identify target words of specific sentence firstly;next, calculate the similarity value of the two sentence framework name;and then calculate the semantic core frame elements in the dependency relation similarity;finally,measure the similarity of two sentences.

Related Work
At present,the research of Chinese sentence similarity is mainly divided into two aspects:One is the vector space model based on the word,and the other is the semantic sentence similarity method based on the word and syntax structure.Obviously,it is ideal to study text similarity based on syntax and semantics,But the research purely based semantic is difficult.Therefore,some experts have proposed semantic similarity research methods based on grammatical structure.At present,the more mature methods are the Chinese FrameNet method and the Semantic Dependency Parsing method.

Chinese FrameNet(CFN) is developed by Shanxi
University,with Fillmore's frame semantics theory [4]and California University Berkeley FrameNet [5] as a reference method to construct a frame of semantic based on Chinese Corpus,and it includes frames,word elements,frame relationships,examples and chapters [6].
"The beggar pursues a dog.",as shown in figure 1.It is the result of using CFN semantic role automatic labeling tool.The target word(Tgt) in the example is "chase" and the frame name is "following"."Beggar"is the Theme of this frame and it is usually a living entity.And the Cotheme represents another moving object.
Semantic Dependency Parsing(SDP) [7] is the analysis of semantic associations between linguistic components within a sentence and semantic associations MATEC Web of Conferences 139, 00028 (2017) are represented by dependency structures.Figure 2 shows the semantic dependency parsing of the example "he hears the explosion".The main dependency relation exist in above example is Affection(Aft) and Content(Cont),that is "hears→he" and "hears→explosion".

Sentence similarity computation based on Chinese FrameNet and Semantic Dependency Parsing
Through the above analysis,the Chinese FrameNet method is difficult to compare the similarities between different elements.For example,the example sentence shown in Figure 1 and "a dog chases a beggar.",theirframes are "following",but the meaning of the sentences is completely different;The SDP method lacks a macro understanding of the whole sentence.For example,the example sentence shown in Figure 2

Similarity between frames
The CFN method first determines the target word of the sentence,and then determines the frame to which it is inspired based on the target word.In practical applications,we find that,the Chinese FrameNet and semantic computing research team of Shanxi University has many similar frames in the library of 323 frames accumulated over the years.If the target words of the two sentence can arouse similar frames,the two sentences may also have very high similarity.
Li Feng [8] proposed the semantic similarity algorithm of words,such as formula (1).
The similarity between frames can be calculated in accordance with formula (1).Because CFN constructs the frame semantic resources and carries on the frame classification, and divides the same type of word element into a frame.So,in addition to some higher similarity frames,the similarity between most of the frame names are less than 0.4,and the similarity between more than half of frames is less than 0.1.

Similarity of sentence core frame elements based on Semantic dependency
If the two sentences have the same frames or a high degree of similarity is calculated in accordance with the 2.1,the semantic dependency relationships in the core frame elements within the sentence need to be further determined.
The similarity judgment of intra sentential semantic dependency is divided into two parts:single semantic dependency computation based on the meaning of words and the whole semantic dependency relation computation based on VSM.

Single semantic dependency computation based on the meaning of words
Semantic dependency parsing can be analyzed and determined by language technology platform of Harbin Institute of Technology [9].The results of the analysis are shown in figure 3,which are the example sentence shown in figure 1 and "a dog chases a beggar.".The above two sentences belong to the "following" frame,and the target words are "pursue/v" and "chase /v".
The core frame elements are "beggars" and "dogs".According to the analysis of semantic dependency relation, two sentences' Agt are "beggars" and "dog";Pat in figure 1 is "dog" and Datv in "a dog chases a beggar."is"beggar".
In the two sentences,the similarity of the specific words in the same or similar semantic dependency relation can be expressed by formula (1) and represented by Sim(R 1i ,R 2i ).Among them,"1" and "2" represent the first sentence and the second sentence respectively,and "i" represents the corresponding or similar semantic dependency in the two sentences.

The whole semantic dependency relation computation based on VSM
The number of times a semantic dependency "r" appears in the sentence is represented by the weight "W",then all the semantic dependencies in sentences S 1 and S 2 are represented by vectors such as R 1 and R 2 ,as shown in formula (2),and a semantic dependency represents a dimension in vector space model whose value is the weight of the dependency "W".
According to the theory and method of the vector space model cosine method [10],it can be similar to the similarity of the semantic dependencies of two sentences,such as formula (3).(2)

Sentence similarity analysis
If taking into account the structure of the sentence, the time and cost will be enormous.sothis paper only  This paper adopts the detection method proposed in document [11].Extracting a sentence from the standard set each time,and calculating its similarity to each sentence in the test set,then ranking the resulting similarity from large to small.If the 2~3 sentences with the highest similarity are standard sentences with similar sentences,then the result is correct.Correct rate calculation as shown in formula (5).
The correct number of sentences for the test results Number of sentences to be measured r= (5) We Use VSM method,HowNet calculation method proposed in document [12],FrameNet semantic analysis method proposed in document [13] and the method

Conclusion
This paper presents a calculation method of sentence similarity,which combines Chinese

Figure 2 .
Figure 2.Semantic dependency analysis and "I smell dynamite.".the main semantic dependencies in the examples are Aft and Cont,but their frames are different,and the semantics of the two sentences are completely different.Therefore,we propose a method of sentence similarity based on Chinese FrameNet and Semantic Dependency Parsing.

Figure 3 .
Figure 3.Semantic dependency analysis results of two example sentences consider the semanti`c frame and core frame elements in the dependency relation between sentences,and semantic frame and core frame elements of the dependency relation is called the effective collocation of, it not only can eliminate fees for core frame elements interference calculation results,also can reduce the complexity of computing.Based on the analysis of 3.1 and 3.2,the similarity computation of Chinese sentence combining Chinese FrameNet network and Semantic Dependency Parsing is presented in this paper, such as formula (4).
SIM(S 1 ,S 2 ) represents the similarity of sentences S 1 and S 2 ;F 1 and F 2 represent the frame of two sentences,and Sim(F 1 ,F 2 ) represents the similarity between the two sentence frame;"n" represents the amount of corresponding or similar semantic dependencies pairs existing in the two sentence;α,β and γ are empirical parameters,they represent the contribution of sentence frame,the single semantic dependency relationship based on word sense and the semantic dependency relationship based on VSM.Their values are determined by a large number of experiments,and 0≤α≤1,0≤β≤1,0≤γ≤1,α+β+γ=1. Considering the semantic dependency relationship between the frames and the core frame elements and the contribution of the semantic similarity relation to the sentence similarity,and after a large number of value comparison experiments,α=0.3,β=0.3,γ=0.4.This not only shows the semantic expression of the frame to the whole sentence,but stresses the importance of semantic dependency in the sentence.4 Experiment and result analysis The final test corpus contains 790 Chinese sentences,of which 90 are experimental standard set,700 are experimental test sets,and the test set includes matching set of 270 sentences and noise set of 430 sentences.Each Chinese sentence in the standard set has 3 similar statements in the matching set.
method across the surface grammatical structure and is more accurate to dig out the deep semantic relations of the sentences.This method only analyzes the core frame elements of sentences and ignores the analysis of non core framework elements when analyzing sentence components by Semantic dependency,and this can avoid interference from the unimportant components of the sentence on the results of the calculations.Next, we will further optimize the similarity calculation method,and propose an improved algorithm for

Table 1 .
MATEC Web of Conferences 139, 00028 (2017) proposed in this paper test Chinese sentences in the test corpus.The test example is shown in Table 1,and the test result is shown in table 2. Test case

Table 2 .
Test results