Conflict resolution and missing completion in the fusion of domain ontology in cyber-physical systems

CPS integrates information services, human resource services, and physical equipment services, and always be supported by the ontologies in multiple domains. Due to the inconsistency of ontologies from different domains, the fusion of domain ontologies may have conflicts and deficiencies. Therefore, this paper provides a method for the conflict resolution and missing completion in the fusion of domain.


Introduction
CPS (Cyber-Physical Systems) refers to an engineering system composed of a group of highly integrated physical and software components. The ontology of a certain field refers to the organic combination of information and relations in the field. CPS integrates information services, human resource services, and physical equipment services, and the information services may involve the collection of data or the integration of ontology in multiple domains. Due to the inconsistency of ontology in different domains, the fusion result of ontology may have conflicts and deficiencies. [1] The sources of ontology are diverse. With the development of data collection technology, the scale of data to be managed has also grown rapidly. The forms and sources of data generation are diversified. The form of data organization is no longer single, and the relationship between data is no longer a pure one-to-one relationship in the past. It is a complex, multi-dimensional relationship. However, the increase in the amount of data does not mean the improvement of data quality. The increase in the amount of data is accompanied by an increase in the probability of data quality problems. New data relationships will also lead to new data quality problems. However, data quality issues may cause different problems in different fields. The same point is that they may have very serious consequences. For example, it may cause huge property losses in the financial field, and the medical field may lead to improper treatment of patients' conditions and even affect the doctor-patient relationship. Therefore, it is very necessary to analyse the data quality problems in the context of big data and explore reasonable solutions. [2] The advent of the data age means that the previous methods of processing data will no longer be fully applicable, and the role of data quality research in big data research has become more prominent. Generally, what people call high-quality data refers to the availability of data legibility, completeness, etc. For inferior data, in a narrow sense, it usually means that the data has one or more problems in terms of missing, redundancy, conflict, etc. In a broad sense, the credibility, relevance and other aspects of the data should also be considered. The process of turning inferior data into high-quality data through a series of methods and means is usually called data repair. For different data problems, the methods of data repair are also different. For data quality problems in relational databases, it is a feasible method to repair the data quality problem by using the method of functional dependence combined with conditional constraints. For data quality problems in nonrelational databases with completely different data organization structures, relying solely on traditional functional dependence combined with conditional constraints is not enough to deal with, because different types of data, such as value pairs, graphs, documents, etc, are involved.
In terms of the expression mechanism of data availability, the advantages of relational databases are mainly reflected in the good maintenance of data consistency. Due to the high degree of data coupling, most of the data quality problems can be solved through traditional functional dependence and conditional constraints. For non-relational databases, data quality problems are not only reflected in data loss, redundancy, conflicts, etc., but also may involve data security or relevance issues [3] . For this type of data quality problem, the traditional method of restoring data quality with functional dependence and conditional constraints is often impossible to start. Therefore, for data quality problems that appear in data, this article believes that a research idea that can be tried is to redefine a expression form and processing mechanism of functional dependence and conditional constraint. Functional dependence can be used to express the mapping relationship between two data sets. Let R denote a relationship, A and B are two data sets containing this relationship, and use A B → to denote a functional dependence, if any two tuples in have equal values in set A , it can be inferred that their values in set B are also equal.
Based on the above theoretical basis, it can be concluded that the basic idea of using functional dependence to repair data quality problems is: define a functional dependence set on R, and if all functional dependencies in the set can be satisfied, then it is determined that the data quality requirements of R meet the standard. On the contrary, it is considered that there is a data quality problem [1] . Therefore, based on the traditional data restoration method for relational data using functional dependence as a means, this paper proposes a method to use dependency theory to repair the relationship quality problem of graph data.
In addition, the concept of graph involved in this article refers to the data stored in the graph structure, not the image data [4] .The database used to store graph data is called the graph database. Graph is a type of non-relational database (NoSQL Database), usually used to process a large amount of graph-based data, suitable for the rapid growth of scientific data in recent years and data with similar structures such as social networks [5] .Up to now, graph databases have been widely used in various fields. For example, processing a large number of configuration files generated by nodes in a network telemetry system. If these files are stored in a relational database, they will take up a lot of disk space and reduce efficiency; and if stored in a graph database, it is very convenient to process [6] .For another example, map construction information can be extracted and inferred to achieve correlation analysis of information through the acquired quality information, and the results can be stored in the map database [7] . The article proposes the use of graph refinement methods for data pre-processing, and its purpose is to facilitate subsequent operations.
The organization structure of this paper is: Section 2 gives related work and proposes problems and research directions. Section 3 gives relevant definitions. Section 4 analyses examples and gives a repair model. Section 5 Summarize the work of this article.

Related work
Li Jianzhong [8] et al. summarized the research progress in various fields of big data availability in the article Research Progress in Big Data Usability, and summarized and looked forward to the future research directions. In the paper Based on Functional Dependence and Conditions Constrained Data Repair Method, Jin Cheqing and others proposed an algorithm that combines functional dependence and conditional constraints to repair data for relational databases and designed corresponding experiments [4]. In the article, they first analysis the traditional use of functional dependence data repair strategy, and divided into two main strategies: one is to directly delete records that do not meet the requirements; the other is to not add/delete any records, but only modify certain fields. Through analysis, it is found that although the function dependence is very important and effective, but there are still some important constraints (hard constraints, quantity-related constraints, equivalent constraints, non-equivalent constraints, etc.) that cannot be described by functional dependence. Therefore, a data repair method combination functional dependence with conditional constraints is proposed. This method solves the shortcomings of only relying on functional dependence to repair data, but its own shortcoming is that the type of data that this method can handle is limited to relational data.
Hamza et al. [9] aimed at the problem of missing data in the Internet of Medical Things(IoMT) and proposed that a dynamic layer recurrent neural network(Dynamic L-RNN) can be constructed to predict missing data. The core idea is to use complete data set for deep learning, the trained model is used to predict the missing data in the incomplete data set, and finally achieve the purpose of repairing the missing data. The idea of this method is applicable to most of the data missing problems of relational or non-relational data. The disadvantage is that only the problem of missing data is analysed and processed, and other traditional data quality problems (such as data conflicts, etc.) are not involved.
M.TALHAa [10] et al. put forward the question of how to weigh data quality and data security in the big data environment, and analysed the conflicts and challenges that may arise. The author creatively put forward the view that data security may become an obstacle to data quality(vice versa), which may be ignored by most scholar. The conflict between the two systems makes the complexity more prominent. In the context of big data, flexible read and write permissions are necessary to implement a data quality management system, but this condition will leave data security risks, because this permission may be maliciously used by some people for illegal gains. For this reason, the author suggests that a finegrained access control model can be implemented or extended to avoid conflicts between data security issues and data quality issues, available models include TBAC (Task Based Access Control), RBAC (Role Based Access Control), etc.
Danilo Ardagna [11] and others proposed a data quality service for the context-sensitive data quality evaluation problem in big data, which can evaluate the data quality of large data sets through parallel computing, and it can choose the amount of data for analysis on the basis of time and resource constraints. They mentioned that in the face of heterogeneous source processing, an adaptive method is required, which can trigger an appropriate quality assessment method based on the data type and context. The authors also considered that in some cases, due to performance and time constraints, it is impossible to evaluate the quality of the entire data set. Therefore, they proposed that only a part of the data set should be evaluated. Data quality evaluation, and by introducing credibility as the reliability metric of the quality evaluation program to measure the resulting loss of accuracy. From the result MATEC Web of Conferences 355, 03012 (2022) ICPCM2021 https://doi.org/10.1051/matecconf/202235503012 point of view, this method effectively improves the efficiency of data quality evaluation, but it inevitably produces a loss of accuracy, although they make up for this loss to a certain extent by introducing credibility.
Maryam [12] and others proposed to use a structured learning theory combined with a data quality framework to explore the impact of processing big data on the quality of company decision-making and the mediating role of data quality and data diagnostics in this relationship, as well as by exploring big data The impact of utilization in terms of data quality and data diagnostics to improve the quality of corporate decision-making and revenue generation. The author found that although there are many related theoretical studies, there is no empirical study to explore the impact of big data utilization on the quality of corporate decision-making. Therefore, the author analysed the data of more than 130 companies, and put forward ideas for the impact of big data on data quality, data diagnostics, etc., and finally verified the ideas through quantitative experiments.
Fan Wenfei [13] [14] et al. proposed a type of consistency constraint called conditional function dependency, which captures data consistency by enforcing semantic related value binding. He proposed a model, through partial Time sequence and time constraints are used to specify the timeliness of data, and the timeliness of data is strengthened through invariable conditional function dependence [15] [16] . Carlo [17] and others have combined the detection and repair of data timeliness errors. Time-effect function dependence and approximate function dependence, the time-effect approximate function dependence is proposed, and its basic definition and some related data mining techniques are given.
Data sampling technology is usually used to improve the performance of the learner when the data is unbalanced. If the data quality is too low or the training data set is too small, the training results will be more unreliable. Jason Van Hulse conducted a comprehensive study on the characteristics of different training data sets, and the results showed that there are multiple indicators that affect the training results. Therefore, in the process of data set analysis, multiple indicators need to be considered [18] .
Data dependence and fuzzy data dependence play a great role in maintaining data consistency and preventing data redundancy. P.C.SAXENA et al. standardized the concepts in Type 2 fuzzy relational database and defined new fuzzy functional dependencies [19] .
Tu Feifei [20] and others summarized the data quality problems in software development support tools such as the problem tracking system and version control system, and summarized 9 data quality problems, and further proposed the use of redundant data to make corrections. And the method of mining user behaviour patterns to modify. The author analyses data quality problems from the three stages of data generation, data collection and data use, including: problem reports in the data generation stage, incorrect creation time and version control data Time issues; incomplete data crawling in the data collection stage and incomplete data issues caused by security and privacy; future data leakage issues in the data use stage, email address issues, and issues about authors and submitters in version control data.
From the above analysis, it can be seen that the method of functional dependence combined with conditional constraints has traditionally been mainly used to repair relational data, while data storage methods have undergone tremendous changes in the context of big data [21] , and non-relational storage has gradually become the mainstream. The old conditional function dependence theory is no longer fully applicable, so this article attempts to study the new processing mechanism of function dependence in the context of big data combined with conditional constraints to solve the problem of graph data quality.

Domain ontology
The ontology is usually represented by a directed graph G . Suppose there is a data node set V, a label set P , and an attribute name set Attrs ; for an attribute a in the attribute name set Attrs , its domain can be denoted as ( ) Dom a .The definition of knowledge ontology is given below [22] . Definition 1. A domain ontology G with data information can usually be represented by a two-tuple ( ) , , , V is a finite set of data vertices, is a function to specify attribute values for nodes, Suppose that a directed graph G contains m nodes and n edges, where 0 m > and 0  V and 2 V is bidirectional, and it cannot be changed when it is a one-way edge. Assuming that each node contained in ( ) V G has a unique identifier(label), its function is to ensure that each node or each edge is independent and unique. Conditional constraints are added on the basis of functional dependence to enrich the expression range [23] . The definitions of conditional constraints in four graph data scenarios are given below.

Functional dependency and conditional constraints
, , EV R R a ). Several vertices with the same relationship 1 R , they and the edge with another relationship 2 R all point to the same vertex a . Definition 6. (semantics constraint, SC). There is no fixed form, usually common sense or logic in our daily lives, which can be abstracted into corresponding forms according to specific contexts.

Conflict resolution and missing completion in the fusion of domain ontology
The conflict resolution and missing completion in the fusion of domain ontology will be illustrated by the . Therefore, it can be inferred that there is a data conflict between the two nodes. Through analysis, the following two situations can be obtained: One is that the relationship between 1 P and 2 P is correct, and the 4 a attribute in 1 P and 2 P is obvious There is a data quality problem; the second is that the 4 a attributes of 1 P and 2 P are correct, then the relationship 1 R between 1 P and 2 P is wrong. In these two cases, the next step of processing will get two different results, which are not as expected. In these two cases, the next step of processing will result in two different results, which is not consistent with the expected only repair result, so it is necessary to deal with the above two situations to make the result unique. The reason for the above two situations is the lack of a standard library that can be referred to, that is, the lack of preconditions to unify the two situations. The solution provided by the article is to establish a standard library for reference, that is, the dependency model. Example 2. As mentioned in Example 1, the relationship between node 1 P and 3 P is also 1 R (i.e. colleague relationship) in Figure 1. From node 3 P , it can be intuitively seen that the value of attribute 4 a is empty; obviously, there is a data quality problem with missing data in node 3 P . However, the statement that "the node has a data missing problem intuitively through the naked eye" lacks a convincing basis. Therefore, we compare Example 1, we know that 1 P and 3 P are colleague relationships, the attribute 4 a =Ali of 1 P , and the attribute 4 a of 3 P is empty. Obviously, " 1 R =colleague relationship" cannot exist that the company attribute "company" of one person is empty, so that 3 P has the problem of missing data. It can be further deduced that the missing attribute 4 a value of 3 P is also "Ali". In this way, a method to determine the missing data problem and the corresponding repair method are obtained. Similarly, it is necessary to rely on the model to determine the preconditions (that is, the attribute 4 a do exist in 1 P ) is correct. As a result, the conflict resolution and missing completion will be proceeded by three steps: First, refine the directed graph in which the fusion ontology is stored. And then do the conflict resolution. Finally, do the missing completion.

Directed Graph Refinement
The structure of the entire graph can be simply classified into two parts: node and edge. Each node has a unique identifier. The inside of the node contains attributes that describe the node, and the edge stores the relationship between the node and the node. In section 3.1, it is mentioned that the internal attributes of the node are not easy to handle. In order to solve this problem, it is necessary to perform some simple pre-processing on the graph data. Take an example to separate the internal attributes of the original node It becomes an independent node and edge, and only retains the identifier of the original node. In order to facilitate the expression and description, the label is used instead. The refinement process algorithm is given as shown in Algorithm 1. The separated node is the attribute value of the original attribute, The edge is the attribute name of the original attribute. Algorithm 1. Directed Graph refinement. Input: Directed graph 1 1 1 ( , ) G V E , relational set 1 R Output: Directed graph 2 3 2 ( , ) G V E , relational set 2 R 1.
Initialize these three sets to empty sets.

5.
for all

13.
Add the newly added edge to 1 E as 2 E , obtain 2 G .

Functional dependency confidence
The domain ontology data is transformed by the storage relation, and the data is converted into a relational table clustered by type according to the storage scheme, and the original graph data structure is converted into data based on relational storage (i.e. relational data). The concept of data confidence in AFD (Approximate Functional Dependence) can be used to detect the reliability of functional dependence [24] . If person and work-with are related through foreign keys, and the $work-with$ in the header of the table is a functional dependency. The functional dependence obtained in this way is not necessarily correct, so we adopt the concept of data confidence in the classic AFD to measure whether a functional dependence is credible. After calculating the confidence of a certain functional dependency, determine whether to adopt the functional dependency by specifying the validity range of the functional dependency. . Then the standard form of confidence of the functional dependence ( ) con ϕ is as follows [25] .

Conflict resolution and missing completion
After refining the graph structure, we can start the next step, that is, to resolve the missing and conflicts in the fusion of domain ontologies.
The abstract process of resolving the conflict problems is as follows: • 1. Given a directed graph G as a fusion ontology, its relationship set is R , its TFD set is Φ , and all nodes are put into set B , • 2. Refining the original directed graph to obtain a new directed graph 1 G and a new set of relations 1 R , in the new directed graph 1 G , the node corresponding to the node V in the original directed graph is 1 V (through the unique identification symbol), • 3. Pick one initial node 2 V , traverse all the nodes associated with it and put it into the set C , • 4. Take a node 3 V in the set C and delete 3 V from the set C . Corresponding to 4 V in 2 G , for the relationship between all the associated edges of 1 V and 4 V , compare each function dependency of the traversal function dependency set, and compare all edges that do not meet the conditional dependencies are reset, so that the relationship that points to the same node corresponds to another relationship that also points to the same node, and update the graph 1 G , • 5. Deal with the hard constraint ( , ) HC U a , reset all the edges in the set U so that they all point to the same node a, and update the graph 1 G , • 6. Process the equivalence constraint ( ) EC U reset all the edges in the set U so that they all point to the same node, and update the graph 1 G , • 7. Repeat steps 4,5,6 until C = ∅ , • 8. Repeat steps 3,4,5,6 until B = ∅ , • 9. Return the directed graph 1 G after dealing with the data conflict problem.
The data conflict repair algorithm is given as shown in Algorithm 2. Obtain 3 G Next, fix the problem of missing data. After the graph structure is refined, it can be found that the company is − edge of the 3 P node points to an empty node. In general, the node can be empty, but in this case, it can be noticed that the 1 P and 3 P nodes are also in a work with − relationship, so the empty node is not valid and should be a data missing problem. Similarly, the above hard constraints and conclusion are used here, that is, the company is − of 3 P node edge should point to Ali node. Give the process of repairing the missing problem: • 1. For the directed graph 1 G that has repaired the data conflict problem, traverse all the nodes, find the empty nodes among them, and put them into the set E ,

Algorithm 2. Conflict Resolution
• 2. Take any node 5 V from E and traverse all the nodes associated with 5 V . If there is only one node associated with 5 V , set it as 6 V , and set the relationship between 5 V and 6 V as 1 R ; traverse all the nodes associated with 6 V , select the same nodes with 1 R relationship, compare all the functional dependencies of the dependency set one by one to find out the matching functional dependencies, if there are matching functional dependencies, fill in the missing part according to the functional dependencies, if not, continue processing hard constraints and equivalent constraints until the conditions are found. If no repair method is found after all the conditions are traversed, the empty node will be marked; if there are multiple nodes associated with 5 V , the empty node will be marked directly; delete node 5 V and update directed graph 1 G , Detect all empty nodes.

4.
If a certain functional dependency (or conditional constraint) is met, the node 5.
is inferred based on the functional dependency (or conditional constraint). 6. end for 7.
Obtain nodes set F and edges set H 8. 4 ( , ) G F H = Combining the repaired results of the three data quality problems of data conflict, data redundancy and data missing, the article gives the repaired graph data.
From the restored graph data, it can be seen that nodes 1 P , 2 P and 3 P have three edges with the same relationship pointing to the same node, which means that these relationships of these nodes are related, and this point is consistent with the conditions specified in the functional dependency set and conditional constraints, and furthermore, it means that the graph data is clean graph data.

Summary
In the process of fusion of multi-domain ontology, the problems of conflict and missing are inevitable. In this paper, a method is proposed, which described the domain ontology with a directed graph, thereby transforming the problem of conflict resolution and missing completion between domain ontology into the relationship between directed graphs.
The main contributions are as follows: Designed the processing flow of conflict resolution and missing completion in the process of multi-domain ontology fusion; Abstract the multi-domain ontology fusion process into the relationship between directed graphs, and then use the method of dealing with directed graphs to deal with the conflicts and deficiencies that may occur in the process.
The future work mainly includes two aspects: on the one hand, mining more special constraints with the characteristics of directed graph; On the other hand, it is mainly to consider the data quality of other types of non-relational data (such as column-based data, key-value pair data, etc.) can use the same (or similar) method. Among them, other kinds of special constraints characteristics can be considered from the connectivity of directed graph, and weights can also be considered to describe the relationship represented by the edges in the directed graph. Or distinguish the newly generated relationship after the refinement operation to simplify the operation of quality problems. For other types of non-relational MATEC Web of Conferences 355, 03012 (2022) ICPCM2021 https://doi.org/10.1051/matecconf/202235503012 data, take key-value pairs as an example, we can consider the connection between keyvalue pairs and directed graph. Analogy to the method of operating directed graph, one way of thinking is to consider whether directed graph and key-value pair data can be converted to each other (or to consider the corresponding relationship), and the other is to expand the dependency theory to adapt to new scenarios.