Methods of structural analysis of business processes

The article outlines selected methods for analyzing business processes: their definitions and instances. The methods for analytical processing of processes constitute a component of the Business Intelligence environment process warehouses, including methods for analytical processing and exploration of the collected process definitions and instances i.e. process mining. One of the main elements of the analysis of processes is to determine the similarity between them. In systems for analyzing large sets of elements, the method of determining similarity should be efficiency because is the basis for others analysis methods, e.g. clustering, classification, etc. A method for analyzing structural similarity of business processes, based on the similarity of sequences of genetic tags of such processes, was presented using the similarity analysis methods based on the editing distance and the developed methods of structural similarity: GNM, DNM, GCM, DCM. The presented similarity methods were used to clustering processes and to determine the central element of the cluster. The developed methods form the basis for the development of similarity methods extended to aspects of semantic similarity of business processes and methods of analysis and exploration of processes.


Introduction
The process automation systems are more and more commonly used in business, especially the workflow systems.After the standard for modeling business processes, i.e.BPMN v1 and v2 [1][2][3] (Business Process Model and Notation), developed by BPMI (Business Process Management Initiative) and OMG (Object Management Group) was introduced, the systems based on the process approach started to grow dynamically.The graphical notation of the process presented in the BPMN standard is in human-readable form.On the other hand, the following formats are used for the purpose of exchange allowing clear interpretation: XML Process Definition Language (XPDL) [4,5], developed by Workflow Management Coalition (WfMC) and adapted to exchange process definitions saved in such a format between the systems, and BPDM (Business Process Definition Metamodel) format, developed by OMG, describing metamodels to define terms, relationships and their semantics used to exchange business process models between various tools for process modeling.The workflow systems created on such a basis and widely used are currently one of the most basic solutions applied in the management supporting systems.Such group of systems also includes process automation systems -the workflow systems, in particular the Workflow Management Systems (WfMS) [4, 5,6].The main purpose of such IT systems is to process the process instances, according to the modeled business process definition.It should be mentioned that the main task realized by the WfMS systems is the processing of processes in accordance with their definitions, not activities described thereby.The implementing systems or specialized services, independent of WfMS, are responsible for performing activities and works described in the processes.

Warehouse of business processes and process mining
The system for the business process analysis is categorized as the Business Intelligence (BI) system, based on or otherwise significantly supported by business processes [6][7][8].One of the main task of the BI systems is to analyze the collected definitions and instances of business processes aimed at increasing their efficiency and effectiveness in the ever-changing business environment through their modification, improvement and optimization.The variability of the environment, legislation, organization, and economic situation makes it necessary to continuously introduce changes and upgrades of the process to increase its efficiency in broad context.The so improved processes feed the process automation systems (workflow systems) as well as process analysis and exploration systems [9][10][11][12][13][14][15][16][17].The use of the process warehouse, including process exploration, makes it possible to perform an in-depth analysis of the processes gathered in the warehouse [7][8][9][10][11], by applying special tools for process exploration, which may use various methods, e.g.analytical, forecast, simulation [12], structural, semantic, etc.The process warehouse as well as process exploration tools should be included in the group of the Business Intelligence systems.
Fig. 1.Architecture of process warehouse [15] The functioning of the process warehouse in the organization is aimed at streamlining the management and decision-making processes by providing current data and information regarding the implementation of business processes.The main tasks of the process warehouse are, among other things, the following [15]: • integration with the process runtime environment, allowing to transfer data in time cycles and via an online model; • possible extraction and upload of the processes and process instances; • processing of data with the process definitions and instances derived from the process runtime environments, in cycle and event modes (online); • storage of data on business process, which include, among other things, the following: • collection of unified business process definitions and instances in terms of their structure and semantics, • collection of metadata and semantic data concerning business processes, • collection of indirect data and results of analytical processing and process exploration, • in forms allowing to analytically process the data, while ensuring expected performance, in particular • analysis of deviations of the process instances from template paths, • identification of deadlocks and bottlenecks in the process instances, • analysis of execution times of the process instances and particular components, • analysis of the decisions made during the process implementation, • identification of the so-called dead process or process fragments and resources, which are not executed or are executed very rarely, • definition and analysis of the process similarity measures as well as event sequences, • identification of new processes on the basis of the process execution logs, • identification of patterns as well as rare and frequent event sequences, • provision of apparatus and tools allowing the process analytical processing and exploration.
The main tasks of the business process warehouse as a decision-making support tool in the field of process exploration is a possibility of analyzing the collected process definitions and instances as resources of the process warehouse, i.e.: • analysis of process definitions: • proposed patterns of process definitions while modeling on the basis of the already used process elements and sets of patterns, • measures of semantic similarities of the process definitions or their fragments, • measures of structural similarities of the process definitions or their fragments, • validation of correctness of the process definition by analyzing the graph created on the basis of such definition, • analysis of the structure of the process definition by analyzing the graph created on the basis of such definition: • a level of detail of the definition defined as: clarity -number of vertices of a single graph and component subgraphs, complexitynumber of resources assigned to the graph or component graph, • whether it is possible that clusterization occurs, when two activities are waiting for their completion, • whether all cases have been included in the definition, • in the field of analysis of the process implementation (instances, logs of the executed processes): • process mining:  discovery of new process definitions on the basis of the process execution instances, • measures of probability of event sequences, • analysis of the process adequacy in terms of whether a given process produces the expected results, • identification of problems (e.g.gridlocks) and their causes.
The process warehouse is a system that, through its software, analysis methods and process exploration, allows efficient analytical processing and exploration of the collected business processes and process instances.Fig. 3 shows the general architecture model of the process warehouse, in which the following basic subsystems may be distinguished: • extract, transformation and load from different sources of various formats of the processes definition and executed instances of business process, together with their structural and semantic unification, • process warehouses as thematic composition of the unified process and process instances, • analyses and exploration of processes and process instances, • access to the process warehouse allowing interactive questioning and answering in the form of reports, tables, charts, etc.

Methods of similarity analysis of business processes
One of the main groups of the process analysis methods are methods used for examining similarities between the business process definitions and instances.The methods directly based on the graph and network theories as well as mixed methods may be included in the above-mentioned group [18][19][20].The structure of a business process is not simple, but complex, and consists of many components, which have significant impact on completeness and precision of the description.Generally speaking, while determining the similarity of business processes as graphs, it is possible to state that they are similar if their selected features are similar.The level of similarity of business processes is determined using a number calculated according to the rules defined by the similarity measure.However, there is no one universal measure, since evaluation of the level of graph similarity should depend on its purpose.Therefore, there are many methods for analyzing similarity of business processes, which are aimed at various aspects of the structure and purpose of the processes.
While assuming that the BI systems shall gather large numbers of definitions, in particular instances of the implemented processes, it is clear that the use of complex methods for analyzing similarity may be very time-consuming in terms of calculation.Therefore, it is important that the methods for analyzing the process similarity as one of the major and most often used methods for the purpose of analysis were not expensive in terms of calculation.In such cases, it should be made possible to use the approach of "multilayer sieves", where the similarity analysis methods of quick and little computing complexity shall be applied in the first place, despite the fact that they are often disastrous and inaccurate in terms of the similarity analysis.
The issue of normalizing the results of such measures is crucial for developing the similarity analysis method.Without such normalization, no comparison of the results produced by various methods in case of process pairs would be possible.The value ranges obtained by various similarity analysis methods would not be useful for analyzing the properties of such methods and comparing the obtained results for the same compared processes.Therefore, to normalize each of the applied similarity measure is an important issue.The normalization function in a general form is expressed by way of the following formula: The similarity of business processes may be analyzed in terms of its constituent size [18]: • syntactic similarity -only the syntax of labels is taken into account, • semantic similarity -the words in the labels are detached from the syntax and only the semantics are analyzed, • contextual similarity -not only the labels of elements are taken into account, but also the context in which these elements occur, and also: • structural similarity -based on the similarity of the process structures, • goal similarity -for measurement of the same destination of the processes.https://doi.org/10.1051/matecconf/201821004016CSCC 2018 When transforming the process to linear sequence of tags -sequence of genetic process [14,15], it is possible to use the similarity analysis methods based on the edit distance.The edit distance is the cost of transforming one sequence into another.The cost is measured using a number of basic operations necessary to maintain two identical sequences.It is evident that a larger number of operations must be performed on sequences, which are more different.The basic operations of sequence modification may include: adding, removal and modification of an element or subsequence.Such operations do not have to have the same cost.Their level of difficulty is the cost of modification.Each operation has a negative value of the modification cost assigned thereto.Basic measures for determining the distance and similarity of tag sequences are as follows:

By normalizing individual measures and calculating distances using various methods, it is possible to obtain the results allowing to analyze the similarity of process saved as genetic sequences (1).
The similarity measures based on the quantitative structure constitute an additional set of measures allowing to determine the similarity of processes saved as the genetic sequence [15]: • General numerical similarity measure (GNM) -the measure that defines the process similarity in terms of a quotient of occurrences of the types of tags in the particular process genetic tag sequences. ( (3) where: w i -number of tag instances (triples) of type i in base genetic tag sequences, p i -number of tag instances (triples) of type i in pattern genetic tag sequences.
• Detailed numerical similarity measure (DNM) -the measure which is similar to the general numerical similarity measure, except that the tags of the same type are those which have the same type of the initial active elements, including their properties, the same type of transition between the active elements and the same type of the final active element, including their properties.
• General chain similarity measure (GCM) -the measure which is aimed at defining the level of similarity between two business processes noted as the genetic tag sequence, based on the number of similar subchains.The ratio of the number of positive comparisons of the subchains with the general number of the compared subchains is the results of the abovementioned comparison.The subchains are properly compared if all of their corresponding threes are of the same type. where: , • Detailed chain similarity measure (DCM) -the measure is similar to the general chain similarity measurement, except that -when comparing particular tags in subchains -the tags of the same type are those which have the same type of the initial active elements, including their properties, the same type of transition between the active elements and the same type of the final active element, including their properties.
The method based on the normalized Euclidean distance is used to group the processes: (6) For the purpose of the process clustering, it is possible to use the method for determining permissible MATEC Web of Conferences 210, 04016 (2018) https://doi.org/10.1051/matecconf/201821004016CSCC 2018 neighborhood, understood as the normalized distance between the processes smaller than the defined distance: (7) as well as for defining central element of the group, i.e. the process in case of which the sum total of the distances from other group processes is the smallest: (8)

Model similarity analysis of genetic sequences of business processes
By transforming processes to the genetic tag sequences in a linear form, it is possible to apply methods for calculating similarity measures to tag sequences.When analyzing the similarity of the processes as genetic tag sequences and using the similarity measures for tag sequences outlined in chapter 3 points 1-19 hereof.Sample results of the similarity measures obtained for the processes shown in Fig. 2 -Fig.7, using the Levenstein similarity metric:    Results of the similarity measures of gtP1-6 processes for the following methods: • General numerical similarity measure (GNM)  • General chain similarity measure (GCM)

Conclusions
The above material outlines the results of studies conducted in the field of analysis of business processes and analysis systems of business processes.
The presented subject is related to the business processes, in particular their modeling and analysis, which constitute a relatively new area of research in both IT and management.However, the implementation of the described processes in both these fields stimulates their development dynamics, as the increased demand for more efficient activities may be a powerful engine for economic growth.
Transformation of the business process notation to the genetic tag sequence as well as application of different methods for analyzing the similarity of business processes based on edit distances allow for efficient analytical processing of a large number of sequences of the process genetic tags.
The presented aspects relating to methods for analyzing the similarity of business processes, which constitute grounds for methods and tools of analytical business processing should be considered as an introduction to further development of this subject matter in various directions.The aforesaid directions include aspects related to the issue of the warehouse of business processes and exploration of such process in terms of both their definitions and instances.
A ,P B ) -non-normalized measure value for the P A and P B processes, d NORM (P A ,P B ) -normalized measure value for the P A and P B processes, d MIN (P A ,P B ) -minimum possible measure value for the P A and P B processes, d MAX (P A ,P B ) -maximum possible measure value for the P A and P B processes,

Fig. 8
Fig. 8 Clustering processes for n=0.25 (green box) and defining central process as gtP3 (red box) -Levenstein similarity The similarity measures and distances were calculated analogically as average values of the similarity methods outlined in chapter 3 points 1-19 hereof: