Analytical processing system of business processes, methods of exploration of business process definitions and instances

. The article tackles the topics related to systems, methods and mechanisms for storing process definitions and instances for the purpose of their analytical processing - process warehouses and general architecture of such process warehouses, including tools allowing their extract, transform, load (ETL) as well as analytical processing and exploration of stored process definitions and instances - i.e. the process mining. Furthermore, the article describes a mechanism for storing processes, with the use of a network database, and one of the methods for researching into similarities of the processes through the tagging of generic processes.


Introduction
The presently applied methods of modeling, designing and creating IT systems are based on three basic approaches: structural, object and process. In the structural and object approach, the system structure and resources come to the fore, whereas the issue describing the processing dynamics stays in the background, even though this aspect is equally important in terms of a complete and consistent description of the modeled system. The situation is different in case of the process approach, where the priority is the description of the processing dynamics in the form of business process models, whereas the system models as well as the descriptions of the system structures and resources recede into the background.
The most popular currently applied notations of the business process is the BPMN v2.0 (Business Process Model and Notation) [1][2][3] standard, developed by the BPMI (Business Process Management Initiative) and OMG (Object Management Group) in January 2011. The standard defines a set of symbols and their semantics, allowing to model a diagram of the business process definition in a graphical form (Fig. 1). The graphical notation of the process, presented in the BPMN standard, is in human-readable form. On the other hand, the following formats are used for the purpose of a notation allowing clear interpretation: XPDL (XML Process Definition Language) [4] developed by WfMC (Workflow Management Coalition) and adapted for the purpose of exchanging process definitions noted in this format between various systems, and the BPDM format (Business Process Definition Metamodel) developed by OMG, which described meta-models defining concepts, correlations and their semantics to exchange the business process models between different modeling processes. Both these formats are based on data notation in XML (Extensible Markup Language), developed by XSD (XML Schema Definition) and XMI (XML Metadata Interchange).
The modeling of business processes is a defining stage of the model that describes operations of a given organization or merged or collaborating elements of the modeled reality fragments. To perform the modeling activities in a manner allowing on one hand, their actual and physical embodiments, and on the other -necessary or desired efficiency, it is crucial to apply the automation tools. The group shall include process automation systems -the workflow systems, in particular the workflow management systems -WFMS (Workflow Management System) [5]. The main purpose of such IT systems is to process the process instances, according to the modeled business process definition. It should be mentioned that the main task realized by the WFMS systems is the processing of processes in accordance with their definitions, not activities described thereby. The implementing systems, independent of WFMS, or specialized services are responsible for performing activities and works described in the processes.
The basic scope and tasks related to the business processing IT systems have been presented by the WFMC organization, which defines workflow in the following way: "The automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules" [6].
On the other hand, the workflow management system -WFMS -is defined by WFMC as: "A system that defines, creates and manages the execution of workflows through the use of software, running on one or more workflow engines, which is able to interpret the process definition, interact with workflow participants and, where required, invoke the use of IT tools and applications" [6].

Systems analysis of business processes
The system for the business process analysis (OLAP-BP) is categorized as Business Intelligence (BI), in case of solutions based or significantly supported by business processes [7][8][9][10]. The process modeling is an activity that requires constant modification, improvement and optimization of such processes. The variability of the environment, legislation, organization, economic situation makes it necessary to continuously introduce changes and upgrades of the process to increase its efficiency in broad context. The PDCA model [11] is used for the business process management. The model describes the process life cycle and consists of four main steps: Plan, Do, Check, Act. The model present a closed cycle of the process management, where the processes should be all the time modified and upgraded. The developed processes may be automated, analyzed and explored through the process automation systems (workflow systems) (Fig.2) [1,3], [5][6][7][8][9][10], [17][18][19][20][21][22][23][24][25][26], [28,33]. The use of the process warehouse, including process exploration, makes it possible to perform an in-depth analysis of the processes gathered in the warehouse [7,8], by applying special tools for process exploration, which may use various methods, e.g. analytical, forecast, simulation [12], structural, semantic, etc. The process warehouse as well as process exploration tools should be included in the group of the Business Intelligence systems.

Process mining
Workflow system Fig. 2. Execution (workflow system) and analysis ( process system architecture [18] The functioning of the process warehouse in the organization is aimed at streamlining the management and decision-making processes by providing current data and information regarding the implementation of business processes. The main tasks of the process warehouse are, among other things, the following [18]: • integration with the process runtime environment, allowing to transfer data in time cycles and via an online model; • possible extraction and upload of the processes and process instances; • processing of data with the process definitions and instances derived from the process runtime environments, in cycle and event modes (online); • storage of data on business process, which include, among other things, the following: • collection of unified business process definitions and instances in terms of their structure and semantics, • collection of metadata and semantic data concerning business processes, • collection of indirect data and results of analytical processing and process exploration, • in forms allowing to analytically process the data, while ensuring expected capacity, in particular • analysis of deviations of the process instances from sample paths, • identification of deadlocks and bottlenecks in the process instances, • analysis of execution times of the process instances and particular components, • analysis of the decisions made during the process implementation, • identification of the so-called dead process or process fragments and resources, which are not implemented or are implemented vary rarely, • definition and analysis of the process similarity measures as well as event sequences, • identification of new processes on the basis of the process execution logs, • identification of patterns as well as rare and frequent event sequences, • provision of apparatus and tools allowing the process analytical processing and exploration.

Process analysis system
The process warehouse is a system which, through its software, analysis methods and process exploration, allows efficient analytical processing and exploration of the collected business processes and process instances. Fig. 3 shows the general architecture model of the process warehouse, in which the following basic subsystems may be distinguished: • extract, transformation and load from different sources of various formats of the definition notation regarding processes and executed instances of business process, together with their structural and semantic unification, • process warehouses as thematic composition of the unified process and process instances, • analyses and exploration of processes and process instances, • access to the process warehouse allowing interactive questioning and answering in the form of reports, tables, charts, etc.

Process warehouse -analytical composition of the process definitions and instances
The process warehouse is an IT tool used for collecting, accessing and efficient processing of the business processes and process instances for the purpose of their analysis and exploration. The physical composition of the process warehouse includes the components of the process definitions and instances in the form of unified data, metadata, semantic data and analytical data (Fig. 4).
Warehouse of the unified processes and instances -a component responsible for the storage of data on the process definitions and instances, pursuant to the effective standards, e.g. XPDL [4,14], MXML [16], XES [15], which constitutes grounds for generating data on the processes stored in the analytical data warehouse. As part of the preliminary data, two variants of the warehouse structures were developed and tested -the original variant based on the physical warehouse of the matrix process structures and the warehouse using Neo4j graph database [31].
Warehouse of metadata and semantic data -a component responsible for the storage of semantic models expressed in ontological languages, which ensures, among other things, a possibility of expressing complex links between the model elements and a possibility of expressing logical equivalence of the terms and instances as well as open world assumptions, by introducing a specific manner of interpreting the lack of knowledge in the model. Correlations between various elements of the process warehouse -a component that implements metamodel of the processes warehouse and that is responsible for storing data on the links binding individual elements of the process warehouse, ensuring its coherence.
Analytical data and process warehouses -a component responsible for storing data on the processes in the form of graphs and networks in the form of matrices, using the network databases -this method for representing data on the process definitions and instances allows to apply a mathematical apparatus for the purpose of the process analyses and streamlines analytical processing in terms of its capacity.

Method of similarity processes by genetic tagging
One of the main groups of the process analysis methods are methods used for examining similarities between the business process definitions and instances. The methods directly based on the graph and network theories [9,27,29,30] as well as mixed methods may be included in the above-mentioned group. The methods are usually quite complex in terms of calculations, which means that their application for the analysis of a large number of the process definitions and instances may be rather costly as far as calculations are concerned. Therefore, the process similarity method by genetic tagging is aimed at increasing the efficiency of the processes similarity analysis in case of a significant number of the processes. The method is based on the process transformation from the form based on graphs to the form of element sequences (genes), which correspond to the model form. Generally, the similarity analysis is aimed at analyzing two "genetic" sequences of the processes.
The first step is to transform the notation of the process definition from the graph form to the linear tag sequence. Such transformation should include the relationship of a certain order, which shall transform two different notations of the same process definition into a uniform and identical process tag sequence.

G=(V,E)
(1) The process defined as graph G, in which a set of the vertices V means a set of the process active elements (i.a. process, activities, gates, events), and set E means the flowing of controls and messages between the process active elements, including the definition of such flow f. An element of the set of vertices V is defined as a wellordered tandem of elements, where s means semantically unified feature of the process, and g means a type of the process active element.
The elements of the process genetic tag sequence are in the following form: whereas the process genetic tag sequence is in the form of the well-ordered sequence: Transformation of Fig. 5  For example, a slight modification of the process presented in Fig. 5 may be introduced by adding one activity, and we obtain the process shown in Fig. 9. Process genetic tag sequence Fig. 9  The so prepared process genetic tag sequence is in the linear form of the input data to establish the similarity measures. Basic process similarity measures have been developed on the basis of the similarities established for the corresponding process genetic tag sequence: • General numerical similarity measure -the measure which defines the process similarity in terms of a quotient of occurrences of the types of tags in the particular process genetic tag sequences.
• Detailed numerical similarity measure -the measure which is similar to the general numerical similarity measure, except that the tags of the same type are those which have the same type of the initial active elements, including their properties, the same type of transition between the active elements and the same type of the final active element, including their properties. • General chain similarity measure -the measure which is aimed at defining the level of similarity between two business processes noted as the genetic tag sequence, based on the number of similar subchains. The ratio of the number of positive comparisons of the subchains with the general number of the compared subchains is the results of the above-mentioned comparison. The subchains are properly compared if all of their corresponding threes are of the same type. • Detailed chain similarity measure -the measure is similar to the general chain similarity measurement, except that -when comparing particular tags in subchains-the tags of the same type are those which have the same type of the initial active elements, including their properties, the same type of transition between the active elements and the same type of the final active element, including their properties.
Values of such similarity measurements are scalar values -number from the range [0,1], where 1 means that the compared processes are identical in terms of a given similarity measure. When applying the aforementioned process similarity measures, based on the similarities of genetic tags for the processes shown in Fig. 5 and Fig. 6, we obtain the following values of the similarity measures: • general numerical similarity measure: 0.9091, • detailed numerical similarity measure: 0.9091. • general chain similarity measure: 0.7532, • detailed chain similarity measure: 0.7532.

Conclusions
The above material shows the results of the studies carried out in the field of modeling, implementation and analysis of the systems based on the process approach as well as systems of analysis and automation of business processes. The presented subject related to the business processes, in particular their modeling, analysis and automation is a relatively new area of research in both IT and management. However, the implementation of the described processes in both these fields stimulates their development dynamics, as the increased demand for more efficient activities may be a powerful engine for economic growth.
The presented aspects of the systems, tools and methods of analytical business processing should be considered as an introduction to further development of this subject matter in various directions. The directions include issues related to the tools and methods for collecting data on the process definitions and instances as well as tools and methods for analytical processing in different dimensions and at different levels.