Detecting Concept Drift in Resource-Service Sequence for Collaborative Task

. To discover the bottleneck of business caused by the changing of market, especially in collaborative tasks, an approach is proposed to detect concept drift in the usage of resource services in business processes. Firstly, influence degree, described as features, is measured in a Resource-Service Sequence (RSS). Secondly, by mining business dataset, the influence relationship between resource services is resolved according to different time window. Then, the influence degrees are clustered as different clusters, called concept drift. Finally, RSSs with concept drift are derived from the feature sequence. The simulation results show the validity of the proposed approach.


Introduction
The development of techniques in cloud computing and Internet of Things results in gradually some changes of the modes in business management and work, which turns activities with single organization into collaborative tasks with multi-organization [1]. In collaborative task, workflow management system is used to integrate enterprise business, information, and resources to accomplish common tasks. Taking manufacturing as an example, the manufacturing process is no longer a single enterprise-owned operation in collaborative manufacturing mode, but participated by many enterprises, including product design companies, component suppliers, and sales companies, etc. In a collaborative task system, resources, as service, are published into a public cloud platform so that they are invoked conveniently in the execution of business activities, which are called resource service. With execution of business activities, there are temporal relationships between the resource services which are used by business activities. The resource services with temporal relationships are called Resource-Service Sequence (RSS). It is important to discover the set of RSSs to optimize the selection of service resources.
The current research on the relationship between resource services focuses on the following areas. The method based on parameter matching is one of important approaches, on which a relationship between services is built through matching their input and output parameters. Based on the relationship, a service-network model is established to provide more customized services for users [2]; A very different approach to the above is to consider dependency relationship depended by the functional semantics of web services. The dependency relationship can be organized into AND / OR graphs and applied to service composition and so on [3]; Actually, with the development of semantic technology, semantic-based relationship among services has become a hot issue [4]. As described above, the current approaches to establish service-relationship models mainly include service parameters and semantic. However, these models do not describe the dynamic nature of service relationships. In fact, the use of resource services can be reflected by some indicators, called feature. The changing feature values can be directly mapped into the influence degree between resource services in an RSS. Therefore, by analyzing business data, the changing of influence degree between upstream and downstream resource services in an RSS can be detected. That is to say, concept drift [5] arises in the RSS. Detecting concept drift in RSS can help enterprises to discover bottlenecks of inter-organizational business process.
This paper is organized as follows. In Section 2, we analyze the problem about the concept drift. A mathematical description of the problem is presented. Section 3 describes the method in detail. The design of the experiments, and the results are presented in Section 4. Finally, in Section 5, we present a summary and discuss what we intend to do in future work.

Problem analysis
In a collaborative task system, there are decentralized resources provided by multi-organization and consolidated on a unified platform, to achieve shared tasks [6][7]. The platform utilizes a workflow management system to coordinate effectively the execution of business activities in a task, while the distributed resource-services are invoked by the workflow system to serve the business activities in an orderly manner to ensure that the business process is successfully completed. As shown in Fig.1, in the collaborative task platform, the process is shared by multiple organizations, because of the concentration of software resources, hardware resources, human resources and technical resources. The relevant definition about RSS can be given as follows.
Definition 3 Feature sequence with concept drift (FScd) Suppose 1 , 2 , …, representing a series of time windows, and the influence degrees between features −1, −1 and , at −1 , are separately expressed as are classified respectively to two different concepts as which each cluster is treated, there is a concept drift to ( −1, −1 → , ). The FScd consisting of two features is expressed as < −1, −1 , , >.

Algorithm statement
The approach to Detect Concept Drift in Resource-Service Sequences (DCDRSS) is introduced newly. It is generally detected that whether the influence degrees from upstream resources to downstream resources changes for an RSS.

Calculating the influence degree
To compute the influence degrees on features between upstream resources and downstream resources for an RSS, change rate is regarded as the measure of the influence degree. In the fields of mathematics application, change rate values the dependent variable as the independent variable changes. There are the larger the absolute values of the change rate, the greater the effect of the independent variable on the dependent variable, and vice versa. Taking the complete workflow execution process as a unit, the influence degree between feature −1, −1 and , in the process is defined as formula (1).
where −1, −1 ( ) and , ( ) respectively are the values of the feature −1, −1 and , at time t. In the business data, the features of resource services are always variables over time. In other words, the value of every specific feature can form a time series in the course of workflow execution with a long execution cycle. Consequently, what we can get is that I( −1, −1 ( )→ , ( )) also form a series of influence degrees.
Normalization is an important data preprocessing step in the field of data analysis. Considering that different resource-service features often have different measures, detecting results be greatly affected. In order to eliminate this impact, it is necessary to standardize the feature data. Here, we can adopt the following formula for unifying metric. * The formula (2) maps all the values of resource-service features between [-1, 1] through a linear transformation.
The following clustering: where is the range of each interval Q m , , ( ) Q m , and t≠k. The average of Density( , ( )) is equal to: where is the number of influence degrees in Q m . III. Amalgamating Q m by following rules: If | Density(Q m-1 ) -Density(Q m ) | < Density(C z ), where Q m-1  C z , Q m is merged to the cluster C z ; The influence degrees in the intervals with sparse density require to be determine whether it can be amalgamated to the nearest cluster, or as discrete values.
IV. The cluster center is the value of influence degree with the highest dot-density value in the cluster.
The value of parameter s in the clustering algorithm depends on the range of I( −1, −1 ( )→ , ( )) and its distribution.
The smaller the range of I( −1, −1 ( )→ , ( )) and the more concentrated its distribution is, the smaller m is.

Detecting concept drift in RSS
In collaborative task, there are always resource services with many features. It is more challenging to calculate the influence degree between the resource-service features and to detect concept drift in an RSS.

Experiments and results
The collaborative design and manufacturing process of electronic products, as an example to illustrate the effectiveness of the proposed approach, include the whole business processes such as the design, manufacture, assembly and parts supply of electronic products, as shown in Fig.1. In the whole business process, product design includes hardware design, software design and mechanical structure design. The resources used in product design have mainly human resources and technical resources. Product processing largely consists of hardware production and machining. Hardware production, machining and parts procurement employing hardware resources, human resources and technical resources, supply raw materials for product assembly. Business activities related to the collaborative design and manufacturing process of electronic products, invoke all resource services shown in Table 1. The features of each resource service are shown in Table 2. Overall designer 2

Hardware design
Overall design scheme 3 Hardware designer 4

Software design
Overall design scheme 3 Software designer 5

Mechanical design
Overall design scheme 3

Mechanical designer 6
Hardware and software debugging Test engineer 7 Hardware design scheme 8 Software design scheme 9

Part configuration
Part configuration scheme 10

Hardware production
Software design scheme 9 Parts 13 Hardware production equipment 14 Based on the workflow model shown in Fig.1, the simulation data is generated according to the following rule: Workflow logs generated, according to the executing frequency of business activities, are identical to the data created by assigning corresponding weights to five execution paths. For comparison, five paths are executed respectively according to the weights of = (0.1, 0.2, 0.2, 0.1, 0.4) and = (0.2, 0.2, 0.2, 0.2, 0.2) in the experimentation. Afterwards, SPADE [11] algorithm is adopted in the experimentation to mine the frequent RSSs. After being tested, RSScds are found in each frequent RSS. Limitedly, taking < 1 , 3 , 10 , 19 , 21 > as an example, the detecting of the results are shown in Table 3 and 4.

Parts procurement
In Table 3 and 4, the influence degree between product project book and the overall design scheme changes 5.26, because the influence degree between the cost of product project book and overall design scheme, the credibility of the overall design scheme respectively changes 8.43, 2.08. Others are similar. Running Algorithm RSSCD_Splice, the other RSScd, < 1 , 3 , 10 >, is obtained. Using the same data, detecting results of the concept drift are compared, adopting algorithm in the paper and DivideSlope in [9], as are shown in Fig. 2 the number of RSScds detected by the algorithm proposed in this paper is significantly more than DivideSlope. The main advantage of the algorithm proposed in the paper is the fact that the influence degrees of changing are still kept and are categorized apart into different cluster, when the changing of influence degrees in value is large in certain time windows. Conversely, values of influence degrees in algorithm DivideSlope are the average of the steady changing rate. Due to the neglect of non-stationary changing rate, the most of influence degrees change a little in calculation over time and are categorized to the same concept.

Conclusion and future work
We propose a detecting approach to conceptual drift of resource-service sequences depending on the influence relationship between features for collaborative task. Through computing the changing rate between features in different time window, to construct a time series of influence degrees, the values of influence degree are classified by a clustering algorithm, and each cluster forms a concept. According to the obtained concept, it is detected whether the conceptual drift arose in an RSS. Finally, it is verified by an experiment that the approach can detect concept drifting well for an RSS. In the future, we intend to investigate other approaches for two-way influence relationship between resource services.