A Novel Approach to Improve Quality Control by Comparing the Tagged Sequences of Product Traceability

Quality control is an essential issue for manufacture, especially when the manufacture is towards intelligent manufacturing that is associated with "Internet of thing"(IOT) and "Artificial Intelligence"(AI) to speed up the rate of product line automatically nowadays. To monitor product quality automatically, it is necessary to collect and monitor the data generated by sensors, or to record parameters by machine operators, or to save the types (brands) of materials used when producing products. In this study, it is assumed that the sequences of the traceability of unqualified products are different from that of qualified ones, and these different values (or points) within the sequences result in these products qualified or unqualified. This approach extracts maximal repeats from the tagged sequences of product traceability, and meanwhile computes the class frequency distribution of these repeats, where the classes, e.g. "qualified" or "unqualified", are derived from the tags. Instead of inspecting all of the sequences of product traceability aimlessly, quality control engineers can filter out those maximal repeats whose frequency distributions are unique to specific classes and then just check the corresponding processes of these repeats. However, from the practical point of view, it should be estimated as a big-data problem to extract these maximal repeats and meanwhile compute their corresponding class frequency distribution from a huge amount of tagged sequential data. To have this work practical, this study uses one previous work that is based on Hadoop MapReduce programming model. and has been applied for an U.S.A patent (US Patent App. 15/208,994). Therefore, it is expected to be able to handle a huge amount of sequences of product traceability. With this approach that can narrow down the range for identifying false points (processes) within product line, it is expected to improve quality control by comparing tagged sequences of product traceability in the future.


Introduction
To speed up the processes of producing products and to reduce the cost of human labors, nowadays manufactories are toward to industry 4.0 [1] that adopt robots and combine with internet of things (IOT) [2].It is predictable that the increasing rate of producing products in industry 4.0 will overwhelm the ability of quality control engineers to handle the problem in time when defective products occur and they need to identify what factors result in that products.To promote the ability of quality control [3] to meet the requirement of industry 4.0, it is necessary and essential to set up sensors e-mail: jdwang@asia.edu.tw to collect automatically the parameters from both environment and machines.That is, quality control (assurance) engineers can use these data to trace back the producing processes of those defective products and to identify false steps within the product line if abnormal products are detected or reported.Due to the increasing rate of producing products using robots, it is beyond the ability of quality control engineers to identify the false processes that result in these abnormal (false) products efficiently without extra analysis tools.It is desired to have novel approach to overcome above big data computational problem [4] to promote the power of quality control of industry in the future.
To improve quality control for reducing the response time of false points identification within the product line, in this study, the sequences of the traceability of products are collected first of all.The traceability of one product, as shown in Fig. 1, is a ordered list (sequence) of elements or records that consist of data generated by sensors, or parameters of machines, or the types of materials collected sequentially when that product is produced.In this study it is assumed that the tag (class) of one product, given by quality control, is certainly concerned with its traceability.In other words, all of the sequences of traceability of those false or abnormal products with the same tag are supposed to contain some specific subsequences, called "markers", that result in of those products.Intuitively, the sequence of product traceability of one product determines the outcome of one product; the makers, some subsequences hidden within its traceability, are the keys to decide its tag that is the same as the quality control labels with.One class marker, therefore, is expected to be highly correlated with its own class (tag) such that it can provide clues to quality control engineers to inspect and identify false points from the subsequence representing that markers, instead of scanning the whole sequence of its traceability.
The goal of this paper is to extract class markers by comparing the class frequency distribution of maximal repeats extracted from the tagged sequences of product traceability via the scalable approach in the previous work [5].The approach proposed in this study is to reduce the range of searching factors that result in unqualified products by comparing the tagged sequences of product traceability, where the traceability of one product is the sequential data generated while that product is produced, e.g.values collected by sensors, machine parameters, materials or components used or the identifier of operators; the tags are labels given by quality control staff.It is believed that these class markers, if they can be extracted efficiently, are valuable to quality control engineers because they can focus on inspecting those steps that are corresponding to these markers within the sequences of product traceability to speed up the processes of locating the false points within product line, instead of surveying the whole steps aimlessly.
The remainder of this paper is organized in the following.Section 2 describes how to generate the tagged sequences for experiments and the method to extract class markers.Section 3 shows experimental results.Section 4 has discussions and Section 5 gives conclusions and future works.https://doi.org/10.1051/matecconf/201820105002ICI 2017

Materials and Methods
This study assumes that each of products is labeled with a tag (ClassID), e.g. one of these five classes as "normal", "abnormal I", "abnormal II", "abnormal III" or "abnormal IV" by quality control (assurance) engineers, and there is a sequence of traceability generated and collected when that product is produced.Most important of all, for simplicity, one "abnormal" tagged sequence of product traceability is expected to contain one specific subsequence, called "marker" in this study, that result in the same class labeling as quality control engineers do.That is, whenever one product is generated and its traceability sequence contain one of these markers, then its class labeling is determined according to which one of these markers.In this following, section 2.1 shows how to generate tagged sequences for experiments and section 2.2 describes the approach to extract maximal repeat briefly.

Tagged Sequences of Product traceability
To have the sequences of product traceability as experimental resource for demonstration in this study, there are 50 sequences of product traceability generated randomly and then each of these sequences is inserted with one class marker manually.Table 1 shows, for example, 50 sequences (N=50) of product traceability containing 20 steps (k=20) in which each of steps can be one of five symbols as "a", "b", "c", "d" and "e".For example, the value of the 2nd step in the sequence of traceability for first product "P1" is "S2,b".Each of these sequences is associated with one "ClassID" of "C1", "C2", "C3" and "C4", and is hidden with one marker, as shown in Table 2.For example, the "C3" contains 10 products and all of these products are hidden with the marker "S2,b S3,a S4,a".Note that these markers are supposed to be unknown and hidden within the sequences of product traceability in advance and the lengths of markers are not fixed but variable.

Extracting Class Markers from Tagged Sequences
A maximal repeat [8] is estimated to be a marker of one class if that repeat is with highly biased class frequency distribution, e.g.appearing in only that class and most of the sequences belonging to that class contain that marker.In other words, a marker of one specific class is a representative pattern for that class.In this study, the sequences of product traceability for experiments are generated randomly, and the markers are inserted into these sequences manually according to their corresponding tags individually.
To extract these distinctive markers hidden in classes that are derived from the tags, this study adopts a scalable approach developed in the previous work [5] that based on Hadoop MapReduce programming model [6,7].That approach has been proved to be capable to handle a huge amount of maximal repeat extraction from sequential data [9,10] and is applying for a patent [11].To have discussion systematical, there are definitions given in the section 2.2.1 and the approach of maximal repeat extraction is described briefly in the section 2.2.2.

Definition: Class Marker
In this study, products with their sequences of traceability are divided into k classes Ci, 1 ≤ i ≤ k, where the types of classes are derived from the tags according to what kind of frequency distribution of subsequences is expected to be biased among these classes.Let the length of the sequence of one product traceability is "k".Let CF(MR j ) be the number of classes that contain the maximal repeat MR j .The number of product traceability in class Ci is defined as n Ci and the total number of product traceability is N = i=k i=1 n Ci .Let d f (MR j , Ci) be the number of product traceability of class Ci that https://doi.org/10.1051/matecconf/201820105002ICI 2017 Table 1.50 products with their traceability and tags (ClassID) Table 2.The statistics of four classes with markers hidden in their corresponding traceability contains the MR j ; let t f (MR j , Ci) be the number of the MR j appearing in the product traceability of class Ci.Note that it might happen that the MR j appears more than twice in one product traceability if there exists repeat processes within the product line.
Let WC(MR j , Ci) , the coverage of one maximal repeat MR j within class Ci, be the percentage of the number of traceability within the Ci that contain the MR j .Let DF(MR j ) = i=k i=1 d f (MR j , Ci) and T F(MR j ) = i=k i=1 t f (MR j , Ci).Intuitively, the MR j is representative for class Ci when the value of CF(MR j ) is "1" and the value of WC(MR j , Ci) is as high as possible.That is, the MR j appears only in one class Ci and the majority of product traceability in Ci contain the MR j .In this study, for simplicity, one maximal repeat MR j is estimated as one "marker" of class Ci if CF(MR j ) = 1 and WC(MR j , Ci) = 100%.That is, the MR j appears only in class Ci and all of the sequences of traceability in Ci contains the MR j .

Maximal Repeat Extraction
To identify these markers from a huge amount of the sequences of product traceability attached with ClassID, this paper adapts the previous work in [5] that is a scalable approach based on Hadoop MapReduce programming model [7].This approach can extract the maximal repeats [8] from a huge amount of tagged sequential data, and meanwhile compute the class frequency distribution of these repeats within these sequential data, where the types of classes are derived from the tags specified by the users or observers for comparison on purpose.To mine for specific subsequences, called markers in this study, that can be used to distinguish one class from another, this study filter out the maximal repeats that only appear in one class and meanwhile happen in the majority of product traceability of that class.
To have this study more self-contained, the approach of maximal repeat extraction are described briefly in the following.Based on Hadoop MapReduce programming, the input of tagged sequences are split into several fixed-size partitions and are distributed to mappers by Hadoop system evenly and automatically.First of all, each of mapper generated all of suffixes of each sequences and have each of these suffixes attached the same tag as that sequence it derived from.Secondly, the mappers output the pair of "key-value" to the reducers, where the "key" is the fixed-length prefix of each suffix and the "value" is the tagged suffix.According to the feature of MapReduce programming, each of reducers will receives the values, some tagged suffixes, with the same key while these suffixes are sorted in lexicographical order.Finally, the reducers scan the sorted suffixes and extract candidate maximal repeats with "right" boundary validation; meanwhile compute the class frequency distribution of these repeats via a stack with push/pop operations [5].To have maximal repeats with "left" boundary validation, it is similar with the same processes as described above, but with the reversed sequences as the input.The maximal repeats are those candidate maximal repeats pass both of "right" and "left" boundary validation.

Results
In this study, there are 50 sequences of product traceability generated and are tagged with specific markers manually for experimental resources.Table 3 gives the distribution of 413 maximal repeats extracted from these 50 tagged sequences in Table 1 according to the values of "CF" and the length of maximal repeats.It is observed that there are 99 maximal repeats whose "CF" values are 1 and the length of these repeats ranges from 1 to 8.
To further inspect the WC values of these 99 maximal repeats ("CF=1") for one class they appear, Table 3   are greater than or equal to 25%.It is observed that the four class markers as shown in Table 2 are extracted and are the same as the first four repeats whose WC values are 100%.That is, experimental results as described above shows that the previous approach [5] did extract and identify these class markers that are hidden within the sequences of product traceability.

Discussion
There are still room to make this study more robust and practical in the future.First of all, in this study, one class marker is defined as a consecutive (continuous) subsequences.Indeed, this assumption may be too restricted in real cases because the false points may be not consecutive to form only one complete segment but several short segments.It needs more efforts or another strategy to overcome this problem.On the other hand, the WC condition of one marker that appears in only one class in which all of sequences contain that marker can be relaxed by selecting the markers whose values of the entropy [13] of class frequency distribution is lower than a given threshold.
Secondly, the values generated in each of steps in the sequence of product traceability are set to be one of five symbols as "a", "b", "c", "d" and "e".When dealing with real practical data, these values may be numeric but not just several symbols.It is necessary to learn how to partition these numeric values into symbols precisely [14] because different transformation might have totally different results.
Thirdly, the experimental resource used in this study only contains 50 sequences of product traceability and the markers are created and inserted manually.From the practical point of view, it is expected to have these tagged sequences for experiments from the factory in real world.Furthermore, the approach proposed in this study do improve the quality control to reduce the searching ranges of the candidate false points that results in abnormal product in this future.

Conclusions
This study provides a novel approach to speed up the processes of locating the false points within the product line via comparing the tagged sequences of product traceability.These class markers extracted can provide clues to locate false point in the product line more efficiently and furthermore can be stored in database as false patterns for quality assurance analysis in the future.As the time of industry 4.0 combined with internet of things (IOT) and artificial intelligent (AI) is coming, it is worth of embedding this approach into quality control services as a package or tool such that monitoring the product line and searching the false points can be done automatically in the future.It is expected that this approach proposed in this study can provide modern industry with new research or direction about quality control (assurance), especially in drug and semiconductor manufacturing.

Figure 1 .
Figure 1.The traceability of one product attached with a tag.

Table 3 .
sorts the WC values of and selects the top 18 maximal repeats whose WC values of one class https://doi.org/10.1051/matecconf/201820105002ICI2017 The statistics of maximal repeats extracted from the tagged sequences in Table1

Table 4 .
Examples: the statistics of 18 maximal repeats whose CF values are equal to 1 and WC values of one class are ≥ 25%