A Method for Recommending Bug Fixer Using Community Q&A Information

It is a very time-consuming task to assign a bug report to the most suitable fixer in large open source software projects. Therefore, it is very necessary to propose an effective recommendation method for bug fixer. Most research in this area translate it into a text classification problem and use machine learning or information retrieval methods to recommend the bug fixer. These methods are complex and overdependent on the fixers’ prior bug-fixing activities. In this paper, we propose a more effective bug fixer recommendation method which uses the community Q & A platforms (such as Stack Overflow) to measure the fixers’ expertise and uses the fixed bugs to measure the time-aware of fixers’ fixed work. The experimental results show that the proposed method is more accurate than most of current restoration methods.


Introduction
In the process of software development, bugs are unavoidable and it is important to fix bugs in time. To guarantee the quality of software, many projects use bug reports to collect and record bugs. Then, Bug triager assign the bug to a list of developers that are qualified to understand and fix the bug report, and then ranking them according to their expertise [1]. However, with the rapid development of software (i.e., especially open source software), vast amounts of bug reports have been produced. According to statistics, from October 2001 to December 2017, the number of bug reports for Eclipse accumulated about 530 thousand, with an average of about 100 bug reports per day. If these bug reports are handled manually only by a bug triager, it is very timeconsuming and tedious task, which consequently impact the efficiency of bug fixing and thus increase the cost of the entire project.
To solve the above problems, many researchers proposed various methods to recommend bug fixer. Most of the methods related to machine learning, which can be divided into three parts: characterizing bug reports, training classifier and adjusting recommend result. Different researchers have carried out different studies for different parts. In the first part, Baysal et al. [2], Zhang T et al. [3] applied Vector Space Mode(VSM), Latent Dirichlet Allocation(LDA) respectively to character bug reports in the feature space to achieve the goal of reducing the dimension. In the second part, Fabrizio et al. [4], Anvik J [5], Zhang W et al. [6] used Naï ve Bayes, Support Vector Machine(SVM), k-Nearest Neighbor(kNN) respectively to train classifier. In the third part, Jonsson et al. [7] choose the k developers with the highest probability to build a potential fixer recommendation list. Jeong et al. [8], Bhattacharya et al. [9] built a bug tossing graph using fixed bugs' tossing information, then, updated the recommendation list using the tossing graph to recommend a better potential fixer. However, these methods have an obsolete training set problem and the fixer recommended must fixed some bug reports similar to the newly one.
Information retrieval also applied to recommend fixer. Matter et al. [10] proposed a fixer recommendation method based on the contribution rate which collected from the developers' source code commits and the bug report keywords. Alenezi et al. [11] investigated the use of five term-selection techniques to select the most different terms in bug reports. The aim is reduced dimension and saved the time of recommendation. However, these methods can not recommend a new developer.
Considering that many developers access and contribute information to some open community question-answering platform(CQA), such as Java Forum, Yahoo! and Stack Overflow. These platforms have collected a lot of information that reflects the professional capabilities of developers. Sajedi et al. [12] proposed a method named , _ _ _ ub RA SSA Z score using question-answering (Q&A) information in Stack Overflow platform as evidence for the fixers' expertise to recommend fixer. Based on the research of Sajedi et al., this paper proposes a more optimized and efficient fixer recommendation method, which further studies how to use the information of CQA platform to measure the expertise of developers and use the fixed bugs to measure the timeliness of candidate fixers' fixed work. Bug report is a software document describing software bugs and helping developers locate and fix bug quickly, so as to ensure the quality of software projects. The main components of a typical bug report include Bug Id, its title, the resolution status (e.g., open, resolved) of the bug, when it is reported and modified, its reporter, its fixer, the description of the report [13].

Stack overflow
Stack Overflow is a popular technology related IT technology Q&A platform. On this platform, users can submit question, browse questions, and index related contents free of charge. Fig 1 shows a question information from Stack Overflow. We can see that, for any question, Stack Overflow uses some specific tags to divide the domain of the question, which can help users to locate the question quickly. Each question information has the number of answers, upvotes, and user information who answered this question. These Q&A information (also called posts), tags, upvotes are a great potential to measure the expertise of the developer.

Fixer recommendation method
Compared with traditional software development, software development today is more like a community activity. Developers often work on some projects hosted on large-scale software repository platform [14], which support code sharing, such as GitHub. Besides, developers also have technical exchanges on a number of community Q&A platforms when they encounter technical problems [15], such as Stack Overflow. Therefore, these two platforms have gathered a lot of information that can reflect the expertise of the developer [16]. Considering the cross-information of common users who are active on these two platforms, this paper will introduce in detail how to make use of Q&A information on Stack Overflow more reasonably and efficiently and recommend bug fixers to GitHub's bug reports. The framework is shown as Fig 2. From Fig  2, the core problem that needs to be solved in the whole method is how to calculate the expertise score and timeaware score of candidate fixers.

A method of calculate expertise score
For developers' answer information in Stack Overflow is often the case that he or she knows this field well. On the contrary, developers' question information in Stack Overflow is the evidence of lack of relevant expertise. Based on this, we surmise that a developer makes n a q =+ posts, q of them questions and a of them answers. In addition, in order to eliminate the influence brought by different activity of different developer, this paper takes the ratio of aq − and n as a measure of expertise, and get _ QA score : Given a bug report, _ QA score uses the number of posts to reflect developers' expertise to a certain degree but ignores the correlation between the posts and bug reports. This paper uses Stack Overflow tags to crossreferencing bug reports in GitHub and posts in Stack Overflow. The tags of the Stack Overflow are a number of technical tags, such as Java, JavaScript, etc. These tags represent the main content of question information and its main function is to divide the question information into well-defined categories to facilitate users to search for areas of interest. Using Stack Overflow tags to match GitHub's bug reports text information, it is equivalent to providing developers with a common language to exchange information without needing to remove text information, stop words and other processing.
In addition, different tags have different relevance with bug reports, so the importance of tags in the text information of bug reports is different. Therefore, these tags cannot be measured with a unified standard. This paper sets different weights for different tags. Furthermore, the size of the number of upvotes reflects the quality of the posts. The more upvotes this post collects, the more expertise this post is assumed to be. Based on this, this paper redefines a and q as _ A score and _ Q score : information (plus one for this answer itself). i represents the number of tag of this answer and , ai b w represents the weight of tag i in the bug report. _ Q score represents the score of all the question information associated with the bug report on the Stack Overflow by the candidate fixer. q upV represents the number of upvotes obtained from each question information (plus one for this question itself).
i represents the number of tag of this question and , qi b w represents the weight of tag i in the bug report. Using _ A score and _ Q score to replace a and b respectively in formula (1), we get the way to measure the expertise of candidate fixer as _ ER score (Expertise Ranking Score):

__ _ __
A score Q score ER score A score Q score Note that, the created date of Q&A information used to measure expertise must be earlier than the date to build a bug report to be fixed, otherwise it has no meaning.

A method of calculate time-aware score
Different software projects require different expertise of developers. As developers work on different projects, their expertise shifts over time [17]. Motivated by the intuition that "more recent evidence of expertise of a developer is more relevant", this paper uses the historical fixed work of developers to measure the timeliness of work. We define it as _ TA score (Time Aware Score): t represents the created date of the bug report to be fixed. j t represents the created date of the fixed bug reports for a candidate fixer.
( , ) bj d t t  represents the difference in number of days between the j bug report fixed by the candidate fixer and the current bug report to be fixed. If the created date of these two bug reports, the value of ( , ) bj d t t  is 0. If a candidate fixer does not fix any bug reports before this current bug report to be fixed, the value of _ TA score for this candidate fixer is 0.

Final ranking
The final bug fixer recommendation formula __ ER TA score is obtained by the formula (4) and formula (5). We define __ ER TA score as formula (6) In this formula (6),  is the parameter that adjusts the weight of _ ER score and _ TA score . This paper sets the range of  in (0,1). By using __ ER TA score to calculate the matching scores of the candidate fixer, then we rank them according to the size of the scores from high to low. Finally, we select the top k ranked candidate fixer as the potential bug fixer.

Experiment
In this section, we introduce the experimental process and show the experimental results. Moreover, we compare the performance of our approaches and that of other previous studies.

Data Preparation
In order to demonstrate the effectiveness of the proposed approach, we choose the experimental data used by Sajedi et al. 12. Table 1 shows the details of the two data sets. These two data sets in Table 1 are selected from GitHub and Stack Overflow. The data set of bug reports comes from the top 20 projects (Table 2.) in GitHub with the highest number of community members and the highest number of bug assignees. All of the community members (CM) must have at least one Stack Overflow activity.

Evaluation measure
This paper uses the average top-k accuracy and MAP (Mean Average Precision) to evaluate the effectiveness of the proposed method. The average top-k accuracy indicates that in all the reports to be fixed, the probability of a real fixer's ranking in the top k of the recommended list. MAP is a kind of precise, synthesized, rank-based evaluation measure. The higher the overall rankings of recommended fixer are, the higher the MAP value is.

Results and analysis
The bug reports used in this experiment have been assigned to a member of the project, that is, every bug report corresponds to a real bug fixer. In the experiment, we put forward the method proposed in this paper for every bug report, calculate __ ER TA score and rank each project member to get a recommended list. Finally, we evaluate the effectiveness of this method according to the recommended results.
This experiment is divided into two parts. In the first part, we use three projects (Julia, www.html5rocks.com and edx-platform in Table 3) and a total of 490 bug reports to carry out the experiment. The purpose is to select the optimal tag weight function of formula (2) and formula (3), and adjust the parameters in formula (6). We set the number of tags matching each bug report and Q&A information to num , and set the same weight function to w . Table 3. shows the results of this experiment. In order to further verify the effectiveness of the method proposed in this paper, we used the optimal parameters adjusted before and expanded the experimental data set (the rest of 17 projects in Table 2 with a total of 6654 bug reports). In addition, we compared the performance of our approaches and that of other previous studies, including 1NN, 3NN, 5NN, Naï ve Bayes, SVM, RA_SSA_Z_score. The experimental results are shown in Table 4. From Table 4, we can see that the top k accuracies of our approach for k is 1, 5, 10 are 45.51%, 89.80% and 97.13% respectively. We also obtained the MAP as 0.640. Compared with other fixer recommendation methods, 3NN obtained the highest accuracy of top1(%) as 46.48%. However, for the more important accuracies of top5(%), top10(%) and MAP, the proposed method in this paper obtained the best results in Table 4.
Further analysis shows that the accuracy of top5(%) obtained 89.90%, which indicates that the most fixer recommended by our approach are within the top 5. If we recommend these 5 fixers to the bug triager to assign bug reports, compared to the original dozens, even hundreds of candidate fixers, greatly reducing the workload of bug triager and save a lot of time. Besides, compared with other fixer recommendation methods, the implementation process of our approach is much simpler and also does not need to train data or remove the stop words from bug report text information. These fully demonstrate the stability and effectiveness of our approach.

Conclusion
This paper presents an optimized method of recommending software bug fixer using community Q&A information, and fully excavates the crossinformation between GitHub and Stack Overflow to achieve the recommendation of software bug fixers. First, our approach uses the Q&A information, tags and upvotes in Stack Overflow platform to measure the expertise of candidate fixers, and then uses the created date of fixed bug reports to measure the timeliness of fixing work; finally, combining with the two methods, we recommend bug fixers to fix a total of 7144 bug reports for 20 open source projects in GitHub. The experimental results show that the overall performance of the proposed method is superior to most of the recommended methods in the field, and has great application value.