Environmental performance assessment using the evidential reasoning approach: The case of logistics service providers

The demand for environmental performance assessment is increasing among business practitioners, and it has nowadays become one of the key factors for a company’s self-improvement as well as for selecting suppliers and logistics providers. The assessment is, in essence, a multiple criteria decision analysis (MCDA) problem comprised of many quantitative and qualitative criteria. Frequently, the assessment data of some criterion is inevitably imprecise and/or incomplete since the nature of environmental assessment relies heavily on professional and complex methods which might not be fully available for every company. Also, qualitative criteria can only be assessed based upon human judgment. This paper, therefore, proposes an application of the evidential reasoning (ER) approach to the assessment of environmental performance for logistics service providers. The lists of criteria and indicators are adapted from ISO 14031. The ER approach is able to logically aggregate all assessment information, although different forms of data (precise or imprecise; complete or incomplete) are obtained. For this paper, assessment data from two logistics companies were gathered and analysed to illustrate the implementation process. The results are in the form of aggregated belief distributions on a unified set of evaluation grades, and the company can use this information for performance improvement and benchmarking.


Introduction
Due to growing public awareness of environmental issues and stricter laws and regulations in many countries, product manufacturers nowadays intensively focus on auditing and assessing environmental performance of their suppliers, including logistics partners, in order to improve the overall supply chain performance [1]. Since transportation activities have been claimed as one of the major sources of global warming and the depletion of non-renewable energy, one of the emerging criteria for selecting and auditing suppliers is their abilities to preserve the environment [2][3][4]. The logistics industry, therefore, has slightly shifted from pricing competition to value-adding services in terms of green and sustainability capabilities [2,5].
It is generally recognised that performance measurement is crucial for business success. The assessment helps promote continuous improvement, presents achievements of corporate goals of environmental protection, provides data for internal decision making, and ensures that firm"s environmental performance meets or goes beyond legislative requirements, as such risks of legal fines and penalties are reduced [4,6,7]. ISO 14031 is one of the most wellknown standards for environmental management that specifically focuses on performance evaluation. The growing popularity of ISO 14031 can be explained by a number of reasons. First of all, it can be simply adopted by every single organisation without any limitations in terms of specific sizes and types of industry. Secondly, since it is a kind of standardised framework, comparison with others is possible. It also, thirdly, facilitates the communication of environmental performances in a systematic way as well as provides comprehensive information for policy decision-making [7,8]. However, the application of ISO 14031 to the logistics industry is still limited among the literature.
Although ISO 14031 could be a choice for logistics firms to start assessing and reporting their environmental performance, a number of criticisms have been made. Firstly, most aspects of performance evaluation suggested in ISO 14031 are still multi-dimensional and qualitative, such as "Wastes" or "Community relations". That means the evaluation still relies on subjectivity and may vary according to personal interpretation. Secondly, since a large number of indicators are exemplified within the standard, it is likely that monitoring too many indicators simultaneously can distract practitioners from the main focuses of their organisations [1]. From this, a method to aggregate various indicators is needed in order for firms to be able to monitor the overall picture of their environmental performance. The obtained aggregated number is also open for business benchmarking in a credible way.
Another issue is that ISO 14031 does not provide a solution to deal with uncertainty and/or incompleteness of the assessment information. A number of studies in MATEC Web of Conferences 192, 01021 (2018) https://doi.org/10.1051/matecconf/201819201021 ICEAST 2018 green supply management implied that many firms feel uncomfortable when reporting information relating to environmental issues [9,10]. This might be because such information relates to pollution and negative impacts upon their operations release into the environment. Furthermore, some firms may lack the capability to conduct a self-assessment in relation to some indicators, particularly the ones that require specific technology, laboratory, or skilled personnel. Moreover, firm"s performances according to some indicators may fluctuate over time or vary by its branches or sites. In these cases, the average may not always be a good choice since it possibly causes the loss of meaningful information. For some qualitative aspects, in addition, the assessor may lack confidence in giving a precise judgement due to the unavailability or incompleteness of evidence. Although a set of evaluation grades or rating scales can be used to transform subjective opinions into numerical data, each assessor may interpret the meaning of each grade or scale differently. Also, they might not be confident when concluding that a situation being considered matches a particular grade, and one or more grades may better suit the actual practice. Lastly, among the various aspects or criteria in ISO 14031, practitioners within the same organisation, or within the same chain of benchmarking partners, may perceive the importance levels of those aspects differently. This may cause disagreement with regard to the aggregated score obtained. These issues support that the method used to determine the overall environmental performance needs the ability to solve the issues of incompleteness and uncertainty of the information in a sensible way. Furthermore, in the logistics industry, the lack of studies which focus on managers" attitudes toward the importance or the weight of each criterion still represents a gap in the literature.
The aim of this study is therefore to propose a logical method to assess and analyse environmental performance for logistics operations. The tool must be able to perform even when uncertainty and/or incompleteness of the assessment information, as well as uncertainty of the criteria weights, exist. To respond to these, the evidential reasoning approach (ER) is employed. ER can theoretically cope with the limitations of other MCDA methods generally used to determine a composite index. For example, the simple additive weighting (SAW) method which relies on the additive approach assumes that all indicators are preferentially independent [11,12]. This assumption may not be realistic in conducting an environmental performance assessment. The aggregation based on the weighted geometric mean (WGM) method [13], on the other hand, does not allow for any compensation among indicators. That means the overall performance will be zero when at least one indicator has zero value. This assumption may not always be acceptable in business performance evaluation since a company"s weakness in one indicator may be compensated by a very strong point in another, which helps to build a competitive advantage. For the data envelopment analysis (DEA) [14] and the technique for order preference by similarity to an ideal solution (TOPSIS) [15], the composite score or the overall performance for a company is determined by using a comparison of the performance of other companies. They might be therefore applicable only for ranking alternatives but not for a single company"s selfassessment [16]. SAW, WGM, DEA, or TOPSIS are also not designed to deal with qualitative indicators, nor any uncertainty of the data. This paper is organised as follows: Following the introduction, ISO 14031 is briefly described in Section 2. Section 3 then explains the algorithms of the ER approach. Section 4 gives details of the research methodology. The applicability of the ER approach is then demonstrated using data from two logistics companies. The results and discussion are given in Section 5, and Section 6 is the conclusion.

ISO 14031
ISO 14031 is an international standard that provides a guideline for assessing and monitoring an organisation"s environmental performance [1,17]. It describes two dimensions of performance indicators: (i) Management performance indicators (MPIs) and (ii) Operational performance indicators (OPIs). MPIs indicate management capability and efforts to improve an organisation"s environmental performances while OPIs provide direct measurements about such performances (e.g., air emission; the use of energy and water). MPIs are also linked to economic and social dimensions, such as indicators that show an organisation"s capabilities to educate or encourage staff to follow environmental protection guidelines; costs and investments spent on this matter. MPIs are divided into four criteria: (i) Implementation of environmental policies and programmes, (ii) Regulatory compliance, (iii) Financial performance associated with environmental performance, and (iv) Community relations. OPIs, on the other hand, are broken down into nine criteria: (i) Materials, (ii) Energy, (iii) Services supporting organisation"s operations, (iv) Physical facilities and equipment, (v) Supply and delivery, (vi) Products, (vii) Services provided by the organisation, (viii) Wastes, and (ix) Emissions [18]. For this study, however, the initial set of OPIs was adapted to suit the characteristics of the logistics industry. Finally, only seven criteria remain to represent OPIs, including (i) Materials, (ii) Energy, (iii) Services supporting organisation"s operations, (iv) Physical facilities and equipment, (v) Products and services, (vi) Wastes, and (vii) Emissions.
As suggested by ISO 14031, it is not necessary for practitioners to include all criteria of MPIs and OPIs in their evaluation system. Each firm normally has its own interests or concerns that relate to the environment. To implement ISO 14031, for each firm, suitable indicators (and measurement units) under each criterion should be identified [18].

The evidential reasoning approach
The ER approach is one MCDA technique that has been applied to many areas of performance assessment and decision making, such as the self-assessment following  [19], the assessment of new product design [20], the assessment based upon the balanced scorecard [21], the evaluation and selection of research projects [22], and the prioritisation of energy sources [23].
The ER approach was firstly proposed by Yang and Singh [24] as a procedure to combine multiple criteria based upon the evidence theory that employs a belief distribution to describe the assessment. The algorithm developed in their article is based on the recursive approach that requires pairwise aggregations. Wang, et al. [25] then proposed an analytical ER algorithm to present an explicit aggregation function by combining all indicators simultaneously. The assessment of each indicator, e i , is conducted towards a set of grades H, H = {H 1 ,H 2 ,…,H n ,…,H N }, where H n+1 is preferred to H n (H 1 and H N denote the worst and the best grades, respectively). All grades must be defined to be mutually exclusive and collectively exhaustive [26].
Suppose that the total performance could be assessed towards L indicators, or a set of indicators e i (i=1,…,L), the assessment result (S) for a company M in terms of each indicator e i can be expressed in the form of a belief distribution, as shown below. ( ) ( ) [26]. Wang, et al. [27] extended the ability of the ER approach to handle interval belief degree as a result of the inclusion of interval data from the assessment. Guo, et al. [28] then demonstrated a case in which interval weights of criteria were inserted into the ER algorithm. When such kinds of uncertainty exist, the belief distribution of company M towards the indicator e i is also expressed in the interval form, as shown below.
In response to a case in which interval belief degrees and interval weights are involved, the analytical ER algorithm is handled through the optimisation functions, as shown below. The next explanations of the ER algorithm follow Wang, et al. [27], Guo, et al. [28], and Sureeyatanapas, et al. [16]. Note that n = 1,…, N and i = 1,…,L.
Objective functions: where represents the relative weight of the indicator e i . The weights of all indicators must be normalised to sum to one; is the weighted belief which supports the assessment of M to the grade H n based on the indicator e i ; The probability mass unassigned to any grade after all N grades have been considered is a combination of two parts: ̅ an ̃ . ̅ is the remaining probability mass caused by the fact that an indicator e i only partly contributes to the assessment relative to its weight. ̅ is equal to zero if e i dominates the assessment or its relative weight is equal to one. ̃ is an unassigned probability mass resulting from the incompleteness of the assessment S(e i ). ̃ is zero if the assessment is complete, or ∑ . For this model, two sets of variables, including the intervals for (i) criteria weights and (ii) belief degrees, are the inputs of the optimisation process. When the precise assessment is given for some indicators, the minimum and maximum values of the interval are equal to that precise value. After combining all L indicators, the total belief degrees, β n and β H , can be generated. β H denotes the degree of belief unassigned to any single grade, or it reflects the total degree of incompleteness in the assessment. The aggregated result for each company M, or ( ( )), can be described as follows. The number of nonlinear optimisation models required to aggregate belief degrees of each company is equal to ( ) for one combination process. The obtained result shows a belief distribution presenting a panoramic view of the overall performance. However, if the aim of the assessment is for ranking or comparing alternatives, the aggregated belief distribution cannot straightforwardly support those purposes. The determination of the expected utilities is then suggested. Utility reflects the assessor's preference for the value of the indicator being considered [12]. The utility of a grade , or ( ), could be between zero to one, and ( ) ( ) The expected utility of can be determined, without any kind of uncertainty, through equation (19).
Nevertheless, the expected utility of M will be in the form of an interval if the assessment involves incomplete or uncertain information. The interval is determined by simply assuming that the unassigned belief degree, (β H ), is transferred to the best and the worst grades to represent the upper and lower bounds of the interval, respectively. To determine the maximum and minimum expected utilities, denoted by ( ) and ( ), the following pair of optimisation models are employed. The comparison of alternatives can be conducted based on the average utility, or ( ) which can be defined as the midpoint between ( ) and ( ). The input variables for this model are the aggregated degrees of belief and the utilities of each grade.
Objective functions:

Research methodology
The assessment process starts by investigating the importance (or the weights) of each criterion in respect of the logistics industry. It is considered that the weights should not be individually determined by a single organisation, although the assessment results are only monitored internally for that firm. This is because, for each firm, the image of "being green" or sustainability does not depend only on their own perception, but it is also judged by the external community. Taking into account how people within the same industry perceive which criteria significantly contribute to the overall environmental image can help firms improve their own performance in a more efficient way. For this study, the weights were elicited from 10 industry experts (hereafter referred to as the "decision makers", or "DMs") based on interviews and the direct rating technique. The 10 DMs include two academic lecturers in logistics management and eight senior managers from different truck fleet logistics companies in Thailand. There are several product types delivered by the eight companies, including metals and steel, cement, concrete, parcels, agricultural products, frozen foods, beverages, etc.
The direct rating is one of the most straightforward weighting methods available. It is highly recommended when the performance evaluation relies on a large number of criteria and indicators, and when a respondent does not feel comfortable using complex weighting methods [29,30]. The direct rating only requires a DM to directly assign the weight to each criterion through a simple scale, such as 0-10 points ranging from "extremely unimportant" to "extremely important", respectively [31]. For this study, each DM was asked to assign the weights to each MPI criterion, followed by each criterion under OPIs. Finally, the weights were assigned to the overall picture of MPIs and then to that of OPIs. The DMs were also asked to provide supporting reasons. The weights of criteria belonging to the same group were then normalised to be relative to each other (they were sum to one). Indicators being under the same criterion were assumed to take equal weight in the aggregation processes. This is because the list of indicators under each criterion is likely to be changeable depending on a firm"s current focuses and strategies.
To demonstrate the applicability of the ER approach to the ISO 14031-based environmental assessment, indicators representing each criterion were identified. Each criterion was able to be assessed through either quantitative or qualitative indicators, or both, and the numbers of indicators were not necessary to be equal among criteria. A measurement unit was specified for each quantitative indicator, and a set of evaluation grades was determined for each qualitative one. A clear definition referring to objective evidence or feasible situations was attached to each grade in order to standardise the assessment or to minimise subjective bias. For each qualitative indicator, the set of grades should cover all feasible practices. The number of grades might be different among different indicators. This depends on how many distinct levels of performances can be classified for each indicator [32]. Then, actual data sets from two logistics companies were employed to demonstrate the analysis. The data was gathered through the interviews of the company managers. They were two of the eight managers giving the weight information. The interviews were conducted during January 2018. For each quantitative indicator, the managers were asked to provide the data of the year 2017. They, however, were informed that they could give interval values if they were uncertain about the answers, or if the data fluctuated over a year. Another option was to skip some indicators if they completely lacked information about such indicators, or if they did not want to disclose such information. In terms of qualitative assessment, the managers were asked to describe the extent of their practices regarding each particular indicator, without showing them the evaluation grades. Belief degree was then assigned to each grade by the researchers based upon the evidence or the stories provided by the managers. Since the aggregation of indicators according to the ER approach requires that all indicators must be assessed towards the same scale, five general grades (None, Poor, Fair, Good, and Excellent) were employed for this study. To unlock such condition, the rule-based transformation technique, introduced by Yang [26] and further developed by Wang, et al. [27] and Guo, et al. [28], was employed to transform various forms of original assessment data into belief distributions attached to the general grades. Due to the word limit, the transformation processes cannot be fully presented in this paper. After the transformation, the results of all indicators were arranged into a form that was applicable for the ER to derive the overall performance for each company, using the weight ranges determined by the 10 industry experts. Note that the nonlinear optimisation here was performed through LINGO software. Table 1 shows the relative weights of the criteria elicited from the 10 experts. The weights are presented in the form of the intervals to reflect different attitudes towards the importance or the contribution of each criterion to the overall environmental performance. The averages are also presented. For MPIs, in general, most DMs assigned a high weight to MPI2 indicating that they greatly realised the impact of legal fines and penalties on the image of their environmental performance. Many of them stressed that, for a firm to sustain itself in the logistics business, regulations relating to freight and transportation must be strictly considered. A bad image regarding this point would destroy customers" loyalty and decrease prospective sales. MPI1, based on an average basis, seemed to be the second-most important management criterion. Many DMs stated that, to be successful in green programs, environmental concerns must be embedded into a company"s policies and strategies.

The importance of the criteria (the weights)
Energy consumption (OPI2) and Emissions (OPI7) became the two most important operational criteria for the logistics industry. From the interviews, this sector consumed a large amount of diesel, and this led to the emission of various greenhouse gases. Most of the green logistics practices, therefore, focused on minimising fuel consumption and air pollution. The cost of diesel also represented the main cost of a transportation service, which needed to be minimised. Wastes (OPI6) also received a high weight from many DMs due to a strong association between logistics operations and many kinds of solid wastes, such as used tyres, pallets, paper boxes.
Regarding the two main aspects of environmental performance, based on an average basis, the weight of OPIs seemed to be only slightly higher than that of MPIs. Many DMs believed that actual outputs and outcomes better reflected a company"s successful implementation of green initiatives than merely displaying management strategies. On the other hand, a number of DMs expressed a contrastive opinion that management should be the most important since it is the driving mechanism for a company to reach an expected outcome. However, the weight ranges of MPIs and OPIs greatly overlapped each other, indicating that the ranking based on the averages was not generalised. The intervals of the weights were then inputted into the optimisation model in order to analyse the environmental performance of the two companies.

Indicators and the transformation into belief distributions
To demonstrate the applicability of the ER approach, 20 indicators (belonging to the four MPIs and the seven OPIs) were generated, as shown in Table 2. They were adapted from the examples of indicators listed in ISO 14031 [18]. A measurement unit was specified for each quantitative indicator, and a set of evaluation grades (with clear definitions attached to each grade) was determined for each qualitative one. Equivalent rules for the transformation were also established for each indicator by referring to the industry"s best and worst cases, based mainly on the review of the literature and the interviews. Unfortunately, these kinds of information were not displayed in this article due to the word limit.
For this paper, MPI3.1 and MPI1.3 were used to respectively demonstrate the transformation of assessment data into belief distributions for quantitative and qualitative indicators. For MPI3.1, its measurement unit was a percentage (%). Figure 1 shows the equivalent rule used for the transformation. The value 20% was considered here to be equivalent to the grade "Excellent", while "None" was assigned to 0%. The concept of linear utility function was then applied to the remaining grades.  Given h is the assessment data of MPI3.1 of company M, and [3.9,4.9], which is an interval value covered by two adjacent grades ( an ), as shown in Figure 1. Following Wang, et al. [27], the degrees of belief assigned to an are then derived as the intervals, [ ] and [ ], as follows: The data was finally transformed into the general form:   shows that the local grade "C" was equivalent to the general grade "Poor" for only around 0.666 degree of equivalent, while another 0.334 degree falls within the grade "Fair". The same concept is also applied to the grade "B". According to the story given by the manager of company N and the definitions attached to the assessment grades A-D for MPI1.3, the company"s performance towards this indicator falls within the grade "B" for 60% and another 40% is in accordance with grade "C". Then, following Yang [26], the assessment data was transformed into the general form:  Table 3. Note that (H H ,1) means that there was a lack of information due to sufficient data not being available or completely unknown, or the company did not want to disclose such information.

Application of the ER approach
The data from Table 3 was then used to analyse the environmental performance of each company under uncertain and incomplete information through the ER nonlinear optimisation model, or Equations (3)- (17). For each company, the aggregation of all indicators under the same criterion was firstly performed. This was followed by the aggregation of all MPIs. Then, all OPIs were combined. Finally, the overall performance was  determined by aggregating the two main aspects. Table 4 shows the aggregated interval degrees of belief for the overall MPI, OPI, and the environmental performance of the two companies. The intervals of the expected utilities were derived by Equations (20)- (24). The results are shown in Table 5. For this study, the utilities of the five general grades; ( ) ( ) ( ) ( ) and ( ) were set to be 0, 0.25, 0.5, 0.75, and 1, respectively. Table 5 shows that company M receives the highest average utility in terms of the overall environmental performance. However, when considering the ranges of the utilities, the maximum utility of company N (0.74) is greater than the minimum of company M (0.42). This means M does not absolutely dominate N. If all information is precisely known, it is possible that the N"s performance might be greater than M"s.
Based on the average basis, M is ranked first in both MPIs and OPIs. When considering MPIs, the expected utility of M completely dominates that of N (the minimum of M is greater than the maximum of N). This is mainly due to the fact that M performs much better than N in both MPI1 and MPI2 which receive high weights from the industry representatives. For M, there are plenty of programmes and projects that relate to the environment. The manager of M also stressed, during the interview, that the company strictly conformed to legal requirements, and, as such, the amount of fines and the number of sanctions/warnings were considerably lower than those of N. There are a few indicators that N performs better than M, such as MPI4.2 which is assessed towards a linguistic scale (grades A-E). Company N is able to show evidence of their efforts to improve the quality of life of the community, such as sponsoring local religious activities and contributing to public infrastructure. CSR activities have been embedded in the company N"s policies whereas they have only been conducted unofficially by company M. The weight of MPI4, unfortunately, is not high enough to push N to be greater than M in terms of the overall management performance.
For OPIs, the ranges of the expected utilities of the two companies greatly overlap each other, and the average of M (0.67) is just slightly higher than that of N (0.50). In general, company M seems to perform better than N, particularly in OPI3 and OPI4. M has a policy of waste segregation. Food waste is used to produce organic fertilisers. Recyclable waste is sold to external parties. There is also a campaign to replace the use of printed documents with email. M, moreover, has a greater percentage of trucks that utilise liquefied petroleum gas (LPG), a kind of low-carbon energy source, when compared to N. On the other hand, when considering OPI2, one of the OPI criteria receiving the highest weight, the manager of company N gave information indicating that the average rate of companies N"s fuel consumption (diesel) fluctuated between 0.167-0.25 litres/kilometre while the manager of M only gave a single number (around 0.24 litres/kilometre). This implies that M performs slightly poorer than N in terms of energy consumption.

Conclusion
This paper presents a unique application of the ER approach to the assessment of environmental performance towards ISO 14031 for logistics operations. It shows that the ER algorithm is able to facilitate the combination of various criteria and indicators even when uncertainties in criteria weights and belief assignments (assessment data) are involved. Uncertainties and incompleteness in the assessment data are still preserved in the aggregated results. This enhances transparency and minimises any inaccuracies of the results that might be caused by too much effort being put into the process of converting imprecise or uncertain data into crisp numbers. The ER approach also suggests the design of evaluation grades in the assessment of qualitative criteria by linking the grades to pieces of evidence or feasible practices in order to minimise subjectivity. The relative weights of criteria are employed in the ER algorithm, and this indicates a compensatory approach among multiple criteria which is consistent with the nature of business performance evaluation and decision-making. This study, also, is the first effort of its kind to elicit the weights or the importance of ISO 14031 criteria perceived by the logistics industry. Although it is difficult to reach a complete consensus within the industry regarding this matter, the obtained ranges of the weights are able to reveal some general viewpoints with major agreement. Overall, in terms of the management performance (MPIs), the industry tends to praise high contributions of conformance to legal regulations and the establishment of official environmental policies and programmes to the image of environmental management. In contrast, focusing only on building community relations tends to have only a small impact. In terms of operational performance (OPIs), fuel consumption and air emissions seem to be at the top of the most important criteria used to reflect the performance of a logistics operator. Understanding major concerns and perceptions of the industry could help create better guidelines for logistics companies to initiate their environmental improvement programmes in a way that is powerful and well-recognised by the community.
The ER approach is effective not only for performance comparison across different companies but a single company can also employ such method for their own self-assessment. The obtained aggregated degrees of belief facilitate the companies to establish improvement plans. For example, regarding the two sample companies, company M may choose to focus on indicator MPI1.1 (due to its high weight and the fact that the company itself still has only a "Fair" performance) by establishing a team to investigate the feasibilities of conducting more environmental initiatives. Company N, on the other hand, still has a low performance in many indicators. For this case, high-weight indicators, such as MPI2.1 and OPI7, should be firstly considered. For MPI2.1, the company should place emphasis on educating their staff and drivers of the updated laws and regulations that relate to their operations in order to avoid being subject to legal fines and penalties. In terms of OPI7, both companies completely lack the capability of properly estimating their carbon footprints of transportation services, which, nowadays, has become one of the key aspects considered by shippers or product manufacturers in their selection-process of logistics service providers. For unknown information, the ER approach assumes that their actual performance could be any value within the entire range of possible cases, and this lowers the minimums of their expected utilities.
This study does not aim to standardise the list of indicators under each ISO 14031 criterion for the logistics industry. The list of indicators shown in Table 2 only represents the examples for demonstrating the implementation process of the ER approach. Subsequent studies may consider investigating practical indicators that are generally recognised and implemented among logistics service providers in order to promote better performance comparison within the industry, using a general set of indicators.