Forecasting of steel consumption with use of nearest neighbors method

In the process of building a steel construction, its design is usually commissioned to the design office. Then a quotation is made and the finished offer is delivered to the customer. Its final shape is influenced by steel consumption to a great extent. Correct determination of the potential consumption of this material most often determines the profitability of the project. Because of a long waiting time for a final project from the design office, it is worthwhile to pre-analyze the project's profitability and feasibility using historical data on already realized orders. The paper presents an innovative approach to decision-making support in one of the Polish construction companies. The authors have defined and prioritized the most important factors that differentiate the executed orders and have the greatest impact on steel consumption. These are, among others: height and width of steel structure, number of aisles, type of roof, etc. Then they applied and adapted the method of k-nearest neighbors to the specificity of the discussed problem. The goal was to search a set of historical orders and find the most similar to the analyzed one. On this basis, consumption of steel can be estimated. The method was programmed within the EXPLOR application.


Introduction
Development of trade, globalization, as well as diversified needs of clients and development of modern production and computer technologies caused that the modern manufacturing and service companies compete in a global market, where client is the most important.That is why it is important to ensure high quality of products and services, suited to his needs and expectations, as well as supplying them in an acceptable time.In this point of view, improvement of processes inside a company [1,2,3], limitation of wastage [4,5], shortening of a production cycle [6] and changeover times, minimization of distribution costs and improvement of quality of offered services have become the primary goals of organizations.Realization of these goals should be performed in connection with application of newest achievements in science and technologynew technologies on one hand, but also methods of decision-making on the other hand [7,8,9].
Corresponding author: michal.rogalewicz@put.poznan.plLarge amounts of data gathered in big databases in scope of a daily functioning of a company are today not always appropriately used for building knowledge about processes and products.Wasting such a detailed source of information, both about a company itself and client requirements, preferences, habits and needs, is a fundamental mistake made in certain organizations.It seems that in the era of Industry 4.0 proper acquisition, processing, analysis and archiving of data gain importance and make for a faster reaction to dynamically changing clients' requirements, shorter time of introduction of a product into market, better estimation of costs of undertakings, as well as obtaining higher client satisfaction [10,11].
To face these challenges, certain methods can be used, allowing analysis of large datasets and/or automatic obtaining of rules and patterns, which are not obvious and trivial for a database userthe so-called Data Mining methods.
This paper presents application of one of the Data Mining methodsthe k-nearest neighbors methodfor a quick, preliminary estimation of steel consumption for building a construction ordered by a client.It facilitates presentation of a preliminary offer to a client, without a need of preparing a detailed project, which could take even several days.Assuming that a company receives several inquiries daily and it orders preparation of a detailed project to a design office, it is important to preliminarily estimate the costs and present them to a client.Only after initial acceptance, actual, detailed project is prepared.The studies were realized in one of Polish construction companies.

The k-nearest neighbors method
The Data Mining methods fulfill several functions.They enable describing data/objects, classification of objects (meaning assigning them to classes of a dependent variable, on the basis of values of independent variables describing a given object), grouping of objects (creating groups of objects the most similar to each other, on the basis of values of independent variables), search for associations (finding interdependencies between data, they are based upon co-appearance of values of particular variables) and realization of the regression task (search for a model of relation between independent variables and a dependent variable in a numerical form) [12,13].
An example of a method that allows realization of both classification and regression tasks is the k-nearest neighbors method.This paper focuses on using this method for a preliminary estimation of steel consumption for an inquiry of quotation sent by a client, so the regression task is the one being realized.The k-nearest neighbors method focuses on finding a determined number of objects located in the closest proximity to a new object and calculating a value of an unknown dependent variable on their basis (usually it is simply an average of a dependent variable value for the selected k objects).The distance between objects in a multidimensional space, for the quantitative variables, is defined using one of the several selected metrics, e.g.Euclidean, Manhattan or Chebyshev.The first one is the most commonly usedit represents a standard way of perceiving distance by humans in real world.For the qualitative variables, the "different from" function is used, taking the value of 0 when two objects have the same value of a feature and 1 if the value is different [13,14,15].
The k-nearest neighbors method requires normalization of variables, as it is possible that a variable taking high values will totally dominate the distance metric and influence of the other variables.It is also worth ensuring, that there will be no situation, in which many less important variables, for which the studied objects will have a very distant value, will dominate very relevant variables, for which this distance will be very small.To achieve this, the so-called axis stretching in a multidimensional space can be applied, meaning choosing appropriate weights for the more important independent variables [13].
The further part of this paper presents an example of use of the k-nearest neighbors method for estimation of steel consumption for the quotation inquiries coming from clients.This estimation is conducted on the basis of already realized, similar projects, archived in a database.The consumption is determined by an a priori defined index.In scope of the performed work, a dedicated software tool EXPLOR was designed.It realizes the following functions: 1C, MultiC and k-neighbors.Their development is presented in the further part, on specific examples.

Assumptions for the used method and application
Aim of the work conducted by authors of this paper was development of a method of supporting the decision-making in scope of steel consumption estimation, for building a construction, which is a result of initial discussions with a client.The following assumptions were taken, regarding the used method and the designed software application: • the application uses a database of projects already realized by the construction companyat the moment of launching the application, the database had approx.300 records (where a single record contained data from one, specific order realized for a client),

• the most important features, differentiating the projects and having the most important influence on a level of steel consumption are: -hall width [m], -hall height, from ground to bottom of roof construction [m], -number of aisles [1 or 2], -roof type [penthouse or gable roof], -roof load [kN/m 2 ],
• the records regarding halls with a gantry and without it, • in the case of the k-neighbor function, the user was given a choice of number of objects ("neighbors"), on basis of which the estimation is conducted, the Euclidean distance was used, • the application realizes three tasks, which are presented in a greater detail in chapter 3.2.

Exemplary use of the application
The authors prepared a software application, realizing three tasks aimed at adjusting value of an independent variable from the already realized orders, to a new offer, on the basis of given values of independent variables.The first two described tasks (1C, MultiC) mostly pertain to rapid comparison of new offers to already realized projects, while the k-neighbor method uses a selection of weights of independent values by the authors.Panel of the EXPLOR software is shown in Figure 1 and Figure 2.
The first function of the EXPLOR program (named the 1C) regards use of only one criterion of result adjustment -a specific value of an independent variable, e.g.finding records of an applied value of hall width.This function is particularly useful, when a company must analyze types of already realized projects by a specific, single criterion.The algorithm will often return a large amount of results (in comparison to the other two functions), which may be then used for deepened exploration, e.g.indicate the most profitable orders of the ones selected.As presented, the steel consumption is determined on the basis of 7 nearest neighbors of the case analyzed ("new offer" -see Figure 3).The results will be returned in a separate window and the nearest neighbors are ordered starting from the smallest distance.It is noteworthy, that in this case, the results can be returned even if the found value of the independent variable is not exactly the same as indicated by a user.Figure 3 presents a results window.The analysis result is an average value of the steel consumption index, as well as the standard deviation.

Conclusions
The paper presents an innovative approach to supporting the decision making in the Bidding and Contract Preparation department, on example of a construction company.The authors, together with employees of a company, defined factors having the highest influence on steel consumption and the most differentiating for the steel constructions and ordered them by importance.The possible ways of database exploration were presented, to find already realized orders, which are the most close to an analyzed order, on the basis of values of features defined by the user.
Facing the tight deadlines and difficulties with preparing even the preliminary projects for several or more than a dozen clients at the same time, it was decided to implement the solution proposed by the authors.Quick preparation of an offer using a steel consumption index estimated on the basis of data of already realized orders, allowed the company to perform calculations of profitability of undertakings in virtually no time, as well as to limit costs of project preparation.
The historical data, appropriately processed and analyzed, have turned out to be very useful in daily functioning of a company and building the process knowledge.The tool developed by the authors is universal and may be used by other companies, after modifying the most important independent variables and values of their weights.

Fig. 3 .
Fig. 3. Result of work of the k-nearest function.The result is 7 closest neighbors.