Supervised Machine Learning Models for Forecasting Fuel Consumption by Vehicles During the Grain Crops Delivery

. In the work possibilities of applying computational intelligence, namely machine learning models, in the grain crops delivery from agricultural enterprises to the elevator are analyzed. The expediency of using regression models of machine learning to forecast fuel consumption by vehicles during the grain crops delivery is established. Based on the historical data of the enterprise on the orders execution for the grain crops delivery, which include key factors influencing fuel consumption, the article forecasts fuel consumption by vehicles using such models: Generalized Linear Model, Neural Network Model, Decision Tree Model and Random Forest Model. The developed models were evaluated according to efficiency criteria, including mean absolute error, root mean square error, mean absolute percentage error, total time and training time. According to the modelling results, it is found that the most accurate and relatively fast forecast of fuel consumption by vehicles is obtained by applying the Random Forest model with MAPE 7.8 %.


Introduction
Year after year, intelligent decision-making support systems in transport logistics systems are gaining more use [1,2].The task at hand for researchers involves resolving the challenge of enhancing the precision when modeling transport processes.This is important not only for elevating cargo delivery efficiency and mitigating risks but also for identifying essential efficiency factors.Concurrently, leveraging computational intelligence stands out as a potent method for enhancing the effectiveness of these models [3,4].
The process of transporting agricultural cargo has its own characteristics for a transport company, namely the use of specialized transport (for example, grain trucks) and the need to take into account the properties of the cargo, such as flowability, moisture, breathing, selfheating (grain, corn, flour, and others).In addition, the carrier should take into account the influence of the human factor and weather conditions on the transport process.Typically, farms choose a carrier for the transportation of their products based on the cost of services.Therefore, the company need to solve the task of planning the transport process in order to rationalize costs and form a market price for its services.

Statement of the Problem and Analysis of Recent Research
The purpose of the study is solving the scientific and applied task of substantiating and optimising an algorithmic machine learning model for predicting fuel consumption by vehicles is currently quite relevant for motor transport enterprises engaged in the delivery of grain crops from agricultural enterprises to the grain elevator.Recently, the number of studies [5][6][7][8][9][10][11][12][13][14][15][16] related to the use of computational intelligence, in particular machine learning, in logistics systems has increased.For example, machine learning algorithms are used to predict the type of transportation [6,7,8], the duration of transportation operations [9,10,11], the choice of vehicle type [12], fuel consumption [13,14,15], the cost of an order [16], etc.However, there are no publications on algorithmic modelling machine learning model for predicting fuel consumption by vehicles during the delivery of grain crops from agricultural enterprises.

Aim and Objectives of the Study
The aim of the work is to substantiate and optimise an algorithmic machine learning model for predicting fuel consumption by vehicles during the delivery of grain crops from agricultural enterprises to the grain elevator.
To achieve the goal there is a need to solve the following tasks in the work: ─ to carry out structural settings of the selected supervised machine learning models and the forecast the fuel consumption by vehicles during the delivery of grain crops from agricultural enterprises to the elevator; ─ to select most efficient machine learning model for predicting fuel consumption by vehicles in the delivery of grain crops from agricultural enterprises to the grain elevator based on the separated criteria.

Methods
It is possible to obtain accurate forecasts of fuel consumption by vehicles during the grain crops delivery from farms to elevators based on machine learning algorithmic models.However, they require large samples of data described by stationary distributions for previous periods (Table 1).When using algorithmic machine learning models, we deal with separate information about the states of the research object, which is generated by machine learning algorithms based on training samples of data about the research object.Based on the information (Table 2) about the states of the study object, algorithmic models are further substantiated.Changes in the states of the study object are used to forecast the target variable (fuel consumption used by vehicles during the grain crops delivery from agricultural enterprises to the grain elevator) (Fig. 1.).The intelligent approach in algorithmic machine learning models has a general pattern of application, which is implemented in several stages.On the basis of these and other studies, we have designed our own algorithmic machine learning models for predicting fuel consumption by vehicles during the delivery of grain crops from agricultural enterprises to the elevator.
In the course of algorithmic modelling of grain crops delivery from agricultural enterprises to the elevator, data on the implementation of transport processes for previous periods describing the relevant processes are used.The optimal model for forecasting specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the r order is considered to be the one that ensures minimisation of the above criteria:

ЕM (SFC ri )=f(MAE, RMSE, MAPE, Total Time, Training Time)→min
(1) Based on the selected machine learning model, existing means of improving the quality of the forecast are used to predict the specific fuel (SFC ri ) by vehicles of the r brand during the service of the i order.This is done by adding adaptive amplifications, increasing the depth of research on certain types of decision trees, as well as proposing different rules for forecasting, etc.

Generalised Linear Model
The algorithm of the GLM allows to select regression models that accurately describe individual factors of fuel consumption by vehicles during the delivery of grain crops from agricultural enterprises to the elevator, which

Predicted
Specific fuel consumption Target Specific fuel consumption are described by one of the distributions belonging to the exponential family [17].The exponential family of distributions includes normal, binomial, Poisson, geometric, negative binomial, exponential, gamma, and inverse normal distributions.
Forecasting the specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the i order for the grain crops delivery from agricultural enterprises to the grain elevator is performed taking into account a set of factors { ri }.In this case, the specified set of factors { ri } is considered as an ordinary multiple regression.The deviation of the desired variable (the specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the i order) from its average value (SFC ri ave ) when the k independent factors of specific fuel consumption change is within ξ r1 , . . ., ξ rk .In this case, the average value (SFC ri ave ) the specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the i order is: where ξ r = ( 1 , . . .,   ) -a row vector representing the independent factor of specific fuel consumption by vehicles of the r brand during the servicing of the i order, τ r =(τ 1 ,...,τ п ) -a column vector representing the regression coefficients; SFC ri ave -the average value of specific fuel consumption by vehicles of the r brand, which is considered to be independent of the factors of specific fuel consumption by vehicles.
Forecasting the specific fuel consumption (SFC ri ) by vehicles of the r brand when servicing agricultural enterprises using classical linear regression involves finding the deviation from the average values (SFC ri ave )of specific fuel consumption by vehicles that are the same for the i orders for the delivery of grain crops from agricultural enterprises to the elevator: where V -a vector of unknown factors of specific fuel consumption by vehicles of the r brand during during the service of the i order.
The vector of unknown factors (V) of specific fuel consumption by vehicles of the r brand during the service of the i order is: At the same time, the use of the GLM implies the assumption that there are identifiable errors: In this case, the possible errors η r are considered independent and distributed according to normal distribution laws with the average value (SFC ri ave ) of specific fuel consumption by vehicles and variance σ 2 .Taking this into account, the GLM for forecasting specific fuel consumption by vehicles is as follows: SFC ri /τ, σ 2 ,ξ∼N r (ξτ,σ 2 I) (6) where N r (ξτ,σ 2 I) -is the normal multivariate distribution of specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the i order.The obtained coefficients of the GLM for forecasting fuel consumption (SFC ri ) by vehicles are presented in Table 3.The main advantage of the GLM is that the regression is not limited to input data sets described by exponential family distributions.This makes possible to represent both frequency and binary indicators when studying the components of transport processes [17].It is also possible to use both correlated and uncorrelated samples of independent factors, which is typical for forecasting specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the i order.

Neural Network
A fully-connected multi-layer neural network (Multilayer Perceptron, or MLP) model belongs to a subset of artificial neural networks of machine learning, their algorithms are created and operate similarly, and they have many levels.Each of these levels provides a different interpretation of the data submitted for processing, which is handled non-linearly in different layers of the created neural network (Fig. 2.).
The process of developing MLP model for forecasting specific fuel consumption (SFC ri ) by vehicles is divided into two stages: 1) determination of the functional relationship forms between the forecasted value (SFC ri ) and a set of factors �ξ ri � that influence the obtained value: The authors of scientific work [1] proved that the effectiveness of MLP model is determined by their ability to display a set of compositional dependencies that are interpreted as separate outputs of hidden layers of neurons: f�ξ r1 ,...,ξ rn �=h 1 �h 2 �h 3 �...h j �ξ rn-k ,...,ξ rn ���� (9) where h j -an interpretation of individual outputs of hidden layers of neurons.
A neural network with a large number of hidden layers provides a much more accurate approximation of complex functional dependencies, which ensures accurate forecast of specific fuel consumption (SFC ri ) by vehicles.The increase of the layers number in the neural network is achieved by bypass connections [2] between individual layers of hidden neurons.In this case, the output of a separate unit with a bypass connection is written by the expression: y=W 2 ∘S�W 1 ∘ξ r �+ξ r (10) where y -output of a separate unit with a bypass connection; W 1 , W 2 -weighting coefficient matrices of individual hidden layers in the neural network; S -logistic sigmoid function;  r -input vectors of the first hidden layer of the neural network; ° -convolution operation.
The deep learning of the neural network for forecasting specific fuel consumption (SFC ri ) by vehicles was performed in the following sequence: The input dataset (BD) (Table 1) was divided into training and test samples.
The loss function was determined and the data sets were created to test the MLP model for forecasting specific fuel consumption (SFC ri ) by vehicles.
The architecture of the deep neural network was tested on the basis of algorithmic modelling and the best one was selected from among them according to the criteria of mean square error (MSE) and mean absolute error (MAE).
As a result of the research, the MLP model was created with using machine learning and predictive analytics platform H2O [18] to forecast specific fuel consumption (SFC ri ) by vehicles (Fig. 3).

Fig. 3. MLP model for forecasting specific fuel consumption (SFC ri ) by vehicles
The advantage of the obtained MLP model for forecasting specific fuel consumption (SFC ri ) by vehicles is to ensure high accuracy with a sufficient and complete sample of initial data on the set of factors �ξ ri � that affect the obtained value (SFC ri ).

Decision Tree and Random Forest Models
Decision Tree model and Random Forest model are decision tree-based supervised machine learning models.The DT model involves the construction of a decision tree for forecasting specific fuel consumption (SFC ri ) by vehicles.The structure of this tree is organised in such a way that the tree has a root, branches that represent internal nodes and leaves that are not further classified.The internal nodes of the decision tree are attributes, while the branches that connect individual nodes provide the values of these attributes.In addition, the leaves are class labels that are used to make decisions when forecasting specific fuel consumption (SFC ri ) by vehicles.
The Random Forest (RF) model for forecasting specific fuel consumption (SFC ri ) by vehicles involves the construction of ensembles of regression trees.Each of these decision trees is characterised by high variance σ i .However, when they are systematically combined in parallel, the resulting variance value Σσ is low.This is due to the fact that individual decision trees are perfectly trained on the given data samples, and the result obtained is not from one individual decision tree, but from a set of formed decision trees.
When obtaining the forecasted value, the final result depends on the classifier with the majority of individual trees votes.In our case, the forecasting result is obtained as the average of all the results of forecasting specific fuel consumption (SFC ri ) by vehicles obtained from individual decision trees.This component is called aggregation [19].
The RF model involves building several decision trees to forecast specific fuel consumption (SFC ri ) by vehicles, which are the basic training models.
The main definitions for forecasting specific fuel consumption (SFC ri ) by vehicles using DT model and RF model were described in previous research [20,21].As a result of the research, the DT model for forecasting specific fuel consumption (SFC ri ) by vehicles was created, as shown in Fig. 4.
Based on the research, the RF model was developed to forecast specific fuel consumption (SFC ri ) by vehicles using equipment, a part of which is shown in Fig. 5.
The obtained quantitative values of the forecasting specific fuel consumption (SFC ri ) by vehicles using the RF model are the average forecast values obtained by each of the 50 decision trees of the ensemble.The obtained forecast of specific fuel consumption (SFC ri ) by vehicles using the RF model, in contrast to the values of individual trees using the DT model, has a lower ability to retrain the model and greater flexibility to the limit of decisions on vehicle fuel consumption (SFC ri ).

Result and Discussion
The justification of a rational model for forecasting fuel consumption (SFC ri ) by vehicles during the grain crops delivery from agricultural enterprises to the elevator was based on the data collected from an enterprise that delivers agricultural products (Lutsk, Volyn region) for 2019-2022 and previously prepared data (Table 1).To do this, we created a program code in Python 3.9 language [22] using the Scikit-Learn library [23]  The evaluation of these models for forecasting fuel consumption (SFC ri ) by vehicles during the delivery of grain crops from agricultural enterprises to the elevator was carried out according to separate criteria (absolute error; standard deviation; total training time; relative training time).The obtained results of studies on Based on the obtained data (Table 4) on the evaluation of models for forecasting fuel consumption (SFC ri ) by vehicles during the grain crops delivery from agricultural enterprises to the elevator, it was found that the best results in terms of absolute error were obtained using the RF (Random Forest) model.This model provides a mean absolute error of 3.341 l/100 km.The data in Table 4 shows that the best forecast of fuel consumption (SFC ri ) by vehicles during the delivery of grain crops from agricultural enterprises to the elevator is performed by the RF (Random Forest) model, which provides a mean absolute percentage error of 7.8% and a total machine learning time of 1.7s.

Conclusion
The article substantiates the expediency of forecasting fuel consumption by vehicles during the grain crops delivery from agricultural enterprises to the elevator using machine learning methods.Based on the most accurate regression models of machine learning were carried out structural settings of the selected models for forecasting, namely: GLM (Generalized Linear Model) model, MLP (Neural Network) model, DT (Decision Tree) model, RF (Random Forest) model.
The most accurate results of forecasting fuel consumption by vehicles during the grain crops delivery from agricultural enterprises to the elevator are obtained using the RF model.The absolute error of the model is 3.341 litres/100km, the mean absolute percentage error is 7.8%, and the total machine learning time is 1.7s.

Fig. 1 .
Fig. 1.Structural machine learning model for forecasting specific fuel consumption of vehicles We have analysed the state of use of algorithmic modelling in various applied fields [1, 2, 3, 4].It has been established that the task of forecasting fuel consumption by vehicles during the delivery of grain crops from agricultural enterprises to the elevator belongs to the tasks of multiple regression.At the same time, the most common and fairly accurate algorithmic regression models are: 1. GLM -Generalised Linear model; 2. NN -Neural Network; 3. DT -Decision Tree model; 4. RF -Random Forest model.For each of the presented machine learning models, the effectiveness of forecasting the specific fuel consumption (SFC ri ) (Specific Fuel Consumption) by vehicles of the r brand during the service of the i-order is evaluated according to the following criteria: − mean absolute error MAE ; − root mean square error RMSE; − mean absolute percentage error MAPE; − total time Total Time; − training time Training Time.The optimal model for forecasting specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the r order is considered to be the one that ensures minimisation of the above criteria:

Fig. 2 .
Fig. 2. Structural model of multi-layer neural network (MLP) for forecasting specific fuel consumption (SFC ri ) of vehicles 2) assessment of the parameters Θ of the forecasting specific fuel consumption (SFC ri ) by vehicles model:

Fig. 4
Fig. 4 Part of the DT model for forecasting specific fuel consumption (SFC ri ) by vehicles of algorithmic regression models described above (GLM (Generalized Linear Model); MLP (Neural Network) model; DT (Decision Tree) model; RF (Random Forest) model).

Fig. 5 .
Fig. 5. Part of the RF model for forecasting specific fuel consumption (SFC ri ) by vehicles

Table 1 .
Part of data for predicting fuel consumption by vehicles during the grain crops delivery from agricultural enterprises to the grain elevator

Table 3 .
GLM for forecasting specific fuel consumption (SFC ri ) by vehicles of the r brand during the service of the i order

Table 4 .
SFC ri ) by vehicles are presented in Table4.Results of evaluation of models for forecasting fuel consumption (SFC ri ) by vehicles during the grain crops delivery from agricultural enterprises to the elevator