Modeling of acoustic pressure variability at thoroughfare

The work presents the analysis of variability of acoustic pressure, calculated from values of equivalent sound level, recorded in monitoring station. The models of acoustic pressure variability were built in the form of regression trees and random forest. The analysis of accuracy of obtained models was carried out. These models were subsequently used for reconstruction of equivalent sound level for periods of monitoring station inactivity.


Introduction
The noise of the vehicle is produced by the engine, exhaust system, contact of tires with road surface, and also by the contact of vehicle surface with air (aerodynamic noise). The level of road traffic noise depends on: vehicle speed, road geometry, type and condition of road surface, number of trucks and other "noisy" vehicles, including motorcycles with defective mufflers, and many other parameters [1]. The monitoring stations located near the streets register sound level which depends on vehicle noise and sounds coming from other sources located in the vicinity of the road ("background noise"). Monitoring stations sometimes stop registering data, and in order to impute the missing values, the models of sound level course can be applied. Our aim was to create method for construction of equivalent sound level model for any street. By using registered values we can calculate selected descriptors of registered signal, use them for building the training dataset, create the model (using regression trees or random forests), and finally apply the model for reconstruction of missing data for single days (preferably no more than one day per one calendar week). Moreover, some of the obtained models are very transparent and easy to interpret.

Measurement data
The equivalent sound level values were recorded in a noise monitoring terminal comprising a sound level meter (SVAN 958A, digital, four-channel, class-1, vibration and sound meter) and a weather station [2], located at Krakowska Street in Kielce, Poland. Measurements were run continuously (24 hours a day) and the RMS of the A-level sound was saved in the buffer in 1 second intervals with a resolution of 0.1 dB. The measurement results were recorded every minute. Based on these measurements, equivalent sound levels were calculated for the following 24h sub-intervals: day (6-18), evening (18-22) and night (22-6). The noise annoyance caused by long-term exposure to noise is often expressed by the A-weighted equivalent sound level (LAeq,T) in decibels (dB), defined by formula [3]: where 0 is the reference pressure equal to 20 µPa, and A ( ) is A-weighted acoustic pressure course. According to ISO standard, the value of , can also be calculated from [4]: where , is the A-weighted acoustic pressure level measured in the -th time interval. The recorded values of , calculated for day sub-interval (6-18) in year 2013 are shown in Fig. 1. The figure shows only "almost complete" calendar weeks (with at most one day without sound level records) excluding "irregular" weeks (containing at least one of 13 national holidays). Data shown in this figure will be used for construction of the sound level models. , calculated from measurements for day sub-interval (6-18) for "regular" weeks in year 2013

Models of sound pressure and its level
Computational intelligence methods such as artificial neural networks, fuzzy systems, random forest regression, or regression trees can be applied to constructing the model. Two Every training dataset used for construction of models consisted of 231 records, each of them describing one day d. Each record contained one output attribute y1 (db_A -, value for day sub-interval) and at least one input attribute x1 (day_of_week -number of day of the week for day d, 1 ∊ {1,2, … ,7}, where 1 denotes Monday, and 7 refers to Sunday). Other descriptors of A-weighted equivalent sound level were also calculated and subsequently used as input attributes. First of them is x2 (Leq_tbd -, value calculated for all other 6 days in calendar week containing x1, except the given day x1), but when no , value is available for any of these six days, the value from the same day of week from the nearest week is used instead. Three other descriptors, x3 (dn2n1), x4 (dn1p1), and x5 (dp1p2) are calculated in the following way:

Model no. 1
The model no. The accuracy of these models, as well as other models built by Cubist (including committee of 5 or 100 models) and RandomForest, is shown in Table 1. RandomForest models (always built using x1 defined as numeric attribute, and with various number k of randomly selected input attributes) are rather "overfitted" to training data and have worse 10-fold cross validation accuracy than Cubist's.
Accuracy of the models on the training and test sets was defined by the mean absolute error (MAE): where 1( ) denotes real 1 value taken from -th record, 1( ) ̅̅̅̅̅̅ denotes 1 value calculated by the model, and is the number of records.

Model no. 2
The model no. 2 uses dataset consisting of records containing only {y2, x1, x6} (or, in other words, pA, day_of_week, and pA_tbd) attribute values -this means that sound pressure levels in decibels were replaced by acoustic pressure in pascals. At first, x1 was defined as numeric attribute (model 2a), and Cubist regression tree software produced the following model (Fig. 3 Table 2. RandomForest models (always built using x1 defined as numeric attribute) are worse than Cubist's when comparing 10-fold cross validation accuracy. After conversion from acoustic pressure in pascals to its level in decibels, it turned out that accuracies of all models no. 2 (Table 3) were slightly worse than accuracies of their counterparts in model 1.    The accuracy of these models is shown in Table 4. Again, RandomForest models (e.g. Fig. 5) are better than Cubist's ( Fig. 4) on training set, but worse when comparing 10-fold cross validation accuracy. Model 3 contains more input attributes, and this fact allows RandomForest to build more accurate system in comparison to models 1 and 2 (both in terms of 10-fold cross validation and training data accuracy).  successfully applied to imputation of one missing value of , of day sub-interval during every week. To calculate all descriptors used in models, it is recommended to wait until the end of the week containing missing value. When this condition is not met, the 2 value will be calculated using data partially from other week, and the 4 and 5 attributes will have unknown values, but the models presented in this paper will still allow imputation of missing , values, presumably with lower accuracy. Cubist achieves better accuracy than RandomForest when evaluated by 10-fold cross validation, but worse when evaluated on training dataset. Best accuracy on training dataset (Mean Average Error 0.168 dB) is achieved by model 3a produced by RandomForest with random selection of k = 5 input attributes. However, better estimate of accuracy is 10-fold cross validation, where the best result (MAE 0.439 dB) is achieved by model 1b produced by Cubist committee of 100 regression trees. The best transparency is obtained by Cubist model 1b, which contains only 2 simple rules.