A new spectrum feature discovered for a category of signals produced by massive and random micro-sources

A statistical study is implemented on the short-time spectrum of one main category of random signals. For the signals with massive and random micro-sources, a new statistic feature of the short-time amplitude spectrum is discovered, which reveals the relationship between the amplitude’s average and its standard for each frequency component. Moreover, the association between the amplitude distributions for different frequency components is also studied. A model representing such association is presented, which accords well with the statistic feature discovered. The analysis result has potential application in signal classification, and also in the study of system characteristics underlying the observed signal.


Introduction
From the viewpoint of computing, a signal processing method is a computing process taking the signal as input. Due to the large variety of signal types, it is important to identify the signal characteristics (or category) for the selection of a proper method for the specific task [1,2]. Signals are the observation result of a system. In practice, a system usually consists of massive micro elements. Each element contributes to the generation of the signal. These elements are named "micro-sources" in this paper. The observed signal is the accumulation effect of all the micro-sources. The relationship between these microsources determines the basic characteristic of the signal, which can act as the basis for signal classification. In some systems, the micro-sources are strongly associated, such as different parts of a tuning fork moving synchronously in vibration. Other systems have loosely coupled micro-sources and these sources move randomly, such as the air turbulence generating sound, or some friction producing sound [3][4][5][6][7].
In this study, the statistic characteristic of the spectrum is studied for the later category of signals. For this type of signals, the signal source can be decomposed to massive and random micro-sources. A novel statistical feature for short-time amplitude spectrum is discovered for this category of signals. The feature is named "consistent standard deviation coefficient". Moreover, the relationship between the amplitude probability distributions of two different frequency components is investigated, based on which a model is proposed representing such relationship. The validity of the new model is verified by analysis, with the discovered statistic feature as direct evidence.

Statistical study of spectrum for a category of random signals
The signals studied are sustained signals captured in real world, with sufficient signal length for statistical study. The signals include the sound of wind, plane engine room, electric fan, rain, river, etc. In the study, each signal is divided into frames, and DFT is performed for each windowed frame. A Hamming window is used on each frame. In another word, the short-time Fourier transform (STFT) is used to gather sufficient spectrum data for the statistic study. The amplitude spectrum is studied. In the discrete spectrum, there is finite number of discrete components, and the statistic study is eventually performed for each frequency component individually.
The signals' sample frequency is 22 kHz. In the STFT on each signal, the frame length is set to 512, which corresponds to a time interval of 23.3 ms for 22 kHz sampling frequency. The study is implemented by programming in Matlab. Let k denotes the k-th frequency component in STFT. Due to the randomness of the signal, the amplitude of k also varies randomly in each frame of the signal. Let ( k ) and 2 ( k ) represent the estimated average and variance of k 's amplitude respectively. And the estimated standard deviation ( k ) is the square-root of 2 ( k ). Mathematically, ( k ) and ( k ) are two functions, and their curves can be drawn after and are estimated for each frequency k . Some results of the curves of ( k ) and ( k ) are shown in Fig.  1 to Fig. 6. It can be observed evidently that there is clear similarity between the curves of ( k ) and ( k ). Such similarity also exists in all the other results of signals studied (signals of massive and random microsources), which inspires the investigation of the relationship between ( k ) and ( k ) as following.  Fig. 6. The result of ( k ) and ( k ) for the signal of scratching a paper Besides the above experimental results, the relationship between ( k ) and ( k ) is quantitatively verified by calculating the correlation coefficient between the two curves of ( k ) and ( k ). The correlation coefficient is calculated in a discrete form: where N is the number of discrete frequencies in the discrete spectrum. Some of the experimental results are shown in Table 1. The correlation coefficients between ( k ) and ( k ) are calculated for different signals. The correlation coefficients between ( k ) and ( k ) are much close to 1.0. Consider the unavoidable error caused by the instability of the signals, and also the noise introduced in the signal capture process, the results indicate that ( k ) and ( k ) are strongly related by a linear proportional relationship, which is a new statistic feature discovered for this category of signals. Because the parameter of "standard deviation coefficient" represents the to ratio, the above statistic feature is named as the feature of "consistent standard deviation coefficient". In another word, for the signals studied here, the proportional coefficient between the standard deviation and the expectation is consistent for all the frequency components in the short-time amplitude spectrum. This feature can also be expressed by: where c s is the consistent standard deviation coefficient of amplitude for all frequency components. The subscript s means that the value of c s is for one specific signal. If the signal is changed to another signal, the value c s may also change. Because the expectation and the standard deviation are two basic statistic of a random variable, the feature of "consistent standard deviation coefficient" indicates that there is certain association between the amplitude probability distributions of different frequency components, which is studied in the next section.

The statistical relationship between different frequency components
Based on the spectrum data obtained by STFT, the histogram of amplitude for each frequency component k is computed. The histogram reflects the distribution of random amplitude data for each k , which is closely related to the amplitude probability distribution. Therefore, the amplitude histogram of each k is compared to those of other frequencies, in order to study the relationship between them. In order to study the amplitude distribution of different k without the influence of different average value, the normalized histogram is also computed for each k . First, the average of amplitude for k is computed. After that, each amplitude data of k is divided by that average value as a preprocessing step. The normalized histogram is then computed based on the data after that preprocessing. In the results, the normalized histogram curves obviously converge to one central curve, which indicate the strong association between the amplitude distributions of different k .
Based on such results, a model of amplitude distribution in frequency domain is proposed for the signals studied here. For each signal, the amplitude distributions for different k are supposed of the same type, but with different expectation (or average) values. In another word, there is a prototype distribution function p 0 (a 0 ), from which the amplitude distribution of any k can be derived by varying the expectation.
The above model can also be described mathematically as follows. As a random variable a, the amplitude of some k is modeled as the scaling of a prototype random variable a 0 , whose expectation is 1: where k is the scaling parameter. Equation (3) is a mathematical description of the model proposed. In the model, a 0 is the same for each frequency component, but the scaling parameter k may be different for different k . The proposed model can also find proof from the new discovered statistic feature in Section 2. In the following, the feature of "consistent standard deviation coefficient" can be theoretically induced from the proposed model; in another word, this model accords well with the feature of "consistent standard deviation coefficient" discovered in the experiments. First, consider the probability distribution of a in Equation (3), given p 0 (a 0 ) is the probability distribution of a 0 . According to Equation (3), the expectation of a is : where 0 is the expectation of a 0 . Based on the pdf (probability distribution function) of a variable's function in probability theory, the probability distribution of a can be deduced as: Consider the standard deviation coefficient of a: Considering Equation (4) and (5), Equation (6) can be rewritten as: Then do the variable substitution a=ka 0 to the integral on the right side of Equation (7): Remember that the variables a and a 0 represent the amplitude value, which is non-negative. Therefore, k is also non-negative. Then Equation (8) Notice that the numerator of the right side of Equation (9) is just the standard deviation of a 0 . Therefore, Notice that the right side of Equation (10) is constant given the prototype distribution p 0 (a 0 ). Therefore, the standard deviation coefficient of a is consistent whatever the scaling factor k is, which is equal to that of the prototype variable a 0 . This just accords well with the experimental results shown in Section 2. Therefore, the feature of "consistent standard deviation coefficient" supports the model proposed here.

Conclusion
The study in the paper indicates the possibility of signal classification by the statistic characteristics in frequency domain. The study is focused on the short-time amplitude spectrum, which is obtained by implementing STFT for the signals of massive and random microsources. A new feature of "consistent standard deviation coefficient" is revealed for this category of signals. This feature indicates strong association between amplitude distributions of different frequency components. A new model is proposed to representing such association. In this model, the random variables representing amplitude of every frequency component belong to the same pdf type, but they have different expectations. Moreover, by mathematical analysis, this model accords well with the feature of "consistent standard deviation coefficient". These results may facilitate signal classification and signal detection, which will be studied in further research. In future work, a specific pdf type will be studied to suit the short-time amplitude spectrum data for