Tackling Failure through Discovery of Semantic Neighbors Nodes in WSNs

. Wireless Sensor Networks (WSNs) are attracting active and increasing research interest in various application fields including industrial control and environmental applications. In such cases accurate event detection is of utmost importance, thus sensors ’ data need to be fused through a sophisticated process (i.e. data mining algorithms). In this context, semantic correlations between sensor nodes and formation of semantic clusters is critical as it enables the fusion of specific sensor data regardless of the proximity criteria. Traditional clustering schemes aim to extend sensors ’ network lifetime using criteria such as received signal strength, while the semantic correlation is frequently omitted. In this paper, two novel techniques for discovering semantic neighbors are proposed, Diffusion Algorithm for Discover Semantic Neighbors (DA- DSN) and Trace Route Algorithm for Discover Semantic Neighbors (TRA-DSN). Design and development efforts are analysed while the evaluation results offer a useful guideline on which technique fits better in different WSNs deployments.


Introduction
In recent years, Wireless Sensors Networks (WSNs) have attracted high research interest in a wide variety of application domains including industrial and mechanical engineering control [1]. The primary advantage of exploiting WSN in such applications is the accurate and reliable failure detection in the context of highly dynamic and in many cases hostile industrial deployments. Such objective is achieved by ad-hoc network formation and rapid connectivity adaptation while processing diverse sensors values towards identification of irregular situations, or malfunction events. Existing data mining techniques can offer algorithmic solutions. However, conventional implementations require high processing capabilities and abundant memory availability, in order to meet specific execution time restrictions. Such assumptions contradict to typical WSN characteristics, where the sensor nodes suffer from limited processing power and available memory. Such restrictions, in combination with the error prone nature of wireless communications, highlight the challenge for designing distributed, highly efficient, yet of low complexity and low resource-demanding data mining algorithms in WSNs. The distributed implementation of such computational intensive-algorithms can be highly beneficial towards balancing CPU load among several nodes. Specifically, the distributed implementation of an algorithm increases on-site processing, and can potentially reduce the number of data packet transmission, leading to bandwidth conservation, network data transfer relaxation and energy consumption degradation. The application of these implementations in a real WSN deployment, requires an event dissemination strategy, where the nodes that are semantically correlated towards a specific event are discovered [2].
Due to critical resource constraints of the sensor networks it is important to design techniques which minimize resources utilization and consequently extend their lifetime. Clustering has been used in ad-hoc networks and WSNs as an effective technique for achieving extended network lifetime and scalability and increase reliability [3]. The conventional idea is to perform the cluster formation based on the received signal strength and to use local Cluster Heads (CHs) as routers of data gathered by the sensors in their clusters towards the sink node. However, although there is often some correlation between sensor nodes grouped in a same cluster, a semantic correlation is not frequently exploited. However, as mentioned earlier, in the majority of WSN applications, there is some semantic correlation between sensor nodes grouped in a same cluster, which is thus, omitted and the overall performance is degraded. In such cases it is of cornerstone importance to accurately define the semantic correlation between the types of sensors considered, as it enables the correlation of specific sensor data in a sophisticated process [4].
In this paper we define as logical or semantic neighborhood of a sensor node the set of nodes that monitor data values that define a multimodal event, utilized by a data mining algorithm in order to derive a decision about the occurrence of an event. Hence, semantic clustering refers to the process of discovering semantic neighbors related to a specific multimodal event. The formulation of such clusters can lead to better resources utilization, towards optimal communicational and computational load balancing, as the event detection algorithm is executed only when a node within the semantic cluster reports a weird value.
Considering a WSN of hundreds of heterogeneous nodes that are used to monitor different scenarios, e.g. temperature and humidity sensors are used for environmental purposes, while at the same time motion sensors and cameras are used for security reasons. If the semantic correlation between the aforementioned type of sensors is taken into consideration, then cluster formation is solely based on proximity and geographical criteria. But intuitively, in case e.g. a temperature sensor detects an abnormal value, it should immediately correlate its value with the humidity value in order to detect an abnormal situation. Similarly, the detection of motion should be combined with the captured image in order to determine any human presence. However, the geographical neighborhood of a sensor node may be totally different from the semantic neighborhood.
In this paper we propose two different protocols of discovering semantic neighbors, Diffusion Algorithm for Discover Semantic Neighbors (DA-DSN) and Trace Route Algorithm for Discover Semantic Neighbors (TRA-DSN) (Fig.1).

Methodology
Assuming a multimodal event (MME) based on the sensor types S 1 , S 2 ,.., S k, , a semantic query SQ= TS 1 ,TS 2 ,…,TS K is constructed. The semantic neighborhood formation of sensor S k is the discovery of its semantic k-1 neighboring nodes as well as the routing paths to these nodes. The core approach in this case of the proposed algorithm is that the root node formulates a packet including the above information and passes it on to all the 1-hop nodes.

Diffusion Algorithm for Discover Semantic Neighbors (DA-DSN)
If any of these is semantically related to the event, it directly sends the response to the initial node, removes the corresponding sensor type from the query and transmit the updated query to its 1-hop nodes. The propagation of the query also occurs in the case, that the node is not semantically related to the multimodal event. The nodes that receive such a message know the whole route path from the initial source node. Hence if they are semantically related to the event they can easily reply back to the source node following the already known route path, otherwise they just forward the message to their children-nodes. If a node receives the query message with the same Multimodal event id more than once, they drop the packet, preventing useless transmissions. As the replies from the nodes arrive at the source node, the formulation of its logical neighborhood is achieved. According to DA-DSN, a query flooding occurs until all sensor types are discovered. The state diagram of the abovementioned algorithm is depicted in the following Figure (

Trace Route Algorithm for Discovering Semantic Neighbors (TRA-DSN)
The second approach proposed and explored in this work (TRA-DSN) is based on the well-known Trace Route technique [5]. Traceroute is a computer network diagnostic tool for discovering/displaying the route (path) and measuring transit delays of packets across an Internet Protocol (IP) network. Trace Route determines the path taken to a destination by sending ICMP Echo Request messages with varying Time to Live (TTL) values to the destination. Each router along the path is required to decrement the TTL in an IP packet by at least 1 before forwarding it. Effectively, the TTL is a maximum link counter. When the TTL on a packet reaches 0, the router is expected to return an ICMP Time Exceeded message to the source computer. Trace Route determines the path by sending the first Echo Request message with a TTL of 1 and incrementing the TTL by 1 on each subsequent transmission until the target responds or the maximum number of hops is reached. The root node that initiates the discovery of its logical neighbors, transmits the packet having the TTL value equal to 1. The 1-hop neighbors, reduce the TTL and send a positive response or a Time Exceeded Message back to the root node. The root node retransmits the query, increasing the TTL value until all semantic neighbors are found. In this case, if the semantic neighbors are close to the source node, the number of transmissions is reduced as there is no advertisement in the network. The state diagram of the above-mentioned procedure is depicted in

Evaluation Results
Aiming to offer an objective evaluation based on state-of-the-art components, the Contiki OS is selected as a prominent open source operating system for networked wireless devices with limited resources. The simulations were conducted using Cooja, the network simulator provided by Contiki OS. It must be noted that Cooja comprises a valid and widely used emulation environment of Contiki based code offering highly accurate representation of the behavior and performance evaluation. The application code has been compiled for TelosB sensor platforms which are emulated by the Cooja simulator. TelosB is based on 8MHz 16-bit RISC TI MSP430 processor with 10 kB RAM, 16 kB of configuration EEPROM, and 48 kB flash memory. It draws 1.8mA in active mode and only 5.1 in sleep mode. It is IEEE 802.15.4 compliant having a widely utilized CC2420 radio chip. The communication is compliant with the 2.4 Ghz IEEE 802.15.4 specification used by most dominant WSN platform. The MAC layer protocol considered is the low power CSMA protocol, without duty cycling so as to evaluate the sole effect of the implemented discovery algorithms, which is one of the most common medium access control algorithms. Based on this framework and in order to reach useful conclusions, a comprehensive evaluation is presented with respect to parameterized network characteristics, while considering all required performance metrics.
The number of transmissions, the energy consumption per node as well as the total energy are the extracted evaluation metrics. Aiming to highlight the pros and cons of each technique, two different topologies have been used in the evaluation analysis. In both cases, the originator of the discovery process is node with id 1. In the first case, the depth of the routing path beginning from this node is 3-hops (Fig.4), while in the second one is equal to 1 hop (Fig.5).  In Tables 1 the data flows of the semantic query packet, using the DA-DSN protocol are presented for both topologies. Μore specifically, the broadcast transmissions for finding the semantic neighbors of node id 1 in Topology 1 are 7, while the unicast transmissions from these nodes towards the originator node are equal to 9. When the initial packet arrives at each node, the routing path is known, hence each of the semantic neighbors uses this path to reply back to the initial node. Similarly, in Topology 2 where the semantic neighbors are 1-hop from the source node, the number of broadcast transmissions is equal to 10, while the unicast transmissions is equal to 5. As it can be easily observed the initial packets will be forwarded to all nodes, regardless that the semantic neighbors can be reached from the first transmission.  Table 2   In a typical Wireless Sensor Network platform, the communication interface comprises probably the most power-hungry component. Consequently, minimizing respective energy consumption, and more precisely selecting the adequate algorithms that lead to respective minimization is of paramount importance to conserve such a scares resource. However, in order to offer a comprehensive evaluation, respective efforts must focus both on the sensor as well as on the network wide performance. Therefore, presented evaluation takes into consideration the i) individual radio energy consumption, ii) the aggregate consumption, iii) the average consumption and iv) the mean deviation of energy consumption. The latter is a critical metric highlighting the dispersity of values and thus the fairness regarding the power consumption among the network nodes.  Table 3 depicts all measured and calculated radio energy consumption measurements regarding the first topology where there is a 1-hop rout path. As it is clearly shown when a route path comprised by multiple hops is considered Diffusion Algorithms does offer distinct benefits in all cases. Specifically, considering network wide energy consumption TRA-DSN yield a 32% higher aggregate energy consumption while individually an analogous 32% increase is recorded by average in each node. However, the metric that provides the most interesting results is the mean deviation where TRA-DSN provided much more dispersed measurements and specifically respective dispersity increased by 60% when the DSN algorithm changed for DA to TRA based approach. This measurement pertains to fairness of energy consumption distribution and consequently the expected network lifetime. Respectively Table 4 highlights the same measurements considering the second topology. In this case the main differences are: i) the route path is only 1-hopo long and ii) the network is comprised by 10 nodes compared to the 7 nodes considered in the first topology.  In this case derived conclusions are not as straightforward as in the first case. The first valuable observation is that in the TRA-DSN case there are nodes that actually didn't actively participated in the discovery process thus yield zero radio power dissipation (nodes 7-10 in our case). This is because the discovery of the semantic neighbors is achieved in steps (TTL) and when TTL level has the value of 1 all semantic neighbors are found and the procedure is terminated. Consequently, Trace Route based approach appears to be more efficient in this case since network wide a decrease in the order of 34% is recorded for both the aggregate and average per node dissipated radio energy. However, once again DA-DSN appear to achieve a more uniform distribution of energy