Reliable Channels for Systems in the Presence of Byzantine Faults

Consider a distributed system that delivers each message from a process to its destination if the message transmission does not experience any faults and only delivers those sent by a non -faulty system process. Such a system is referred to as a reliable message passing system. A reliable message passin g system requires a reliable channel, a communication channel between a pair of processes that always detects a fault in message transmission and each detected fault is an actual fault, to be implemented. In this paper, we first identify the necessary conditions to detect some restricted form of Byzantine Faults in a message passing system where n disjoint paths exist between each pair of endpoints. We consider Byzantine Faults (BF) whose e_ect is limited to the modification of a message metadata, omission faults, and message replay. We then present a protocol implementing a reliable channel in message passing systems in the presence of n — 1 Byzantine Faults using n disjoint paths between each pair of communication endpoints where the paths with faults are not known apriori. The proposed protocol detects Byzantine Faults, where each detected fault, an actual fault, authenticates message origins, identifies faulty paths and classifies faults in the presence of multiple messages sent by various system processes.


Introduction
Co mmunicat ion reliab ility and security are essential properties in co mputer networks. Two impo rtant issues in communicat ion reliability and security are secrecy and authenticity; this work is concerned with the latter. A communicat ion that provides secrecy ensures that the message contents cannot be discovered by intermediate processes and channels. On the other hand, an authenticated network transmits each message sent and only those sent by the source process to the destination process intact. A channel with communicat ion authenticity, also referred to as an authenticated channel, is imp lemented by recovering fro m lost and altered messages, and detecting spurious messages, i.e., messages generated in the system by sources other than the expected/intended source process. Whereas, a reliable channel transmits only those messages sent by the source process to the destination process intact.
A reliab le channel ensures the following two properties. Completeness: whenever a node observes the effects of a faulty message, the system eventually generates evidence against the faulty message. Accuracy: it never generates valid evidence against a correctly delivered message.
Distributed systems are subject to variety of faults including Byzantine Faults (BF) [1]. A By zantine Fault is the most general form o f faults where a system component may exh ibit an arbit rary behaviour such as corruption of its state or program, sending arbitrary messages, or modifying system messages [2]. Therefore, many security attacks including censorship, misrouting, data corruption, software defects, and virus attacks can be considered as Byzantine Faults.
Byzantine Fault Detection, is a fundamental problem in achieving system availability, safety, and security. Byzantine Fau lt detection refers to the ab ility of a system to monitor and identify faulty system behaviour, rather than masking the effect of the fault. Byzantine fault detection is important since upon detection of the fault, the system can take appropriate actions.
Known approaches to deal with By zantine Faults include service replication [3], cryptographic techniques, message authentication codes [3], network coding and error detecting/recovering code techniques [4], and signed digests [5]. For instance, [4] shows that Byzantine modification detection capability can be added to a mu lticast scheme based on random linear block network coding, with modest additional computational and communicat ion overhead, by incorporating a simp le polynomial hash/check value in each packet. With this approach, a sink node can detect Byzantine modifications with h igh probability, provided that these modifications have not been designed with knowledge of the random coding combinations present in all other packets obtained at the sink: the only essential condition is the adversary's incomp lete knowledge of the random network code seen by the sink.
Existing schemes that detect or tolerate Byzantine Faults perturbing message content and variables deal with end-to-end communication [6,7,8] but fall short of tolerating/detecting faults in the form of spurious, and replay messages, and perturbation of message parameters during message routing. When message parameters are perturbed and spurious and replay mes sages are sent, additional fau lt types are introduced including routing the message to an unintended destination, mixing up of shares of different messages, replacement messages or false positives, a non-faulty message detected as a faulty message. Replacement messages are messages whose parameters are altered in such a way that it is regarded as another message. Cryptographic schemes, schemes based on network coding, message digests and hashing techniques available in the literature do not deal with these types of fau lts. These techniques also make strong assumptions such as perfect cryptography and cause delays due to the ciphering/coding at the sending end and deciphering/decoding at the receiving end. Furthermore, most authenticated channel imp lementations rely on computational bounds on the adversaries and an authentication step to exchange a key.
In this paper, we consider a system where processes communicate via message-passing over disjoint paths in the presence of Byzantine Faults. We first identify the necessary conditions to detect the pres ence of Byzantine Faults in the system using mes sages exchanged via disjoint paths between communicat ing endpoints. We then present a protocol to implement reliable channels in message passing systems in the presence of n-1 Byzantine Fau lts using n disjoint paths between each pair of co mmunicat ion endpoints where the paths with faults are not known. The proposed algorithm deals with a large variety of By zantine Faults including spurious messages, message omission and replay, and modification/corruption of message meta-data (parameters). The protocol allo ws mult iple messages to be sent simultaneously in the system where mult iple faults in the meta-data of the messages may occur introducing additional types of By zantine Faults such as replacement messages that are not possible otherwise. The algorith m also performs sender authentication where the destination process detects changes in the sender id and messages sent by an adversary (imposter) process claiming the id of another process. It also identifies fau lty paths and the fault types allo wing various recovery methods to be adapted. The algorithm does not use cryptographic or error correct ing/recovering codes. Therefore, there is no message latency due to network coding or cryptographic encodings, and there is no need to agree on the coding/decoding scheme be-forehand leading to timely discovery of faults affecting message transmission.
The proposed algorithm can be used for reliable message passing and fault detection in computer networks containing multip le d isjoint paths between communicat ion endpoints, in particular, in P2P networks whose overlay network topologies often contain many disjoint paths between endpoints. The protocol has the overhead of sending n copies of the same message. However, the overhead can be reduced if the nu mber of faults reduces. Using the proposed protocol, a perfectly reliable message passing can be implemented where the message is delivered if there is no fault detected, and discarded otherwise.
In addition, the p roposed protocol can be comb ined with network coding techniques when the protocol is modified to send n shares of a message using network coding instead of sending n copies of the same message. In this case, network coding can be used to detect/tolerate faults in message content while the proposed algorithm detects various other forms of By zantine Faults during the routing of messages accurately.
The paper is organized as follows. In Section 2, we present the background of the proposed work and the related work availab le in the literature. Section 3 de fines the necessary conditions required for the detection of Byzantine Faults. We conclude the paper in Section 4 with some final remarks.

Background and Related Work
A survey of information theoretic research in the area of secure network co mmun ication in the p resence of Byzantine adversaries is given in [9]. Two important issues in secure network co mmunicat ion are secrecy and authenticity. In the network coding context, the problem of ensuring secrecy in the presence of a wire tap adversary has been considered in [10] and [11]. The problem of correct ing adversarial errors to build authenticated channels has been studied in [12] and [13].
The concept of failu re detection was introduced by Chandra and Toueg [14]. These were defined for crash failures, but Malkh i and Reiter [15] later extended them to special classes of By zantine failures. In a more general manner, Doudou et al. [16] have introduced muteness failure detectors dealing with nodes that prematurely stop sending messages. Ho et. al [4] present a Byzantine modification detection based on random network coding techniques. Kihlstro m et al. [17] have introduced several classes of failure detectors that expose detectable Byzantine failures. However, they consider classes of algorith ms in which all messages are broadcast, and in which processes know when to expect messages fro m other processes. State machine replication [18] is a classical technique for masking a limited number of Byzantine Faults. BFT techniques, e.g. [3], are based on this idea. Although these techniques perfectly protect the system fro m Byzantine failures, they are usually not intended to detect the faulty nodes, and they are inherently expensive and not scalable. The BAR model [19] extends the BFT approach to tolerate selfish behaviour of rat ional nodes while provid ing a mechanism for detecting certain application specific misbehaviour. Alvisi et al. [20] introduced a technique that monitors quorum systems and raises an alarm if the failure assumptions are about to be violated. This technique is probabilistic and cannot identify wh ich nodes are faulty. Intrusion detection systems [21] can detect certain types of protocol violat ions; however the heuristics used in IDS tend to produce either false positives, false negatives, or both. Reputation systems such as EigenTrust [22] can be used against Byzantine failures, but they cannot prevent a coalition of malicious nodes from denouncing a correct 07002-p.2 node. Finally, t rusted computing platforms like TCG/ Palladiu m can detect failures that involve s oftware modifications, but force users to exclusively run certified software.
A related p roblem, Byzantine consensus in asynchronous message-passing systems, has been shown to require at least 3f +1 processes to be solvable in several system models (e.g., with failure detectors, partial synchrony or randomizat ion) [23]. Recently a couple of solutions to imp lement Byzantine fault-tolerant statemach ine replication using only 2f + 1 replicas have appeared [24]. This reduction fro m 3f + 1 to 2f +1 is possible with a hybrid system model, i.e., by extending the system model with trusted/trustworthy components that constrain the power of fau lty processes to have certain behaviours.
Our algorith m d iffers fro m prev ious work in using disjoint paths to detect Byzantine Faults between two nodes (sender and destination). The proposed algorithm deals with a large variety of Byzantine Fau lts including spurious messages, message omission and replay, and modification/corruption of message metadata (parameters) that are not dealt with by existing schemes. Unlike most work on By zantine Faults, this work focuses on detecting Byzantine Faults rather than tolerating them, therefore, we have weaker set of assumptions, i.e. it is only enough to have one non-faulty path for detection. The same algorith m can be extended to tolerate faults but with stronger assumption. The algorithm uses disjoint paths to detect faults between two nodes only unlike other work [25] that find all faulty nodes. This allows to have more than n=3 fau lty nodes in the network. The proposed algorith m deals with a large class of faults including message meta-data and content alteration, o mission faults, spurious messages and message replay. Un like cryptographic coding techniques, the proposed algorithm does not require high computational power.

Properties of BFD Protocol
In this section, we first define the properties that any system that detects Byzantine Faults needs to satisfy. We provide detection conditions that are necessary and sufficient to detect Byzantine Faults in the system and then prove that they are necessary and sufficient.
A system that detects a BF satisfies the following liveness and safety properties following [14], [17], and [26].

Eventual Strong Completeness (Liveness)
Every message sent by a non-faulty node and perturbed (but not omitted) by an intermediate node or a link is eventually received by the destination. In addition, Every non-faulty node that sends a message perturbed or omitted by an intermediate node or a link eventually discovers that its message has experienced a fau lt prior to reaching its destination..

Eventual Strong Accuracy (Safety)
No non-faulty node receives a perturbed message and wrongfully considers it as a non-perturbed message. In addition, no non-faulty node wrongfully d iscovers that a message has been perturbed.

Concluding Remarks
We presented a new approach using disjoint paths to detect Byzantine Faults (BFs) in co mputer networks to implement reliable channels and showed the necessary conditions to detect Byzantine Faults. It is an open problem to investigate ways to use disjoint paths to detect other forms of Byzantine Faults. In addition, investigation of various ways to combine the proposed algorith m with the existing ones such as finite state duplication, and cryptographic and network coding techniques to deal with Byzantine Faults is an open problem.