Two Multicasting Schemes for Irregular 3 D Mesh-based Bufferless NoCs

In this paper, the authors proposed two multicast routing algorithms for irregular 3D mesh-based Bufferless Interconnection Network. The DRM-1 scheme sends multicast through a non-deterministic path. The DRM-2 scheme broke multicast into multiple single packets. Compared with the DRM-1 multicast routing algorithm, DRM-2 scheme with four different synthetic workloads acquires high system performance.


INTRODUCTION
Nowadays, the technology applications and programming models in the Multi-Processors System-on-Chip (MPSoC), the multicast communications and the broadcast communications are becoming common.In a multi-processors system, the cache-coherent shared memory protocols (such as directory-based or token-based) require one-to-many or broadcast communications to require shared data or invalidate shared data on diverse cache blocks [1].It has been stated that only 3-5% multicast traffic will have a serious impact on system performance in the total network traffic [1].Thus, the supporting multicast communication in NoC can enhance system performance significantly.However, for the moment, Multicasting chip router designs [2,3,4,5] are only on the buffer router.There is almost no router designs that sustains multicasting on the bufferless router.
On the other hand, the following issues make the multicast communication more complicated.The first issue is the irregular topology structure.Applying virtualization [6] allows multiple applications mapping to different sub-network at the NoC level.The sub-network is often irregular and it negates regular routing algorithms, such as odd-even routing, XY routing and so on [7].The other is many kinds of application communication behavior, such as embedded systems, desktop applications.As a result, the customized multicast routing approaches as the using routing tables may not suit to them.
Several multicast approaches have been proposed for Network-on-Chip on the general mesh topology.The low Distance multicast (LD) [3] algorithm is a path-based method that optimized multicasting destination nodes order and made use of adaptive routing for multicast packets in the network.The Virtual Circuit Tree Multicasting (VCTM) [4] is a representative tree-based multicast routing method.This routing algorithm sends a setup packet to build a multicast tree before sending the multicast packet.Being similar as the VCTM multicast algorithm, the two-phase multicast tree algorithm is proposed in [5] which consumes less power than the VCTM algorithm.The Recursive Partitioning Multicast (RPM) [14] divided the network into several partitions, the multicast packet selects intermediate nodes to replicate, minimizes the packet replication time.However, the above approaches [3,4,5,14] cannot support multicasting for irregular networks.The bLBDR routing [15] supports multicasting in irregular sub-networks, but this scheme is routing on the buffer router.

Shaojun Wei
The Department of Electronic Systems, Tsinghua University, Beijing, China ABSTRACT: In this paper, the authors proposed two multicast routing algorithms for irregular 3D mesh-based Bufferless Interconnection Network.The DRM-1 scheme sends multicast through a non-deterministic path.The DRM-2 scheme broke multicast into multiple single packets.Compared with the DRM-1 multicast routing algorithm, DRM-2 scheme with four different synthetic workloads acquires high system performance.In the open literature, there has been no work addressing bufferless multicasting in irregular 3D NoC systems.Even multicast routing algorithms for 3D bufferless NoCs are very rare.Feng [16] proposed three bufferless multicast routing algorithms for 2D general mesh NoC architecture.In this paper, two irregular 3D bufferless NoCs oriented multicast strategy are first proposed: the deflection routing based multicast scheme one (DRM-1) and the deflection routing based multicast scheme two (DRM-2).

PROPOSED TWO MULTICAST SCHEMES FOR 3D IRREGULAR ARCHITECTURE 2.1 3D bufferless hybrid router architecture
The bufferless NoC architecture in this paper extends a 2D mesh topology.Figure 2 shows the bufferless router architecture.For 3D NoC mesh architecture, there are 7 input/output ports in each router.Each input port has only one input register, so the packet is not buffered in the router.The router adopts the deflection routing to route packets.When two or more packets compete for a common optimal port through which leads to a shortest path to the destination, only one packet can gain the optimal port, other packet(s) will be deflected to a non-optimal port.The port allocator will sorted arriving packets in order to limit the number of misrouting packets to avoid livelock.The Priority Sort module accords the number of hops the packet has been routed in the network to acquire the optimal port, which means the most hops of the packet have the highest priority.

Routing strategy
Our proposed two multicast algorithms for the irregular 3D bufferless NoC architecture are DRM-1 and DRM-2.DRM-1 scheme firstly collects the destination set into a group, then computes destination nodes distance to the source based on the Manhattan distance.This algorithm is a non-deterministic path-based multicast scheme.However, being different from the buffer path-based multicast, the multicast packet will be along a non-deterministic path and rout to each destination.As a multicast packet arrives at a router, the router always selects a destination with the minimum Manhattan distance to the current router from the destination nodes.Due to the packet would be deflected away from the shortest path to the destination, the best destination nodes in the multicast will change dynamically during the routing process.Figure 1 shows an example of multicasting using a same scenario for irregular 3D mesh NoC architecture using DRM-1 multicast routing algorithm.In this example, it is assumed that for irregular 3D mesh NoC architecture, the source node is at node 16 where the destination set is D= {4, 11, 22}.
Src_addr field (18 bits): It denotes the relative address to the destination node (6 bits for row addresses, 6 bits for column addresses, 6 bits for layer addresses).Dst_addr field: using the bit string encoding, the bit number is based on the number node in the architecture.A bit of '1' in the string means the corresponding nodes is the destination node.
Hop counter field (11 bits): Record the number of hops that the packet has been routed.
Payload field: The payload has 80 bits, which can be extended to contain more bits for different application requirements.
In multicast routing, when a multicast packet arrives to a destination node, the bit in the destination bit string corresponding to that destination node will be reset to '0', and the message is copied and sent together with its new header address to the next destination node.Our proposed routing algorithm will be shown in the next subsection.The hop counter field of the packet will add 1.Both of them will be completed in Figure 4.

DEADLOCK AND LIVELOCK AVOIDANCE
The deflection routing is inherently deadlock-free due to the fact that packets never have to wait in a router.However, when a packet did not acquire the optimal port, it will move further away from the destination.Thus, livelock must be avoided by limiting the number of misrouting.In the multicast routing algorithm, the multicast packets are prioritized based on its age that the number of hops already routed in the network.The age-based priority mechanism can make the oldest packet firstly win the link arbitration and go directly to its destination.Once the oldest packet reaches its destination, another packet becomes the oldest.For our 3D mesh multicast routing algorithm, switch arbiter decides on only one top priority multicast packet that can acquire the optimal port.Through it to the destination node in a cycle, the other multicast packets are deflected in the 3D plane based on the age.Thus livelock can be avoided.

PERFORMANCE EVALUATION
We evaluate the performance of the proposed multicasting mechanism for the irregular 3D mesh NoC interconnection network.Nostrum [17] router as a baseline router structure is a cycle-accurate NoC simulator developed in VHDL.The arbitration scheme for the switch allocator is age-based.
The performance of the network using the packet injection rate and latency curves function was evaluated.The packet latency which is the time duration from the packet is created at the source node to the packet which is delivered to the destination node.To perform the simulations, a packet generator is used to produce packets and uses a FIFO to buffer the packets due to the fact that there is no free output port to route the packet.A combination of multicast (20%) and unicast (80%) traffic was used.For the unicast portion of the traffic, we use four traffic patterns: uniform random, transpose, bit complement, bit reverse, shuffle and tornado.In uniform random traffic, each resource node sends packets randomly to other nodes.For bit complement traffic, the four-bit source node ID {si | i [0,3]} sends packets to destination{ si | i [0,3]}.For transpose traffic, resource node is (x, y, z) (! (x=y=z)) sends package to the destination is (z, y, x).If the four-bit source address is {s3, s2, s1, s0}, the bit reverse traffic destination address is {s0, s1, s2, s3}.If the four-bit source address is {s 3 , s 2 , s 1 , s 0 }, the destination address for shuffle traffic is {s 2 , s 1 , s 0 , s 3 }.For tornado traffic, each (radix-k) digit of the destination address Dx is a function of a digit Sx of the source address, which is Dx = Sx+ (k/2-1) mod k.For multicast packet, the destination positions are uniformly distributed.
In our experiment, the number of destination nodes has been set to 8. The average packet latency curves for uniform random (that is, 80% uniform random

Web of Conferences MATEC
unicast plus 20% uniform multicast), transpose, bit complement and bit reverse are shown in Figure 7.It can be observed for all four traffic patterns that the DRM-2 multicast scheme achieves less average latency.

CONCLUSION AND FUTURE WORK
In this paper, we propose two multicast routing strategy for the irregular 3D mesh architecture.Our simulations with different traffic profiles showed that the DRM-2 multicast routing algorithm can achieve significant performance improvements over the DRM-1 multicast routing algorithm.In the future, our work will be extended to simulate it using a set of realistic workloads and introducing faulty links for proposed architecture.We will also measure area and power consumption for the architecture.

Figure 2 .
Figure 2. 3D bufferless hybrid router architecture (a), Node 16 sends a multicast packet to nodes 4, 11 and 22. Node 22 is chosen at the first best candidate since it has the minimum Manhattan distance 2 to the source node 16.After the packet is sent to node 22, node 11 is chosen as the second best candidate.Without contention, the multicast minimum latency is equal to 7 hops.The path shown in Figure3(a) is not the only one path since the packet may be deflected due to contention.

Figure 3 .
Figure 3. Example of two multicast routing where the source is at node 16

Figure 7 .
Figure 7. Latency versus average packet inject rate with 8 multicast nodes