By Admin on

Design of Smart  Power-Saving Architecture for Network on Chip:

                        In network-on-chip (NoC), the data transferring by virtual channels can avoid the issue of data loss and deadlock. Many virtual channels on one input or output port in router are included. However, the router includes five I/O ports, and then the power issue is very important in virtual channels. In this paper, a novel architecture, namely, Smart Power-Saving (SPS), for low power consumption and low area in virtual channels of NoC is proposed. The SPS architecture can accord different environmental factors to dynamically save power and optimization area in NoC. Comparison with related works, the new proposed method reduces 37.31%, 45.79%, and 19.26% on power consumption and reduces 49.4%, 25.5% and 14.4% on area, respectively.

1. Introduction:

  • In recent years, the 3-dimensional IC and TSV (Through-Silicon Via) technology are proposed to solve area issues. The 3-dimensional IC of Intel Ivy Bridge processor and the 16-core multicore architecture can be implemented in 22 nm. Therefore, the multicore and heterogeneous systems are popular research in SoC (system-on-chip). These architectures require high throughput and performance to transfer data in a multicore SoC. Therefore, the NoC (network-on-chip) can be proposed to solve this requirement, but it derived new problems such as power consumption and area.
  • The NoC architecture [1] consists of processing element (PE), network interface (NI), router, and topology which is shown in Figure 1. The PEs transfer information to NI, the NI packages the information into flits then passes to routers. The routers have difference corner router (CR), edge router (ER), and router (R); the CR, ER and R has three, four, and five I/O ports to access information then each port includes virtual channels. Router includes transmission channel, routing computation (RC), virtual channel arbiter (VA), switch arbiter (SA), and crossbar (XBAR). The flits includes header, body, and tail; the header flit has PE priority, source address, destination address, and so forth. The RC uses header flit and routing algorithms to find transmission path. VA uses two stages arbitration to select most high priority packet transmission and then will sign transmission channel. SA uses two stages arbitration and will select most body flits into XBAR to transmit. The VA will be working when the packet is arrival. The SA operation when the flit is arrival. The tail flit represents last flit, and then the router will unregister transmission channel. The router topology includes mesh, star, and fat tree.

Figure  1: NoC architecture.

2. Power Issue with  Virtual Channels:

  • The multicore architecture and big data communication are more popular in next generation. Traditional communication technologies cannot meet a large amount of traffic on multicore and heterogeneous chip. The NoC can solve this issue. It uses network transmission method to make the difference core communication at same time. The NoC can solve the communication issue but the big data access enhances the power consumption.
  • The router composed of the arbitration and transmission unit [16] is illustrated in Figure 2. The arbitration unit selects the highest priority packet sent to next router. The arbitration unit includes routing computation (RC), VC arbiter (VA), and switch arbiter (SA). The RC is the calculation of routing paths and priorities. The VA contains a number of two-stage arbitrations to select packet and sign up VCs. First stage selects the local highest priority packet from input VCs to crossbar and signs up VCs. Second stage selects the global highest priority packet from input crossbar to output VCs and signs up VCs. The SA also contains a number of two-stage arbitrations to select flits for transmission. First stage selects the local highest priority flits from input VCs to crossbar. Second stage selects the global highest priority flits from input crossbar to output VCs. The VA executed prepacket and the SA executed preflits.

Figure 2: Router architecture with NoC.

  • The router with transmission unit is illustrated in Figure 3. In this unit, it includes VCs to access large packet from input physical channel to output physical channel. A power consumption calculation to VCs is shown in (1). The variable of represents the number of access packets or flits in VCs. The variable of represents access frequency in VCs. The variable of represents capacitance and represents voltage in VCs. Nicopoulos et al. [2] and Katabami et al. [17] proposed clock-gating to solve this issue.

Figure 3: Transmission unit.

  • In this paper, we proposed a dynamic control of each virtual channel clock in different transmission environments. Whether packet transfer is complete, the SPS can effectively reduce the power consumption and does not affect the transmission performance.

3. Router and  Topology with SPS:

3.1. Relation of  Topology and Router:

The relation of topology and router is illustrated in Figure 4. The router uses different transmission mode with topologies. For example, the mesh uses the - routing to transmit. The - routing flow chart for 2 × 2 meshes is illustrated in Figure 5, when the MSB of destination router address () is equal to the MSB of current router address () and if the LSB of router addresses ( and ) is equal then it means the flits arrival. Otherwise, the - routing algorithm includes two-stage flows. In stage one, the flits are sent until that the equals of on the -axis routers. In stage two, the flits are sent to the destination by -axis routers. The virtual channel will be initialed under packet transmit on two routers, which procedure is shown on Algorithm 1.

Sign up    Algorithm
Input: and .
(1)   while (flits arrival) do
if ( is header and is free channel)  (2)
{sign up the channel and select the channel  (3)
to output}   
else if ( is body and = )  (4)
{select the channel to output}  (5)
else if ( is tail and = )  (6)
{clear the channel and select the channel to output;}  (7)
else  (8)
{read back flit to virtual channel}  (9)
(10) end while

Algorithm 1: Channel sign up algorithm.

Figure 4: Topology and router relation with SPS.

Figure 5: - routing flow chart.

  • The control method of arbiter architecture uses different transmission mode to design. The VC arbiter and switch bar are by the topology and priority to design the routing computation unit. Algorithm 2 constructs VC two stages arbitration of prepackets. Stage 1 decided high priority packet into crossbar from local VCs (input VCs) of each packet at lines 3 to 4 and lines 8 to 10. Stage 2 decided most important packet to transmission from global VCs (output VCs) of each packet at lines 5 to 6 and lines 11 to 13.
Virtual    channel arbitration
Input: header flits
/*Control    signal enable*/
(1)   while (header flits) do
(2)   use lottery arbitration to select local and global highest priority flits
if (local)  (3)
 = local input virtual channel address}  (4)
if (global)  (5)
 = global input virtual channel address}  (6)
(7)   end while
/*Channel    switch*/
(8)   Case
(9)    = local packet of
(10) end case
(11) Case
(12) = global packet of
(13) end case

Algorithm 2: VC arbitration algorithm.

  • Algorithm 3 constructs VC two stages arbitration of preflits. Stage 1 decided high priority flit into crossbar from local VCs (input VCs) of each flit at lines 3 to 4 and lines 8 to 10. Stage 2 decided most important flit to transmit from global VCs (output VCs) of each flits at lines 5 to 6 and lines 11 to 13.
Switch    arbitration
Input: body and tail flits
/*Control    signal enable*/
(1)   while (body or tail flits) do
(2)   use channel sign up register to select local and global highest priority flits
if (local)  (3)
 = local input virtual channel address}  (4)
if (global)  (5)
 = global input virtual channel address}  (6)
(7)   end while
/*Channel    switch*/
(8)   Case
(9)    = local packet of
(10) end case
(11)  Case
(12) = global packet of
(13)  end case

Algorithm 3: Switch arbitration algorithm.

  • The router includes four directions to connect other routers and one local physical channel to connect PE in transmission channel architecture. There have been VCs of each physical channel without local physical channel. The switch bar support for transmission the most important packet to output channel. The SPS controls each VCs power consumption when the channel status changes. The SPS architecture is introduced in next section.

3.2. Topology  Architecture:

  • The topology is definition of the packet transmission path between router and link. The router connection topology architecture is shown in Figure 6; they include star, mesh, ring, and tree topologies. The RC algorithms depend on topology architecture in arbitration unit. The VA and SA algorithms depend on packet priority in arbitration unit. In this paper, the topology is the 2 × 2 mesh, the RC algorithm is - routing, and the VA and SA algorithms are lottery [18].

  •  (a) Star

(b) Mesh

(c) Ring


  • The router that connects with PE is shown in Figure 7; so that the PE and router access information, use the network interface (NI). It handles the information between router and PE. The NI includes two level designs [19] as shown in Figure 8. It contains three modules to meet the specifications of the different layers. The shell module needs to meet IP specification. The kernel module needs to meet the NoC topology specification.

Figure 7: Router connection with PE.

Figure 8: NI breakdown into Shell, Kernel, and interface.

3.3. Flits with  Router Architecture:

  • The flit specification with router is shown in Figure 9; the flit type of 2-bit 00 represents the one packet; this flit type does not sign up VCs. The 2-bit 01 represents the header flit which includes routing information and address; this flit type always is determined in sign up channel. The 2-bit 10 represents the body flit which includes transmission information; this flit payload records the segment packet. The 2-bit 11 represent the tailas last transmission information; this flit not only records the last segment packet but also cleans the VCs.

Figure 9: Flits type of router.

4. SPS with Router  Design

The VC that contains many slots to access data led to extra power consumption. In this paper, we propose SPS architecture to reduce the power consumption.

4.1. Router with SPS  Architecture

  • The proposed router with SPS architecture is illustrated in Figure 10. The physical channel (PC) is used to connect other routers and access information. The input VCs (IVC) is used to store information from PCs. It always is designed by FIFO or other sequential logic. The arbiter decides the flits priority to control input switch logic (ISL) and output switch logic (OSL) to transmit flits. It includes RC, VA, and SA. The crossbar (CR) connects IVC to OVC, the switch signal form arbiter. The output VCs (OVC) store information from CR. The proposed SPS uses the transmission channel status to dynamic control IVC and OVC clock in essential operating.

Figure 10: Router with SPS architecture.

  • The VCs with SPS architecture are illustrated in Figure 11. It controls system clock into I/O VC to reduce power consumption. In this architecture, the VC contains 0 to slots to access data.

Figure 11: VCs with SPS architecture.

4.2. Design of SPS  Control Timimg

  • The VCs access timing diagrams of SPS architecture are illustrated in Figure 12. The Clock Block A indicates that the VCs have no information to transmit. The Clock Block B indicates that the VCs are writing information. The Clock Block C indicates that the data in VCs are waiting to transmit. Our analysis for unused clock-gating architecture is shown in (2). The slots access information of power consumption is denoted by . The slot content full and empty of power consumption are denoted by and , respectively. The is power consumption except for , , and . The unused clock-gating architecture does not control clock for sequential logic in VCs. Therefore, the logic will generate power consumption in high transmission structure.

Figure 12: VCs power with clock diagram.

4.3. Design of SPS:

  • The first CFSM includes initial, empty, full, and waiting status. Initial status: when the VC is reset, the structure is into the initial status until the flit arrive. Empty       status: when the user resets the VCs or the flits transport to next storage unit, the structure is into this status. Full status: the store flit in VC is full. Waiting status: When the user resest the VCs or the store flit is complete.
  • The VCs with SPS algorithm is illustrated in Algorithm 4. In line 3, the VCs will initialize the VCs count and flags. The VCs will access flits to change VCs count when channel packet or arbiter signal arrive at line 4 to 9. When the VCs count can be changed, then the VCs flag will be changed at line 10 to 17.

Figure 13: CFSM of SPS with VCs.

VCs    with SPS Algorithm
Input: VCs clock, channel packet, arbiter signal and reset.
Output: channel packet, channel status
(1)   VCcount is integer and range is 1 ≤ VCcount
(2)   VCflag includes full flag and empty flag
(3)   initial VCcount and VCflag
(4)   while (channel packet or arbiter signal be arrival) do
if (channel packet be arrival and full flag != 1)  (5)
{VC  (6)count = VCcount + 1 and packet store in VCs}
if (arbiter signal be arrival and empty flag != 1)  (7)
{VC  (8)count = and packet be read from VCs}
(9)   end while
(10) while (VCcount be change) do
 (11) if (VCcount = )
 (12){assign full flag to 1}
 (13)else if (VCcount = 1)
 (14){assign empty flag to 1}
 (16){assign full flag and empty flag to 0}
(17) end while

Algorithm 4: VCs with SPS algorithm.

SPS    Algorithm
Input: system clock, channel packet, arbiter signal and reset.
Output: VCs clock
(1)   VCgroup is VCs group of 4 direction port
(2)   VCflag includes full flag and empty flag
(3)   Initial VCs clock and access VCs count and stage flag
(4)   follow LCR to arrangement all slots priority;
(5)    is VCs clock of each VCgroup //where
(6)   Example VCgroup = East port
(7)   initial = 0; //where
(8)   while (virtual channel be write) do
if (VC  (9)flag = empty)
 (10)   = system clock}
 (11) If (VCflag = full flag)
 (12)   = 0 and = system clock}
(13) end while
(14) while (virtual channel be read) do
 (15)if (empty flag = 1)
 (16)   = 0}
(17) end while

Algorithm 5: SPS algorithm.

5. Experimental  Results

  • In this section, we proposed autotesting architect for router with SPS. This architect includes four modules of autotesting. The first module is test-vector generator (TVG); the FSM is illustrated in Figure 14; the Idle status is waiting for the requirement of start testing, when the requirement arrives, TVG then will change status from idle to generator. When the requirement is cancelled, the status be changed from generator to idle. The generator status will generate test-vector and compare-vector; this is illustrated in Figure 15; we use language to generate lottery arbitration [18] in test-vector at control step 1. We use HDL to design the conventional router to generate the compare-vector and the input pattern from the test-vector at control step 2. When the compare-vector and test-vector functions are complete then the status will be changed from generator to vector output (VO) at control step 3. The VO status will transform test-vector and compare-vector to Xilinx memory IP files, through memory to control data output to test and compare only one clock.

Figure 14: Test-vector generator (TVG) module FSM.

Figure 15: Generator status control and data flow graphs.

6. Conclusions

  • The Smart Power-Saving (SPS) architecture for network-on-chip was presented. A clock control circuit and SPS algorithm are demonstrated to reduce the power consumption on the NoC architecture. From experimental results, the proposed SPS architecture is more efficient to reduce the power consumption than IntelliBuffer [1], adaptive data compression [3], and buffer clock-gating [10] in the NoC architecture.

SPIRO Google Plus