# Power Aware Based on Voltage Islands for X-Clock Tree Construction

# Chia-Chun Tsai

Abstract—This paper proposes an algorithm of power aware based on voltage islands for constructing an X-clock tree with considering double via insertion. Different voltages are assigned for multiple voltage islands for power aware to reduce total power consumption under the clock delay control. Higher rate of double via insertion is made for via-effect avoidance and reliability. We first partition a clock network to be the number of voltage islands, such as L-type or T-type, and construct the X-clock tree for each voltage island with double via insertion. Then, we combine these X-clock trees based on a well-defined connection with inserted level shifters for minimizing the power. The delay effect due to the total number of inserted double vias is also accounted. Ten benchmarks are tested for our approach. Compared with single voltage island, experimental results show that our X-clock tree based on multi-voltage islands can save up to 21.58%, 4.75%, and 33.8% in power, delay, and running time, respectively.

*Index Terms*—X-clock tree, voltage island, clock delay, power consumption.

#### I. INTRODUCTION

Reducing the power consumption is the critical issue in contemporary chip design for nano-meter process technology. A SoC (system-on a chip) integrates a number of functional blocks and usually has multi-mode operations for different set of blocks that work at different time [1]. If all the blocks are supported with a uniform supplying voltage, the power always consumed the same during all the working time. To save the power, the voltage-island design methodology [2]-[4] assigns multiple supplying voltages to the functional blocks in a system for minimizing power consumption. For instance, the performance-critical blocks, e.g. processors, require the highest supplying voltage and other noncritical-based blocks, e.g. control logics and peripheral units, can operate at lower voltages.

Signals are often required to transmit among the functional blocks of voltage islands via various buses or a clock at different time in a chip. We need to warrantee that these signals can be correctly transferred among voltage islands. For transmitting a signal on the different voltage islands, a level shifter (LS) has to be inserted into the interconnection that transmits a signal from a low-voltage island to a high one because a circuit may suffer from excessive leakage energy when low voltage gates directly drive high voltage ones [1], [4]. In a contrast, a level shifter is not required when a signal is transferred from a high-voltage island to a low one.

Many works for voltage islands were addressed. Lee *et al*. [5] proposed the voltage-island partitioning and floorplanning

under the control for timing constraints. Dong and Goto [6] presented the floorplanning approach based on the multi-voltage and level-shifter driven. A global routing for multi-voltage islands based on power-driven approach [7] was proposed for the evaluation of power reduction. Many literatures [8]-[13] were concentrated on single voltage island for clock tree construction. Tsai et al. [14] proposed a clock tree construction on multi-mode multi-voltage islands. Based on the binary clustering approach, they inserted buffers and adjusted their locations to minimize the clock delay and skew for matching different modes. Su et al. [15] and Lin et al. [16] improved the above approach by replacing some inserted buffers with adjustable delay buffers (ADBs) for the well-defined control to the clock delay under boundary skew. Kim [17] expanded the clock tree synthesis to a 3D stacked IC. With the requirement of zero skew, the clock tree construction depends on the trade-off of TSVs and total wire length. The deferred layer embedding (DLE) is used for reducing the number of TSVs while the deferred merge embedding (DME) is employed for minimizing the total wire length. Chen et al. [18] proposed the 3D-IC clock tree that constructs the clock tree on ASIC layer to associate with the pre-defined clock network on platform layer. The TSVs on ASIC layer projected from the platform layer were controlled for minimizing the clock delay and skew. Lee et al. [19] extended the voltage islanding technique to apply for the concurrent optimization of power and temperature in 3D-stacked ICs. They adjusted the 2-layer 3D floorplanning to optimally partition several voltage islands for better hot stream for decreasing temperature.

In this paper, we propose an approach to construct an X-clock tree on multi-voltage islands. With the pre-defined partition for a clock network depending on the different supplying voltages, the clock tree for each voltage island can be individually constructed and they are then combined with associating level shifters to get the best one in trade-off of power and delay under zero skew. Compared with the original clock network (one island) from the experiments of proposed approach, we proved that our power consumption always get larger reduction under the reasonable clock delay.

The reminder of this paper is organized as follows. Section II describes the problem formulation. Section III states the power and delay estimation for a wiring model with level shifter and double via structures. Section IV presents the proposed algorithm of X-clock tree construction with considering double via insertion for power minimization. Experimental results running on benchmarks are shown in Section V followed by a conclusion of this work in Section VI.

#### II. PROBLEM FORMULATION

For a clock network problem with multiple islands using different supply voltages, generally, there are two straight

This work is supported by MOST 103-2221-E-343-005 and NHU-104 project.

**Chia-Chun Tsai**, Department of Computer Science and Information Engineering, Nanhua University, Dalin Township, Chiayi County, Taiwan, R.O.C.

approaches. The first one is to complete the whole clock tree routing, then insert level shifters in post-refinement step for minimizing the clock delay and compensating to be zero skew. Another approach is to reduce a routing problem to be a number of sub-problems. The approach first divides it to be multi-voltage islands and routes the subclock tree for each voltage island, then combine all the subclock trees to be a complete one with level-shifter insertion.

Fig. 1 shows an example for the explanation of above two approaches. As shown in Fig. 1(a), a chip consists of three voltage islands and twelve clock sinks. Islands 1, 2, and 3 respectively operate at 1.0 V, 1.1 V, and 1.2 V. Fig. 1(b) shows the first approach that the clock tree is directly constructed to connect twelve sinks located on three voltage islands. Six level shifters are required for interconnecting from the low-voltage island (Island 1) to other two high-voltage islands (Islands 2 and 3). Fig. 1(c) shows the second approach that the clock tree combines three subclock trees (*CLK*<sub>1</sub>, *CLK*<sub>2</sub>, and *CLK*<sub>3</sub>) with two level shifters those respectively connect Island 1 (1.0 V) to Island 2 (1.1 V) and Island 3 (1.2 V). Compared with the first approach shown in Fig. 1(b), the second one can save four level shifters.



Fig. 1. Clock routings on (a) three voltage islands constructed from (b) approach A with six LSs and (c) approach B with two LSs.

We employ the second approach to construct a clock tree on multi-voltage islands. As shown in Fig. 1(c), the system clock source adopts the voltage of 1.0 V of Island 1 and two level shifters are required for driving Islands 2 and 3. That is, two subclocks CLK<sub>2</sub> and CLK<sub>3</sub> are first combined and then  $CLK_1$  is integrated with two level shifters and them. On the other hand, we may select the voltage of 1.1 V of Island 2 as the supply voltage of system clock source, such that the subclocks CLK<sub>1</sub> and CLK<sub>3</sub> are first combined and a level shifter is inserted when  $CLK_2$  is integrated with them. Based on the above discussion, we can estimate different combination of subclocks that constructs whole-chip clock routing with distinct inserted level shifters, power consumption, and clock delay. Experimentally, power consumption and clock delay are always trade off, that is, more power consumption more clock speed.

Due to the advanced lithography technologies, metal wires in a chip can be routed with arbitrary angles, especially for diagonal ( $\pm 45^{\circ}$ ) wires assigned with metal layers 3 and 4. X-architecture combines diagonal, horizontal, and vertical wires to respectively achieve improvements of 10%, 20%, 30%, and 20% in terms of chip performance, power consumption, die cost, and wirelength compared with Manhattan-architecture [20]. Our clock tree construction considers this X-architecture.

Moreover, redundant-via insertion (RVI) is a well-known and effective method highly recommended by semiconductor foundries for reducing failed vias. As shown in Fig. 2(a), if via-open defect ( $v_2$ ) exists and the clock signal cannot be delivered to the clock sink  $s_i$ . RVI is also called double via insertion (DVI) because a redundant via  $r_i$  is inserted next to a single via  $v_i$  to form a double via structure, as shown in Fig. 2(b). When using the DVI method, we should follow the via width  $(v_w)$  and rule space (*RS*) to avoid creating any design-rule violation, as shown in Fig. 2(c). The double via insertion is also considered in our clock tree construction for improving yield and reliability.



Fig. 2. (a) The cross-section view of a double via. (b) The clock signal cannot be delivered to clock sink  $s_i$  if a via  $(v_2)$  defects. (c) Design rules are for single and redundant vias.

In summary, our problem of clock routing on multi-voltage islands is defined as follows.

Given a set of clock sinks on a set of multi-voltage islands, the objective is to construct a zero-skew X-clock tree with considering double via insertion for power minimization.

Currently, no any papers contributed the clock tree construction based on X-architecture except our published PMXF approach of Tsai *et al.* [11] for constructing an X-clock tree. But, the approach just considered on the single voltage island with double via insertion. The purpose of this paper is to extend the work on multiple voltage islands to minimize the power consumption under the reasonable clock delay.

#### **III. POWER AND DELAY ESTIMATION**

To evaluate the power consumption of a clock tree based on single voltage island, the calculation of total power should include the equivalent wire capacitances of interconnections, input capacitances of inserted level shifters, and loading capacitances of multi-voltage islands. Thus, the total power  $P_{total}$  is formulated as follows.

$$P_{total} = \sum_{\forall e_i} C_{load,i} F_{clk} V_{dd}^2 , \qquad (1)$$

where  $C_{load,i}$ ,  $F_{clk}$ , and  $V_{dd}$  are the capacitance of the sink *i* (or node *i*), clock frequency, and supplying voltage, respectively.  $e_i$  is defined as the set of clock tree edges those are along the path from the clock root to the sink *i*.

The fitted Elmore delay (FED) model [21] is widely used for the wire delay calculation of a clock tree construction. A wire *j* with the width  $w_j$  and length  $l_j$  based on the FED model is shown in Fig. 3(a), where *r*,  $c_a$ , and  $c_j$  are the sheet resistance, unit area capacitance, and fringing capacitance, respectively. The delay of the wire *j* with a loading capacitance  $C_{Li}$  at the sink *i* is formulated as follows.

$$Delay(i) = (rl_j / w_j) \left[ 0.5(Dc_a w_j + Ec_f) l_j + FC_{L,i} \right], \quad (2)$$

where coefficients *D*, *E*, and *F* are obtained by using the curve fitting techniques [21].

A level shifter (LS) is used to insert into the interface from a low-voltage island to a high-voltage island. As reported in [1], a level shifter can consume the power and affect the delay. Fig. 3(b) shows that the equivalent circuit of a level shifter contains the intrinsic delay  $T_{LS}$ , input capacitance  $c_{LS}$ , and output resistance  $r_{LS}$ . When a level shifter drives the wire j

#### International Journal of Engineering and Applied Sciences (IJEAS) ISSN: 2394-3661, Volume-2, Issue-11, November 2015

with a loading capacitance  $C_{L,i}$  shown in Fig. 3(a), the delay is formulated as follows.

$$\operatorname{Delay}(i) = T_{LS} + (r_{LS} + rl_j / w_j) \left[ 0.5(Dc_a w_j + Ec_f) l_j + FC_{L,i} \right]. \quad (3)$$

(b)

Fig. 3. Equivalent circuits of (a) a wire *j* and (b) a level shifter.

(a)

For a double via, the inserted redundant via is always parallel to the single via, as shown in Fig. 4(a). Hence, the resistance and capacitance of a double via are half and double of a single via, respectively. Fig. 4(b) shows the equivalent circuit of a double via, where k is two [20]. The delay calculation is referred the same to (2).



Fig. 4. (a) The cross-section view of a double via and (b) its equivalent circuit.

# IV. ALGORITHM OF POWER AWARE BASED ON VOLTAGE ISLANDS FOR X-CLOCK TREE CONSTRUCTION

Depending on multi-voltage islands, each island can operate at a specified supplying voltage such that the total power consumption of a chip can be reduced. We propose an algorithm of multi-voltage-island-based X-clock tree construction with considering double via insertion (MuVIX-DVI) to integrate all the voltage-island X-clock trees for power minimization. Fig. 5 shows the proposed MuVIX-DVI algorithm.

| Algorithm: <i>MuVIX-DVI</i> (Multi-voltage-island-based X-clock tree construction with double via insertion) |  |  |  |  |  |  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|--|--|--|--|
| Input: A set of voltage islands VI and a set of supplying voltages SV                                        |  |  |  |  |  |  |  |  |  |  |
| Output: A multi-voltage-island-based X-clock tree with considering double via                                |  |  |  |  |  |  |  |  |  |  |
| insertion for power minimization.                                                                            |  |  |  |  |  |  |  |  |  |  |
| 1 $SV_{sys} \leftarrow$ Determine the supplying voltage of system clock source.                              |  |  |  |  |  |  |  |  |  |  |
| 2 $PMXF(VI)$ ; /*construct X-clock tree for each voltage island $\in VI$ .*/                                 |  |  |  |  |  |  |  |  |  |  |
| 3 <i>DVI-X</i> ; /* Insert double via into X-clock tree for each voltage island $\in$ <i>VI</i> .*/          |  |  |  |  |  |  |  |  |  |  |
| 4 Let each constructed X-clock tree be a leaf-node.                                                          |  |  |  |  |  |  |  |  |  |  |
| 5 $CS(VI) \leftarrow$ Obtain the connection sequences of VI based on $SV_{sys}$ .                            |  |  |  |  |  |  |  |  |  |  |
| 6 for each voltage island $vi \in CS(VI)$                                                                    |  |  |  |  |  |  |  |  |  |  |
| 7 { $CS(LS) \leftarrow Obtain the connection sequences for level-shifter insertion.$                         |  |  |  |  |  |  |  |  |  |  |
| 8 do                                                                                                         |  |  |  |  |  |  |  |  |  |  |
| 9 { Make combination for each $ls \in CS(LS)$ }                                                              |  |  |  |  |  |  |  |  |  |  |
| 10 while ( <i>power</i> is improved)                                                                         |  |  |  |  |  |  |  |  |  |  |
| 11 }                                                                                                         |  |  |  |  |  |  |  |  |  |  |



In the algorithm, given a set of multi-voltage islands, *VI*, and a set of supplying voltages, *SV*, the supplying voltage denoted as *SVsys* for the system clock source is first determined. Then, our one algorithm PMXF [11] constructs the X-clock tree for each voltage island belonging to VI and another algorithm DVI-X [12] is applied to the tree for inserting double via to improve yield and reliability. Third, we mark these constructed X-clock trees to be leaf nodes. To

integrate these leaf nodes of island-based X-clock trees for power minimization, all the connection sequences with different voltage islands can be combined as possible, denoted as CS(VI). For a connection sequence  $vi \in CS(VI)$ associated with the *SVsys* of these islands, level shifters are required to insert into the interface from low-to-high voltage islands. The combination of connection sequences with level-shifter insertion is denoted as CS(LS). After that, we estimate the power consumption for each connection sequence  $ls \in CS(LS)$ . Finally, we can get a multi-voltage-island-based X-clock tree with the well-defined connection sequence for minimum power consumption.

#### A. Determination of System Clock Supplying Voltage

Before constructing the system clock tree which connects all the island-based subclock trees, we first define the supplying voltage  $SV_{sys}$  for the system clock source as follows.

$$SV_{sys} = \min_{i \in \mathcal{S}} SV_k \tag{4}$$

For each voltage island  $vi_k \in VI$ ,  $vi_k$  can operate at several supplying voltages  $SV_k = \{sv_1, sv_2, ...\}$ . In this work, we set the lowest supplying voltage of all the voltage islands as the  $SV_{sys}$  for the expectation of minimum power consumption, but some level shifters should be required for the interfaces from low-to-high voltage islands.

# B. PMXF: X-Clock Tree Construction on Single Voltage Island

We apply our PMXF algorithm [11] to construct an X-clock tree for each voltage island. Fig. 6 shows the outline of PMXF algorithm and the detailed information is referred in [11] due to limited space. An example of X-clock tree construction with 8 sinks, s1~s8, using PMXF algorithm is shown in Fig. 7.

**Algorithm:** *PMXF* (*Pattern-Matching X-architecture clock routing with X-Flip*) Input: A set of n clock sinks S and an X-pattern library for a pair of points Output: An X-architecture clock tree with zero-skew and minimal delay 1 for  $h \leftarrow 0$  to  $\lceil \log_2 n \rceil$ while( un-paired-node in  $S_h$ ) /\* Check  $|S_h| > 1$  or not.\*/  $(s_i, s_i) \leftarrow DPPG(S_b)$ ; /\*Determine a pair of sinks/points with GMA.\*/  $PMX(s_i, s_j)$ ; /\*Select the X-pattern of  $s_i$  and  $s_j$ .\*/  $P_t \leftarrow DCTP(s_i, s_j, PTN(s_i, s_j), x);$ /\*Obtain tapping point  $P_t$  and zero-skew ratio x of  $(s_i, s_j)$ .\*/ X-Flip( $s_i, s_j$ ); /\*Use X-Flip technique to reduce wire length of ( $s_i, s_j$ ).\*/ if (x<0) WireSizing( $s_i$ ,  $s_j$ ); /\*Size wire width  $w_r$ \*/ if (x>1) WireSizing( $s_i$ ,  $s_i$ ); /\*Size wire width  $w_i$ , \*/ Insert( $S_{h+1}$ ,  $P_t$ ); /\*Insert  $P_t$  into the set of points at the (h+1)<sup>th</sup> level.\*/ Fig. 6. The PMXF algorithm constructs an X-clock tree for each voltage island. Metal 1 Metal 2 / Metal 3 Metal 4



Fig. 7. An example of X-clock tree with 8 sinks.

#### C. DVI-X: Double Via Insertion for X-Clock Tree

Our DVI-X algorithm [12] is suitable for double via insertion to an X-clock tree. Fig. 8 shows the outline of DVI-X algorithm and the detailed information is referred in [12] due to limited space. Examples of X-clock tree partial layout before and after double via insertion are respectively shown in Figs. 9(a) and 9(b).

| Algorithm: DVI-X (Double-via insertion for X-architecture clock routing)                                            |
|---------------------------------------------------------------------------------------------------------------------|
| Input: An X-architecture clock routing with the sets of wires W and vias V                                          |
| Output: An X-architecture clock routing with double-via insertion                                                   |
| 1 $R = ARVC(V, W)$ ; /* Arrange redundant-via candidates (RVCs). */                                                 |
| 2 $P = Partition(V, R)$ ; /* Make partitions with the intersecting RVCs. */                                         |
| 3 for each $p_i(V_i, R_i) \in P$                                                                                    |
| 4 $G_{b,i} = CBG(p_i);$ /* Construct bipartite graph $G_{b,i}$ of $p_i$ , where $G_{b,i} = (V_i \cup R_i, E_i).$ */ |
| 5 $G_{c,i} = CCG(R_i); /*$ Construct the conflict graph $G_{c,i}$ of $R_i$ , where $G_{c,i} = (R_i, E_{c,i}). */$   |
| 6 $C_{max,i} = MCQ(G_{c,i});$ /* Obtain the set of maximal cliques $C_{max,i}$ of $G_{c,i}$ . */                    |

7  $M_i = MBGC(G_{b,i}, C_{max,i});$  /\* Match  $G_{b,i}$  and  $C_{max,i}$  to insert double vias into  $p_i$  \*/

Fig. 8. The DVI-X algorithm is for inserting double via into X-clock tree.



Fig. 9. An example of X-clock partial layout, the layout (b) is after inserting double vias into the layout (a).

#### D. Combination of X-Clock Trees among Voltage Islands

From the above two subsections 4.*B* and 4.*C*, given a chip with several voltage islands as shown in Fig. 10(a), PMXF first constructs the X-clock subtree for each voltage island and then DVI-X inserts double via into the subtree. Fig. 10(b) shows the X-clock subtree for Island 2. A system clock source enters the clock source CLK<sub>2</sub> of Island 2 to drive all the clock sinks synchronously. Here, we let its clock source CLK<sub>2</sub> be a leaf-node, denoted as Leaf-node<sub>2</sub>, and present its supplying voltages as  $SV_2 = \{sv_1, sv_2, ...\}$ . Similarly, Leaf-node<sub>3</sub> and  $SV_3$  respectively represent the clock source and supplying voltage of Island 3.



Fig. 10. (a) Given three voltage islands and (b) PMXF and DVI-X construct the X-clock subtree of Island 2 with double via insertion and (c) its subtree is labelled as a leaf node.

To construct a multi-voltage-island-based X-clock tree, we should know how to connect these islands with different supplying voltages to achieve minimum power consumption. Since the construction of X-clock tree is based on binary tree structure, the combination of connection sequences is k! if there is a number of k leaf nodes. For the three voltage islands shown in Fig. 10(a), they are labelled as *Leaf-node*<sub>1</sub> for Island 1, *Leaf-node*<sub>2</sub> for Island 2, and *Leaf-node*<sub>3</sub> for Island 3 with three supplying voltages 1.0 V, 1.1 V, and 1.2 V, respectively. Hence, there are six connection sequences (i.e., 3!), denoted as  $CS(VI) = \{vi_1, vi_2, vi_3, vi_4, vi_5, vi_6\}$ . For  $vi_1 \in CS(VI)$  as

shown in Fig. 11(a), *Leaf-node*<sub>1</sub> and *Leaf-node*<sub>2</sub> are connected first and then they are associated with *Leaf-node*<sub>3</sub> to complete the voltage-island-based X-clock tree. Fig. 11(b) shows the other connection sequence of  $vi_6 \in CS(VI)$ .



sixth one  $vi_6$ .

After determining the supplying voltage for the system clock  $SV_{sys}$  and the supplying voltage  $V_{dd}$  for each island, such as  $SV_{sys} = 1.0 \text{ V}$ ,  $V_{dd}1 = 1.0 \text{ V}$ ,  $V_{dd}2 = 1.1 \text{ V}$ , and  $V_{dd}3 = 1.2 \text{ V}$ , we can integrate a multi-voltage-island-based X-clock tree with combining three leaf-nodes and inserting the required level shifters. For  $vi_1 \in CS(VI)$  shown in Fig. 12(a), the  $SV_{sys}$ is 1.0 V due to Islands 1-3 respectively operate at 1.0 V, 1.1 V, and 1.2 V. The inserted level shifters  $LS_1$  and  $LS_2$  deliver the system clock source at 1.0 V to Leaf-node<sub>3</sub> and Leaf-node<sub>2</sub> at 1.2 V and 1.1 V, respectively. This is the connection sequence for  $vi_1$  with level-shifter insertion. On the other hand, Figs. 12(b) and 12(c) show the other two connection sequences for  $vi_6$  with different level-shifter insertion. In Fig. 12(c),  $LS_1$ delivers the system clock source at 1.0 V to *Leaf-node*<sub>2</sub> at 1.1 V and  $LS_2$  that converts the clock signal from 1.1 V to 1.2 V. Hence, each connection sequence of multi-voltage islands has at least one connection sequence with level-shifter insertion.



Fig. 12. The connection sequences with inserted level shifters for (a)  $vi_1$ , as well as, (b) and (c) for  $vi_6$ .

# E. Power and Delay Estimation with Inserted Level Shifters

When the clock signal is delivered from a voltage island operating at  $SV_{sys}$  to another island which supplying voltage is higher than  $SV_{sys}$ , a level shifter has to be inserted. Fig. 13(a) shows that Islands 1 and 2 operate at 1.0 V and 1.1 V, respectively. When two islands are connected, a level shifter is inserted and delivers the clock signal from the system clock source to the Leaf-node<sub>2</sub> of Island 2. To respectively calculate the clock delay from the system clock source to Leaf-node<sub>1</sub> and Leaf-node<sub>2</sub> with (2) and (3), Fig. 13(b) shows the equivalent model of Fig. 13(a) based on FED model.

#### International Journal of Engineering and Applied Sciences (IJEAS) ISSN: 2394-3661, Volume-2, Issue-11, November 2015



Fig. 13. (a) Two connected voltage islands with level-shifter insertion and (b) their equivalent circuit for delay calculation.

With the above discussion, different connection sequences with inserted level shifter should result different clock delays. At the same time, different power consumptions based on the calculation of (1) can also be obtained. Finally, we get the one with power minimization under the reasonable clock delay.

#### F. Analysis of Time Complexity

For given a set of *n* clock sinks in a set of m voltage islands, the proposed MuVIX algorithm shown in Fig. 5 can complete the design of multi-voltage-island-based X-clock tree. PMXF [11] constructs the X-clock tree for each voltage island in O(*n* log *n*). DVI-X [12] inserts double vias for each voltage island in O( $k^3$ ), where *k* is the number of single vias. For each connection sequence, it takes O(*m* log *m*) to combine m leaf-nodes with inserted level shifters. Because we always determine the lowest supplying voltage as the supplying voltage for the system clock source, the combination of connection sequences for searching the minimum power consumption is less than *m*!. Moreover,  $m \ll n$ ,  $m \ll k$ , and *k* is proportional to *n*. Hence, the time complexity of MuVIX algorithm is O(*n* log *n*)+O( $k^3$ ).

#### V. EXPERIMENTAL RESULTS

The proposed MuVIX algorithm of power aware based on voltage islands for X-clock tree construction has been implemented by using C/C++ programming language and performed on a MS-Windows 8.1 machine with Intel i7 CPU@2.2GHz, dual cores, and 8GB RAM. Table I lists the fabrication parameters of FED delay model [21] and level shifter (LS) under 130nm process [22] for power and delay calculation. The tested benchmarks contain IBM *r*1-*r*5 [8], ISCAS89 s1423, s5378, and s15850 [9], and MCNC Primary1-2 [10].

Table I. Technology parameters of FED delay model [21] and a level shifter [22] under 130nm

| and a level sinter [22] under 150ini. |         |   |            |                  |      |  |  |  |  |  |  |
|---------------------------------------|---------|---|------------|------------------|------|--|--|--|--|--|--|
| $r \left( \Omega / \mu m \right)$     | 0.623   | D | 1.12673ln2 | $r_{LS}(\Omega)$ | 250  |  |  |  |  |  |  |
| $c_a$ (fF/µm)                         | 0.00598 | Ε | 1.10463ln2 | $C_{LS}$ (fF)    | 23.5 |  |  |  |  |  |  |
| $c_f$ (fF/µm)                         | 0.043   | F | 1.04836ln2 | $T_{LS}$ (ps)    | 54.4 |  |  |  |  |  |  |
|                                       |         |   |            | $F_{clk}$ (Hz)   | 100M |  |  |  |  |  |  |

For all the experiments, a benchmark is first partitioned into two voltage islands (i.e., L-type) or three voltage islands (i.e., T-type1 or T-type2) and the X-clock tree is constructed for each partitioned voltage island using PMXF algorithm [11] and then DVI-X algorithm [12] is followed for double-via insertion with considering the skew tuning for skew minimization. After the X-clock tree construction with double via insertion for each partitioned voltage island, we expand the PMXF algorithm to integrate all the island-based sub-X-clock trees and level shifters are inserted if the clock signal is delivered from a low-voltage island to a high-voltage island. Thus, we have several different connections depending on a sequence of islands associated with different supplying voltages and level shifters. Finally, we can determine one of them that power consumption is minimized under the reasonable clock delay. Fig. 14 presents the platform of PMXF integrated environment for executing above experiments.



Fig. 14. The platform of PMXF integrated environment.

Fig. 15 shows three partitioned types, L-type, T-type1, and T-type2, for a benchmark. An L-type consists of two voltage islands, island 1 for 1.0 V and island 2 for 1.2 V, and the height of island 2 may be varied in the range of  $1/2\sim2/3 h$ , where *h* is the height of a benchmark. A T-type1 consists of three voltage islands, island 1 for 1.0 V, island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 or island 3 may be changed in the range of  $1/3\sim2/3 h$ . A T-type2 consists of three voltage islands, island 1 for 1.0V, island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V, and the height of island 2 for 1.1 V, and island 3 for 1.2 V.



Fig. 15. The partition of (a) L-type, (b) T-type1 voltage islands, and (c) T-type2 voltage islands.

Table II shows the results of single voltage-island-based X-clock trees with/without considering double via insertion in terms of via, power consumption, clock delay, and CPU time. From the experiments, the rate (#Dvia / #via) of double via insertion (DVI) is up to 91.81% on average. This higher DVI rate can improve yield rate and reliability during process manufacture. The power and delay with double via insertion (wDVI) are always light larger than those of without double via insertion (woDVI). The ratio (wDVI / woDVI) in the table is defined as the power (delay) with vs. without considering DVI. In summary, the power and delay with DVI averagely increase 0.084% and 0.095%, respectively. The CPU time of PMXF is always larger than those of DVI-X and their CPU times are proportional to the number of sinks or inserted double vias.

Table III shows the comparison of L-type vs single voltage island with considering double via insertion. From the table, the power, delay, and CPU time using L-type can averagely save 21.85% (i.e., 1–0.7815), 6.08% (i.e., 1–0.9392), and 19.19% (i.e., 1–0.8181), respectively, while the number of

vias is paying more 0.87% (i.e., 1.0087–1).

Table IV shows the results of T-type1. Due to each benchmark is divided into three voltage islands and has different combined connections for the selection of different power and delay. The table present the three cases vs its single voltage island of a benchmark, lower power first, smallest delay first, and balanced power&delay first. From the table, we find that one of their three cases is not always best for all the benchmarks. Thus, the determined rule for the best one is that power consumption can be minimized as possible under the delay control. Therefore, each benchmark has the best one in power and delay marked in bold characters in the table. Similar results for another T-type2 are shown in Table V.

All the best results selected from Tables IV and V are listed in Tables VI and VII, respectively. We combine these results of Tables VI and VII and compare them to the results using single voltage island. The power, delay, and CPU time can averagely save 21.3% (i.e., [(1-0.7939)+(1-0.7802)] / 2), 3.43% (i.e., [(1-1.0011)+(1-0.9304)] / 2), and 48.4% (i.e., [(1-0.4146)+(1-0.6175)] / 2), respectively, while the number of vias is paying more 0.22% (i.e., [(1.0002 - 1)+(1.0042 - 1)] / 2).

Moreover, we compare all the results, the power, delay, and CPU time of multi-voltage islands (L-type+T-type) vs single voltage island that can save up to 21.58% (i.e., (21.85%+21.3%)/2), 4.75% (i.e., (6.08%+3.43%)/2), and 33.8% (i.e., (19.19%+48.4%)/2) on average, respectively, while the number of vias is paying more 0.55% (i.e., (0.87%+0.22%)/2).

Fig. 16 presents the X-clock trees of (a) L-type, (b) T-type1, and (c) T-type2 voltage islands with double via insertion for benchmark r5.



Fig. 16. X-clock trees of the benchmark r5 that are based on (a) L-type, (b) T-type1, and (c) T-type2 voltage islands.

## VI. CONCLUSION

The X-clock tree construction with considering one of DFM issues, double via insertion (DVI), for via-effect avoidances and the DVI rate is always over 90% for reliable manufacturing. The modes of multi-voltage islands like L-type or T-type are implemented in the DVI-based X-clock tree construction in advance, the power consumption can be efficiently reduced as well as clock delay and running time compared with single one. Expanded work is to consider different partition models depending on different voltages and clock sink distribution on a 3D chip [17]-[19,[23] and integrate them to be a well-defined clock tree under control in power and delay.

#### ACKNOWLEDGMENT

The author would like to thank to Dr. Chung-Chieh Kuo for supporting the voltage-island platform.

#### REFERENCES

- W. K. Mak and J. W. Chen, "Voltage island generation under performance requirement for SoC designs," *IEEE Design Automation Conference in Asia and South Pacific*, Jan. 2007, pp. 798-803.
- [2] M. C. Lu, M. C. Wu, H. M. Chen, and H. R. Jiang, "Performance constraints aware voltage islands generation in SoC floorplan design," *IEEE International SOC Conference*, Sept. 2006, pp. 211-214.
- [3] J. Hut, Y. Shins, N. Dhanwadat, and R. Marculescut, "Architecting voltage islands in core-based system-on-a-chip designs," *IEEE International Symposium on Low Power Electronics and Design*, 2004, pp. 180-185.
- [4] W. P. Lee, H. Y. Liu, and Y. W. Chang, "An ILP algorithm for post-floorplanning voltage-island generation considering power-network planning," *IEEE/ACM International Conference on Computer-Aided Design*, Nov., 2007, pp. 650-655.
- [5] W.-P. Lee, H.-Y. Liu, and Y.-W. Chang, "Voltage-island partitioning and floorplanning under timing constraints," *IEEE Trans. on CAD.*, vol. 28, no. 5, May 2009, pp. 690-702.
- [6] B. Yu, S. Dong, and S. GOTO, "Multi-voltage and level-shifter assignment driven floorplanning," *IEEE Int. Conf. on ASIC*, Oct. 2009, pp. 1264-1267.
- [7] Tai-Hsuan Wu, A. Davood, and J. T. Linderoth, "Power-driven global routing for multi-supply voltage domains," DATE, march 2011.
- [8] R. S. Tsay, "Exact Zero Skew," IEEE International Conference on Computer-Aided Design, 1991, pp. 336-339.
- [9] J. G. Xi and W. W.-M. Dai, "Useful-skew clock routing with gate sizing for low power design," ACM/IEEE Design Automation Conference, June 1996, pp. 383-388.
- [10] M. A. B. Jackson, A. Srinivasan, and E. S. Kuh, "Clock routing for high performance ICs," ACM/IEEE Design Automation Conference, June 1990, pp. 573-579.
- [11] C. C. Tsai, C. C. Kuo, J. O. Wu, T. Y. Lee, and R. S. Hsiao, "X-clock routing based on pattern matching," *IEEE International SOC Conference*, Sept. 2008, pp. 357-360.
- [12] C. C. Tsai, C. C. Kuo, L. J. Gu, and T. Y. Lee, "Double-via insertion enhanced X-Architecture clock routing for reliability," *IEEE Int. Sym.* on Circuits and Systems, pp. 3413-3416, May 2010.
- [13] C. C. Tsai, C. C. Kuo, and T. Y. Lee, "High performance buffered X-architecture zero-skew clock tree construction with via delay consideration," *International Journal of Innovative Computing*, *Information and Control*, vol. 7, no. 9, Sept. 2011, pp. 5145-5161.
- [14] C. C. Tsai, T. H. Lin, S. H. Tsai, and H. M. Chen, "Clock planning for multi-voltage and multi-mode designs," *IEEE International Symposium on Quality Electronic Design*, Mar. 2011, pp. 654-658.
- [15] Y. S. Su, W. K. Hon, C. C. Yang, S. C. Chang, and Y. J. Chang, "Clock skew minimization in multi-voltage mode designs using adjustable delay buffers," *IEEE Trans. on CAD of Integrated Circuits and Systems*, vol. 29, no. 12, Dec. 2010.
- [16] K. Y. Lin, H. T. Lin, and T. Y. Ho, "An efficient algorithm of adjustable delay buffer insertion for clock skew minimization in multiple dynamic supply voltage designs," *IEEE 16th ASP-DAC*, Jan. 2011, pp. 825-830.
- [17] T. Y. Kim and T. Kim, "Clock tree synthesis for TSV-based 3D IC designs," ACM Transactions on Design Automation of Electronic Systems, vol. 16, no. 4, Oct. 2011, Article 48.
- [18] F. W. Chen and T. T. Hwang, "Clock tree synthesis with methodology of re-use in 3D IC," *IEEE/ACM Design Automation Conference*, June 2012, pp. 1094-1099.
- [19] B. Lee, E. Y. Chung, and H. J. Lee, "Voltage islanding technique for concurrent power and temperature optimization in 3D-stacked ICs," *Int. Technical Conference on Circuit/Systems Computers and Communications*, July 2014.
- [20] T. Y. Ho, Y. W. Chang, S. J. Chen, and D. T. Lee, "Crosstalk-and performance-driven multilevel full-chip routing," *IEEE Trans. on CAD*, vol. 24, no. 6, June 2005, pp. 869-878.
- [21] A. I. AbouSeido, B. Nowak, and C. Chu, "Fitted Elmore delay: a simple and accurate interconnect delay model," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 12, no. 7, July 2004, pp. 691-696.
- [22] T. C. Chen, S. R. Pan, and Y. W. Chang, "Timing modeling and optimization under the transmission line model," *IEEE Transactions* on Very Large Scale Integration Systems, vol. 12, no. 1, Jan. 2004, pp. 28-41.
- [23] S. J. Wang, C. H. Lin, and Katherine S. M. Li, "Synthesis of 3D Clock Tree with Pre-bond Testability," *IEEE ISCAS*, 2013, pp. 2654-2657.

# International Journal of Engineering and Applied Sciences (IJEAS) ISSN: 2394-3661, Volume-2, Issue-11, November 2015



**Chia-Chun Tsai** received his B.S. degree in Industrial Education from National Taiwan Normal University, Taipei, Taiwan, ROC, in 1978, and both M.S. and Ph.D. degrees in Electrical Engineering from National Taiwan University, Taipei, Taiwan, ROC, in 1987 and 1991, respectively. From 1978 to 1989, he was a specialist teacher of electronic maintenance at Taiwan Provincial Hsin-Hua and Taipei Municipal Nan-Kang Technology High Schools, respectively. From 1989 to 2005, he served at the Department of Electronic Engineering, National Taipei University of Technology, Taipei, Taiwan, ROC. From 1994 to 1995 and 2000 to 2001, he was on leave from National Taipei University of Technology and worked on postdoctoral research at University of California, San Diego, and North Carolina State University, respectively. Since 2005 he has been with the Department of Computer Science and Information Engineering, Nanhua University, Chiayi, Taiwan, ROC, where he is a Full Professor. His current research interests include VLSI design automation and mixed-signal IC designs.

Table II. Results of single voltage island X-clock tree with/without considering double via insertion in via, power delay, and CPU time.

| Donohmoult | #Cimlro | Via   | s in Single | island | Power    | (mW) in Single | e island | Delay (  | ns) in Single | island  | CPU time (s) | in Single island |
|------------|---------|-------|-------------|--------|----------|----------------|----------|----------|---------------|---------|--------------|------------------|
| Benchinark | #SIIIKS | #via  | #Dvia       | ratio  | woDVI    | wDVI           | ratio    | woDVI    | wDVI          | ratio   | PMXF         | DVI-X            |
| <i>r</i> 1 | 267     | 1222  | 1113        | 0.9108 | 80.12    | 80.237         | 1.00146  | 278.176  | 278.317       | 1.00051 | 35.71        | 0.374            |
| r2         | 598     | 2840  | 2583        | 0.9095 | 194.889  | 195.091        | 1.00104  | 858.143  | 858.636       | 1.00057 | 163.9        | 1.357            |
| r3         | 862     | 3995  | 3676        | 0.9202 | 261.838  | 261.996        | 1.00060  | 1452.224 | 1453.014      | 1.00054 | 29.68        | 2.574            |
| r4         | 1903    | 9166  | 8401        | 0.9165 | 612.497  | 612.901        | 1.00066  | 4099.221 | 4101.838      | 1.00064 | 270.753      | 11.014           |
| r5         | 3101    | 14665 | 13446       | 0.9169 | 1012.692 | 1013.366       | 1.00067  | 6934.2   | 6938.616      | 1.00064 | 974.752      | 28.314           |
| Primary1   | 269     | 1195  | 1122        | 0.9389 | 169.068  | 169.102        | 1.00020  | 38.369   | 38.412        | 1.00112 | 2.92         | 0.375            |
| Primary2   | 603     | 2335  | 2159        | 0.9246 | 417.055  | 417.13         | 1.00018  | 220.477  | 220.591       | 1.00052 | 16.98        | 1.42             |
| s1423      | 74      | 350   | 323         | 0.9229 | 6.907    | 6.914          | 1.00101  | 6.507    | 6.518         | 1.00169 | 0.11         | 0.032            |
| s5378      | 179     | 771   | 698         | 0.9053 | 17.549   | 17.569         | 1.00114  | 14.437   | 14.459        | 1.00152 | 4.53         | 0.203            |
| s15850     | 597     | 2704  | 2475        | 0.9153 | 65.062   | 65.155         | 1.00143  | 43.441   | 43.517        | 1.00175 | 9.77         | 1.282            |
| Average    |         |       |             | 0.9181 |          |                | 1.00084  |          |               | 1.00095 |              |                  |

Table III. Comparison of L-type vs single voltage-island X-clock trees with considering double via insertion in power, delay, and CPU time.

|            |                  | Total via        |        | Pe               | ower (mW)        |        |               | Delay (ns)       |        | CPU time (s)     |                  |        |  |
|------------|------------------|------------------|--------|------------------|------------------|--------|---------------|------------------|--------|------------------|------------------|--------|--|
| Benchmark  | Single<br>island | L-type<br>island | ratio  | Single<br>island | L-type<br>island | ratio  | Single island | L-type<br>island | ratio  | Single<br>island | L-type<br>island | ratio  |  |
| <i>r</i> 1 | 2335             | 2362             | 1.0116 | 80.237           | 59.901           | 0.7466 | 278.317       | 284.837          | 1.0234 | 16.056           | 3.705            | 0.2308 |  |
| r2         | 5423             | 5453             | 1.0055 | 195.091          | 147.617          | 0.7567 | 858.636       | 853.203          | 0.9937 | 87.101           | 15.211           | 0.1746 |  |
| r3         | 7671             | 7919             | 1.0323 | 261.996          | 211.568          | 0.8075 | 1453.014      | 1344.603         | 0.9254 | 16.347           | 18.034           | 1.1032 |  |
| r4         | 17567            | 17843            | 1.0157 | 612.901          | 491.303          | 0.8016 | 4101.838      | 3011.99          | 0.7343 | 194.833          | 132.573          | 0.6804 |  |
| r5         | 28111            | 26581            | 0.9456 | 1013.366         | 813.438          | 0.8027 | 6938.616      | 6377.82          | 0.9192 | 523.916          | 654.573          | 1.2494 |  |
| Primary1   | 2317             | 2243             | 0.9681 | 169.102          | 129.026          | 0.7630 | 38.412        | 44.539           | 1.1595 | 2.489            | 2.492            | 1.0012 |  |
| Primary2   | 4494             | 4373             | 0.9731 | 417.13           | 322.762          | 0.7738 | 220.591       | 150.233          | 0.6812 | 11.109           | 7.593            | 0.6835 |  |
| s1423      | 673              | 705              | 1.0476 | 6.914            | 5.312            | 0.7683 | 6.518         | 6.147            | 0.9431 | 0.142            | 0.152            | 1.0704 |  |
| s5378      | 1469             | 1608             | 1.0946 | 17.569           | 15.008           | 0.8542 | 14.459        | 13.571           | 0.9386 | 4.733            | 6.333            | 1.3381 |  |
| s15850     | 5179             | 5145             | 0.9934 | 65.155           | 48.266           | 0.7408 | 43.517        | 46.732           | 1.0739 | 11.052           | 7.18             | 0.6497 |  |
| Average    |                  |                  | 1.0087 |                  |                  | 0.7815 |               |                  | 0.9392 |                  |                  | 0.8181 |  |

Table IV. Comparison of T-type1 vs single voltage-island X-clock trees with considering double via insertion in power and

|           | delay   |           |         |          |               |        |         |           |                |        |                                  |        |           |        |
|-----------|---------|-----------|---------|----------|---------------|--------|---------|-----------|----------------|--------|----------------------------------|--------|-----------|--------|
| Benchmark | Single  | island    | Low     | er power | first for T-t | ype1   | Small   | est delay | first for T-ty | pe1    | Balanced power&delay for T-type1 |        |           |        |
|           | P(mW)   | Delay(ns) | P(mW)   | ratio    | Delay(ns)     | ratio  | P(mW)   | ratio     | Delay(ns)      | ratio  | P(mW)                            | ratio  | Delay(ns) | ratio  |
| rl        | 80.237  | 278.317   | 59.691  | 0.7439   | 427.582       | 1.5363 | 62.207  | 0.7753    | 288.858        | 1.0379 | 62.116                           | 0.7742 | 299.385   | 1.0757 |
| r2        | 195.091 | 858.636   | 148.002 | 0.7586   | 1250.522      | 1.4564 | 150.909 | 0.7735    | 921.493        | 1.0732 | 150.005                          | 0.7689 | 1078.489  | 1.256  |
| r3        | 261.996 | 1453.014  | 211.714 | 0.8081   | 1591.417      | 1.0953 | 217.006 | 0.8283    | 1336.342       | 0.9197 | 212.448                          | 0.8109 | 1595.594  | 1.0981 |
| r4        | 612.901 | 4101.838  | 488.249 | 0.7966   | 4708.278      | 1.1478 | 500.119 | 0.816     | 3539.556       | 0.8629 | 495.312                          | 0.8081 | 3972.278  | 0.9684 |
| r5        | 1013.37 | 6938.616  | 782.342 | 0.772    | 7953.024      | 1.1462 | 808.245 | 0.7976    | 7091.42        | 1.022  | 790.549                          | 0.7801 | 7668.044  | 1.1051 |
| primary1  | 169.102 | 38.412    | 132.392 | 0.7829   | 70.796        | 1.8431 | 135.808 | 0.8031    | 42.198         | 1.0986 | 135.332                          | 0.8003 | 47.789    | 1.2441 |
| primary2  | 417.13  | 220.591   | 319.622 | 0.7662   | 220.743       | 1.0007 | 333.061 | 0.7985    | 167.825        | 0.7608 | 329.334                          | 0.7895 | 175.559   | 0.7959 |
| s1423     | 6.914   | 6.518     | 5.179   | 0.7491   | 9.342         | 1.4333 | 5.44    | 0.7868    | 6.98           | 1.0709 | 5.44                             | 0.7868 | 7.038     | 1.0798 |
| s5378     | 17.569  | 14.459    | 13.535  | 0.7704   | 20.23         | 1.3991 | 14.112  | 0.8032    | 15.068         | 1.0421 | 13.992                           | 0.7964 | 16.882    | 1.1676 |
| s15850    | 65.155  | 43.517    | 48.224  | 0.7401   | 61.844        | 1.4211 | 50.532  | 0.7756    | 42.468         | 0.9759 | 50.397                           | 0.7735 | 42.714    | 0.9815 |
| Average   |         |           |         | 0.7688   |               | 1.3479 |         | 0.7958    |                | 0.9864 |                                  | 0.7889 |           | 1.0772 |

Table V. Comparison of T-type2 vs single voltage-island X-clock trees with considering double via insertion in power and delay

| Benchmark | Single  | island    | Lower power first for T-type2 |        |           |        | Smallest delay first for T-type2 |        |           |        | Balanced power&delay for T-type2 |        |           |        |
|-----------|---------|-----------|-------------------------------|--------|-----------|--------|----------------------------------|--------|-----------|--------|----------------------------------|--------|-----------|--------|
|           | P(mW)   | Delay(ns) | P(mW)                         | ratio  | Delay(ns) | ratio  | P(mW)                            | ratio  | Delay(ns) | ratio  | P(mW)                            | ratio  | Delay(ns) | ratio  |
| r1        | 80.237  | 278.317   | 59.851                        | 0.7459 | 300.465   | 1.0796 | 65.776                           | 0.8198 | 285.819   | 1.027  | 65.809                           | 0.8202 | 285.82    | 1.027  |
| r2        | 195.091 | 858.636   | 140.134                       | 0.7183 | 823.275   | 0.9588 | 150.371                          | 0.7708 | 630.627   | 0.7345 | 149.624                          | 0.7669 | 631.169   | 0.7351 |
| r3        | 261.996 | 1453.014  | 205.05                        | 0.7826 | 1378.38   | 0.9486 | 216.096                          | 0.8248 | 1160.669  | 0.7988 | 211.387                          | 0.8068 | 1196.014  | 0.8231 |
| r4        | 612.901 | 4101.838  | 479.296                       | 0.782  | 3933.765  | 0.959  | 505.203                          | 0.8243 | 3550.251  | 0.8655 | 494.468                          | 0.8068 | 3624.769  | 0.8837 |
| r5        | 1013.37 | 6938.616  | 768.672                       | 0.7585 | 6424.236  | 0.9259 | 807.773                          | 0.7971 | 5538.822  | 0.7983 | 790.973                          | 0.7805 | 5612.758  | 0.8089 |
| primary1  | 169.102 | 38.412    | 124.731                       | 0.7376 | 57.195    | 1.489  | 131.744                          | 0.7791 | 42.847    | 1.1155 | 130.898                          | 0.7741 | 44.059    | 1.147  |

# Power Aware Based on Voltage Islands for X-Clock Tree Construction

| primary2 | 417.13 | 220.591 | 308.293 | 0.7391 | 209.02 | 0.9475 | 327.941 | 0.7862 | 167.037 | 0.7572 | 324.382 | 0.7777 | 167.721 | 0.7603 |
|----------|--------|---------|---------|--------|--------|--------|---------|--------|---------|--------|---------|--------|---------|--------|
| s1423    | 6.914  | 6.518   | 5.31    | 0.768  | 8.317  | 1.276  | 5.815   | 0.841  | 5.799   | 0.8897 | 5.761   | 0.8332 | 5.871   | 0.9007 |
| s5378    | 17.569 | 14.459  | 12.839  | 0.7308 | 15.598 | 1.0788 | 14.038  | 0.799  | 11.204  | 0.7749 | 13.887  | 0.7904 | 11.265  | 0.7791 |
| s15850   | 65.155 | 43.517  | 48.416  | 0.7431 | 46.194 | 1.0615 | 52.137  | 0.8002 | 32.254  | 0.7412 | 52.051  | 0.7989 | 32.26   | 0.7413 |
| Average  |        |         |         | 0.7506 |        | 1.0725 |         | 0.8042 |         | 0.8502 |         | 0.7956 |         | 0.8606 |

Table VI. Comparison of T-type1 vs single voltage-island X-clock trees with considering double via insertion in power, delay, and CPU time.

|            |                  | Total via         |        | P                | ower (mW)         |        |               | Delay (ns)        |        | C                | PU time (s)       |        |
|------------|------------------|-------------------|--------|------------------|-------------------|--------|---------------|-------------------|--------|------------------|-------------------|--------|
| Benchmark  | Single<br>island | T-type1<br>island | ratio  | Single<br>island | T-type1<br>island | ratio  | Single island | T-type1<br>island | ratio  | Single<br>island | T-type1<br>island | ratio  |
| <i>r</i> 1 | 2335             | 2355              | 1.0086 | 80.237           | 62.207            | 0.7753 | 278.317       | 288.858           | 1.0379 | 16.056           | 2.664             | 0.1659 |
| r2         | 5423             | 5398              | 0.9954 | 195.091          | 150.909           | 0.7735 | 858.636       | 921.493           | 1.0732 | 87.101           | 6.198             | 0.0712 |
| r3         | 7671             | 7937              | 1.0347 | 261.996          | 217.006           | 0.8283 | 1453.014      | 1336.342          | 0.9197 | 16.347           | 6.838             | 0.4183 |
| r4         | 17567            | 18322             | 1.0430 | 612.901          | 495.312           | 0.8081 | 4101.838      | 3972.278          | 0.9684 | 194.833          | 48.904            | 0.2510 |
| r5         | 28111            | 27530             | 0.9793 | 1013.366         | 808.245           | 0.7976 | 6938.616      | 7091.42           | 1.0220 | 523.916          | 235.56            | 0.4496 |
| Primary1   | 2317             | 2214              | 0.9556 | 169.102          | 135.808           | 0.8031 | 38.412        | 42.198            | 1.0986 | 2.489            | 1.723             | 0.6923 |
| Primary2   | 4494             | 4285              | 0.9535 | 417.13           | 329.334           | 0.7895 | 220.591       | 175.559           | 0.7959 | 11.109           | 5.226             | 0.4704 |
| s1423      | 673              | 662               | 0.9837 | 6.914            | 5.44              | 0.7868 | 6.518         | 6.98              | 1.0709 | 0.142            | 0.18              | 1.2676 |
| s5378      | 1469             | 1560              | 1.0620 | 17.569           | 14.112            | 0.8032 | 14.459        | 15.068            | 1.0421 | 4.733            | 0.204             | 0.0431 |
| s15850     | 5179             | 5110              | 0.9867 | 65.155           | 50.397            | 0.7735 | 43.517        | 42.714            | 0.9816 | 11.052           | 3.5               | 0.3167 |
| Average    |                  |                   | 1.0002 |                  |                   | 0.7939 |               |                   | 1.0011 |                  |                   | 0.4146 |

Table VII. Comparison of T-type2 vs single voltage-island X-clock trees with considering double via insertion in power, delay, and CPU time.

|            |                  | Total via         |        | P                | ower (mW)         |        |               | Delay (ns)        |        | C                | PU time (s)       |        |
|------------|------------------|-------------------|--------|------------------|-------------------|--------|---------------|-------------------|--------|------------------|-------------------|--------|
| Benchmark  | Single<br>island | T-type2<br>island | ratio  | Single<br>island | T-type2<br>island | ratio  | Single island | T-type2<br>island | ratio  | Single<br>island | T-type2<br>island | ratio  |
| <i>r</i> 1 | 2335             | 2384              | 1.0210 | 80.237           | 65.776            | 0.8198 | 278.317       | 285.819           | 1.0270 | 16.056           | 4.818             | 0.3001 |
| r2         | 5423             | 5533              | 1.0203 | 195.091          | 140.134           | 0.7183 | 858.636       | 823.275           | 0.9588 | 87.101           | 3.558             | 0.0409 |
| r3         | 7671             | 7685              | 1.0018 | 261.996          | 205.05            | 0.7827 | 1453.014      | 1378.38           | 0.9486 | 16.347           | 8.444             | 0.5166 |
| r4         | 17567            | 17486             | 0.9954 | 612.901          | 479.296           | 0.7820 | 4101.838      | 3933.765          | 0.9590 | 194.833          | 68.26             | 0.3504 |
| r5         | 28111            | 28162             | 1.0018 | 1013.366         | 768.672           | 0.7585 | 6938.616      | 6424.236          | 0.9259 | 523.916          | 263.118           | 0.5022 |
| Primary1   | 2317             | 2288              | 0.9875 | 169.102          | 131.744           | 0.7791 | 38.412        | 42.847            | 1.1155 | 2.489            | 2.676             | 1.0751 |
| Primary2   | 4494             | 4597              | 1.0229 | 417.13           | 308.293           | 0.7391 | 220.591       | 209.02            | 0.9476 | 11.109           | 7.25              | 0.6526 |
| s1423      | 673              | 651               | 0.9673 | 6.914            | 5.761             | 0.8332 | 6.518         | 5.871             | 0.9007 | 0.142            | 0.296             | 2.0845 |
| s5378      | 1469             | 1519              | 1.0340 | 17.569           | 13.887            | 0.7904 | 14.459        | 11.265            | 0.7791 | 4.733            | 0.182             | 0.0385 |
| s15850     | 5179             | 5129              | 0.9904 | 65.155           | 52.051            | 0.7989 | 43.517        | 32.26             | 0.7413 | 11.052           | 6.789             | 0.6143 |
| Average    |                  |                   | 1.0042 |                  |                   | 0.7802 |               |                   | 0.9304 |                  |                   | 0.6175 |