An Improved Matrix Generation Framework for Thermal Aware Placement in VLSI

Since hotspots and temperature gradients are reliability and performance-critical issues in processors, thermal awareness finds a vital place in the processor design cycle. Incorporating thermal awareness at the level of physical design, this work proposes a new, fast, and efficient thermal aware placement algorithm called the Thermal Aware Matrix Placement Optimizer (TAMPO) for gate arrays. The algorithm TAMPO is composed of the following components: an improved heat diffusion aware cell arrangement technique called the Initial Matrix Generator (IMaGe), a unique stochastic thermal model based on a thermally improved interpretation of the well known Matrix Synthesis Problem (MSP) and a Simulated Annealing (SA) engine for finding the global optimum solution. TAMPO targets to reduce the peak temperature while maintaining improved values of temperature gradients and the standard deviation in cell temperature with respect to the average chip temperature. This work also presents a methodology, the Co-optimized TAMPO, which extends the concept of TAMPO to simultaneously optimize the thermal attributes and the wirelength of a chip. The proposed algorithms realize a placement in matrix arrangement and upon experimentation on the ISCAS89 benchmark circuits encouraging results have been obtained.


I. INTRODUCTION
Thermal management has always been a challenge in VLSI design, since processors containing billions of transistors and pulsating at gigahertz frequencies, generate a huge amount of heat in small areas and often suffer from hotspots and high temperature gradients. Hotspots degrade the reliability of chips by aggravating failure mechanisms like electromigration, stress migration, gate oxide breakdown, and thermal cycling [1]. Increasing junction temperature exponentially increases the stand by leakage power dissipation which may even cause thermal runaway and subsequently permanent damage of IC [2]. Moreover, temperature gradient leads to ailments like clock skew and cross talk induced noise in interconnects [3]. These factors impose substantial cost overhead for implementing cooling solutions. Data centers waste a massive 40% of energy under cooling [4]. Hence it becomes very imperative to implement thermal aware techniques at the different levels of abstraction of the VLSI design cycle.
The associate editor coordinating the review of this manuscript and approving it for publication was Yuh-Shyan Hwang.
For power-constrained ICs, placement and floorplanning are vital levels where the thermal awareness can be further incorporated, in addition to the optimization of traditional design objectives.
The thermal aware placement of gate array IC has been recognized in [5] as a combinatorial optimization problem referred to as the Matrix Synthesis Problem (MSP), which targets to minimize the temperature of the hottest region and also tries to ensure an even heat distribution throughout the gate array IC. The MSP is about generating an optimal arrangement of a given set of numbers in a matrix such that the maximum sum of submatrices of a particular size (t x t) is minimized. It considers the numbers as the heat dissipated by cells located at the corresponding matrix locations and the highest submatrix sum as the hottest region in the chip. The Simple Approximation algorithm, also termed algorithm A1 has been proposed in [5] as a solution to the MSP. Algorithm A1 sorts the cells in decreasing order of heat and distributes them successively in to four groups viz. G 0 = {L 0 }, G 1 = {L 1 }, G 2 = {L 2 }, G 3 = {L 3 }, such that the order of the groups on the basis of the heat of the constituent VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ elements is G 0 ≥ G 1 ≥ G 2 ≥ G 3 . The placement has been realized as a square matrix in which every (t x t) sub matrix (with t = 2) is constructed by randomly selecting one element from each of the four groups thereby ensuring that every high power cell is placed along with moderate and low power cells. Basically this grouping and sub matrix formation results in uniform heat or power distribution throughout the IC and a reduction in hotspots. Electronic Design Automation (EDA) tools like Hotspot [6], [7], [28] provide an accurate temperature estimation of functional blocks by solving a lumped thermal RC network of the stacked-layer package scheme of IC. But Hotspot tool takes considerable time overhead in solving the temperature value from the compact thermal model. In this regards the MSP placement methodology proposed in [5] smartly avoids the expensive computation of actual temperature estimation during the optimization process and provides a fast solution to the thermal aware placement problem of gate array ICs. However, according to [9] and [10] the temperature of a functional block is influenced by the length of its shared boundary as well as the power density differences with the neighboring blocks. Moreover, according to [8] the die boundary wall also influences the local thermal characterization. The die boundary wall has an adiabatic influence on the internally generated heat and the temperature of a functional block is characterized by its relative position from the die boundary wall. These thermal considerations have been ignored by the works done till now on MSP [5], [11]- [14] for the thermal aware placement. Moreover, the experimental works done in [5], [11]- [14] have not quantified the thermal improvements in terms of temperature. The experimental works done in [5], [11]- [14] considers only square matrix formation for cell placement thereby incurring more dummy cells to make up a square number of total cells. Dummy cells are proxy blocks having zero power dissipation and the inclusion of more dummy cells increase the chip area. Hence in this work, we have modified the Matrix Synthesis Problem (MSP) by incorporating the missing thermal considerations and developed a fast, new and efficient placement algorithm capable of generating optimal square matrix as well as minimum-cell matrix (having lesser dummy cells) thermal aware placement for gate array ICs.
Contributions: Our work makes the following salient contributions.
1) It presents a transformation algorithm, the Gate Array Packer (GAP) which maps a logic circuit into a gate array architecture composed of basic cells and clusters.
2) It depicts an algorithm, the Initial Matrix Generator (IMaGe), for a heat diffusion aware even power distribution scheme and the construction of initial placement of cells. The IMaGe generates square matrix and minimum-cell matrix placements.
3) It presents an improved thermal model, designed by modifying the thermal metric of the MSP viz. local summative heat of a submatrix region or thermal zone along with its reflective heat component from the adiabatic die boundary wall.
4) It presents the placement algorithm, Thermal Aware Matrix Placement Optimizer (TAMPO) which incorporates the IMaGe, the proposed thermal model, and a Simulated Annealing (SA) engine to give a fast optimization in peak temperature, temperature gradient, and the standard deviation in temperature of cells in matrix placement.
5) It further extends the concept of TAMPO to generate a placement strategy called the Co-optimized TAMPO which optimizes the thermal attributes and the wirelength simultaneously.
6) Finally it reconstructs few reference placement algorithms; one based on the Hotspot tool [6], [7], [28], one based on the Simple Approximation algorithm [5], and the other based on the thermal aware placement algorithm [11] to validate the performance of the proposed placement algorithms.

II. RELATED WORKS
Some of the efforts made towards the development of efficient thermal aware placement and floorplan algorithms are discussed as follows. Paper [5] introduces the Matrix Synthesis Problem (MSP) for thermal aware placement of gate arrays. Thermal aware placement of standard cells and gate arrays has also been presented in [11]- [13] by implementing the MSP where the proposed algorithms assume square matrix placement and try to minimize the peak (t x t) submatrix sum to reduce the hotspots. The algorithms in [11] consider the matrix elements as the power dissipation of cells and also show an approach to simultaneously optimize the wirelength and hotspots. Authors in [12] and [13] assume the matrix elements as the temperatures of cells and every (t x t) submatrix as a window. Work done in [12] implements a multiobjective optimization heuristic based on the game theory to simultaneously minimize the maximum window temperature and the wirelength. The methodology in [13] also employs a game theory based approach to minimize the maximum window temperature and the deviation of maximum temperature. The MSP has been further applied for the thermal aware 3D IC placement of standard cells in [14]. Using a simulated annealing based approach and considering every active layer as a square matrix with matrix elements as the power density of cells, the algorithm mitigates the hotspots by reducing the maximum aggregate of every (t x t) submatrix. The algorithm in [14] also simultaneously optimizes the wire length and the TSV. However, the temperature of functional blocks is placement dependent and cannot be taken as input to the placement problem as has been considered in [12] and [13]. Moreover, the submatrix sum of quantities like heat, power, and power density considered in [5], [11], and [14] respectively alone cannot account for the degree of hotness of a region. Work done in [15] proposes a 3D MSP cube model for the thermal aware mapping of 3D NOC architecture. Also utilizing a genetic algorithm approach, it achieves improvements in temperature deviation, power, and delay. Work done in [16] shows a thermal aware-placer based on thermal force and thermal padding methods for optimizing the peak temperature and the temperature gradient. It also uses the modified nodal analysis to estimate the temperature from the equivalent thermal circuit of the chip.
Authors in [17] present a pre-RTL tool framework based on the simulated annealing heuristic to optimize the peak temperature and chip area of SoC and chip multiprocessor floorplan. A thermal aware floorplan algorithm has also been presented in [18] for optimizing the temperature-dependent wire delay, routing congestion, reliability factors, area, and peak temperature of the chip based on the HotFloorplan tool. Work done in [19] also portrays a thermal aware hybrid PSO-GA based floorplan algorithm for optimizing the area, wirelength, and temperature of the chip. Hotspot tool has been used in [17]- [19] for the temperature estimation. However, the temperature estimation methods employed in [16]- [19] during the optimization process incur a large computational budget and execution time. A relatively faster conjugate gradient method has been proposed in [20] for computing the temperature from the thermal model of the Hotspot tool. The floorplan algorithm in [8] avoids the computational expense of exact temperature estimation and employs a heat diffusion method to give a fast optimization of the peak temperature, area, and wirelength. A fast fixed outline thermal aware multilevel floorplan algorithm has also been presented in [21] which uses a power blurring analytical method to estimate the temperature and simultaneously optimizes the temperature and wire length of the chip.

III. MOTIVATION
As an example let us consider the matrix placement of a test case circuit containing 33 functional cells each of identical height (H = 0.0006 m) and width (W = 0.0008 m) just like gate array cells. The functional cells and their corresponding power dissipation values have been shown in Fig. 1. In order to obtain a thermal aware solution according to the Simple Approximation or algorithm A1, the cells at first have been sorted in descending order of power in a linear cell array and secondly grouped as shown in Fig. 2. Now considering a square matrix placement, the nearest square number which can accommodate 33 functional cells is 6 2 = 36. Number of dummy cells (of zero power dissipation) to be added to the placement matrix is 36 -33 = 3. The minimum or the best possible peak submatrix sum with t = 2 in a 6 x 6 placement matrix with the given set of cells (functional {C i } and dummy {D i }) is 1.5 W. As shown in Fig. 3 some matrix placements have been constructed where the dotted envelopes in red and blue denote the submatrix regions with the peak sum for t = 2 and t = 3 respectively. Consider Placement-1 as shown in Fig. 3a where the cells have been randomly distributed. According to the MSP, Placement-1 is a poor solution due to higher peak submatrix sum (3.6 W with t = 2) and uneven distribution in power dissipation. Following algorithm A1, one element (cell) is selected from each cell group G k = {L k } within the linear cell array (in Fig. 2) and so placed in the (t x t) submatrix regions that the peak submatrix sum is minimized and Placement-2 in Fig. 3b is obtained. The peak submatrix sum in Placement-2 is 1.8 W (with t = 2) and this is the minimum value achievable by algorithm A1 (with t = 2) but it is not the best value (1.5W). Hence Placement-2 is one of the optimal solutions of algorithm A1 according to MSP with t = 2. Two more solutions viz. Placement-3 in Fig. 3c and Placement-4 in Fig. 3d have been constructed in accordance with the placement scheme of algorithm A1. Finally Placement-5 in Fig. 3e has been constructed according to our proposed Updated Placement Scheme -UPS (refer section VC). All the solutions viz. Placement-3, 4 and 5 are inferior solutions according to MSP since they have higher peak submatrix sum with respect to Placement-2 for both t = 2 and t = 3. The peak submatrix sum of the placements is available in Table 1.
Through experiments based on the placement matrices described in Fig. 3, the corresponding thermal maps and thermal attributes have been obtained and presented in Fig. 4 and Table 1 respectively. The description of the terms used for the analysis of the placement solutions are as follows: 'Max. Zonal Sum' represents the peak (t x t) submatrix sum, 'Avg. Temp.' is the average temperature of cells, 'Peak Temp.' is the maximum on-chip temperature, 'Min. Temp.' is the minimum temperature on the chip, 'Temp. Grad.' is the temperature gradient on the chip and 'Std. Dev. Temp.' is the standard deviation in cell temperature denoting the extent of the temperature variation of all cells from the average chip temperature. A lesser value of standard deviation in cell temperature implies an improved even temperature distribution throughout the chip. On the basis of the thermal maps and the experimental data in Table 1, the following observations have been made. (a) Placement-1 has the highest peak temperature, temperature gradient, and standard deviation in cell temperature. It is the worst placement and hence experimental data agree well with the MSP interpretation. (b) Placement-2 gives an improvement of about 9.22% in peak temperature, 53.4% in temperature gradient, and 59.2% in the standard deviation in cell temperature over Placement-1. It is a much better solution than Placement-1 and hence again experimental data agree well with the MSP interpretation. (c) Placement-3 gives an improvement of about 15.52% in peak temperature, 76.82% in temperature gradient, and 75.37% in the standard deviation in cell temperature over Placement-1. This makes Placement-3 even a better solution than Placement-2. (d) Placement-4 gives an improvement of about 15.29 % in peak temperature, 75.08% in temperature gradient, and 75.3% in the standard deviation in cell temperature over Placement-1. Hence Placement-4 is also a better solution than Placement-2. (e) Finally, Placement-5 gives an improvement of about 15.85 % in peak temperature, 84.78% in temperature gradient, and 87.48% in standard deviation in cell temperature over Placement-1 which makes Placement-5 the best placement. The experimental observations (c), (d), and (e) are unexplainable by the MSP interpretation.
To find an answer to the experimental observations, let us focus on the thermo-resistive IC package model presented in   [22] and [23] where every heat-generating point on the die is surrounded by a network of resistive heat flow paths which transport the dissipated heat away from the heat-generating points. It can be observed from the model that the resis-tive heat flow paths decrease as the heat-generating point approaches closer to the die boundary. For instance, in case of a heat-generating point on the die periphery the lateral heat flow paths facing the die boundary, get blocked. The 216368 VOLUME 8, 2020  decrease in heat flow paths results in the accumulation of the dissipated heat thereby leading to a rise in temperature of the heat-generating point. Hence the die boundary behaves adiabatically to the heat flux generated inside the chip. The high power cells C 21 , C 22 , and C 23 are the critical players in producing hotspots. In Placement-1, since all the high power cells are sitting together and very close to the adiabatic die wall, the submatrix sum and the peak on-chip temperature is very high. The uneven power distribution on account of the random cell placement resulted in the high gradient and high standard deviation in cell temperature. In placement-2 since the distribution of power is even, the submatrix sum is very less and hence the peak temperature, temperature gradient and the standard deviation in cell temperature are lesser than Placement-1. But in Placement-2, since the critical cell C 23 is at the corner of the die periphery having its lateral heat flow paths facing the adjacent north and west die boundaries blocked, the peak temperature is much higher than Placement-3, 4, and 5. On the contrary, since the critical cell, C 23 is relatively far away from the die boundary and also the power distribution is even, the peak temperature, temperature gradient and the standard deviation in cell temperature is better in Placement-3, 4, and 5. Moreover, in Placement-4 and 5, the number of hotspots is lesser than in Placement-3 since the other critical cells C 21 and C 22 are relatively far away from the die periphery. On account of the proposed UPS technique, Placement-5 gives an even better power distribution than Placement-3 and 4 (constructed by the algorithm A1 scheme) since UPS allows the lowest-in-order power cell within each submatrix region to share the maximum cell boundary of the highest power cell thereby facilitating improved diffusion of heat.
To further assess the effectiveness of the proposed placement technique, we have manually optimized the placements for peak submatrix sum and also for peak on-chip temperature. Placement-6 in Fig. 5a has been optimized manually for the minimum peak submatrix sum (1.5W) with t = 2, and it is the best placement according to the definition of MSP. However, from the thermal maps ( Fig. 4e and Fig. 6a) and the experimental data in Table1, it has been observed that Placement-5 (solution of proposed algorithm TAMPO) despite having a comparatively higher peak submatrix sum of power, gives an improvement of 4.65% in peak temperature, 53.7% in temperature gradient and 57.2% in the standard deviation in cell temperature over Placement-6. The solution of the proposed algorithm TAMPO (Placement-5) is superior to the manually constructed solution (Placement-6) optimized for the peak submatrix sum of power. Hence again it has been observed that minimization of the peak submatrix sum (of power or heat) alone doesn't lead to a thermalaware solution. Now, Placement-7 in Fig. 5b has been manually optimized for the minimum peak temperature. From the thermal maps ( Fig. 4e and Fig. 6b) and the experimental data in Table1, it has been observed that Placement-7 gives an improvement of 1.42% in peak temperature, 27.41% in temperature gradient, and 4.23% in the standard deviation in cell temperature over Placement-5 (solution of proposed algorithm TAMPO). Hence the manually constructed temperature-optimized solution (Placement-7) manages to obtain a little improvement over the solution of the proposed algorithm TAMPO (Placement-5). However, manual optimization is time-consuming and possible for placements with a small number of cells whereas algorithm TAMPO is capable of handling placements with a large number of cells and also gives fast and efficient thermal-aware optimization.
Hence we find that the Matrix Synthesis Problem (MSP) is not adequate to ensure a thermal aware placement since it attributes the degree of hotness only to the peak (t x t) submatrix sum of heat. Therefore we have modified the concept of MSP by addressing the relative position of a sub-matrix (thermal zone) from the adiabatic die boundaries along with the submatrix sum of power for computing the thermal metric of placement and also by enhancing the heat diffusion through the Updated Placement Scheme (UPS).

IV. GATE ARRAY PACKER (GAP)
To emulate the situation of handling a gate array mapped circuit we have constructed a very basic gate array mapping algorithm called the Gate Array Packer (GAP) as shown in Fig. 7. The input to the GAP is the logic circuit composed of basic gates and flipflops and the output is the same circuit composed of a set of rectangular cells of equal number of transistors and identical dimensions and these cells act as input to the proposed thermal aware matrix placement algorithm.

A. ASSUMPTIONS OF GAP
A gate array is made of an array of transistors wherein the fundamental building block is a basic cell enveloping equal 'k' number of nMOS and pMOS transistors. The algorithm further considers 'Z' number of contagious basic cells to form a cluster such that every cluster has equal 'kZ' number of nMOS and pMOS and hence a total transistor count of '2kZ'. The algorithm packs the logic circuit with an objective to maximize the area utilization or minimize the cluster resource without generating any new inter-cluster connection other than the original nets. As a consequence every cluster accommodates only an integral number of functional blocks and the portion of the circuit which gets mapped to a particular cluster remains stationary in it for later stages of the placement process.

B. DEFINITION OF TERMS RELATED TO GAP
''Queue list'': It is a set {B g } wherein an element at the g th position is a functional block B g which has higher or equal number of transistors T g than its successor sitting at (g+1) th position. The queue list is obtained from the set {B q } of N 0 number of functional blocks of the logic circuit, by arranging the blocks B q in descending order of total transistor count T q .
''Queue count'': It is a counter 'g' to denote the position of functional block to be selected next from the queue list {B g }.
''Cluster count'': It is a counter 'i' to indicate the next fresh cluster C i to be selected for allocating the functional blocks.
''Utilized transistor count'': It denotes two counters (e i , h i ), where e i is the number of nMOS and h i the number of pMOS within a cluster C i , which has already been utilized for mapping a functional block. Initially when no functional block has been mapped to a cluster, e i = h i = 0 and it is called a fresh cluster. If 0 < e i < kZ or 0 < h i < kZ or both, the cluster is partially utilized. If e i = kZ and h i = kZ, the cluster is fully utilized.

C. PROCESS
The algorithm first constructs the Queue list and selects the functional blocks according to the serial number (i.e. Queue count) in the list and packs them into the clusters. The basic idea of GAP is to pack the ''largest block first'', so that the smaller blocks can, later on, be fitted within the leftover spaces of the partially utilized clusters. During the packing process, the algorithm first scans for the partially utilized clusters and maps a functional block to such a cluster only if the unutilized transistors in it are sufficient to accommodate the total transistors of the functional block. Otherwise, a fresh cluster is allocated to the functional block. The clusters finally utilized in realizing the complete logic circuit are termed as ''functional clusters'' and they play the role of the set of rectangular cells {C i } to be placed by the proposed thermal aware placement algorithm. The output of GAP also includes a set of inter-cluster nets connecting the functional clusters.
In our experimental work we have considered: (a) Every cluster to be composed of Z = 10 basic cells and every basic cell to be made up of equal k = 4 number of pMOS and nMOS transistors, (b) Cluster height H cell = 0.0002 m and cluster width W cell = 0.0004 m, (c) Every D-flipflop in the circuit is composed of 3 NOT gates and 2 nMOS transistors (as per the description in ISCAS89 circuits [29]). Further realizing the NOT gates in CMOS logic, a D-flipflop has been FIGURE 5. Manual synthesis of placement matrix for the test case circuit modules with row = column = 6 and t = 2. (a) Placement-6 is the best placement according to the MSP philosophy and obtained by manual design. (b) Placement-7 is an optimal solution for peak temperature obtained by manual design. The Dotted red and blue envelopes denote the regions with peak submatrix sum for t = 2 and t = 3 respectively. finally realized with 5 nMOS and 3 pMOS transistors. The rest of the gates in the circuit have been realized in CMOS logic. An illustration of the mapping process has been shown in Fig. 8 where a test circuit containing 9 functional blocks is mapped to clusters C 1 and C 2 each composed of Z = 4 basic cells and each basic cell is composed of equal k = 4 number of pMOS and nMOS transistors. The functional blocks in the test circuit in Fig. 8a have been numbered according to the Queue list and mapped to the clusters in Fig. 8b according to the serial number 'SL' equals to the Queue count 'g'.

V. INITIAL MATRIX GENERATOR (IMaGe)
We propose a technique called the Initial Matrix Generator (IMaGe) for constructing improved initial placement solutions by modifying the placement scheme of the Simple Approximation algorithm [5]. The functional clusters C i generated by the Gate Array Packer (GAP) are treated by IMaGe as the rectangular cells to be placed in the initial matrix placement. Hereafter the functional clusters have been termed as functional cells and the unused (dummy) clusters as dummy cells. The inputs to the algorithm IMaGe are the set of N 0 functional cells (clusters) {C i }, set of power dissipation {p i } of the corresponding cells (clusters), height H cell and width W cell of each cell (cluster), maximum bound of aspect ratio 'r' of the die. The initial placement solution generated by IMaGe is further optimized by the Simulated Annealing (SA) engine within the proposed thermal aware placement algorithm to obtain the final solution.

A. ORDER OF PLACEMENT MATRIX
The number of rows and columns of the placement matrix are determined according to Step2 and Step3 of the algorithm shown in Fig. 9. The algorithm IMaGe generates ''square matrix'' as well as ''minimum-cell matrix'' initial placement solution as follows. A square matrix realizes the placement with the minimum equal number of rows and columns. In square matrix placement the aspect ratio of the die takes a default value as shown, A minimum-cell matrix placement realizes the placement matrix with the minimum number of cells while maintaining an aspect ratio within the defined upper bound r and the lower bound 1/r. In this case the minimum integral values of VOLUME 8, 2020 For the case of minimum-cell matrix placement, in our work we have considered r = 2, thus allowing the aspect ratio of the die to vary between 0.5 and 2.

B. CELL GROUPING
A set of total cells = {E i } composed of the functional cells {C i } and dummy cells {D i } has been constructed and further arranged in the descending order of power dissipation as shown in Step5 and Step6 of Fig. 9. Further the cells have been divided among the four cell-groups G k = {L k }, 0 ≤ k ≤ 3 as shown in Step7 of Fig. 9. An example of cell grouping has also been shown in Fig. 2 of section III. The variables Q 0 , Q 1 , Q 2 , Q 3 shown in Fig. 9 denote the number of cells in the corresponding cell-groups G 0 , G 1 , G 2 , G 3 . The variables Q k have been computed with the consideration that height of cell H cell is lesser than or equal to its width W cell and the L 1 , L 2 , L 3 , L 4 cells being placed in every (t x t) submatrices in accordance with the Updated Placement Scheme (UPS) as illustrated in Fig. 3e of section III. Referring to the algorithm in Fig. 9, in case when the height of cell is greater than its width, the number of cells Q 2 (of G 2 cell group) and Q 3 (of G 3 cell group) have to be exchanged and also the positions of L 2 and L 3 cells in every (t x t) submatrix have to be swapped. The proposed UPS technique for cell placement has been discussed as follows.

C. UPDATED PLACEMENT SCHEME (UPS)
For two adjoining cells in a chip floor sharing a boundary of length L, having power densities d i and d j respectively, the heat diffusion H between them according to [9] and [10] is given by, Since in our work all the cells have equal area, the power densities d i and d j has been replaced by the power of cells p i and p j respectively and (3) has been modified as, Similarly the total heat diffusion H T of a cell with all its neighbors given in [10] can be modified as, The increase in the total heat diffusion H T from a high power cell will help in lowering its temperature. Now consider the placement scheme of the Simple Approximation algorithm or algorithm A1 as shown in Fig. 3b, Fig. 3c and Fig. 3d.
Here inside a (t x t) submatrix region the highest power cell L 0 shares its smaller edge (height) with L 1 (the second highest power cell), its larger edge (width) with L 2 (the third highest power cell) and no boundary with the lowest power cell L 3 . Hence according to (4) and (5) there will be poor diffusion of heat from L 0 in this placement scheme. Hence we modify the placement scheme of algorithm A1 and propose the Updated Placement Scheme (UPS) where inside a (t x t) submatrix region, L 0 shares its larger edge with L 3 , its smaller edge with L 2 and no boundary with L 1 as shown in Fig. 3e of section III. This placement scheme facilitates an improved diffusion of the heat from the critically high power cells belonging to the G 0 = {L 0 } cell group and also gives an improved balance in power distribution inside every (t x t) submatrix regions.

VI. PROPOSED THERMAL MODEL
We propose a thermal model that avoids the expensive computational budget for the exact temperature estimation and adopts a stochastic method for finding an assumption of the degree of hotness of a partial placement solution generated during the optimization process. The design assumptions held by the proposed thermal model are as follows.  Fig. 3e has been shown in Fig. 10 below. The designated critically hot cells in the placement (Fig. 3e) are C 21 , C 22 , C 23 and any thermal zone encompassing a critically hot cell is a critical thermal zone. In the center of heat map (Fig. 10), few critical thermal zones Z 1 , Z 2 and Z 3 with t = 2 have been shown containing their centers of heat positioned at the geometrical centers of the respective critically hot cells. For every center of heat in the map, ξ denotes it summative power and (δ x , δ y ) denote its nearest distance from the adiabatic die boundary.

D. HEAT COMPONENTS
The thermal model characterizes every critical thermal zone by affixing with it two thermal constituents, viz. the summative heat component and the reflective heat component described as follows.

1) SUMMATIVE HEAT COMPONENT
It accounts for the heating effect in a thermal zone due to the heat collectively generated by all the cells present within it. The model quantifies the summative heat component ξ k for the k th critical thermal zone as the aggregate power (like the VOLUME 8, 2020 aggregate heat in [5]) as follows.
Here p i indicates the power dissipated by each cell C i present in the (t x t) submatrix region. The model further attributes the aggregate power ξ k as the power dissipation of the center of heat of the k th critical thermal zone. With the increase in the summative heat component of a thermal zone, its degree of hotness also increases.

2) REFLECTIVE HEAT COMPONENT
It accounts for the influence on the heating effect in a critical thermal zone due to the proximity of the adiabatic die walls. The model assumes that the heat dissipated by a center of heat travels up to the nearest die walls along the orthogonal x and y directions and the die walls being adiabatic again reflect the incident heat back to the source. The reflected heat influx results in heat accumulation at the center of heat and subsequently increases its temperature. For simplicity, the model considers that rate of heat generation or power dissipation from a center of heat = rate of heat incidence on the die wall = rate of heat reflection from the die wall = rate of heat accumulation at the center of heat. Now, according to the Fourier heat flow equation the rate of heat flow Q between two points δ distance apart, normal to the cross-sectional area A, having thermal conductivity K and temperature difference θ is given by, According to [10], the temperature difference θ between two heat exchanging points is directly proportional to their power density difference d. Since the thermal zones have identical dimensions, the associated surface area on the die and cross sectional area A perpendicular to the die are constant. Since power density = power / surface area, and area being a constant parameter, the power density difference d between thermal zones can be substituted with their power difference p. As θ ∝ d and K is constant, it follows from (7) that, since d ∝ p and A is a constant. Now considering one of the heat exchanging points to be the center of heat (having power ξ k ) of the k th critical thermal zone and the other a heat reflecting point (having power p j = 0) on the die boundary wall δ distance apart, the power difference P = ξ k -p j = ξ k -0 = ξ k . Hence the rate of heat flow Q k from the center of heat of the k th critical zone to the die wall according to (8) is given by, The heat gets reflected from the die wall and accumulates at the center of heat at a rate of Q k . The model computes the total reflective heat component ô k as the aggregate rate of heat accumulation at a center of heat along the x and y directions as,

E. SATURATION THERMAL ZONE
The model defines an imaginary thermal zone (submatrix) having the maximum possible heat ξ sat by considering all of its t 2 cells (clusters) to have the maximum power, max {p i }. Hence the power dissipation of the center of heat of the saturation zone is given according to (6) by, Similarly, the maximum reflective heat accumulation for the saturation thermal zone occurs when its center of heat is positioned at a minimum distance δ min from the die walls and the associated maximum reflective heat accumulation rate Q sat according to (9) is given by, Along x direction, δ min = W cell /2 and along y direction, δ min = H cell /2. Hence the maximum possible reflective heat component in the center of heat of the saturation thermal zone upon reflections along the x and y directions according to (10) and (12) is given by,

F. CRITICAL THERMAL METRIC
The model redefines the thermal metric µ t described in [5] by characterizing every critical thermal zone of the placement matrix with the critical thermal metric function µ CTM given by, Parameters λ 1 and λ 2 are the weights specifying the relative importance of the normalized heat components in defining the thermal metric. In our experimental work we have obtained good results by configuring λ 1 = 1 and λ 2 = 0.2. The target of the proposed thermal aware placement algorithm is to minimize the critical thermal metric µ CTM such that the peak on-chip temperature is minimized as well as the power dissipation and temperature is evenly distributed in the entire placement.

VII. PROPOSED THERMAL AWARE MATRIX PLACEMENT OPTIMIZER (TAMPO) ALGO RITHM
The proposed Thermal Aware Matrix Placement Optimizer (TAMPO) algorithm as described in Fig. 11 integrates the Initial Matrix Generator (IMaGe), the proposed thermal model, and the Simulated Annealing (SA) heuristics [24], [25] to optimize and finally obtain the thermal aware placement VOLUME 8, 2020 solution from the global optimum in the solution space. The important design aspects of the proposed algorithm TAMPO are as follows.

A. PROBLEM DEFINITION
Given a set of rectangular cells {C i } with an associated set of power dissipation {p i } and identical cell dimensions, the objective of the proposed algorithm is to distribute the cells in a matrix arrangement such that the temperature is evenly distributed and the peak on-chip temperature of the arrangement is minimized, subjected to the satisfaction of certain shape constraints. Shape constraints: For square matrix placement the aspect ratio of the chip equals to the aspect ratio of a rectangular cell C i . For a minimum-cell matrix placement with a defined upper bound r the aspect ratio of the chip varies between 1/r and r.

B. PLACEMENT ENCODING
The algorithm TAMPO reads a geometrical placement as shown in Fig. 3e

C. PERTURBATION FUNCTION
It is the function Perturb () defined by algorithm TAMPO which randomly chooses any one of the following mechanisms to operate on an existing placement solution and generate a new partial placement solution. (a) Swapping cells between identical cell-groups G k in two randomly selected (t x t) submatrices. (b) Swapping of two randomly selected rows containing cells belonging to the identical cell-groups G k . (c) Swapping of two randomly selected columns containing cells belonging to the identical cellgroups G k . The existing or the present solution is denoted by Present_matrix and the Perturb () generated new solution by New_matrix.

D. COST FUNCTION
The algorithm TAMPO defines a function Cost () to evaluate the fitness of a placement solution as follows.
Cost = Thermal metric of new placement Thermal metric of initial placement × 1000 (15) The thermal metric is the µ CTM defined in (14). Fitness of a solution is the reciprocal of its cost and hence more the cost value, lesser is its fitness. The thermal metrics of the Simple Approximation based, the Hotspot tool based and the proposed TAMPO placement algorithms differ from each other in order of magnitude and dimension. The thermal metric has been normalized in (15) so that the cost metric bears the same order of value in all the three placement algorithms and the same simulated annealing engine may be implemented with identical optimization parameters. Moreover, since the normalized cost is a small fractional number, it has been up scaled 1000 times to help set the other simulation parameters effectively and obtain a better optimization. The difference in cost of the new solution New_matrix and the present solution Present_matrix is given by, If h < 0, the new solution is superior and if h > 0 the new solution is inferior than the present solution.

E. METROPOLIS ACCEPTANCE CRITERIA
This is a probability based criteria [24], [25] for accepting a new placement solution described as follows. (a) Incase h < 0 i.e. when the new solution is superior, the probability of acceptance is P = 1. (b) Incase h > 0 i.e. when the new solution is inferior, the probability of acceptance is, The inferior solution is accepted if the condition < P is satisfied, where is a randomly generated number varying between 0 and 1. Also the corresponding annealing temperature T can be derived from (17) as,

F. SIMULATION PARAMETERS
The list of simulation parameters and their configured values related with the Simulated Annealing based proposed thermal aware placement algorithm TAMPO has been shown in Table 2.

G. TERMINATION CRITERIA OF OPTIMIZATION PROCESS
The overall optimization process consists of several annealing cycles wherein every annealing cycle again comprises of a number of iterations. For terminating the process, following criteria have been adopted from [25].

H. WIRELENGTH AND AREA
The set of inter-cluster nets associated with the circuit is generated by the Gate Array Packer (GAP) algorithm. The total inter-cluster wirelength associated with a placement solution of the circuit is determined according to the Half Perimeter Wirelength (HPWL) model given in [25] and [26]. Also the area of the placement is computed as, Area of chip = No.of rows×No.of columns×Area of a cell (19) Experimental data in Table 3 shows the impact of the control parameters (P i , P f , ρ) variation on the performance of the SA based proposed algorithm TAMPO, denoted by the optimal cost and annealing cycles (global cycles) required for convergence. Here the results have been obtained for the s5378 circuit and it has been observed that combination COM-8 gives the best results. Other benchmark circuits show the similar trend as well. The convergence graph of   The cost function here is the normalized value of the critical thermal metric µ CTM defined in (14) and (15). The convergence of the cost function has been shown with respect to iterations defined by the number of annealing cycles (global cycles). Here the initial cost is 1000 and after the first annealing cycle the cost is 914.535. The minimum cost achieved upon optimization is 830.65. Here the convergence occurs at the 578 th and continues up to 610 th annealing cycle.

I. CO-OPTIMIZATION OF THERMAL METRIC AND WIRELENGTH
The algorithm TAMPO has been further extended to simultaneously optimize the critical thermal metric µ CTM in (14) and the half perimeter wirelength (HPWL) of placement solutions by redefining its cost function as follows.
Thermal metric of new placement Thermal metric of initial placement

Wire length of new placement Wire length of initial placement
Here W 1 and W 2 are the weights related with the relative refinement of the thermal metric µ CTM and the wirelength respectively. The weights W 1 and W 2 are defined such that 0 ≤ (W 1 , W 2 ) ≤ 1 and W 1 + W 2 = 1. This extended form of the TAMPO which uses (20) as the objective function for optimization is the 'Co-optimized TAMPO'.

VIII. REFERENCE PLACEMENT ALGORITHMS
The dimension and power specification of the cells used and the temperature of the final optimized placement solutions are not defined in previous works [5], [11]- [13]. Hence we have constructed three more placement algorithms; (i) one based on the Hotspot tool [6], [7], [28], (ii) the second based on the Simple Approximation algorithm [5] and (iii) the third based on the thermal-aware placement algorithm [11] for the purpose of performance validation of the proposed algorithms TAMPO and Co-optimized TAMPO.
The Hotspot tool based placement algorithm generates a random initial square matrix placement and optimizes the solution using the Simulated Annealing (SA) heuristics similar to that implemented by TAMPO but with the following changes. (a) The perturbation function considers a random swap of cells or a random swap of matrix rows or a random swap of matrix columns. (b) The thermal metric is the peak temperature estimated by the Hotspot tool. (c) On account of the extremely high run time incurred due to exact temperature estimation, the number of iterations in an annealing cycle has been limited to 50 for all the circuits and the total annealing cycles have been limited to 100 for s5378, s9234, s13207, s15850 and to 10 for s38417 and s38584. The parameters of the Hotspot tool [28] and their configured values utilized in this paper have been specified in Table 4. The Simple approximation based placement algorithm generates the initial square matrix solution according to algorithm A1 in [5] and optimizes the solution using the same Simulated Annealing (SA) heuristic with identical simulation parameters implemented by the TAMPO. The cost function here is the thermal metric defined as the maximum (t x t) submatrix sum of power (instead of heat) in the placement matrix i.e. µ t as described by algorithm A1 in [5]. Since this algorithm is built on the same SA heuristics, parameter variations show a similar trend of performance impact as that of the TAMPO. Hence the same parameter combination COM-8 (mentioned in Table 3) has been maintained. The convergence curve of optimization process obtained from the Simple approximation based placement algorithm for the s5378 benchmark circuit has been shown in Fig 13. Here the cost function is the normalized value of the thermal metric µ t according to (15). The convergence of the cost function has been shown with respect to iterations defined by the number of annealing cycles (global cycles). The initial cost here is 1000 and after the first annealing cycle, the cost is 982.652. The minimum cost achieved by optimization is 905.566. Here the convergence occurs at the 553 rd and continues up to the 610 th annealing cycle.
The thermal-aware placement algorithm [11] (refer Fig. 2 of [11]) which targets to minimize the hotspots, distributes the functional blocks or modules in square matrix arrangement, and optimizes the maximum aggregate of power dissipation or the critical threshold function in thermal regions denoted by the (t x t) submatrices. It initially generates a random placement and further improves it by two levels of iteration viz. outer loop and inner loop. Every outer loop identifies a local optimum through the different inner loop operations and finally, the global best is obtained from all such local optimum. Perturbations: (a) For every inner loop a new solution is generated by swapping the highest power In the case of placement algorithm [11], we have experimented with different combinations of the control parameters (∈, max_ascent) to test the impact on the performance denoted by the optimal cost and the outer iterations (global cycles) required to converge. Also in order to ensure equal maximum iteration limits in the outer and inner cycles, similar to TAMPO, here we have configured outer_iter = 1000 (which is equal to global_cycle of TAMPO) and the inner_iter = 2E T (which is equal to the local_steps of TAMPO). Table 5 shows the results obtained for the s5378 benchmark circuit with placement algorithm [11] and it has been observed that combination COM-7 gives the best outcome. Similar trend has been observed for other benchmark circuits as well.
The convergence curve for the optimization process of the thermal-aware placement algorithm [11] has been shown in Fig.14. Here the cost is the critical threshold function defined as the maximum (t x t) submatrix sum of power dissipation in a matrix placement. The convergence of the cost function has been shown with respect to the number of outer iterations (global cycles). Here the initial cost is 0.642687 W and after the 1 st outer iteration, the cost is 0.420266 W. The minimum cost achieved after the optimization is 0.346629 W. Here the VOLUME 8, 2020  [11] experimented on the s5378 benchmark circuit in square matrix placement.

FIGURE 14.
Convergence curve of the thermal-aware placement algorithm [11] experimented on the s5378 benchmark circuit.
convergence occurs at the 883 rd and continues up to the 999 th outer iteration.

IX. EXPERIMENTAL RESULTS AND ANALYSIS
The proposed algorithms TAMPO and the Co-optimized TAMPO have been experimented on the ISCAS89 [29] benchmark circuits listed in Table 6. The reference placement algorithms based on (i) Hotspot tool [28], (ii) Simple Approximation algorithm [5], and (iii) thermal-aware placement algorithm [11], have also been implemented on the benchmark circuits. A power model is required to estimate the power of the clusters (rectangular cells) generated by the Gate Array Packer (GAP) algorithm. However, the heat generated by cells in the placement matrix has been generated randomly in [5]. We have also generated the power of the rectangular cells (clusters) according to [27] varying randomly between 4.06 x10 6 W/m 2 to 0.22 x 10 6 W/m 2 for 90 nm processor to test the effectiveness of the placement algorithms. The placement algorithms have been designed in C language and experiments have been conducted in a Linux system running on a 3GHz Intel Core i5 processor. Hotspot tool has been used to find the exact temperature of the final optimized solutions with the configuration mentioned in Table 4 of section VIII. Table 6 shows the average power dissipation and other particulars of the benchmark circuits in original and post gate array map. Moreover, the functional clusters obtained in Table 6 are the functional cells mentioned in Table 7 and Table 10. Table 7 shows the attributes of the benchmark circuits, common to the square matrix solutions given in Table 8, Table 9 and also in Table 11.   Table 8, Table 9 and Table 11.  Table 8 presents the attributes of the optimized square matrix placements achieved by the Hotspot tool based placement algorithm. Similar to [5], in our work also the Simple Approximation algorithm has been implemented for generating square matrix placements with t = 2. Table 9 demonstrates the particulars of optimized square matrix solutions achieved by (a) the Simple Approximation based, (b) thermalaware placement algorithm [11] based, and (c) the proposed TAMPO placement algorithms with t = 2. Table 10 depicts the attributes of the optimized minimum-cell matrix solutions attained by the proposed placement algorithm TAMPO with t = 2. The half perimeter wirelength (HPWL) reported here relates to the inter-cluster nets generated by the algorithm GAP .  TABLE 9. Particulars of the optimized square matrix placement solutions obtained from the Simple Approximation based placement algorithm, placement based on thermal-aware algorithm [11], and the proposed TAMPO algorithm.  In this work the thermal quality of placement has been assessed on the basis of the peak temperature 'Peak Temp.', temperature gradient 'Temp. Grad.', and the standard deviation in cell temperature 'Std. Dev. Temp.'. The peak (t x t) submatrix sum has been denoted by 'Max. Zonal Sum'. From Table 8 and Table 9 it can be observed that the Hotspot tool based placement algorithm gives improved thermal results than the placement algorithms based on Simple Approximation and [11]. However, both the placement algorithms based on Simple Approximation and [11] give an enormous 99% (approximate) improvement in execution time over the Hotspot tool based algorithm, thereby FIGURE 16. Average percentage improvement obtained from (a) the square matrix solutions of TAMPO over the square matrix solutions of Thermal-aware placement algorithm [11], (b) the minimum-cell matrix solutions of TAMPO over the square matrix solutions of thermal-aware placement algorithm [11]. The temperature is in • C.

TABLE 11.
Attributes of the optimized square matrix and minimum-cell matrix placement solutions obtained from the proposed Co-optimized TAMPO algorithm with equal weightage (W 1 = 0.5, W 2 = 0.5) for thermal metric (µ CTM ) and the wirelength (HPWL) refinement.
stressing the importance of adopting alternative algorithms like the Simple Approximation and [11] for designing placement algorithms. From Table 8 and Table 9 it can also be observed that the proposed algorithm TAMPO gives almost equivalent thermally good solutions as the Hotspot tool based algorithm while (TAMPO) giving an enormous 99% improvement in the execution time over the Hotspot tool based placement algorithm. From Table 8 and Table 9 it can be observed that the average temperature (Avg. Temp.) of the square matrix solutions of the corresponding circuits obtained from the placement algorithms is almost the same.
From the results in Table 9, an analysis has been done in Fig. 15a between the square matrix solutions of both the Simple Approximation based placement algorithm and the TAMPO. In this case, the placement algorithm based on Simple Approximation gives an average improvement of 2.38% in peak submatrix sum over TAMPO. However, TAMPO gives an average improvement of 7.07% peak temperature, 58.73% temperature gradient, 46.23% standard deviation in cell temperature compared to the former. TAMPO incurs an average overhead of 4.02% in execution time and a marginal 0.028% average overhead in wirelength (HPWL) with respect to the Simple Approximation based placement algorithm. Again utilizing the results in Table 9 and Table 10 an analysis has been made in Fig. 15b between the square matrix solutions of the Simple Approximation based placement algorithm and the minimum-cell matrix solutions of the TAMPO. In this case, the placement algorithm based on Simple Approximation gives an improvement of 5.26% in the peak submatrix sum. But TAMPO gives an average improvement of 5.14% peak temperature, 52.68 % temperature gradient, and 37.05% standard deviation in cell temperature with respect to the former. TAMPO also gives an average improvement of 4% dummy cells, 4.58% area, 4.65% half perimeter wirelength (HPWL), and 1.03% execution time over the Simple Approximation based placement algorithm. However, the solutions generated by TAMPO have a slightly higher average temperature compared to the former counterpart.  Table 9, an analysis has been done in Fig. 16a between the square matrix solutions of both the thermal-aware placement algorithm [11] and the TAMPO. In this case, the thermal-aware placement algorithm [11] gives an average improvement of 1.27% in the peak submatrix sum or the critical threshold over the TAMPO. On the contrary, TAMPO gives an average improvement of 6.04% peak temperature, 53.79% temperature gradient, 34.47% standard deviation in cell temperature compared to the former. TAMPO gives a marginal 0.05% average improvement in half perimeter wirelength (HPWL) over the thermal-aware placement algorithm [11]. However, TAMPO incurs an average overhead of 38.86% in execution time with respect to the former counterpart. Again from the results in Table 9 and Table 10, an analy-sis has been made in Fig. 16b between the square matrix solutions of the thermal-aware placement algorithm [11] and the minimum-cell matrix solutions of the proposed algorithm TAMPO. In this case, the thermal-aware placement algorithm [11] gives an average improvement of 4.23% in the peak submatrix sum or the critical threshold over the TAMPO. Whereas, TAMPO gives an average improvement of 4.09% peak temperature, 47.09% temperature gradient, 24.26% standard deviation in cell temperature compared to the former. TAMPO also gives an average improvement of 4% dummy cells and 4.58% area, 4.71% half perimeter wirelength (HPWL) over the thermal-aware placement algorithm [11]. However, TAMPO incurs an average overhead of 32.1% in execution time with respect to the former counterpart. Table 11 shows the results of the solutions obtained from the proposed Co-optimized TAMPO (discussed in section VII H) where equal weightage (W 1 = 0.5, W 2 = 0.5) has been given for the optimization of the thermal metric and wirelength (HPWL). A comparative analysis has been done between the results of the TAMPO (thermal aware only) and Co-optimized TAMPO placement algorithms given in Table 9 and Table 11 respectively, relating to the square matrix as well as the minimum-cell matrix solutions. In the case of square matrix solutions, Co-optimized TAMPO gives an average improvement of 34.31% wire length (HPWL) over TAMPO. However, Co-optimized TAMPO incurs an average overhead of 2.46% peak temperature, 52.35% temperature gradient, and 86.26% standard deviation in cell temperature with respect to TAMPO. In this case, the execution time of TAMPO is on average 95.97% faster than Co-optimized TAMPO. In the case of the minimum-cell matrix solutions, Co-optimized TAMPO gives an average improvement of 32.93% wire length (HPWL) over TAMPO. But again Co-optimized TAMPO costs an average overhead of 1.97% peak temperature, 38.62% temperature gradient, and 41.58% standard deviation in cell temperature in comparison to TAMPO. In this case, the execution time of TAMPO is on average 95.78% faster than Co-optimized TAMPO. The excess runtime overhead in Co-optimized TAMPO is due to the additional wirelength (HPWL) estimation task of partial placement in every iteration step of the optimization process. Since TAMPO only tries to optimize the thermal metric, it gives thermally superior solutions in comparison to Cooptimized TAMPO where 50% of weightage has been given for the thermal improvement. However, the Co-optimized TAMPO gives a substantial improvement in the wirelength and still gives improved thermally aware solutions in comparison to the placement algorithms based on Simple Approximation [5] and the thermal-aware placement algorithm [11].

Now from the results in
The experimentally generated thermal profile showing the temperature distribution of the different optimized matrix placement solutions of the s38417 circuit, synthesized with the reference placement algorithms as well as the proposed placement algorithms have been shown in Fig. 17. Hence the placement algorithms based on [5] and [11] give more improvement in the peak submatrix sum of power or the critical threshold value over the proposed algorithm TAMPO; however, TAMPO gives better thermal aware solutions. Also, the proposed algorithm Co-optimized TAMPO gives solutions that are improved in peak temperature, temperature gradient, and wirelength with respect to the placement algorithms based on [5] and [11].

X. CONCLUSION
In this paper, the proposed Thermal Aware Matrix Placement Optimizer (TAMPO) has proved to be an efficient framework for generating improved thermal aware matrix placements of gate arrays. Experimental results suggest that TAMPO has successfully improved the philosophy of the Matrix Synthesis Problem (MSP) to generate solutions that are thermally supe-rior in terms of peak temperature, temperature gradient, and the standard deviation in cell temperature with respect to the existing methodologies of Simple Approximation and [11]. Work done in this paper quantifies the thermal improvements in terms of temperature which the previous works on thermal aware matrix placement lack. Since the framework avoids the expensive overhead of exact temperature estimation during the placement synthesis, it is fast and hence maintaining almost the same quality of solutions, it gives significant run time improvement over the Hotspot tool based placement algorithm. Experimental results also show that the extended version of TAMPO, termed as the Co-optimized TAMPO gives efficient thermal aware placement of cells while maintaining a reasonable reduction in the wirelength as well. As a future scope, the work can be extended for incorporating the metrics like routing congestion and interconnect delay along with the thermal metrics of a chip in a multi-objective optimization problem. It can also be extended for the placement problem of 3D ICs.