A Genetic Algorithm-Based Metaheuristic Approach for Test Cost Optimization of 3D SIC

Demand for small, multi-functional, high performance electronic product with less power consumption is increasing rapidly. To meet the demand, IC design has been shifted from two dimensional integrated circuit (2D-IC) to three dimensional integrated circuit (3D-IC), where multiple device layers are stacked together to create stacked integrated circuit (SIC). This results the complexity in 3D SIC architecture and increase in the number of fault-sites. Therefore, testing of SIC has become complicated. Consequently, the test data volume also grows in proportion to the number of cores in the SIC, since each core is associated with one or more tests, which leads to longer test times. Test cost of IC which depends on test time, associated hardware to test the cores and the power dissipated at the time of test, can be represented as a weighted sum of test time and the associated hardware to test the core with power considered as the test constraint. As a result an efficient test plan is required to co-optimize test time and hardware under certain power constraint. The objective of our work is to design an efficient test plan both for non-stacked IC (i.e. SIC with single chip) and 3D stacked IC (i.e. SIC with multiple chips) under a power constraint, where each chip is provided with IEEE 1149.1 architecture. An existing cost model is used for calculating the test cost. Initially we propose First fit based two dimensional (2D) Bin Packing optimization algorithm for minimizing the test cost of non-stacked IC. However, the method produces sub-optimal result in comparison to earlier reported work. Knowing the complexity of 3D SIC, Genetic algorithm based metaheuristic approach is next proposed in this paper. It is applied on several ITC02 benchmark circuits and the experimental result shows the efficacy of the proposed algorithm in comparison to earlier works.


I. INTRODUCTION
Over the years, integrated circuit (IC) design has become important among researchers and industrial people, since it achieves high functionality and performance with less power consumption. To achieve high functionality and performance IC design has become complex and interconnect has become a major source of circuit delay and power consumption. To reduce circuit delay and power consumption three dimensional integrated circuit (3D IC) (i.e. several layers are stacked together and some horizontal interconnect wires are replaced by vertical connection) is introduced.
Exploring the use of the vertical dimension (3D) on both memory and logic [1], a wide range of applications of 3D IC on commercial products are seen in the market. Applying 3D The associate editor coordinating the review of this manuscript and approving it for publication was Dušan Grujić . technology, memories have already been successfully manufactured and commercialized [2]- [4]. A field-programmable gate array routing switch using a 4-tier Monolithic 3D IC was designed by researchers from Stanford University in 2014 [5]. In 2015, researchers from the National NanoDevice Laboratory of Taiwan built a 6T SRAM cell in a 2-tier Monolithic 3-D IC using 50-nm transistors and 20-nm channel [6]. Again, SONY announced the world's first stacked chip CIS camera system [7]. Even 3D IC provides the suitable framework for mobile applications which require faster response, small form factors, high data bandwidth and processing power [8]. 3D IC is considered as the desirable architecture to support a neuromorphic computing system (based on neurobiological architecture) which requires a highly parallel and connected environment [9]. To improve the design of neuromorphic computing system in 3D IC, a 3D floorplan based framework is proposed in [10]. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Various integration technologies of 3D IC (i.e. multiple cores are integrated in a single chip or System-on-Chip (SOC), multi-chip modules (MCMs) (Marinissen and Zorian [11]) in which chips are placed laterally, system in packages (SiPs) in which chips are stacked vertically by bonding wires, stacked integrated chip (SIC) [11]- [14] in which chips are stacked vertically and connected by interconnects known as Through-Silicon-Vias(TSVs)) have helped to achieve more functionality with increased performance, lower power and reduced production cost. All these integration technologies have their respective pros and cons. For example, unlike SoC, MCM, SiP and SIC provide heterogeneous system integration. A TSV based three dimensional stacked integrated circuit (3D-SIC) integrates multiple dies vertically for stacking, thereby creating a smaller footprint, high transistor density and providing enhanced speed, reduced wire length [20]. During the IC fabrication, there may be a chance for TSV to become defective and it must be identified to increase the performance of the system. To resolve this problem, many faulty TSV detection techniques are introduced [15]- [19].
Although the introduction of 3D-SIC has reduced the cost of production but increased the design complexity due to which ICs have become prone to defects during manufacturing. As a result it increases test cost. Generally, test flow of 2D IC, comprises of two test flows (i) Wafer Sort (performed after wafer fabrication and before assembly and packaging) and (ii) Final Test or Package Test [11] (see Figure 1). The name wafer sort indicates testing each chip to sort out known good dies (KGDs). According to Marinissen et. al [11] 3D IC consisting of n dies has 2n test flows, (i) n tests for individual dies before packaging (ii) (n − 2) tests for intermediate stacking (iii) one for the final stack (iv) one for packaging.
Here die and chip can be used interchangeably. An example with six dies are shown in Figure 2 and it is clear that total test flows is 12. test-cost optimization for 3D ICs is done by developing a cost model that takes into account various test costs at each step of the stacking process. Several studies [21]- [25] have been carried out on 3D SIC. Among these, [22] and [23] are related to yield improvement during testing. Wafer matching and layer redundancy are considered as important method for improving the yield in [22]. Again in [23], test-cost optimization for 3D ICs is done by developing a cost model that takes into account various test costs at each step of the stacking process. In [24], Manjari et. al addresses the problem of finding best possible stacking sequence of dies such that the total stacking time is minimum and a given TAM width constraint is satisfied, assuming test time and TAM width of each die are already given. Basically there are two ways to test the dies either sequentially or concurrently. In case of concurrent test dies require less test time but more TAM width whereas serial schedule requires more test time and less TAM width. Since, there are various possible ways to stack the dies sequentially or concurrently under a given TAM width, objective is to find out the best one.
Since testing of each core causes power dissipation, therefore an efficient test schedule is required to test n number of cores with different power values in a way such that the power generated during testing does not exceed certain power limit w max and the time taken to test all cores is minimized. If the objective is to reduce the power only, then cores can be tested sequentially. If the target is to reduce the test time only then the cores can be tested concurrently, obviously the testing power may exceed the w max value and can cause the damage to the IC. A common approach for reducing the test time for core-based ICs would be to perform concurrent core tests. Entirely new work on 3D SIC addressing power in IEEE 1149.1 framework is proposed in [21], [25]. Unlike [24], they considered the following integration (i) Single chip per package in which each chip can have multiple cores and (ii) Multiple chip per package in which each chip can have multiple cores. Since testing always incurs power, so power generated at the time of test is an important factor. According to Sengupta et al. [21], [25] two main contributors to the test cost of ICs are (i) test time and (ii) the design for testability (DFT) hardware. Basically, the test time of an IC is the total time taken to execute the applied test schedule that is the order in which the various logic blocks i.e. cores of an IC are tested.
While, test time can be represented as the sum of wafer test time and package test time, test cost can be formulated as the weighted sum of test time and the DFT hardware (in terms of number of test data registers (i.e. TDR)). Moreover, scheduling tests under power constraints, with the objective of minimizing test cost, is NP-hard [26].
Observing the complexity of the work, Simulated Annealing based method is applied in [25]. The main drawback of Simulated Annealing is that it takes single solution and try to enhance it to get better solutions at the cost of high computational time. In this work, our objective is to obtain an efficient test plan so that test cost is minimum. Now if we set our target to minimize the test time only number of TDRs increases, while minimizing the number of TDRs causes increase in the overall test time. Therefore, optimizing only one aspect of test cost (i.e. test time or TDR) may restrict us to obtain optimized results overall. Initially, we propose a 2D Bin Packing optimization algorithm for core-based 3D non stacked ICs to minimize the test cost. Since the method produces sub-optimal result in comparison to earlier reported work and considering the complexity of 3D SICs, we apply Genetic algorithm based metaheuristic approach to obtain near-optimal test cost.
We compare our work with an earlier work [25]. The work in [25] used IEEE 1149.1 which is also extensively used in other works. For this reason, we limit our work to IEEE 1149.1 architecture only. Also, TSV interconnect test contributes a constant term to the test time in IEEE 1149.1. Therefore, TSV interconnect test is not considered when addressing test scheduling in the research work.
The remaining part of the paper is organized as follows. Section II describes preliminaries and background of SIC and Non IC in IEEE 1149.1 JTAG test architecture and cost model. Problem formulation for both Non SIC and SIC are discussed in Section III. Proposed work is discussed in Section IV. Experimental result is discussed in Section V. Finally the paper concludes with observation in Section VI.

II. PRELIMINARIES
Among various design for testability (i.e. DFT) architectures, IEEE 1149.1 is an important scan based architectures for testing and it is commonly known as JTAG (Joint Test Access Group) architecture. In IEEE 1149.1, input and output terminals are worked as a shift register to improve the testability of IC.

A. TEST ARCHITECTURE
In this section we discuss two important design architectures of 3D IC-(i) Non stacked IC (i.e. Non SIC) and (ii) Stacked IC (i.e. SIC).

1) NON SIC
The test architecture of a Non SIC based on IEEE 1149.1 architecture is shown in Figure 3. A chip consists of three cores c 1 , c 2 and c 3 and test access point (TAP) consists of five terminals, namely Test Data Input (TDI), Test Data Output (TDO), Test Mode Select (TMS), Test Clock (TCK) and an optional Test Reset (TRST).
TAP controller can access the scan chain of each core via TDRs, again the test data is sent from TAP to cores via Test data register (TDR). If more than one core are tested in a session then cores are tested in single session and they share single TDR. In Figure 3, cores c 1 and c 2 are tested in same session, so they are tested via single TDR i.e. TDR1.  If more than one core to be tested in sequence and in different sessions, they are tested via different TDRs. According to Figure 3, c 3 is tested in different session and so an extra TDR that is TDR2 is required for testing. The basic principle of test application process during testing is discussed as follows: In 3D SIC, to test the cores, the test patterns are shifted into the TDRs through the respective TDI_down of the chips and then applied to the cores and the response is captured in TDRs. Again, the captured response is shifted out through TDO_up of the topmost chip which acts as the TDI_down of the chip above. The test response, is shifted out through the TDO_up of the topmost chip and exits each chip through the TDO_down. Each test response exits the 3D Stacked IC via the TDO_down of the lowermost chip.

B. COST MODEL
The existing test cost model [25] is considered for discussion. Considering number of terminals δ = 5 (i.e. TDI, TDO, TMS, TCK, TRST), let C (with each core having scan chain length l i , pattern p i and power w i ) number of cores are tested in any j th session then test time t j of more than one core can be calculated in equation 1, where l c and p c indicate length of scan chain and number of patterns respectively.
If c = 1 (single core is tested in any session), then equation 1 can be reduced to 2.
If C number of cores are tested in any session j, the power generated during j th session i.e. w j can be calculated as equation follows which is the sum of the power dissipated by each core tested in the j th session.
Now if there exists q number of sessions in a particular test plan then maximum power w max of that test plan is shown in equation 4 According to [25], the cost of testing can be calculated using equation 5, where α and β are weight constants specified by the designer and T and TDR are total test time and number of test data registers respectively. The constant α is set to 1 and β is set to a positive value greater than 1, such that the DFT hardware and test time become similar order of magnitude as in [25].
Since α is 1 in [25], so the equation 5 can be reduced to the following Total time T is calculated as shown in equation 7, where T w and T pt are wafer sort and package test time respectively.
However, the procedure to find out T w and T pt varies from Non SIC to SIC. Since in Non SIC single chip (with single core or multiple cores) is involved, therefore, T w and T pt are identical. So, if k number of sessions are involved in Non SIC testing, then total test time T will be as follows: The test time T w consisting of k sessions can be calculated as follows:- The test time T pt consisting of k sessions of SIC can be calculated as follows:- Most of the research work addressed the test related problems and its solution in system on chip (SOC) environment. Among various works in SIC, limited number of papers addressed the test scheduling problem under resource and power constraint on Non SIC.

C. NON SIC
The entire description of Non SIC with three cores (see Figure 3) is shown in Table 1, where description of each core is represented in the form of scan chain length l, number of patterns p and power value w. According to Table 1,  In benchmarks [34] no unit is specified for it. Now there are various ways to test the cores i.e. test the three cores in TABLE 1. Example data used in Non SIC [25].   Table 2).
Various possible test plans and its corresponding costs for β = 2000 are shown in Tables 3 and 4 respectively. In Table 3, first, second, third and fourth columns indicate test    According to [25] test plan 5 is the best test plan since test cost in this case is 34260 which is minimum among all five test costs and w pt is within power boundary.
Test costs for the same test plan (as shown in Table 3) for β = 3000 is shown in Table 5.

IV. PROPOSED WORK
In the Bin Packing problem, a given set of input data items are needed to pack in a way such that the minimum number of bins are used. It is a well-known technique for various optimization problems and it has wide range of applications in various field such as operations research, computer science, and engineering. An asymptotically optimal parallel Bin Packing algorithm is discussed in [27]. Study of many classical Bin Packing algorithms are discussed in [28]. A study on two dimensional packing problems are discussed in [29]. The 2D Bin Packing problem can be described as follows: Given n items of different weights w 1 , . . . , w n and maximum capacity of bins w max (≥ max(w 1 , . . . , w n )), assign each item to a bin such that total number of bins c bin is minimized. For an example, there are 5 items of weights 4, 8, 1, 4 and 2 respectively and bin capacity is 10. Objective is to assign these 5 items to the given capacity bin such that total number of bins is minimum. Solving the problem, we find that minimum 2 bins is required to assign all the 5 items and the resultant bins with 5 items are {8, 2} and {4, 4, 1}. Various techniques can be used to solve this problem-(i) First fit decreasing (ii) Best fit decreasing. Test cost optimization for core based ICs under a power constraint can be transformed to 2D Bin Packing problem. Bin Packing problem is widely used in various problem in 2D SOC architectures such as test time optimization of 2D core-based SOC designs [30], resource allocation and test scheduling problem of core-based SOC [31], co-optimization of TAM and wrapper in 2D SOC [32], core based SoC test scheduling with power constraint [33] etc. Another application of Bin Packing problem in 3D architecture is reported in [25]. Most of the earlier reported works ([30]- [33] and [25]) are based on Best Fit Bin Packing algorithm while our work is based on the First Fit Decreasing algorithm. Moreover, the earlier reported works [30]- [33] address 2D SOC architecture where the proposed work addresses more complicated problem which is based on 3D architecture in which multiple device layers are stacked together (3D SIC). Unlike [30]- [33] (where Bin Packing based heuristic algorithm is applied for solving 2D SOC problem) we have applied Genetic algorithm based metaheurisitic approach to solve the problem of 3D SIC.

A. 2D BIN PACKING FOR NON SIC
We now discuss elaborately First fit decreasing based Bin Packing algorithm for minimizing test cost in Non SIC. In First fit decreasing algorithm, items are arranged in decreasing order, and then in this way next item is always packed into the first bin where it fits. Let there are five cores Core 1, Core 2, Core 3, Core 4 and Core 5 indicated by c 1 , c 2 , c 3 , c 4 and c 5 respectively in a Non SIC (see Table 6). If we apply Heuristic [25], cores will be sorted in descending order based on the test pattern, therefore the resultant order of cores will be Core 3, Core 4, Core 1, Core 2, Core 5 and resultant test plan will be {c 3 , c 4 , c 5 } and {c 1 , c 2 } and the number of TDRs will be 2. Hence, the resultant test cost will be 32860.
However, further optimization in test cost is achieved applying proposed First fit decreasing based Bin Packing algorithm where cores are sorted in descending order according to their power values. Algorithm 1 shows the procedure to create bin and test schedule and Algorithm 2 shows the procedure for calculating test time and test cost.
In proposed First fit decreasing based Bin Packing algorithm, cores are considered as items which are to be placed inside bins and each bin can be considered as session with w max bin capacity. We have to propose a test plan such that using minimum number of bins all cores are placed inside the bins and which results in minimized test cost.
Initially, cores are sorted in descending order according to their power values (see Figure 5).
Let the bin capacity is 100 (which is same as maximum power boundary w max ). Since Core 1 is the largest item Algorithm 1 For Creating Bin and Test Schedule 1: Input: c 1 , . . . , c n , l 1 , . . . , l n , p 1 , . . . , p n , w 1 , . . . , w n , α, β 2: and w max 3: Output: m bins and resultant schedule 4: begin 5: Sort w 1 , . . . , w n power values in decreasing order. 6: bin b j , j = 1, 2 . . . n. /* maintain an array of size n to store remaining space in bins */ 7: for i = 1, n do 8: All w i 9: for j = 1, m do 10: All bins b j 11: if b j ≥ w i then /* if jth bin capacity is greater than the power of any i th core 12: b j = b j − w i /*b j holds the remaining bin capacity*/ 13: break the loop 14: end if 15: end for 16: if j == m then /*value did not fit in any available bin then*/ 17: Calculate t j using equation 1 /*If more than one core in single session */  according to power value, so it is placed in the first bin (see Figure 6). Therefore available space of the bin is 50. Next, the power value of Core 2 is compared with the available bin space and VOLUME 9, 2021    checked whether it fits in the bin or not. Since w 2 is 40 which is less than the available space 50, therefore they can be tested in the same session using single TDR (i.e. concurrent testing) and it is depicted in Figure 7. Now, Core 3 cannot fit since power value of w 3 is 40 which is greater than the available space in the bin i.e. 10. So, Core 3 is placed in another bin as shown in Figure 8 and tested using different TDR. Next Core 4 need to be inserted.
Since we are following First fit based Bin Packing so we have to check from the beginning. Since remaining space of bin is 10 and the power value of Core 4 is 20, so Core 4 cannot fit in the first bin. Next bin is checked and the remaining  space of the next bin is 60, so Core 4 can easily fit in the bin (see Figure 9) Next core 5 need to be placed. Since the power requirement of Core 5 is 10 and the remaining space of the first bin is 10, so Core 5 can easily fit in the bin. In this way we obtain the resultant test plan with two sessions i.e. {c 1 , c 2 , c 5 } and {c 3 , c 4 } and two TDRs. Therefore the resultant test time for sessions {c 1 , c 2 , c 5 } and {c 3 , c 4 } are 2940 time unit and 10290 time unit respectively (applying equation 1) and total test time T becomes 2 × 13230.26460 (applying equation 8). (Here, c 1 , c 2 , c 3 , c 4 and c 5 indicate Core 1, Core 2, Core 3, Core 4 and Core 5 respectively). Since two TDRs are involved so total cost becomes 30460 (applying equation 5). Therefore we obtain improved test cost in Non SIC compared to the earlier work reported in [25] which is 32860. The running time of First Fit Decreasing based Bin Packing algorithm is O(n 2 ).

B. GENETIC ALGORITHM
The test cost reduction problem in 3D stacked IC is considered as NP complete problem as it is equivalent to the multiprocessor scheduling problem which itself is a well-known NP complete problem according to [26]. From the discussion, it is clear that the problem is basically an optimization problem. Again, stacking multiple chips together makes the problem more complicated, so it is hard to achieve the best solution using classical heuristic technique such as Bin Packing. In comparison to classical technique, Genetic Algorithm (GA) based meta heuristic technique performs better because classical method searches for a single point or solution, GA always operates on a whole population of points or solutions. This contributes the robustness of GA, that is, it increases the chance of reaching global optimum and reduces the risk of becoming trapped in a local stationary point. The basic idea of GA is to maintain a population of chromosomes, which represents the potential solutions for a particular problem which evolves according to the Darwin's principle of natural selection and survival of the fittest. GA for a particular problem must have following five components.
a. Chromosome representation using proper encoding, b. Cost function for fitness evaluation, c. Selection operator, d. Crossover operator, e. Mutation operator. Algorithm 3 indicates the proposed Genetic algorithm for calculating test time and test cost of SIC. The different steps taken in Genetic algorithm are described as follows.

1) ENCODING
Proposed encoding scheme for stacked integrated circuit is discussed elaborately in this section. In the proposed encoding scheme cores are assigned some values, the range of the value varies from 1 to n, where n indicates number of cores. Same value assigned to some cores indicates cores are tested in same session with same TDR and different value assigned to different cores indicates cores are tested in series with different TDRs.
Stacked IC consists of multiple chips where each chip consists of multiple cores. The chromosome representation of stacked IC is shown in Figure 11, where Chip 1 consists of 3 cores, Chip 2 consists of 2 cores, package consists of 5 cores (package indicates package test where 5 (i.e. 3+2) cores are tested together) respectively. Again, ''1 1 1'' indicates three cores c 1 , c 2 , c 3 are tested in single session using single TDR during Chip 1 testing, ''1 1'' indicates two cores c 3 , c 4 of Chip 2 are tested in single session using single TDR during Chip 2 testing, ''1 1 1 2 2'' indicates three cores c 1 , c 2 , c 3 of Chip 1 and two cores c 4 , c 5 of Chip 2 are tested in single session.

2) FITNESS FUNCTION
Initially populations are generated randomly. Each chromosome (i.e. x j ) in the population is associated with fitness [Start] Initial population is generated randomly.

5:
[Evaluation] Calculate fitness of each chromosome /* using equations 1-2 and equations 6-10 */ 6: for j ← 1, p do 7: cost = (cost + cost(x j )) /* cost(x j ) indicates cost of any j th test plan*/ 8: end for 9: Fitness f (x j ) = cost(x j )/cost 10: [New Population] Repeat the following steps until p offspring have been created: 11: Chromosomes are sorted in descending order with respect to f (x j ).

12:
[Selection] 13: for j ← 1, p do 14: Select chromosome with highest f (x j ) value and lowest f (x j ) value for crossover 15: end for 16: [Crossover] Single point crossover is performed 17: [Mutation] A random value p m is generated and one random position r is selected from chromosome x j . 18: if p m = 0 then 19: r = r + 1. [Replace] 26: if Iteration = 10 then 27: Stop. 28 which reflects the degree of goodness of the chromosome and indicates to determine which chromosome is to be used for forming new one. In this problem the fitness of each chromosome (i.e. f (x j )) is calculated using equations 1-2 and equations 6-10. A set of random chromosome is considered as population (see Table 6). While the population is sorted on the basis of fitness value, the fitness of each chromosome is calculated using equations 1-2 and equations 6-10. As discussed earlier, chromosome ''112 11 11222'' indicates cores of Chip 1 are tested in single session, cores of Chip 2 are tested in single session and last field indicates package test in which all cores of Chip 1 and Chip 2 are tested together. Therefore overall test time of chromosome ''112 11 11222'' is 14430 (using equation 7) and overall test cost of chromosome ''112 11 11222'' is 35210 (refer third row of Table 4), and the fitness f (x j ) (see line number 8 of Algorithm 3) is 0.2049 (i.e. the individual cost divided by total cost of the population).

3) SELECTION
To maintain the diversity of the population, two chromosomes, one with highest fitness value and another with lowest fitness value are selected for crossover in proposed work.
Here the fitness value corresponds to test cost. According to Table 6, chromosome ''112 11 11222'' with highest f (x j ) 0.2049 and chromosome with ''112 12 11223'' with lowest f (x j ) value 0.1962 are selected for crossover.

4) CROSSOVER
In crossover, the feature of parent chromosomes are combined to form two offsprings with the possibility that good chromosome generates better one than the parents if it inherits the best characteristics from each of them. Among various crossover techniques such as uniform, single point, multipoint etc., single point crossover is considered in our proposed work. One such example of single point crossover is shown in Figure 12.

5) MUTATION
To restrict the premature convergence of GA to sub optimal solutions mutation operator is introduced in one or more genes of selected chromosomes. To perform mutation a small value p m is generated randomly. Another random value r is generated which indicates the position of chromosome to be changed during mutation. If p m = 0, r is incremented by 1 otherwise r will be as it is.

C. RESULT APPLYING GENETIC ALGORITHM
Applying Genetic algorithm we obtain the optimized test plan as shown in Table 8, where cores c 1 , c 2 of Chip 1 are tested in one session and core c 3 of Chip 1 is tested in another session, therefore 2 TDRs are required to test three cores of Chip 1 . Again, two cores c 4 , c 5 of Chip 2 are tested in one session and therefore one TDR is required to test two cores of Chip 2 . So altogether, three TDRs are required to test two chips during wafer sort. During package test, cores c 1 , c 2 are tested in single session, c 3 , c 4 are tested in single session and c 5 is tested in another session. The resultant schedule in package test is shown in Table 8. The details of Test time, TDR and test cost of the resultant test plan is shown in Table 9 and according to Table 9, the test cost is 33460 which is less than the test cost 33710 (applying Simulated Annealing in [25]). Therefore we obtain improved test cost in SIC compared to the earlier reported work in [25]. Time complexity of Genetic Algorithm cannot be calculated as such, but it depends on mainly two factors i.e. number of generation or iteration and population size. In our approach population size is restricted to 900 and the number of iterations is restricted to 10 in our experiment. However, space complexity depends on the population size.

V. EXPERIMENTAL RESULT
In this section we discuss the experimental result of 2D Bin Packing (for Non SIC) and Genetic algorithm (for SIC) applied on six representative SOCs p22810, p93791, g1203, d695, h953 and d281 from ITC'02 SOC benchmarks and compare with Heuristic Algorithm (HE) and Simulated Annealing (SE) of [25]. The proposed algorithm is developed in C++ language and executed on Intel processor having 4GB RAM.

A. NON SIC
The comparison of proposed 2D Bin Packing approach of Non SIC with Heuristic and Simulated Annealing of [25] is shown in According to Table 10, test cost reduction is achieved in all benchmarks in comparison to Heuristic [25]. Specially in   [25] and Simulated Annealing [25].

TABLE 11.
Comparison of test cost of SIC (with 2 chips and 3 chips) between Simulated Annealing in [25] and Proposed Genetic Algorithm. benchmarks p93791, d695, and d281, we achieve very good result in comparison to [25].
According to Table 10, test cost reduction is achieved in all benchmarks except p22810 in comparison to Simulated Annealing [25]. We can reach to the following conclusion that, proposed Bin Packing is quite efficient than the Heuristic method proposed in [25] and Simulated Annealing, except in case of p22810 (since better result is achieved in Simulated Annealing in [25]). Again in Simulated Annealing, the best test cost achieved at the cost of longer computation time as compared to the proposed Bin Packing approach.

B. SIC
As discussed earlier in 3D Stacked IC or SIC more than one chips are stacked together. The experimental result of SIC for 2 chips and 3 chips are shown in It can be observed that in case of SIC with 2 chips, there are 6 2 (i.e. 15) different ways exist to test. Among the 15 instances, some 5 instances are reported in the literature, however, 10 more instances are possible, and they are shown in Table 11. Among these 5 instances we achieve better result in all 5 instances, among which we achieve best result in SIC VOLUME 9, 2021 with 5 and 6 (benchmarks h953 and d281 are involved in stacking). Also, we achieve better result in all 3 instances of SIC with 3 chips and best result is achieved in SIC with 2, 3 and 4 chips (benchmarks p93791, g1023 and d695).
SIC with 2 chips is further illustrated in Table 12, where all 15 instances and their total costs are shown. According to Table 12, columns 1, 2, 3 and 4 indicate serial number, design of two chips stacking, their individual test cost, the total test cost respectively. Among all 15 possible stacking with 2 chips, total cost can be minimized if we test 1 and 2 together, then 3 and 4 together and at last 5, 6 together. The reason behind the minimum cost value is that since benchmarks 1 and 2 generate comparatively high test cost and 3 and 4 generate comparatively low test cost (see Table 10), so the benchmarks which generate high test cost are tested together (such as 1, 2) and low cost benchmarks are tested together (such as 3,4) and remaining benchmarks (5 and 6) are tested together, thus it greatly reduces the overall test cost. Therefore, proper test planning with 2 chips per stack can reduce the overall test cost from 1680149 to 1510234.

C. VARIATION OF TEST TIME AND TEST COST WITH POWER
The variation of test time and cost with different power boundaries is shown in Table 13 for Non SIC, in which columns 1, 2, 3 and 4 indicate serial no, power boundary, time and cost respectively and it is observed that test time and test cost decreases or remains same with increase in power boundary.

D. VARIATION OF TEST COST WITH NUMBER OF ITERATIONS
The variation of test cost with different number of iterations are shown in Figures 13, 14 and 15. During our experiment we observe that test cost gradually decreases with generation which indicates that the algorithm is robust and it will give us good quality as solution set is improving. However, test cost become stable after iteration 10 in most of the SIC where design of stacking are 1,2, 2,3 and 3,4. So iteration 10 is considered as the maximum number of iteration for our experiment.

E. VARIATION OF TEST COST WITH MUTATION
Since mutation is the key to change the region of search space, mutation probability has impact in finding solutions of good quality. So, by fixing crossover probability (1-point crossover), population size to 900 and number of iteration to 10 and changing mutation probability from 0.05 to 0.8, test costs are shown in Figures 16, 17, 18 and similar trend is observed in the SIC where design of stacking are 1,2, 2,3 and 3,4. In our experiment increase in mutation probability leading to increasing test cost. So we restrict mutation probability to 0.05.

F. VARIATION OF TEST COST WITH CROSSOVER
By fixing mutation probability to 0.05, population size to 900 and number of iteration to 10 test cost difference between 2-point and 1-point crossover are shown in Figure 19, here positive value indicates 1-point crossover produces minimum test cost in comparison to 2-point crossover. According to Figure 19, it clear that the best test cost reduction is achieved in small size chip (example SIC chip design in which chips 5 and 6 are involved).

G. COMPARISON OF EXECUTION TIME
The CPU time taken to execute the SA in [25] and proposed GA are shown in Table 14 and Figure 20.     In Table 14, columns 1, 2 and 3 indicate number of chips, Execution Time in SA [25] and Execution Time in GA respectively. It can be observed that Simulated Annealing arrives at the desired test plan with considerably longer computation time as compared to the Genetic algorithm.

VI. CONCLUSION
DFT hardware and test time play a crucial role in increasing test cost of SIC. Test time can be reduced if we test the cores concurrently which demands higher power requirement. Thus power constraint plays a significant role. So, to reduce the test cost co-optimization of test time and DFT hardware under certain power boundary is required. In our method we have considered core based Non SIC and SICs based on the IEEE 1149.1 test architecture standard as systems for testing. Along with an existing cost model a First fit based 2D Bin packing algorithm is applied to minimize the test cost by properly scheduling cores of SICs while meeting power constraints. In case of Non SIC, where the same test schedule is applied during wafer sort and package tests, all those cores are placed in a bin whose concurrent tests do not exceed power boundary w max .
However it has been observed that 2D Bin Packing approach is not efficient for large size cores and therefore performance becomes worse when multiple chips are stacked together. Therefore, it is hard to achieve the best solution using 2D Bin Packing based optimization technique. Thus Genetic Algorithm based metaheuristic approach is applied for SIC, where each chip is tested individually during wafer sort and jointly during package test.
The test cost is minimized by efficiently applying selection, crossover and mutation operators on population. Also the algorithm is designed in such a way so that it can be applied for SICs with any number of chips and for any power boundary.