Thermal management implications of heterogeneous integrated packaging - FAST TURN PCB | Fast turn PCB

As the semiconductor industry reached lower process nodes, silicon designers struggled to make Moore’s Law work as well as previous generations. It is no longer economically feasible to increase core size in SoC designs. The decomposition of a single SoC into dedicated chips (called small chips) has significant advantages in cost, yield, and performance. Smaller chips offer manufacturers the advantage of being able to shrink only the process nodes for certain components, while keeping other components at a more economical size. In addition, several power-related factors also come into play. Heterogeneous integration provides an alternative path to keeping up with Moore’s Law rather than relying on traditional node-scaling techniques. Thermal management compared to traditional monomer structures because of their increased power density, physical size and geometry.

Heterogeneous integration allows components of different process nodes and functions to be encapsulated into a single module. There are several approaches, including but not limited to multi-chip module (MCM), system-level package (SiP), 2.5D silicon through-hole (TSV) silicon intermediate layer, and high-density fan out (HDFO)). Some of these technologies have been around for a long time, but only recently has it become popular to decompose large monolithic SoC bare slices into smaller sub-components or small chips that are packaged into a single module. This type of heterogeneous packaging allows companies to keep pace with the scale and economies of Moore’s Law. The benefits of heterogeneous encapsulation are numerous. First, designers are no longer limited to single-node technology. Individual functions can use the old node size, and it does not make economic sense to migrate to the latest node. Second, the smaller chip size allows for more chips per wafer, so less waste around it. Yield is also improved because a failed sub-component does not need to reject the entire SoC. While there are many applications for heterogeneous encapsulation, this blog post will focus specifically on high performance computing (HPC).

 FAST TURN PCB

Power consumption in semiconductors

Complementary metal oxide semiconductor (CMOS) devices dissipate heat in three main ways: dynamic power, short-circuit dissipation, and leakage. When the capacitor absorbs energy to charge, dynamic power loss occurs due to switching activity within the circuit. Historically, this method has been the highest source of power consumption in CMOS devices, but recently leakage current has played a more important role at lower nodes. For high-tech nodes at 65nm and below, the leakage power has grown to 50% of the total power consumption of the device.

Power consumption is considered to be the biggest limiting factor in chip design. Although more and more cores can be packed into a single silicon chip, it is not feasible to run them all at full performance at the same time due to thermal limitations. Instead, the kernel must be restricted or disabled to reduce overheating. The phenomenon of unusable silicon, often referred to as “dark silicon”, where areas of silicon must be deactivated due to thermal problems, limits the performance and efficiency of the overall device as it fails to reach its full potential. Power consumption is not a problem that is going away. As the power efficiency of each calculation in silicon continues to increase, so does the overall package power density. Increasing package power density requires careful design and material consideration to optimize thermal performance.

Source: Intel

Power density trend

Before the rise of hyperscale data centers and artificial intelligence (AI) computing, power densities of 3-5 kW per rack were considered the norm. At this level, the chips in the rack can be cooled using an air-cooled radiator. Air-cooled radiators release air into the data center channels and eventually extract the heat through coolers or refrigeration units. Today, artificial intelligence and other new HPC applications require more power per chip, with some exceeding 500 W. Cooling racks by moving air is no longer effective or even feasible at standard rack sizes. In fact, data center rack power density is expected to continue to rise and is expected to reach 15 to 30 kW per rack in the near future. This level of power density requires other forms of cooling.

Many advanced cooling solutions are being developed today. The ideal cooling solution can be implemented using existing infrastructure without radically changing the data center environment. Heat pipes and quench plates use phase change heat transfer in closed loops to achieve effective thermal conductivity significantly higher than that of copper or aluminum. These technologies are widely used today, but still face the same challenges when they are implemented into radiators that must be cooled by flowing air. The next advance in cooling is liquid cooling. This can be done in two forms; Indirect cooling by using cold plates or direct cooling by immersion; The latter is a more exotic form.

Google Tensor Processing Unit (TPUv3) AI machine learning board with liquid cold plate.
Cray’s Shasta direct liquid cooling system was used on the first exascale supercomputer.

In submerged cooling systems, the equipment is in direct contact with the dielectric coolant. Depending on the coolant and configuration, immersion systems can operate with either a single or two-phase liquid. Two-phase immersion cooling has the advantage of constant temperature on each device in the coolant bath, but these types of systems are more challenging to implement than single-phase systems. Immersion cooling requires a completely different data center environment because almost the entire rack must be sealed to contain the coolant. Because this is so different from today’s practices, significant hurdles need to be overcome to make immersion cooling economically viable. However, there is still a lot of interest in direct and indirect liquid cooling. The Open Computing Project (OPC) has two projects focused on developing standardized solutions for immersion and cold plate cooling. Mainstream air cooling solutions widely used today will not support the future needs of HPC and AI.

Two phase immersion cooling system. (Source: AnandTech — Gigabyte)

Thermal challenges of heterogeneous packaging

From a heat conduction point of view, there is little difference between most heterogeneous packaging forms. It is not common to stack chips or components on top of high-power chips in high-power packages, so this discussion will only consider 2.5D or MCM style designs. Almost all of these configurations involve the same basic heat flow path through the top of the package. From the junction, heat is transmitted through silicon and thermal interface materials (TIM), then into the radiator and dissipated into the system cooling solution. However, many packaging options for heterogeneous integration have their own unique process and physical properties that indirectly affect thermal properties due to packaging warping and its effect on thermal interface materials.

The radiator

For most semiconductor packages, radiator provides thermal performance as well as the advantages of protected silicon and warpage control. However, in some cases, direct exposure of silicon to system cooling solutions can provide better thermal performance than radiator covers. When the package is connected to a very low resistance TIM II (thermal interface material between the package and the system radiator) and a high performance radiator (such as direct liquid cooling), the actual heat transfer inside the lid is very small. In this case, the heat is mainly transmitted directly up from the silicon, so it may be beneficial to remove the thermal resistance of the radiator and TIM I along this heat flow path. However, bare silicon is not without its own challenges,

Considering alternatives with high resistance TIM II and lower performance radiators, such as simple air-cooled aluminum radiators, radiators typically provide a thermal advantage because heat diffuses over a larger area before leaving the package. The higher the thermal resistance of the system cooling solution, the more heat is transmitted within the package. Since thermal resistance is a function of area, heat transfer over a larger area to the TIM II and radiator effectively “lowers” their resistance. For heterogeneous packages, there is usually a high power density difference in the total area of the package. Thus, this corresponds to more potential advantages of using radiators. In addition, the greater the temperature gradient throughout the package,

Package with integrated radiator

Heterogeneous packaging also involves the challenge of having components on a single small chip that are highly different, which may be due to manufacturing variability or simply different types of components (e.g., small chips versus high bandwidth memory (HBM) modules). Using integrated radiators, they can be manufactured to compensate for different heights through different cavity depths. When considering tolerance accumulation for stacked chips, it is critical to keep the minimum TIM bond line on the highest power chips. Therefore, the radiator cavity should take this into account in its design.

Thermal interface material

For most high-performance computing scenarios, more than 95% of the total power of the device dissipates through the top of the package into system-level cooling solutions. In a package (not including 3D), the only components along this path are silicon, thermal interface material, and copper (radiator material), except for the bare chip package, which has only silicon. Since silicon is the desired semiconductor and copper is already one of the best thermal conductors, the only variable in material selection is the thermal interface material. Although the thickness of the thermal interface material is at least an order of magnitude smaller than that of silicon and radiator, it typically contributes more than 50% of the thermal resistance along the path.

The choice of TIM I is critical for high-power packaging. The material not only needs low thermal resistance, but also needs to be able to withstand the conditions experienced by the package during assembly and its service life. When the device is heated or cooled during reflow or operation, TIM will be subjected to considerable stress due to mismatches in thermal expansion coefficient (CTE) of copper, silicon and organic matter within the package. The ability to maintain adhesion and cohesion during these stress cycles is as important as the thermal conductivity of the body. Achieving a balance of these properties is challenging, and by far the most common is to find materials only at opposite ends of the spectrum. The gel and grease type TIM consists of a polymer matrix containing conductive particles, such as aluminum or silver. These materials have lower elastic moduli, but they still have lower thermal conductivity compared to metals. Metal solders TIM (e.g. Indium) provide very high thermal conductivity at the expense of very high moduli, which challenges TIM’s machinability and reliability.

Heterogeneous encapsulation provides a unique environment for TIM. TIM can connect not only multiple components, but also multiple material types according to the package. In addition, TIM may be subjected to different stresses than a large single chip. One benefit of heterogeneous encapsulation of Tims is that different Tims can be used on different components. For example, central processing unit (CPU) chips can have high-performance TIMs, while low-power HBM modules can use sticky Tims to reduce package warpage.

The TIM resistance is a function of the thickness, thermal conductivity and contact resistance at the interface. By nature, heterogeneous packages are usually very large, so this corresponds to a large TIM surface area. Its thickness is several orders of magnitude smaller than TIM’s total contact surface area. This means that the bulk conductivity of the material plays a relatively small role in the total thermal resistance of TIM. Thus, although advanced metal welded TIM has extremely high thermal conductivity relative to polymer-based materials, the thermal benefit is only increased on the large surface area of heterogeneous packaging types. In addition, due to the warping of such a large package, these high-modulus metal Tims are subjected to a great deal of stress.

Detailed view of polymer thermal interface material containing particles.

Components and small chip organization

The first line of defense for thermal management is silicon itself. Silicon has relatively high thermal conductivity and is excellent at mitigating hot spots. Because heterogeneous packages decompose functions into individual components, they lose the heat dissipation advantage of large sheets of silicon. However, this actually benefits thermal performance because the heating components are spread out, thus reducing their thermal crosstalk.

Thermal sensing components or small chip placement provide significant opportunities for thermal optimization of packages. Chip and package designers should carefully consider the electrical and thermal balance of component placement, especially when high power is involved. Where possible, the high-power components should be spread out to distribute power more evenly throughout the package area. However, the edges and corners of the package are limited in terms of heat dissipation, so the high power density should not be too close to the perimeter.

System integration

Heterogeneous packages can include more silicon in area without limiting the line size. As a result, their overall body size also tends to increase. MCM packages in excess of 70 mm x 70 mm are not uncommon today. This relatively large package size presents challenges when it comes to system-level integration with TIM II and radiators. To maintain adequate thermal resistance at the TIM II interface, pressure is applied. For the surface area of a large MCM, considerable force is required to meet this pressure requirement. This creates stress not only on the package, but also on the system motherboard. High strength may require additional strengthening of motherboard and/or radiator mounting hardware, driving up costs. If sufficient pressure is not applied to the TIM II, the device will suffer from thermal degradation. This problem demonstrates another benefit of implementing radiators in large heterogeneous packages: variability in radiators and TIM II applications can be compensated by radiators for more consistent thermal performance.

Impact of package-level thermal performance enhancement

As discussed in the previous section, packaging contributes more and more to the total system thermal resistance as system-level cooling solutions advance. There are a number of options available to take advantage of heat-enhanced packaging. At the most basic level, improvements in package thermal resistance will reduce junction temperatures correspondingly. A general rule of thumb for semiconductor devices is that the operating life is halved for every 10°C increase in junction temperature. Therefore, the theoretical working life of devices can be significantly increased by reducing junction temperature through package-level thermal enhancement.

Alternatively, the heat-enhanced package can operate at higher power because the system-level cooling solution can support additional thermal loads while maintaining the same junction temperature. Since junction temperatures typically limit chip performance, this was a clear choice to utilize packages with improved thermal resistance.

A package with features added to enhance its thermal performance will still emit the same amount of heat as the original design, but the temperature difference between the junction and the environment will be reduced. At the system level, reducing the thermal resistance of the package has many benefits. It is possible to increase the ambient temperature or cool the solution instead of decreasing the junction temperature while maintaining the same original junction temperature. In the case of air-cooled data centers, there are significant cost savings.

Founded in 2015, The company takes PCB sample manufacturing as the entrance, and has the ability of fast delivery of high-end samples and small and medium-sized batches. Through the whole value chain services such as PCB manufacturing, BOM purchasing and PCB assembly, we provide customers with vertically integrated one-stop solutions for their products. We continue to contribute to the continuous innovation and development of China’s electronic technology, in order to build a first-class electronic product design and manufacturing outsourcing service provider. Pursue the material and spiritual happiness of all staff and make contributions to the progress and development of mankind and society.

We offer professional design solutions covering all aspects of PCB and PCBA layout, including the following board technologies.

Include:

1.PCB design: free laminated design and impedance calculation.

2.PCB manufacturing :PCB board making capacity of 1-48 layers, blind hole, copper thickness up to 12 ounces

3. Component purchase :BOM purchase within 3 days

4.PCB assembly: urgent sample and batch assembly processing, only 1-3 days

5. Functional testing

6. Electronic assembly global logistics

7. Single, double-sided, multi-layer board

8. Rigid circuits, flexible circuits and rigid-flexible circuits.

Please email us to get a quote right away: sales@fastturnpcbs.com

Telephone number: 15018735409

 FAST TURN PCB