Shimeng Yu, Yuan-Chun Luo, Tae-Hyeon Kim, Omkar Phadke
©SHUTTERSTOCK.COM/JAMESTEOHART
Compute-in-memory (CIM) has emerged as a compelling approach to address the ever-increasing demand for energy-efficient computing for edge artificial intelligence (AI) applications. Nonvolatile capacitive synapses have been recently proposed to further boost the energy and area efficiency of CIM hardware with the following attractive features when compared with resistive synapses: negligible static power consumption, selectorless access, low interconnect voltage drop and exemption from sneak-path leakage, limited read disturb owing to the small-signal read manner, and great 3D stacking potential. In this article, the operating principle of the capacitive crossbar array for neural network inference is first explained, followed by a survey of state-of-the-art nonvolatile capacitive synapses that are mostly implemented by the ferroelectric devices with asymmetric capacitance–voltage characteristics. Finally, the design guidelines on capacitive device parameters are discussed toward optimizing the array-level performance metrics.
The conventional computing paradigm places processing units and memories separately, where data loading and storing become the bottleneck of the overall system’s throughput and energy efficiency. The situation deteriorates as the data size in modern AI algorithms increases significantly over recent years, resulting in more congestion and overhead in the data movement. On the other hand, CIM [1] as a new computing paradigm performs the processing in the same physical location where the data are stored, thereby significantly reducing the data transfer latency and power consumption. The overall operations are performed inside the memory arrays, possibly in the mixed-signal mode with high input/output parallelism.
CIM specializes in accelerating the multiply-and-accumulate (MAC) function for vector–matrix multiplication, which is the dominant operation in modern AI algorithms. The operating principle of CIM involves one matrix (e.g., synaptic weights of a neural network), typically stored as the conductance of the memory cells inside the memory arrays, and an input vector (e.g., input feature map of a neural network), represented by wordline (WL) voltages as shown in Figure 1. With WL voltages as the input vector, the resulting current from each cell becomes an entry of a dot product. These currents/charges are automatically summed up at the end of the memory array as the weighted sum of the MAC operation. The bitline (BL) currents are later converted into the digital outputs by the following analog-to-digital converter (ADC).
Figure 1. The operating principle of CIM with resistive and capacitive synapses.
Nonvolatile tunable conductance is an essential element that represents weight in a CIM system. It can be achieved by various types of resistive memories, such as resistive random access memory (RRAM) [2], phase-change memory (PCM) [3], spin-transfer-torque magnetic random access memory (STT-MRAM) [4], and ferroelectric field-effect transistor (FeFET) [5]. Although CIM prototype macros have been experimentally demonstrated with resistive memories [6], [7], several grand challenges exist for the large-scale implementations. First, the low-resistance-state resistance (RLRS) of the resistive memories is typically of only a few kΩ, thus consuming a large amount of leakage current/power during steady state, especially when multiple WLs are required to be turned on simultaneously for parallel MAC operations. Second, the finite resistance of the memory cell causes the interconnect voltage drop along the WL/BL wires, causing resistance (Rwire) at an advanced technology node, resulting in a significant portion of the applied voltage dropping across the WL/BL wires instead of fully across the selected memory cell. This inserts errors into the weighted sum due to the nonideal effective input voltage for both read and write operations. Third, the finite cell resistance allows sneak-path current to flow within the crossbar array. The sneak-path currents also add errors into the summed current at the end of the BL. Fourth, the nonzero read voltage of the resistive memories could gradually drift the stored synaptic weights, causing a read drift disturb issue following consecutive reading operations. Hence, a periodic refresh operation with energy overhead may be required to restore the memory states after a certain number of read operations. Finally, the access transistor for the resistive memories in the 1-transistor-1-resistor (1T1R) configuration may need to accommodate the high write currents; thus, they are usually sized up from the minimum width, which decreases the area efficiency of the memory arrays. In addition, the access transistor is generally implemented in the front-end-of-line (FEOL) silicon substrate, limiting the integration of resistive memories at the back-end-of-line (BEOL) for only one tier.
To overcome these challenges of resistive memories, nonvolatile capacitive memories for CIM designs have recently been proposed [9], [10], [11], [12], [13], [14], [15], [16]. The capacitive synapse uses tunable small-signal capacitance instead of conductance as the synaptic weights. The operating principle of the capacitive crossbar array shares many similarities with that of the resistive counterparts as shown in Figure 1. The primary difference is that the resistive synapses utilize the current domain computing, while the capacitive synapses exploit the charge domain computing. The capacitive read-out process is typically implemented in two phases, as illustrated in Figure 2. Phase I is to charge each capacitor along each column by raising the WL voltages if the corresponding inputs are nonzero. Here a WL voltage as small as about 100 mV, which is sufficiently smaller than the coercive field of the ferroelectric layer to allow for nondestructive reads, can be used. The resulting charges stored on each nonvolatile capacitor (Cnv) represent the dot product of each entry of the multiplication. In this phase, the negative input and output of the operational amplifier (OPAMP) are connected to the common-mode voltage (VCM), which results in an effective dc 0 V read voltage across the Cnv cell as the WL is biased at VCM in phase II. In phase II, the goal is to transfer all the charges to the reference capacitor (Cref), which shunts the OPAMP. To achieve this, the switches are turned off, causing the negative input of the OPAMP to be at analog ground, having the same potential as the positive input at VCM. Subsequently, the inputs are switched back to VCM, resulting in a zero cross-voltage across the capacitors in the array. This causes the charges stored on the capacitive synapses to flow out of the array. The charges on the WL electrode side flow back toward the WL driver, while the charges on the BL electrode side flow toward Cref along the only available path. Here the total charges on Cref determine Vout. With more inputs turned on or more capacitive synapses in the high-capacitance state (HCS), there would be a greater number of charges and thus a higher Vout. The ideal weighted sum in the charge domain is shown in (1). The full Vout swing is defined as the difference between Vout when all the capacitors in the array are in the HCS and Vout when all the capacitors are in the low-capacitance state (LCS). \[{V}_{\text{out},j} = \frac{\mathop{\sum}\limits_{i,j}{{V}_{\text{in},i}\times{C}_{\text{nv},i,j}}}{{C}_{\text{ref}}} \tag{1} \]
Figure 2. Capacitive CIM and its operating principle.
where i is the row number and j is the column number.
Figure 3 compares the resistive CIM and capacitive CIM. Since there are no steady-state currents flowing once the capacitors are fully charged, the capacitive weights consume very limited static power. Furthermore, a capacitor is equivalent to being super resistive, so it mitigates the issues of sneak-path leakage and interconnect voltage drop along WL/BL wires. Another benefit is that the capacitive read-out could take advantage of the small-signal responses with read voltages below 100 mV [8], and such low read voltages suppress the drift issue under continuous read stress. Since the capacitive approach has the potential to avoid read disturb, IR drop, and sneak-path currents, it could eliminate the need for an access transistor as usually required in the resistive approach. As a result, the capacitive approach can potentially lead to a smaller cell area and employ a true crossbar array architecture that supports multitier 3D stacking at the BEOL, where the peripheral circuits could be hidden underneath the memory array following the concept of CMOS under array. The capacitive synapses could potentially leverage the industrial development of the FeFET technologies as discussed in the section “Device Candidates for Nonvolatile Capacitive Synapses.” Subnanosecond polarization switching speeds have been reported in FeFETs [17], [18], which are comparable to the switching speeds of representative resistive synapses such as RRAM [19], PCM [20], and STT-MRAM [21]. Furthermore, FeFET has already been demonstrated in the 22-nm technology node [22], which is also comparable to the scaling status of its resistive counterparts such as 28-nm RRAM [23], 20-nm PRAM [24], and 16-nm STT-MRAM [25].
Figure 3. (a) A comparison between resistive and capacitive CIM. (b) Capacitive CIM achieves low IR drop, low read disturbance, no sneak-path current, and low static power consumption.
It is noted that there exists a volatile capacitive design for CIM, where the weighted sums are also represented as charges [9]. The reported volatile capacitive CIM designs usually use an SRAM with charge sharing and redistribution as the read-out mechanism. If the WL inputs are on, the charges on the storage node of the SRAM cells will be shared with the Cref at the edge of the array. The SRAM bit cell and the peripheral circuits of the array need to be customized with additional circuitry to accommodate such as sensing scheme, which is different from the conventional voltage mode sensing for the memory access. The usage of SRAM increases the leakage power consumption and needs reloading of the weight data after reboot. Hence, we focus on recent advances of the nonvolatile capacitive synapses in this survey. First, we introduce different types of nonvolatile capacitive synapses, including metal-ferroelectric-metal (MFM) and metal-ferroelectric-semiconductor (MFS) structures, charge shielding, and one-capacitor one-RRAM, and explore their structure and operation principles. Then, important device parameters for array-level metrics are discussed. Overall, this review article aims to provide an overview of the emerging capacitive synapses and to highlight their potential for CIM designs.
This section focuses on four different concepts for demonstrating nonvolatile capacitive synapses, including MFM [10], [11], MFS [12], [13], charge shielding [14], and one-capacitor one-RRAM [15].
Luo et al. [10] and Hur et al. [11] report one of the earliest prototypes on a capacitive synapse with the MFM structure TiN/Hf0.5Zr0.5O(HZO)/TiN stack as shown in Figure 4(a). The stack was fabricated with the plasma-enhanced atomic layer deposition in situ process with a relatively low annealing temperature at 450 °C for a BEOL-compatible process. The small-signal capacitance–voltage (C–V) curve of the device is shown in Figure 4(b), where an ac small-signal excitation (up to 100 mV amplitude) on top of a quasi-static dc sweep is applied to extract the capacitance. Due to the asymmetry in the C–V curve, two distinct memory states at 0 V dc bias are observed. The asymmetric C–V can be attributed to the oxygen vacancies that are concentrated at the interface of the MFM structure as shown in Figure 4(c) [16]. The oxygen vacancies introduce the domain wall pinning effect, which creates more domain walls after programming with positive voltage and fewer domain walls after erasing with negative voltage. The higher number of domain walls responds to the small signal excitation with more charge differences on the two plate electrodes and contributes to a larger capacitance in HCS. On the other hand, the smaller number of domain walls after erasing results in a smaller capacitance in LCS. This initial demonstration of nonvolatile capacitive synapse in this work features 10-year retention by extrapolation, >103 cycling endurance with strong 3 V/1 ms pulses, and limited read and write disturb, but it suffers a low HCS/LCS on/off ratio of ∼1.125. Rather than relying on the intrinsic interfacial defects of a symmetric structure, the C–V asymmetry can be enhanced by making the capacitor stack asymmetric, such as the Mo + MoOx/HZO/TiN stack, where a larger on/off ratio of ∼1.175 was reported by the electrode work function engineering to shift the C–V horizontally [26].
Figure 4. (a) A schematic and (b) the small-signal C–V curve of the MFM capacitor based on the TiN/HZO/TiN stack. (c) An illustration of the physical origins of CHCS and CLCS thanks to the excessive oxygen vacancies at the bottom electrode interface [11]. TE: top electrode; FE: ferroelectric; BE: bottom electrode.
The read operation for the MFM stack is usually performed at 0 V dc bias (with a small signal) to avoid the read disturb of the memory states during consecutive read-out. On the other hand, take RRAM, for example; although its typical read voltage (0.5–1 V) is below the typical write voltage (>2 V) and is unable to fully flip the memory states, the nonzero read voltage could gradually drift the memory state, degrading the long-term reliability. Hence, the 0 V dc read is desirable in the capacitive synapse.
The write disturb is also of concern for the crossbar array. The write disturb can be understood as when a selected cell is being programmed, the relatively high voltages applied from WL/BL may disturb the unselected cells along the wires, disturbing the memory states in the array. The write disturb can be mitigated by adding an access transistor or selector. It is known that the selector design is difficult, especially making it compatible with capacitive synapse. On the other hand, adding an access transistor takes up a large silicon footprint and limits the multitier BEOL stacking potential. Therefore, it is important to apply a smart write scheme, such as a 1/3 V write scheme [27], where the unselected cells experience 1/3 of the write voltages. The results in [10] show that there is a limited write disturb with <10% of the total margin between CHCS and CLCS. With a low read/write disturb and free from the sneak-path currents and interconnect voltage drop, the capacitive crossbar array can potentially achieve high area compactness without the need for an access transistor. The integration density can be further optimized by making the capacitors as a nonplanar structure (e.g., the stacked capacitors as used in the DRAM process).
A small-scale 12 × 12 crossbar array has been recently demonstrated as shown in Figure 5(a) and (b) [10]. The array-level MAC operations with the MFM capacitors have been experimentally measured with the peripheral circuits integrated on the board level. Figure 5(c) and (d) shows the measured output weighted sum (Vout) with respect to either the number of on-state HCS capacitance or the turned-on input. The endurance and retention at the array level were also characterized: There is still a distinct Vout window after thousands of strong 3 V/1 ms stress pulses. The retention test at 85 °C shows that a clear memory window can be observed after 17 hours with a possible extrapolation to 10 years.
Figure 5. (a) Capacitive crossbar array images with unit MFM capacitor area = 2 × 2 µm2. (b) The array core. The experimental MAC results with an increasing number of (c) turned-on inputs and (d) CFE in the HCS [10].
To increase the HCS/LCS on/off ratio of the nonvolatile capacitive synapse, the MFS stack [12] is desired with a silicon layer added between the ferroelectric layer and the bottom electrode, as shown in Figure 6(a). An inversion-type ferroelectric capacitive memory has been demonstrated, which has an MFS structure and a heavily doped n+ region. The device achieves a high CHCS/CLCS ratio of ∼125. CHCS is achieved with the formation of an inversion layer, and CLCS is dominated by the depletion capacitance formed by the p/n junction between the heavily doped n+ region and the p− substrate. For switching from CHSC to CLSC, the device relies on band-to-band tunneling and trap-assisted-tunneling. On the other hand, switching from CLCS to CHCS is facilitated by the supply of minority carriers from heavily doped n+ region. Based on this concept, a 32 × 32 crossbar array was demonstrated with a sense margin of 0.4 V as Vout. One drawback of this work is the requirement of large switching voltages (>6 V), which incurs an overhead on the peripheral circuit design.
Figure 6. (a) A schematic of the MFS structure. (b) The typical C–V curves with different programming voltages when the capacitance is measured between the gate and the shorted source/drain with a floating body on GlobalFoundries’ 28-nm FeFET platform. (c) The CHCS, CLCS, and on/off ratio versus different programming voltages. (d) An illustration of the physical origin behind CHCS and CLCS of the capacitive FeFET [13].
Another variant of MFS-based synapse is to leverage the gate source/gate drain capacitance of a regular FeFET structure [13]. A representative C–V curve is measured from the GlobalFoundries 28-nm FeFET [28] platform’s test structure, where the source/drain is internally connected, as shown in Figure 6(b). The erasing voltage is fixed at −3.5 V, while the programming voltage is swept from 2 V to 3.5 V. The on/off ratios at 0 V dc versus the programming voltages are shown in Figure 6(c), where an on/off ratio > 20 can be achieved with a moderate programming voltage at 2.5 V. Increasing CHCS with rising programming voltages also implies the potential of the multilevel cell for the capacitive MFS structure. The multilevel cell is difficult to achieve with a low on/off ratio in the MFM case because the difference between two adjacent capacitance states is small and would be vulnerable to noises and variations. The underlying mechanism for the nonvolatile capacitance is shown in Figure 6(d) and is explained as follows.
In a C–V of a FeFET, there are three types of primary capacitances: 1) oxide capacitance, which can be accessed after the formation of an inversion layer or an accumulation layer; 2) overlap capacitance, formed by the overlap between the gate and the source/drain region; and 3) depletion capacitance (which forms in the absence of an inversion layer). The oxide capacitance is usually the largest, followed by depletion capacitance and then overlap capacitance. If the threshold voltage of the transistor is modulated (e.g., by the polarization switching of the ferroelectric gate stack), then that results in a horizontal shift in the C–V curve. Hence, it is possible to achieve a high-capacitance on/off ratio at 0 V gate bias in a FeFET. After the positive write voltage is applied on the gate, the ferroelectric dipoles point downward, which makes the threshold voltage negative and allows for the formation of an inversion layer at 0 V gate bias; thus, a CHCS can be measured between the gate and the source/drain terminals. On the other hand, after the negative erase voltage is applied to the gate, the dipoles point upward, which makes the threshold voltage positive. Since the body terminal of the FeFET is floating, holes cannot be supplied at the high frequency, preventing the formation of the accumulation layer. Hence, the overlap capacitance between the gate and the source/drain dominates the CLCS. Since the read operation is performed at 0 V gate bias while applying a small signal, the readout operation is nondestructive. Further optimization of the on/off ratio has been reported in [13], where CLSC is reduced by a smaller gate-to-source/drain overlapped area, while CHSC can be increased by a larger gate area of the FeFET.
A memcapacitor device based on the principle of charge shielding is another approach to realizing the nonvolatile capacitive synapse [14]. The device structure and operating principle are shown in Figure 7. There is a p+/n–/n+ lateral junction that acts as the charge-shielding layer, with a memory dielectric and gate above the n– region and a dielectric and n+ readout electrode below as illustrated in Figure 7(a). The p+- and n+-doped regions are reservoirs that supply holes and electrons, respectively, and a charge-trapping layer and a ferroelectric layer can be used as the memory dielectric. In the read operation, the n+- and p+-doped regions are connected to ground, an alternating voltage signal with a specific bias voltage is input to the gate electrode, and the readout electrode senses the capacitance as shown in Figure 7(b). At this time, the readout small-signal capacitance depends on the mode of the n− region (accumulation, depletion, and inversion) as depicted in Figure 7(c)–(e), where Cox is the memory dielectric capacitance, CSi is the capacitance of n− Si, Cbox is the buried oxide dielectric capacitance, and Cn+ and Cp+ are the nonlinear accumulation and inversion capacitances. In the accumulation (inversion) mode, the n− region is supplied with electrons (holes) from the n+ (p+) region, inducing a high capacitance of Cn+ (Cp+) between the n+ (p+) and n– regions as shown in Figure 7(c) [Figure 7(d)]. Therefore, a strong shielding layer is formed in the n– region, which interferes with the response of the charge on the readout electrode and results in a low readout capacitance. On the other hand, in the depletion mode, the shielding efficiency of the n− region is reduced due to the low charge density, and the charges on the readout electrode respond to the small-signal input voltage and cause a high readout capacitance. This capacitive synapse utilizes a small-signal capacitance difference at the readout voltage by adjusting the flat-band voltage and the threshold through the memory dielectric to switch among the modes. Both the charge-trapping layer and ferroelectric layer can be used as memory dielectric, but the polarity of the write voltage may differ.
Figure 7. (a) A schematic of the charge-shielding capacitive synapse. (b) The equivalent capacitance model for read operation. Illustrations of the physical origins of CHRS and CLRS when the n– region is in (c) accumulation, (d) inversion, and (e) depletion modes.
Another approach of nonvolatile capacitive synapse has been demonstrated with one RRAM in series with one dielectric capacitor [15]. The device schematic and its operating principle are shown in Figure 8. When the RRAM is in the LRS, the low resistance overshadows the impedance of the parasitic capacitance (Cp) of the RRAM, resulting in the dielectric capacitance (Cdi) dominating the overall capacitance. On the other hand, when the RRAM is in the high-resistance state (HRS), Cp cannot be neglected. Hence, the equivalent capacitance of the stack becomes Cp and Cdi in series, which can be reduced to Cp since Cp is typically much smaller than Cdi. Since Cp is small, the on/off ratio (the ratio between Cdi and Cp) could be very high. One of the weaknesses of this design is that it relies on the impedance matching between RRAM resistance and Cp. The impedance matching determines the CHCS and CLCS, and it becomes unreliable when the resistance drifts over time or when there are multiple frequency components present in the applied voltages. This design has been demonstrated with the application of spiking neural networks instead of the general CIM application described in Figure 1.
Figure 8. (a) A schematic of the 1C1R capacitive synapse and illustrations of the physical origins of (b) CHCS and (c) CLCS in the 1C1R capacitive synapse.
The key features of the MFM capacitor, the MFS capacitor, capacitive FeFET, charge shielding, and 1C1R are compared in Table 1. The MFM capacitor has high BEOL compatibility and a small footprint but a poor on/off ratio, resulting in low resistance to noises and variations. The two MFS types have larger on/off ratios than that of MFM, enabling multilevel operation but low BEOL compatibility. If the MFS capacitor can be engineered toward BEOL-compatible fabrication, and the write voltage is improved by adjusting the thickness of the ferroelectric layer, it would be a compelling candidate among the nonvolatile capacitive synapses. The charge-shielding-based synapse has a larger on/off ratio than those of the MFS types, but it also has a large footprint, low BEOL compatibility, and high write voltage. The 1C1R has the largest on/off ratio among the nonvolatile capacitive synapses and high BEOL compatibility. However, the multilevel potential of 1C1R design is limited because the capacitive parallel path is tuned either on or off by the shunting resistor’s state, resulting in a discrete change of the overall capacitance of the device.
Table 1. A comparison of various capacitive synapses.
This section aims at connecting the device parameters to array-level performance metrics. Let us consider a 128 × 128 crossbar array with a column of 128 ferroelectric-based capacitors. As explained in Figure 2 earlier, the capacitive synapses can be operated by two-step charge summation. First, charge the capacitors corresponding to nonzero inputs through WLs. Second, transfer the stored charges through the BL. The BL is connected to the input of an OPAMP to provide an analog ground for summing the transferred charges. In this section, “charge transfer time” refers to the time it takes for the charge in phase II to be transferred.
To increase the Vout signal swing, Cnv and Cref in the crossbar array should be cooptimized. According to (1), if the ratio between Cnv and Cref is too small, the Vout voltage swing will be too small. On the other hand, if the ratio between Cnv and Cref is too large, the voltage swing would increase so much that it hits the ceiling of the supply voltage of the OPAMP, resulting in distortion of the Vout linearity. Therefore, a proper range of the on/off ratio should be designed for reliable capacitive synapse behavior.
After a proper ratio between CFE and Cref has been identified, their absolute values must be determined. Lower overall capacitances mean fast charge transfer times as the resistance-capacitance (RC) delay for the charge transfer phase (phase II) is shorter. However, the low capacitances would make the system more susceptible to temporal noises, especially thermal (i.e., kT/C) noise. A smaller capacitor may also suffer from process variation. Therefore, a balanced design is required to meet a specific design specification.
The device-to-device (D2D) variation in RRAM has been intensively studied for its impact on system performance and inference accuracy. On the other hand, the effect of the D2D variation in the capacitive array has not been fully investigated. The impact of D2D variation and temporal noise on Vout can be illustrated in Figure 9. D2D variation results in both vertical and lateral shifts of the transient output voltage curve, while the temporal noise results in fluctuation along the mean steady-state Vout values. Both nonidealities could result in distortion and a decrease in the linearity of the weighted sum computation. Initial studies by [13] suggest ∼1%–2% of D2D variation at the µm scale for capacitive FeFET. Less than 5% of variation can be projected at sub-100-nm dimensions if we assume the variation is inversely proportional to the logarithm of the area. To quantify the impact of the D2D variation on the array level, some simulations such as a Monte Carlo simulation could be performed based on the distribution of the measured data for future work. The voltage standard deviation at the output could be treated as an equivalent noise, and the effect of temporal noise could be combined with that of the D2D variation. Initial studies show that the temporal noise dominates the overall effect at Vout when D2D variation is 1% [29]. However, when the D2D variation increases to 10%, the two effects become comparable and need to be considered jointly. It should be noted that these results are specific to the settings used and may differ under different device and circuit designs.
Figure 9. The output signal increases with charge transfer time. D2D variation causes a vertical shift of the output signal, while the temporal noise results in fluctuation along the mean values. Both nonidealities distort the weighted sum and will result in different inference accuracy, where low to high accuracy are represented by red to green colors. (a) With small D2D variation and (b) with large D2D variation.
The effective number of bits (ENOB) in the capacitive read-out context determines how much information can be extracted from the weighted sum, represented by Vout. An application with less error tolerance may require a higher ENOB, and ENOB is also an important factor to determine the resolution required for the following ADC. To calculate ENOB, signal ranges and the equivalent noise at the output are required. The signal range is specified by the highest possible Vout value subtracted by the lowest Vout value, while the noise is the standard deviation of the combined effect of temporal noise and D2D variation. Initial studies show that with an on/off ratio of 25, a saturated ENOB can reach 7 bits when D2D variation is low (1%). ENOB drops by 0.5 bits when the D2D variation becomes higher (10%). A column with 128 rows carries a full precision of 7 bits; thus, the 0.5-bit loss of information is usually acceptable for deep neural networks [30].
As shown in Figure 9, the charge transfer time presents a tradeoff between latency and precision. A longer charge transfer time allows a higher ENOB. Assuming a device with an area of 75 nm × 75 nm (which is technically achievable at an industrial platform for capacitive FeFET [28]) and a dielectric constant of 25, the CHCS is set to 120 aF. In this case, the charge transfer time is about 5–10 ns in a 128 × 128 array with all values of Cnv programmed to CHCS. The charge transfer time is relevant to the optimization of the OPAMP. However, the most important factor is the size of the capacitors that determine the RC delay with a tradeoff between ENOB and speed, as discussed in the section “Effective Number of Bits.”
The subarray energy without OPAMP is negligible, only contributed by the energy of charging the capacitors. Compared with a representative RRAM array, the capacitive array could show >1,000× energy improvement for one MAC operation for the array core only [31]. However, OPAMP in the peripheral circuits consumes high energy as well. Therefore, the OPAMP design needs to be less energy-consuming to maintain the energy benefit of the capacitive design. The OPAMP also needs to be compact enough so that it can be compatible with the capacitive array under an 8-BL-sharing-one-OPAMP or a 4-BL-sharing-one-OPAMP setting. On the other hand, if the gain of the OPAMP is too low, some charges will remain on the CFE in the crossbar array and cause error in the Vout readout. The key is to apply a rather compact design with a sufficient gain (preferably >100) to maintain the analog ground at the negative input of the OPAMP for charge transfer. Luo et al. [31] apply a 9 T OPAMP and achieve >99% charge transfer with <1% error. The capacitive array-level energy benefit (including peripheral circuits) reduces to >50× compared with RRAM.
The nonvolatile capacitive synapses are promising to implement an energy- and area-efficient CIM accelerator. In this review, operating principles of the capacitive approach are introduced, followed by the discussion of four types of the reported device designs and their tradeoffs, including MFM, MFS, charge shielding, and 1C1R devices. The array-level performance metrics and their dependency on device parameters are also discussed to inspire future device–circuit codesign. Design strategies are to be explored to mitigate the technological challenges such as D2D variation, thermal noise, and mismatch between Cnv and Cref in the future endeavors of the research community.
The authors acknowledge the following federal sponsors and industrial collaborators for supporting the research on ferroelectric capacitive synapses: IARPA, IMEC, GlobalFoundries, and PRSIM, one of the SRC/DARPA JUMP 2.0 centers.
Shimeng Yu (shimeng.yu@ece.gatech.edu) is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA.
Yuan-Chun Luo (yluo369@gatech.edu) is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA.
Tae-Hyeon Kim (thkim@gatech.edu) is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA.
Omkar Phadke (omkarphadke@gatech.edu) is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA.
[1] S. Yu, H. Jiang, S. Huang, X. Peng, and A. Lu, “Compute-in-memory chips for deep learning: Recent trends and prospects,” IEEE Circuits Syst. Mag., vol. 21, no. 3, pp. 31–56, thirdquarter 2021, doi: 10.1109/MCAS.2021.3092533.
[2] Y. Chen, “ReRAM: History, status, and future,” IEEE Trans. Electron Devices, vol. 67, no. 4, pp. 1420–1433, Apr. 2020, doi: 10.1109/TED.2019.2961505.
[3] T. Kim and S. Lee, “Evolution of phase-change memory for the storage-class memory and beyond,” IEEE Trans. Electron Devices, vol. 67, no. 4, pp. 1394–1406, Apr. 2020, doi: 10.1109/TED.2020.2964640.
[4] S. Ikegawa, F. B. Mancoff, J. Janesky, and S. Aggarwal, “Magnetoresistive random access memory: Present and future,” IEEE Trans. Electron Devices, vol. 67, no. 4, pp. 1407–1419, Apr. 2020, doi: 10.1109/TED.2020.2965403.
[5] T. Mikolajick, U. Schroeder, and S. Slesazeck, “The past, the present, and the future of ferroelectric memories,” IEEE Trans. Electron Devices, vol. 67, no. 4, pp. 1434–1443, Apr. 2020, doi: 10.1109/TED.2020.2976148.
[6] C.-X. Xue et al., “A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2020, pp. 244–246, doi: 10.1109/ISSCC19947.2020.9063078.
[7] H. Jiang, W. Li, S. Huang, and S. Yu, “A 40nm analog-input ADC-free compute-in-memory RRAM macro with pulse-width modulation between sub-arrays,” in Proc. IEEE Symp. VLSI Technol. Circuits, 2022, pp. 266–267, doi: 10.1109/VLSITechnologyandCir46769.2022.9830211.
[8] J. Okuno et al., “SoC compatible 1T1C FeRAM memory array based on ferroelectric Hf0.5Zr0.5O2,” in Proc. IEEE Symp. VLSI Technol., 2020, pp. 1–2, doi: 10.1109/VLSITechnology18217.2020.9265063.
[9] J. Lee, H. Valavi, Y. Tang, and N. Verma, “Fully row/column-parallel in-memory computing SRAM macro employing capacitor-based mixed-signal computation with 5-b inputs,” in Proc. IEEE Symp. VLSI Circuits, 2021, pp. 1–2, doi: 10.23919/VLSICircuits52068.2021.9492444.
[10] Y.-C. Luo et al., “Experimental demonstration of non-volatile capacitive crossbar array for in-memory computing,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2021, pp. 1–4, doi: 10.1109/IEDM19574.2021.9720508.
[11] J. Hur et al., “Nonvolatile capacitive crossbar array for in-memory computing,” Adv. Intell. Syst., vol. 4, no. 8, Aug. 2022, Art. no. 2100258, doi: 10.1002/aisy.202100258.
[12] Z. Zhou et al., “Experimental demonstration of an inversion-type ferroelectric capacitive memory and its 1 kbit crossbar array featuring high CHCS/CLCS, fast speed, and long retention,” in Proc. IEEE Symp. VLSI Technol. Circuits, 2022, pp. 357–358, doi: 10.1109/VLSITechnologyandCir46769.2022.9830291.
[13] T.-H. Kim et al., “Tunable non-volatile gate-to-source/drain capacitance of FeFET for capacitive synapse,” submitted for publication.
[14] K.-U. Demasius, A. Kirschen, and S. Parkin, “Energy-efficient memcapacitor devices for neuromorphic computing,” Nature Electron., vol. 4, no. 10, pp. 748–756, Oct. 2021, doi: 10.1038/s41928-021-00649-y.
[15] Z. Wang et al., “Capacitive neural network with neuro-transistors,” Nature Commun., vol. 9, no. 1, Aug. 2018, Art. no. 3208, doi: 10.1038/s41467-018-05677-5.
[16] Y.-C. Luo, J. Hur, P. Wang, A. I. Khan, and S. Yu, “Non-volatile, small-signal capacitance in ferroelectric capacitors,” Appl. Phys. Lett., vol. 117, no. 7, Aug. 2020, Art. no. 073501, doi: 10.1063/5.0018937.
[17] H. Bae et al., “Sub-ns polarization switching in 25nm FE FinFET toward post CPU and spatial-energetic mapping of traps for enhanced endurance,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2020, pp. 31.3.1–31.3.4, doi: 10.1109/IEDM13553.2020.9372076.
[18] W. Chung, M. Si, P. R. Shrestha, J. P. Campbell, K. P. Cheung, and P. D. Ye, “First direct experimental studies of Hf0.5Zr0.5O2 ferroelectric polarization switching down to 100-picosecond in sub-60mV/dec germanium ferroelectric nanowire FETs,” in Proc. IEEE Symp. VLSI Technol., 2018, pp. 89–90, doi: 10.1109/VLSIT.2018.8510652.
[19] H. Y. Lee et al., “Evidence and solution of over-RESET problem for HfO based resistive memory with sub-ns switching speed and high endurance,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2010, pp. 19.7.1–19.7.4, doi: 10.1109/IEDM.2010.5703395.
[20] N. Saxena and A. Manivannan, “Sub-nanosecond threshold switching dynamics in GeSb2Te4 phase change memory device,” J. Phys. D, Appl. Phys., vol. 53, no. 2, 2020, Art. no. 025103, doi: 10.1088/1361-6463/ab4c1b.
[21] G. Jan et al., “Achieving sub-ns switching of STT-MRAM for future embedded LLC applications through improvement of nucleation and propagation switching mechanisms,” in Proc. IEEE Symp. VLSI Technol., 2016, pp. 1–2, doi: 10.1109/VLSIT.2016.7573362.
[22] S. Dunkel et al., “A FeFET based super-low-power ultra-fast embedded NVM technology for 22nm FDSOI and beyond,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2017, pp. 19.7.1–19.7.4, doi: 10.1109/IEDM.2017.8268425.
[23] H. Lv et al., “BEOL based RRAM with one extra-mask for low cost, highly reliable embedded application in 28 nm node and beyond,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2017, pp. 2.4.1–2.4.4, doi: 10.1109/IEDM.2017.8268312.
[24] M. J. Kang et al., “PRAM cell technology and characterization in 20nm node size,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2011, pp. 3.1.1–3.1.4, doi: 10.1109/IEDM.2011.6131478.
[25] P.-H. Lee et al., “A 16nm 32Mb embedded STT-MRAM with a 6ns read-access time, a 1M-cycle write endurance, 20-year retention at 150°C and MTJ-OTP solutions for magnetic immunity,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2023, pp. 494–496, doi: 10.1109/ISSCC42615.2023.10067837.
[26] S. Mukherjee et al., “Capacitive memory window with non-destructive read in ferroelectric capacitors,” IEEE Electron Device Lett., vol. 44, no. 7, pp. 1092–1095, Jul. 2023, doi: 10.1109/LED.2023.3278599.
[27] S. Yu and P.-Y. Chen, “Emerging memory technologies: Recent trends and prospects,” IEEE Solid-State Circuits Mag., vol. 8, no. 2, pp. 43–56, Spring 2016, doi: 10.1109/MSSC.2016.2546199.
[28] M. Trentzsch et al., “A 28nm HKMG super low power embedded NVM technology based on ferroelectric FETs,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2016, pp. 11.5.1–11.5.4, doi: 10.1109/IEDM.2016.7838397.
[29] Y.-C. Luo, J. Read, A. Lu, and S. Yu, “A cross-layer framework for design space and variation analysis of non-volatile ferroelectric capacitor-based compute-in-memory accelerators,” submitted for publication.
[30] H. Jiang, W. Li, S. Huang, S. Cosemans, F. Catthoor, and S. Yu, “Analog-to-digital converter design exploration for compute-in-memory accelerators,” IEEE Des. Test, vol. 39, no. 2, pp. 48–55, Apr. 2022, doi: 10.1109/MDAT.2021.3050715.
[31] Y.-C. Luo, A. Lu, J. Hur, S. Li, and S. Yu, “Design and optimization of non-volatile capacitive crossbar array for in-memory computing,” IEEE Trans. Circuits Syst., II, Exp. Briefs, vol. 69, no. 3, pp. 784–788, Mar. 2022, doi: 10.1109/TCSII.2021.3108148.
Digital Object Identifier 10.1109/MED.2023.3293060
Date of current version: 15 September 2023