Path: EDN Asia >> Design Centre >> Computing/Peripherals >> Does DDR skew balancing scare you?
Computing/Peripherals Share print

Does DDR skew balancing scare you?

21 May 2012  | Vineet Gupta, Swati Gupta, Naveen Raina, Sunit Bansal

Share this page with your friends

The design under consideration has a maximum frequency of 1GHz, 10 clock domains, 96 modes and corners, and is implemented on 32nm using SOC Encounter from Cadence. The paper highlights the reasons like delay variation of PHYs, placement restriction, etc. due to which skew balancing becomes a complex affair. The proposed method helps to reduce the turn-around time and iterations for convergence to implement four DDR blocks by more than 50%. The stringent skew targets of 40ps were achieved in all the corners as compared to 70ps in previous projects.

Double data rate (DDR) designs have become a common usage in many ASICs and interfaces. These designs have an advantage in terms of increased performance as compared to traditional single data memory designs. DDR uses both the edges of clock to transfer data and thus can serve twice the frequency as compared to SDR. This increased data bus performance is attributed to source synchronous data strobes which permit data to be captured on both rising and falling edge of the clock. Although better performance can be achieved using this design, the designer has to be careful during the schematic and layout stages to insure the desired performance is attained. Smaller setup/hold times, cleaner reference voltage and need for proper termination brings up challenges which were not faced in SDR.

For an eight bit DDR PHY to send the data appropriately to the memory, all the eight bits should reach simultaneously. Also to receive the data bits, the pad to Phy delays need to be synchronized which is practically not possible given the routing and placement challenges. A skew of a certain degree is allowed to enable this – in this case it was 40ps. Even to attain this skew is a design challenge owing to the complex placement of pads with respect to the Data Phys in DDR. Along with that there were multiple modes and corners so the delays as well as skews that vary with PVT. The variation could be as much as 20% in some cases.

Along with Data PHY, the Address PHY is likewise seen in the design as well. This PHY is responsible to generate read and write signals as well as to send address to the memory. Just as all the bits of data are sent simultaneously to the external memory, address bits should be likewise sent simultaneously as well. Again there is a skew requirement for these bits. But along with that there is a timing consideration, how to check whether the address bits would reach the memory in time at DDR design level. To emulate the scenario, setup and hold from the PHY to pads and the reverse path from pads to Phy are checked with respect to the DDR control clock. These setup and hold requirements become stringent due to multi-corner functioning of the design.

This paper explains the methodology by which the desired skew values were achieved along with proper timing for the Data Phy signals.

What is DDR?
DDR SDRAM (double data rate synchronous dynamic random access memory) is a class of memory integrated circuit used in computers. It achieves greater bandwidth than the preceding single data rate SDRAM by transferring data on the rising and falling edges of the clock signal (double pumped). Effectively, it doubles the transfer rate without increasing the frequency of the clock.

The DDR controller which includes the DDR PHY, PLL and soft logic will convert the input data received from CPU at HDR clock and transfer it to SDRAM at DDR clock. The frequency of DDR clock is four times that of HDR clock. The data is captured by SDRAM at SDR clock at both edges of the clock. The frequency of SDR clock is twice than that of HDR clock.

The major constraints in implementing the DDR controller are to Balance the stringent skew requirement between all the Data bits of the PHY. The clock tree method was used to meet the constraint. In this paper, old methodologies were discussed and explained in accordance with the proposed method.

There were several methodologies which were followed earlier for balancing the data and address signals. These include the buffer tree synthesis approach and the manual buffer placement approach using HFN (High Frequency nano Router in SOC Encounter). The current methodologies are time consuming which involves several loops of manual iteration before convergence to final figures. Also the HF router approach was not supported by SOC Encounter from 10.1 versions and moving to old version and database exchanges was quite a painful task. Moreover, the frequency of the design is 466MHZ as compared to 433MHZ in previous samples.

On top of it, the multicorner scenario from 32nm onward has made the manual iteration quite challenging. Tool based CTS flow was used for getting the right skew and balancing of the paths. However, just defining the clock sink for every end point and grouping them did not solve the purpose. The following is the methodology and precautions adopted to make an automated flow which helped achieve very good skew number in all the corners.

A clock spec file with the Phy data/address pins as clock and pads' input pins as sinks, and pad output pins as clocks with Phy data input pins as sinks was generated. To prepare the design for CTS, some precautions were taken at placement stages. Placement blockages in the region between Phy and pads were added so that the tool does not put any buffer or register there, and sufficient area is available for clock buffers. These signals were declared as clocks in a spec file which was read in pre-placement stages so that they were not touched, which prevent buffering for transition fixing. These blockages were removed before CTS and reinstated once CTS is completed. The blockages that were reinstated have to be soft blockages or else the tool will move all the CTS added buffers in post CTS stage.

Clock groups in MMMC: The first experiment which was done had all the clocks for each Data and Address PHY grouped together so as to attain minimum skew among data bits of each PHY. This approach was tried in multi-mode, multi-corner CTS. A reasonably approachable value for max and min delay based on previous experience was given, and clock inverters were used for CTS. The results were not good. The min delay for some paths was as low as 2ps whereas the max delay was as high as 200ps leading to very high skew values. The reason was that some pads were very close to data bits and others were far away leading to different routing and buffer placements. The tool seemed not to respect the clock grouping done.

Min delay approach with high max delay: The clock grouping was removed and the min delay was used as the prime constraint. The min delay was kept at 0ps so as to see the worst min delay that the tool could achieve. The max delay was kept at its previous high value. With this run, the longest path i.e. the path having the worst min delay was determined at 120ps. For the next iteration, this was kept as Min delay so that the tool would try to reach a common min delay for all the paths. The results were not good as some bits again had a high max delay.

Min delay approach with same max delay: The max delay value was run same as min delay so that the skew target given to the tool is forced to be 0ps. With this, the results turned out to be good. All the bits had reasonable insertion delays with skew of only 30ps, well within the budget.

Once CTS is done, the removed blockages were added to start post-CTS execution. To check the results in Sign off stages need some precautions. The paths analyzed were not real paths. Skew values were analyzed via the Phy to pad and vice versa paths. For timing, paths were analyzed from Address phy only by enabling the LB pin as clock at Phy output. A script performing the following key analysis is as follows:

Balancing: The skew balancing was analyzed among all the signals of Data Phy. A script generates the arrival times from the PHYs to PAD_OUTPUT and from PAD_INPUT to the PHYs. The reports can be generated in both Encounter and ETS and analyzed in an Excel worksheet. From these script, an excel sheet was generated and an output similar to Table I was achieved. Table 1 shows the delays and skew for one of the data phy instances. The first column gives the path for which the skew is analyzed which in this case is PHY to Pad. The other columns give the rise/fall insertion delays at PHY output, the delays to pad (including PHY delay) and the path delay, respectively.

1 • 2 • 3 Next Page Last Page

Want to more of this to be delivered to you for FREE?

Subscribe to EDN Asia alerts and receive the latest design ideas and product news in your inbox.

Got to make sure you're not a robot. Please enter the code displayed on the right.

Time to activate your subscription - it's easy!

We have sent an activate request to your registerd e-email. Simply click on the link to activate your subscription.

We're doing this to protect your privacy and ensure you successfully receive your e-mail alerts.

Add New Comment
Visitor (To avoid code verification, simply login or register with us. It is fast and free!)
*Verify code:
Tech Impact

Regional Roundup
Control this smart glass with the blink of an eye
K-Glass 2 detects users' eye movements to point the cursor to recognise computer icons or objects in the Internet, and uses winks for commands. The researchers call this interface the "i-Mouse."

GlobalFoundries extends grants to Singapore students
ARM, Tencent Games team up to improve mobile gaming

News | Products | Design Features | Regional Roundup | Tech Impact