Path: EDN Asia >> Design Centre >> IC/Board/Systems Design >> Robust timing closure in scan shift using sequential gates
IC/Board/Systems Design Share print

Robust timing closure in scan shift using sequential gates

18 Jun 2012

Share this page with your friends

All modern day SOCs use scan structures to detect any manufacturing faults in design .Scan chains designed for testing, connect sequential elements of chip in serial order. Due to absence of combinational logic between the scan elements, these scan chains are prone to hold failures. Moreover in sub-90nm technologies, the OCV (On Chip Variation) has huge impact on timing margins. So unless design is timing signed off across multiple corners, there are very high chances of hold failures specially in hold critical paths like scan chains. These hold failures make the chip unusable in real applications (even though chip may be fully operational in functional scenario). These failures if found on silicon will lead to yield loss and hence huge revenue loss to design companies. So we need to design a robust scan structure to tackle above problems.

In this article, we will start with quick revision of timing basics of flops and latches. In next section, we will discuss scan chains and associated timing closure challenges with them. We will then explain the use of latches and flops in scan chains to create robust scan structure that will be immune to timing failures in sub-90nm technologies. We will cover best possible solution to meet timing requirements for all possible combinations of sequential elements in scan chain.

QUICK RECAP OF SETUP/HOLD TIMING

Flip-flops and latches are the two basic building blocks of a sequential circuit. A flip-flop changes its state at active edge (positive or negative) of the clock pulse applied. The flop simply retains its output when there is no active clock edge. On the other hand latch is a level sensitive device which continuously samples its input and correspondingly changes its output on active pulse level (positive or negative) of some enable signal. A flip flop has master slave configuration having two latches in cascade working on opposite active level. A flip flop area is almost double of latch area.

In order to design synchronous designs, we need to ensure that output of flops/latches is not metastable. This can be ensured by meeting setup and hold checks in design.

Figure 1

Click to enlarge

In a flop, 1-1 is hold check while 1-3 is setup check(Figure 1) for single cycle operation. We need to make sure that data launched by flop1 is captured by flop2 before next active edge. At the same time we need to make sure that data launched by flop1 is not captured by flop2 on same active edge.

Figure 2

Click to enlarge

In case of second flop being negative edge triggered, setup check will be 1-2 (Figure 2) while hold check will be on previous negative edge (Figure 2). This means that data launched by flop1 should not be captured by previous falling edge of flop2. This in real time is not possible unless we have clock skew more than the half cycle.

Thus in a positive-positive or negative-negative flop pairs, setup check is by default one cycle and hold check is zero cycle While in positive-negative or negative-positive flop pairs, setup check is by default half cycle and hold check is half cycle backwards. Lets hold the concept of timing checks in latch for time being.

Scan Chains

Scan chains are used in SOCs to do testing. All registers of design are connected in serial order and stimulus is provided from outside chip and then output is observed through shifting out these chains to detect any stuckat/transition failure. Modern day SOCs are quite complex and have multiple clock domains in a single chip. While scan stitching a design after logical synthesis, it is generally taken care to stitch flops having same clock structure in same scan chain. But due to limited availability of scan input/output ports available at top level, mixing of registers across different clock domains is inevitable. Having scan chains of unbalanced length is also not good idea because of increase in overall test time. So this scan structure leads to timing closure problems in later stages of design. Since scan shifting is done at slower frequency and there is minimal logic if any between flop pairs, setup closure is not a problem. However these paths are very hold critical because of minimal logic and due to skew present between pair of flops. As we discussed above since flops from different domains are mixed in a scan chain, there are many cases where there is huge skew between launch and capture flops. Many of marginal hold violations can pop up during late stages of design due to noise effect and this can lead to hold buffering in otherwise stable or closed design which can cause design goes haywire.

More worse could be the fact that our derate margins may not be sufficient and we can see hold failures on silicon only. This could be the case if uncommon clock path is huge and actual variation on silicon is higher than estimated variation. As we go further in sub-90nm CMOS technologies, variation effects are getting more and more dominated and can result in lot of hold violations on silicon. Any hold failure in scan shift path has severe consequences. It requires lot of debugging and time to detect failing chain on silicon. The situation worsens when we have compression logic for scan as well. Even after detecting failing chain, we need to block it and it will lead to reduced test coverage.

In short hold failure in scan chain is very risky and design must be robust enough to take care of these uncertainties.

There are methodologies like scan chain reordering to rearrange the scan chains depending upon spatial location of registers. Although these techniques are quite handy and designer must explore them as well but as we discussed above there exists cases where scan chain crossing between two clock domains is unavoidable.

A better way to solve this problem is to act proactively and take care of these issues in logical synthesis stage itself where scan chains are built. All flops driven from same clock gating logic should be stitched together and at the end of these bunch of flops, a lockup latch could be inserted to avoid any hold failure from last flop of this domain to first flop of next clock domain

Let us understand this concept from one example shown in Figure 3.

Figure 3

Click to enlarge

If clock period is 50ns and skew is 5ns, we have to insert 5ns + derate margin equivalent hold buffers between flop3 and flop4 at later stages of design. As we discussed above that due to ocv in sub-90nm designs, our standard derates may not be sufficient as uncommon clock path goes beyond certain limits. For example, only 5ps variation per clock buffer (over and above derated value) for a capture path having 10 extra clock buffers will lead to 50ps violation. Moreover this margin may not be sufficient as due to OCV factor this skew can be more than 5ns.

The solution to above problem is inserting lockup latch at output of flop3 with lockup latch having same latency as flop3.

Figure 4

Click to enlarge

As we can see from above waveform (Figure 4), when we insert lockup latch between flop3 and flop4, our timing path is broken in two stages.

1. From flop3 to lockup latch
Hold Check is from 1-1 which is still zero cycle check but much relaxed and easy to meet as there is no skew. Default setup check is from 1-2.

2. From lockup latch to flop4
Hold check is from 2-1. This is major advantage and motivation to insert lockup latch. Hold is shifted half cycle backwards and now if our clock skew is even up to half of shift clock period, we have sufficient margin. This guarantees that there will not be any hold violation now in this case.

Setup check is from 2-3. Latch is transparent during 2-3, and any data captured during this phase will be transferred to flop4 till edge 3(minus setup time of flop ). We can observe that setup check from flop1 to lockup latch can be relaxed as well. 1-2 is default check but latch is transparent during whole half cycle, so in ideal case setup check can shift toward 3. (This concept is called latch borrowing.)

Another important thing to note here is that lockup latch should have clock same as launching flop clock and not as capture flop clock. As we saw above, hold check from flop3 to latch is still 1-1(zero cycle check). We will not have any advantage if lockup latch has its clock same as capturing flop clock. So ideally both launch flop and lockup latch should be driven by same clock buffer in clock tree structure.

The above example shows latch is effective way of fixing hold in scan shift paths. Some people might question that we can insert hold buffers or delay cells to fix these violations also. However a quick look at area of hold buffer, delay cell and latch suggest that hold buffer is appropriate for fixing small hold violations but if violation is slightly large, latch has advantage of both area and delay over buffer. With delay cells there is always risk of huge variation from one operating condition to other, so these cells should be used selectively and smartly. On the other hand latch always guaranty half cycle delay independent of operating conditions.

In our last section we will consider various cases to find out most suitable candidate for fixing hold when there is huge clock skew between launch and capture flop in a scan chain.

DIFFERENT CASES
Case 1: Between positive and positive edge triggered flops
We covered this case in our above example and negative level latch can be used

Case 2: Between negative and negative edge triggered flops
With same analogy as above, positive level latch would be suitable candidate

Case 3: Between negative and positive edge triggered flops
We know that hold is quite relaxed here. No lockup element is required here.

Case 4: Between positive edge and negative edge triggered flops
This is very interesting case. This is not a problem from timing point of view but this is illegal connection in scan shifting. Since in ATPG clock is considered to be return to zero waveform (after shifting is complete clock will be active low), if we allow this type of crossing we will find that after scans shifting all such positive and negative pairs will have same value after clock pulse. This will lead to drop in test coverage because all flops are not independent controllable. So it should be avoided to have such a situation while stitching but sometimes it is unavoidable to do that because of compression logic or hard macros.

We can insert a negative level lockup latch between positive and negative flops but this will solve the ATPG problem but will introduce timing problem because hold check would be again zero cycle check from both flop to lockup latch and latch to negative edge flop.

Another solution is to insert a dummy flop working either on positive edge or negative edge of clock between these flops. It should be noted that dummy flop will still have same value as first flop or second flop after shifting depending upon whether we have make it positive edge triggered or negative edge triggered but this will not cause any problem because this is not any functional flop and we are not using it anywhere to capture data in any pattern. If we decide to insert positive edge flop, clock latency of launch flop and this dummy flop should be same because it will be zero cycle hold check and dummy flop to next flop would be half cycle hold check and similarly if we insert dummy negative edge flop, latency of capture flop and dummy negative edge flop should be same.

This completes all four cases possible between flops that can exist in design but sometimes these cases are not so obvious. For example, a word of caution is for scan stitching in a design where we have hard macros, which are prestitched. Many times we don't have netlist/spef/timing constraints available for these hard macros, it is advisable to insert lockup latch before these hard macros in our design to be sure in case hard macro owner has missed it. Another such example is burn-in mode where scan chains of design are concatenated together in order to toggle all the flops at same time. So here also is the possibility that last element of chain and first element of next chain have timing critical logic or invalid positive to negative crossing. This type of scenarios ideally should be taken care in RTL itself because designer knows better about the order of scan elements while concatenating chains together. If this is not taken care, it is a good practice to insert appropriate lockup latch at the end of each chain.

Using above techniques and guidelines, a designer can ensure robust scan structure in its chip. In case of setup failure design can operate at lower frequency but in case of any critical hold failure, intended functionality of logic is unpredictable. Hold failure in scan shift is very critical. It can result in huge coverage loss while testing. So we need a robust scan structure which can address potential scan shift failure issues like we discussed above. An appropriate lockup type element is perfect solution to address such issues because it guarantees half cycle delay independent of operating conditions.

Freescale Semiconductor




Want to more of this to be delivered to you for FREE?

Subscribe to EDN Asia alerts and receive the latest design ideas and product news in your inbox.

Got to make sure you're not a robot. Please enter the code displayed on the right.

Time to activate your subscription - it's easy!

We have sent an activate request to your registerd e-email. Simply click on the link to activate your subscription.

We're doing this to protect your privacy and ensure you successfully receive your e-mail alerts.


Add New Comment
Visitor (To avoid code verification, simply login or register with us. It is fast and free!)
*Verify code:
Tech Impact

Regional Roundup
Control this smart glass with the blink of an eye
K-Glass 2 detects users' eye movements to point the cursor to recognise computer icons or objects in the Internet, and uses winks for commands. The researchers call this interface the "i-Mouse."

GlobalFoundries extends grants to Singapore students
ARM, Tencent Games team up to improve mobile gaming


News | Products | Design Features | Regional Roundup | Tech Impact