Notes
Slide Show
Outline
1
Original Acclaimed LMO500 Height: 14”, Width: 17.5”, Depth: 18” Weight: 60lbs
unprecedented small system AC & DC parametric accuracy and throughput
2
Add AtSpex’s FASTPINS Best in Class:
- FAST 100/200/400MHz Vector Rates
- DEEP up to 100+Million Vectors
-WIDE 768 Channels
3
The Fast Men Group!
System Design Mgr Rich Lee, CTO & Architect Tim Michels, GM Danny O'Neill
4
IT Randall Ader, CAD Layout / Mechanical Designer Jay Turkovsky and At-Spex SW Eng. Mgr Jim Perry (from left to right)
5
FAST Pin Count and Vector Depth Configuration Summary
6
Pin Mixing vs Pattern Rate
7
Vector Depth Calculations
  • Example 1 S100 Mode: 384 I/O’s populated w/ 512MB 72-Bit DIMMs per PEG


    • Each Quadrant has 96 pins per DIMM
    • 3 bits per pin times 96 pins= 288 bits per vector
    • 288 bits/vector divided by 72 bits/Word= 4 Words/Vector
    • 512MB DIMM actually comprised of nine 64Mx8 (512Mb) components or
    • 9x64= 576 MB= 604 million bytes (decimal)
    • 604 million byes divided by 9 bytes/word= 67.11Million words
    • 67.11 Million Words divided by 4 Words/Vector= 16.78 Million Vectors


  • Example 2 S200P+: 256 I+O@200MHz w/ 1GB DIMM’s (128Mx72)
    • 2 Bits per pin x64 pins/PEG Quadrant= 128 bits/Quadrant round à 144 bits
    • 1GB DIMM actually comprised of qty 9 128Mx8 (1Gb) components=1152MB or 1208 million bytes (decimal)
    • 1208 Million bytes divided by 9 bytes/word=134.2 Million words
    • 134.2 Million Words divided by 2 Words/Vector= 67.1M Vectors



8
Pin Mixing-Pattern Rate MHz Calculations:
6 Words Per Vector vs 4 Words vs 2 Words
  • Rule: entire PEG must be either all I/O's (96 max per PEG) or any combination just I's and O's (192 max per PEG)


  • Pin Mixing: if 1 PEG’s VMM were configure as 96 I/O's and the other 3 PEG's as 192 I's or O's then the customer would have 672 total DUT signal pins comprised of 96 I/O and 192x3= 576 dedicated I's and O's


  • 6 words required as follows: 192 channels x 2bits/channel in I+O mode= 384 bits
  • 384 Bits/Word divided by 72 bits/word = 5.3 Words round-up to 6 Words (zero fill).
  • At 400 Mega Words/sec, divide by 6 words hence 66MHz pattern rate


  • In the case of I/O: 3 bits/pin per Vector
  • 384 I/O divided by 4 PEGs= 96 I/O channels per DIMM
  • 72 Bits/Word divided by 3 bits/channel= 24 channels/word per vector
  • therefore 96 channels divided by 24 Channels/Word= 4 words per Vector
  • in the S100 Mode, the 200 MHz DDR2 DIMM VMM is delivering double-data rate @400 Mega Words per sec or 4 72-bit words per 100 MHz


  • In the + Speed Mode (200/267 MHz vector rate), only 2 words are available at 400/533 Mega Word transfer rate:
  • thus @ 2 Words x 72 bits/Word= 144 bits
  • and 144 Bits divided by 2 Bits/CH= 72 Channels per PEG. Round down to 64.  64 Channels x 4 PEGs = 256 I's or O's
9
Architecture: 192 channels per PEG, 4 PEGs per FAST (768 Pins Max)
each PEG is a Xilinx Leading Edge Virtex-4 90nm, 1148 pin flip-chip 35x35 1mm fBGA
10
PEG Quadrant Schematic Zoomed
11
DDR2 DIMM Controller
12
Sustained 267MHz/533 Mega Word Transfers/sec 
DDR2 Controller Zoomed (Visio)
13
Analog Switch Module:
One Precision pin per two FAST PINS
384:768
14
ASM Zoomed
15
Hi Freq Connector
16
Hi Freq Connector Zoomed
17
Seamless integration of 1 GHz bipolar CLKs with 500 MHz CMOS Pin Electronics (Visio)
via differential signalling, multi-level I/O
18
System Overview
19
FAST Block Diagram
20
PEG Block Diagram
21
PCI North and South Bridges
22
QUICK, DEEP
Pattern Loads
via
Ethernet MAC
DMA over PCI
North Bridge
23
DDR2 Functional Block Diagram
24
DDR2 Enabling Technolgy
25
DDR2 DIMM Controller (Visio)
26
Linux PCI North Bridge DMA Driver
27
Bipolar Quad 400MHz Clock Module (1/4 shown)
28
One Precision Pin per Two Fast Pins 384:768
29
Ethernet to the Test Head for
Deep Fast Pattern Loads
30
PEG Quadrant DUT-Facing Pin Electronics
31
DDR2 Waveforms
32
DDR2 240-pin DIMM Module Dimensions
33
Blue Thunder
34
Tranny Lines Rules
  • All Dut Lines Exactly 50 Ohms…
  • All DUT Lines Are Equal Length…
  • All DUT Lines Are Spread-out to minimze crosstalk...
  • Transmission line theory:  if the round trip time of the line is less than the rise, then you do not have a tranny
  • Example 3 inch, 50 ohm line: 170 ns inch = 510 ps one way and 1 ns Round Trip equals 1 ns edge
  • If the line were 2 inches, then 680ps RT and no tranny to 1 ns edge


  • Rule of Thumb: output impedance of the part must be 1/10 of the Zo line


  • Stray Capacitance  reaks havoc with tranny equations


  • To achieve 50 ohms, stray must be reduced to 5 pF


  • Each connector a signal travels thru is several puff, each node it touches such as relays and switches is some more


  • DUT line spacing rules to maintain 50 ohms means more layers


  • Each foot of good 50 ohm even adds 10 pF


  • To accomplish no stubbs snuggle switches to isolate existing system’s electronics



35
Fastest Small System Ever @ Speed and Measurement/Datalogging Thruput
  • LMO500 upgradable plug-in module (17” x 14”, 23 layer PCB) called FAST_PINS for at-speed testing:
  • 100/200/400MHz vector pattern rate modes and fmax shmooing 1MHz increments
  • 768 signal pins
    • Reconfigurable user-selectable ASIC I/O pin levels: all 3.3V, 2.5V, 1.8V, 1.5V 1.2Vsingle-ended and differential standards
  • uses the existing 384 pin LMO Precision DC Parametic Unit (PMU) delivered  thru 1:2 transparent ASM (Analog Switch Module)
  • Also deploys 4 User DUT Power Supplies -3 to +7 (1.5 A)
  • Thruput matters in production, of course, but also when debugging and datalogging w/ measuring many pins for IIH, IIL, IZH, IZL, VOH, VOL at VCC core and VCC I/O min and max corners


36
Precision DC Pin
37
 400 MHz Clock Module 5-4-3-2-1 !!
38
Edge 693 Drivers Bipolar 500MHz Clock Drivers
39
Bipolar Deskew Verniers
Every Precision Channel’s Driver& Comparator are Aligned to less than 100ps error during calibration
40
Competitive
41
Paradigm Shift
42
Appropriate Mix of CMOS & Bipolar ECL
43
Teradyne J750 versus LMO500 FAST_PINS
44
Competitive Matrix
45
Catalyst vs. Quartet
46
Long in the Tooth
  • 50/100MHz die shrink to 200 MHz abandoned in 2001
  • 70-90 MHz Voltage Comparator Problems
  • 512 pins approx $500K, AtSpex 768 $100-200K
  • Accuracy
  • AtSpex: 90 nm PEG (Pin Electronics Gate Array) for emerging
    • - Scale to 65nm in 2006
  • Added low end MSO
  • Precision LMO Pin is Bipolar, AC parametrics 10 ps resolution, per-pin calibration deskews EPA to < 100ps (Edge Placement Accuracy)
  • Per-Pin CMOS 80 ps resolution
  • Requires
47
Competition
48
Nextest
49
Added Low-end Mixed Signal Option 9-17-01 20MHz & over 20KHz 104 dB SNR
  • The Enhanced Digital Channel Board improves J750 edge placement accuracy from 500 ps to 325 ps, enabling customers to improve yields, increase device specifications, and achieve significant savings in manufacturing.
  • Semiconductors continue to move toward higher data rates for an ever-widening array of devices. As chip speeds increase, so does the need for accuracy in determining propagation delays and other measurements.
  • Test a wider range of devices
  • The improved edge placement of the Enhanced Digital Channel Board expands the range of the J750 to test a wider array of devices than ever before. It's the ideal tool for testing microcontrollers, baseband controllers, and FPGAs of up to 100 MHz.
  • Easy to upgrade and use
  • Current J750 users can extend capability and accuracy of their existing systems at a low cost. The Enhanced Digital Channel Board functions like the standard board - users will not notice any difference, other than the improved accuracy. New customers can take advantage of the many improvements made to the J750 including increased accuracy with the Enhanced Digital Channel Board.
50
Teradyne J750
51
Agilent 93000
52
AtSpex Pricing Strategy
  • Competitive strategy is to drop the price from $2,300 to $1,700 on 100MHz model as they introduce the 200 MHz model.
  • AtSpex base price for 384 maximum pins is $145,000 or $377 per pin.
  • FAST_PINS upgrade $75 - $100,000 across 384 pins or $573 per pin.


53
Last time it was 200MHz in 131 per-pin, 7.5W ECL Gate Arrays, this time it’s Not Your Grandmother’s FPGA’s! : 90 nm, 1148 pin Flip Chip, striped I/O fBGA and 1GB 400MB/s DDR3200’s & 533MB/s DDR2 DIMM 4300’s
54
LMA 750
55
 
56
 
57
"The 768-200 architecturally adds fmax..."
  • The 768-200 architecturally adds fmax 100/200/400 MHz Functional Vector Data Rate all pins can be clock or data
  • Speed.  Pattern Rates to 267 MHz BEST IN CLASS


  • Deep.  Vector Depth to 100M BEST IN CLASS


  • Wide.  768 tester channels: split-pins 384 Drivers, 384 Receviers


  • Quick. Vector Loads enabled by Ethernet-to-the-Tester PCI DMA’ing over the NorthBridge directly to memory


  • Precision: Standard Pin AC parametrics: 100 ps Driver/Comparator edge placement accuracy; DC parametrics: 12 bit resolution, 1/4LBS max linearity error- calibrated to NIST-tracable 5 ½ digit DVM


  • Fast.  Mainframe-competitve AC and DC parametric  test times ex. 100 pin DUT w/ 25,000 vectors containing 4,000 transitions- 4 sec total to measure and datalog to a file.  Time search algorithm averages 11 iterations to aquire an edge to 10 ps res.


  • User tools: IEEE 1149 JTAG SCAN Programming Module, MS Windows user interface, Windows WinCHARacterization per-pin digitizing scope, per-pin Curve tracer; Self-test auto-verifies and datalogs all channels for functional Drive/Compare and AC/DC source and measure


  • And of course, Small- not only the world’s only known 100MHz benchtop IC tester, the only 200/400.  Approx 15 inches cubed and 500W at 120V, auto-cooling fans to 37 degrees C.  Previous 200MHz CMOS architeture was in 4 square meters and 16,000 W.


  • Microprogrammable FPGA-based tester hardware now includes the vector processor unit (VPU), pin formatters, error logic and pipelines and DUT I/O.  ATSpex IP includes PCI DMA Controller thru the N Bridge, 533 MB/s DDR2 1GB Gate Array PEG, and embedded-Linux DMA2PEG device driver.


  • Like a space satellite whose entire mission needs to be changed in mid-flight, the Pin Electronics Gate Array is reconfigurable via the test program loading a new image file ex’s. testing all the ASIC I/O standards a device is cable of, configuring the VPU into an Algorithmic Pattern Generator for embedded memory test


58
 
59
100Mb/1Gig Ethernet to the Test Head
60
 
61
 
62
 
63
 
64
Vector Memory Module DDR Controller
65
Write Enables for the RDF’s
66
 
67
Rx Frame Timing
68
FIFO
69
 
70
 
71
64 Meg x8 or 512Mb DDR2 component w/ ODT (on-die termination)
72
Virtex-4 Data Sheet
DC I/O Specifications
73
 
74
 
75
Big Picture
76
"The 768-200 architecturally adds fmax..."
  • The 768-200 architecturally adds fmax AC Functional Vector Data Rate across all channels, Vector Depth and increased Pin Count as well as providing the original features of the acclaimed LMO500:


  • Fast.  Vector rates to 100/200/400MHz
  • Deep.  Up to 100+M vector pattern depth
  • Wide.  768 tester channels
  • Quick.  Deep Vector loads via Ethernet PCI DMA
  • And of course, Small- not only the world’s only known 100MHz benchtop IC tester, the only 200/400.  Approx 500W at 120V, auto-cooling fans to 37 degrees C


  • Precision: Standard Pin AC parametrics: 100 ps Driver/Comparator EPA edge placement accuracy; DC parametrics: 12 bit resolution, 1/4LBS max linearity error- calibrated to NIST-tracable 5 ½ digit DVM


  • Fast.  Mainframe-competitve AC and DC parametric  test times ex. 100 pin DUT w/ 25,000 vectors containing 4,000 transitions- 4 sec total to measure and datalog to a file.  Time search algorithm averages 11 iterations to aquire an edge to 10 ps res.


  • Modern enabling technologies: 90 nm 1152 pin fBGA FPGA’s w/ state of the art clock management DCMtm/DLL’s; 400/533 MB/s DDR2 PC4300 DIMMs; GB file transfer via Gigabit Ethernet; PCI DMA IP, Embedded Linux


  • User tools: IEEE 1149 JTAG SCAN Programming Module, MS Windows user interface, Windows WinCHARacterization per-pin digitizing scope, per-pin Curve tracer; Self-test auto-verifies and datalogs all channels for functional Drive/Compare and AC/DC source and measure


  • Microprogrammable FPGA-based tester hardware now includes the vector processor unit (VPU), pin formatters, error logic and pipelines and DUT I/O.  ATSpex IP includes PCI DMA Controller thru the N Bridge, 400-533 MB/s DDR Controller, 200/400MHz , Vector Processor and Pin Electronics Gate Array PEG, and embedded-Linux DMA2PEG device driver.  Like a space satellite whose entire mission needs to be changed in mid-flight, the Pin Electronics Gate Array is reconfigurable via the test program loading a new image file ex’s. testing all the ASIC I/O standards a device is cable of,
    • configuring the VPU into an Algorithmic Pattern Generator for embedded memory test
  • tm trademark of Xilinx
77
Linux SBC
78
CMOS Feature Geometry
79
China
80
Japan
81
Your Author
82
 
83
Virtex-4 IDELAY Enabling Technolgy: Per-Pin IDELAY to Auto-Deskew each Quadrant’s 267MHz DDR2 72-bit data lines wrt center-aligning DQS- 64 tap, 80 ps resolution
84
3D Rendering
85
Windows GUI Interface
86
AC Datalogging
87
Fast Config
88
Shmoo and DC Accuracy Spec
89
Algorithmic Pattern Generation
90
DDR2 Read Timing Margin
91
DDR Device Test Vectors
92
 
93
Linux PCI Northbridge DMA Driver: 10/100/1000 Ethernet-to-the-Testhead for DEEP WIDE FAST VECTOR Loads
94
Striped I/O Flip-Chip
95
 
96
Universal 1 mm fine pitch fBGA DUT Card
97
CMOS Enabling Technology: Clock Manager Deskew Adjust
98
SOC Testing
99
Precision DC Parametric I-Force V-Measure and V-Force I-Measure
100
System Power Distribution and Twisted Pair for Bipolar Differential Signalling Distrinution
101
DC Parametric
102
 240 pin Unbuffered DDR2 DIMM Power-up Init
103
512 MB 240-pin DDR2 Unbuffered UDIMM IDDD Spec
104
Multiple clock domains
105
DDR2 DIMM DC Test Conditions
106
DDR-2 DC Operating Conditions (SSTL_1.8)
107
DDR-2 AC DC Test Conditions
108
CKT_3 Enable Timing Diagram
109
Ex Verilog Test CKT_3: Clocked I/O schematic
110
 
111
Virtex4 Power Supply Test Conditions
112
DC Power Supply Test Source for Virtex 4 LX60ff 1148 pin fBGA
113
DDR2 Component DC’s
114
500MHz CMOS Digital Clock Manager DCM

  • –1MHz -500MHz range
  • –20ps phase precision
  • –Jitter < 10% of clock period
  • –Duty cycle distortion < 5% of clock period


  • Zero delay buffer
  • Frequency synthesis
  • 90-degree shifted output clocks
  • Phase Shift control
    • –Specify clock period fraction
    • –Directly control delay line tap
  • Dynamic Reconfiguration Port (DRP)
    • –Adjust Multiply/Divide and Phase Shift values without reconfiguring device
      • •DCM reset required for M/D change




115
DDR2 Read Analysis @ 267 MHz
116
Flip Chip
  • Uses solder bump connections across the chip’s surface
  • Traditional I/O pads arranged around perimeter and chips pads wirebonded to the package
  • Eliminates the need for a pad ring- considerably reduces chip area
  • Allows the I/O pads to be optimally placed in the middle of the configurable logic array shortens the signal paths reduces L and C
  • ASMBL tm –Striped I/O
  • 11 levels of Cu metallization provide better signal routing and configurability
117
 
118
Paradyme Shift: FAST Pin Drivers implemented in FPGA SELECT I/O
119
Powerful but Hot- Way Too hot for the End User

  • 973 200/400MHz (multiplexed) CMOS and Bipolarà J2 400/800 MHz SiGe cancelled during the downturn of 2002
    • 973 43 KiloWatts

  • Hi-end SOC Testers Mix of Bipolar and CMOS: Example  LTX Fusion HFi
    • 4 Digital channels comprised of a CMOS ASIC, 2 GaAs timing chips, 2 Dual Channel DR/CMP




  • Cancelled Credence ValStar (2001) was 1st CMOS 200MHz machine
    • 7 Watts /channel had to H2O cooled



  • Our Bipolar ECLin ps LMA750 200MHz 256 pins
    • 131 pin ECL Gate Array per pin Timing Controller  7.5 Watts/pin
    • Edge Semi or Analog Devices DR/CMP/LD: 5 Watts/pin
    • 84 pins took 190 cfm fan- temp of system 70 degrees C



120
"DUT facing Pin Drivers/Rec’vrs"
  • DUT facing Pin Drivers/Rec’vrs:
    • 192 @ 100MHz (96 I/O) VCCO= 3.3, 2.5, 18. 1.5


  •  PCI pins:
    • 50 @ 33MHz VCCOPCI3_




  •  Test Vector DIMM


  • DUT Misc Pins: 7 enable loop, clock loop, local loop


  • FPGA Sync 24 position run en, fail count




121
240 pin 1GB Un-buffered DIMM Test Pin Requirement:

  • 200/267 MHz I/O Pins Required
    • 72 DQ’s 72
    • 8 Differential Pairs Bidirectional Data Strobes 16 8 CLK’s and 8 Klunks
    •   ----
    • 88 FAST I/O
  • Dedicated Input Pins
    • Command Bus: /CS, /RAS, /CAS, /WE 4
    • 3 differential CLK pairs 6 3 CLK’s and 3 Klunks
    • Nine Write Data Masks- 1 per per byte 9
    • Clock Enable 1
    • Multiplexed Address Bus 14
    • ---- 34 FAST IN’s
  • Specialty I/O Pins
    • SDA 1
    • Check Bits 8
    • ---
    • 9 I/O’s
  • Specialty In Pins
    • ODT, SCL, SA 5 I’s
  • -------
  • Total 97 I/O’s and 39 I’s


122
"("
  • (

        input [2:0] CLK,
        input [2:0] CLK_N,
        input CKE,
        input CS_N,
        input RAS_N,
        input CAS_N,
        input WE_N,
        inout [8:0] DM_RDQS,
        input [1:0] BA     ,
        input [13:0] ADDR   ,
        inout [71:0] DQ     ,
        inout [8:0] DQS    ,
        inout [8:0] DQS_N  ,
        output [8:0] RDQS_N ,
        input ODT
    );


123
DDR2 DIMM Test Program Development from the Verilog Model
124
Training FIFO
125
Training FIFO V-code
126
 
127
Our 90’s 100/200 MHz Bipolar machine
 7.5W per pin Timing/format, 3 W per PEC Drive/Cmp/Load
128
2 channels per PEC- one on each side- 6 Watts, 128 of these arranged radially in a 256 pin head
129
 
130
 
131
Appropriate Mix of CMOS & Bipolar ECL
132
CMOS Delay Sensitivities Erode the Data Eye
  • Propagation delay (tpd) is a function of frequency and can vary by several nanoseconds (ns) - killer.
    •  TPD HL versus TPD LH variance up to several ns due to PNP NPN totem pole mismatch.
    •  Poor pulse width control so bad as a clock generator.
    •  Pin to pin skew up to several ns and this is improvement over last year!
    •  power dissipated only when switching: Voltage squared x frequency x C Load.
    •  drift in tpd 80 picoseconds (ps) per degree Celsius.
    •  LMO creates a convection oven w/ thermal 36 C.


    • CMOS stabalization of device juntion temperature ensuring consistent edge placement
    • Bipolar: uniform power density, low temperature coefficient

133
"Liquid cooling"
  • Liquid cooling
    • Plumbing, pumps, heat exchange units
    • Each PCB cold plates to physically contact the hottest and critical devices- heat sinks and and thermal grease
134
 
135
 
136
FAST Orcad PCB Layout 17” x14” actual- 20 layers
137
 
138
 
139
 
140
Let’s Build a user-affordable vesatile
  • Build the At-Speed w/ 90 nm CMOS
    • Differential LVPECL signalling from Bipolar programmable frequency synthesizer
    • Near perfect analog switches to isolate the Precision AC/DC
    • Zero ON impedance, <5 pf
    • 768 100/200/400 MHz FAST PINS
    • 80 ps resolution, 64-tap PVT-impervious per-pin programmable deskew
    • At 100+MHz it all about preserving data eye and centering locked clocks



  • Build the Precision AC/DC w/ mature Linear, and Off-the-shelf Bipolar Analog (Timing Generators, VComp & Digital (ECLin ps)
    • Per-pin Deskew 10 ps resolution, calibration of each Driver (CMOS programmable from -.5V to +5.5V) and Comparator Linear w/ ECL-digital out w/ < 50ps dispersion 384 I/O to less than 100ps EPA
    • ATE SOC Component Makers Brooktree/Edge and Vitesse GAAS both exited market leaving ATE hi and dry




141
What You Don’t Want to do in Windows:
 Deep, Fast Pattern Loads
  • User Interface- simply a broadband connection
    • Implemented via an embedded Small Board Computer with an Ethernet NIC MAC Media Access Controller to handle the IP Stack
    • AtSpex Software
      • FAST
        • Linux PCI Ethernet DMA Driver for Deep, Fast Pattern Loads
        • Remote Command CLI
      • Precision- What You Do Want to do in Windows
        • Windows GUI for tester operations, AC/ DC programming
  • What You Don’t Want to do in Windows:
    • Store and Edit Deep, Fast Patterns Loads
    • Example:
      • 288 bits per PEG x 4 PEGs or 1152 bits/ 768 channels= 144 Bytes per vector
      • Now 10 Million vectors times 144 Bytes per vector= 1440 M or 1.4 GB
      • Broadband to the test head- 100Mb/sec Ethernet tuned to PCI DMA 33MHz/32 bit
        • Must move across the PCI Bus twice 1st from customer network to SBC and 2nd from SBC memory to the Deep Pattern Burst DDR DIMM’s- assume 10Mb vector load rate after overhead 1.4GB divided by 10 Mbs per sec
        • WORST CASE ALL PINS and 10Million patterns
          • 1.4 x10**9/10**7= 1.4 x10**2 sec; 1400 sec 60 sec/min 60 min/hour= 23 min
        • TYPICAL CASE 200 PINS 96 I/O 3 bits per pin and 96 I+O and 100,000 patterns so 2 PEGs x288 bits per PEG or 576 bits 72 Bytes per vector
          • Now 100,000 times 72 Bytes per vector= 7.2 x10**6 bytes divided by 10**7= .72 secs



142
 
143
Broadband to the Test Head via Linux SBC
144
Multiple Clock Domains
145
Vector Addressing
146
Pi Network 50 Ohms
147
Realizable 450MHz FIFO
148