Long-Wire Data Acquisition Specification

© 2002-2016 Kevan Hashemi, Brandeis Univiersity HEP Electronics Shop

Contents

Introduction
Architecture
Cables
Devices
Multiplexers
Repeaters
Drivers
Servers
Software
Transmit Signals
Receive Signals
Direct Clocking
Power Supplies
Device Types
Command Bits
Address Bits
Element Numbers
Driver Jobs
TCPIP Messages
version_readbyte_writebyte_readstream_read
byte_pollloginconfig_readconfig_write
stream_deleteecho
Design
Problems
Reset FailureCold StartMask Burn-OutIncorrect Pull-Up

Introduction

The name LWDAQ stands for Long-Wire Data Acquisition, and refers to the length of the cables that connect LWDAQ devices, LWDAQ multiplexers, LWDAQ Repeaters, and LWDAQ drivers. These cables can be up to 130 m long. They supply power and control signals from the drivers to the devices, and carry analog and digital signals from the devices to the drivers. A LWDAQ makes only one measurement at a time. Large LWDAQ systems perform thousands of measurements in sequence, and take thousands of seconds to complete their entire data acquisition cycle.

All LWDAQ cables are interchangeable. Each cable is a network cable, very similar or identical to a CAT-5 cable, that contains eight wires. All eight wires take part in the connection. Four carry analog and digital power. The reminaing four make up two twisted pairs. One pair carries digital comminucation from the driver. The other pair carries digital or analog communication from the device. Both the digital and analog signals are LVDS (low-voltage differential signals).

The LWDAQ transmits sixteen-bit addresses to its multiplexers and sixteen-bit commands to its devices. Each sixteen-bit transmission takes 4 μs. Some devices turn on lasers or LEDs when they receive the correct commands. Some return analog voltages that the driver digitizes and records. Image-capture devices allow the driver to read out pixel voltage levels directly, and digitize them with correlated double-sampling. Devices that produce digital data instead of analog data transmit their data to the driver at the driver's command one byte at a time. Data transfer in this way takes place at roughly 1 MByte/s.

The LWDAQ performs correlated double-sampling at up to 2 MSPS over a 130-m cable by transmitting a 2 MHz pixel clock from the driver to the device. The device returns a new pedestal voltage on the HI portion of the clock and a new sample voltage on the LO portion. The driver digitizes the difference between the sample and the offset and stores the eight-bit result in its own memory. Each sample takes 500 ns, but the time taken for the clock signal to travel to the device along a 100-m cable, plus the time taken for the analog signal to return over the same cable to the driver, is 1000 ns. The driver synchronizes its digitization with the returned analog signal by measuring the loop time of the cable before it begins clocking the analog signal.

All contemporary LWDAQ systems use some form of LWDAQ server. The server makes the LWDAQ available over TCPIP using the LWDAQ message protocol. We provide software that communicates over TCPIP with LWDAQ servers, and we invite you to download this software from our website. Other users have used Visual C++ and LabView to communicate with LWDAQ servers.

This specification may not be exactly the document you need to answer your questions about the LWDAQ. The following table lists several other likely documents.

DocumentDescription
User ManualHow to use the LWDAQ hardware and software.
Optical Alignment SystemInstallation of a large LWDAQ system.
Cable ManualPerformance and construction of LWDAQ cables.
Driver ManualManual and data sheet for the A2037 LWDAQ driver.
Driver ManualManual and data sheet for the A2071 LWDAQ driver.
SoftwareLWDAQ software download page.
Table: LWDAQ Documents

If you are a user of the LWDAQ, your best place to start is the LWDAQ User Manual.

Architecture

The LWDAQ protocol supports the architecture shown below. LWDAQ drivers connect to repeaters, multiplexers, devices and sensors. Sensors connect to devices. We use the term "sensor" to include not only sensors, but also light sources, actuators, and any other outputs a device might have. Devices are slaves of the driver. Communication between the driver and its devices can pass through repeaters and mulitplexers. Repeaters restore outgoing control signals mid-way along lengthy cables. Repeaters also allow the driver to switch off power on the other side of the repeater. Multiplexers are simple switches that direct communication to one of several devices. The following figure shows the various ways in which LWDAQ components can be connected together.


Figure: LWDAQ architecture. Connection to the outside world is via TCPIP.

As you can see from the diagram, a single device can control multiple sensors, light sources, actuators, and electrical outputs. We refer to all device inputs and outputs as sensors.

Example: The Resistive Sensor Head (A2053) is a LWDAQ device that provides eleven resistive sensor connections. The A2053A allows us to connect eleven 1000-Ω RTDs. The LWDAQ reads the sensors out sequentially with precision 0.02 mK. The twisted-pair wires to the sensors can be several meters long.

Example: The Polar BCAM Head (A2051) is a LWDAQ device that controls four lasers and reads out two TC255P image sensors. The lasers and image sensors are arranged on two auxilliary boards connected to the A2052 by flex cables.

The LWDAQ connects to the outside world over a TCPIP network, such as the Internet or a Local Area Network. We use our LWDAQ Software to control our LWDAQ systems, but some users write their own control software. One user has written a LabView interface.

All LWDAQ cables are inter-changeable. Two wires carry LVDS serial transmissions from the driver, two carry acquired data transmissions from the device, and four carry power. The driver can transmit sixteen-bit address words, or sixteen-bit command words. Multiplexers respond to address words, and devices respond to command words. A single TCPIP socket might give access to only a few devices, or it might give access to thousands.

Example: The LWDAQ Driver with Ethernet Interface (A2037E), forst produced in 2003, has one TCPIP connection and eight driver sockets. The data download speed from the driver is 160 kBytes/s over 10-Base-T ethernet.

Example: The LWDAQ Driver with Ethernet Interface (A2071E), first produced in 2011, also has one TCPIP connection and eight driver sockets. Data download speed is 1.4 MBytes/s over 100-Base-T ethernet.

Example: The TCPIP-VME Interface (A2064F) sits in a VME crate and provides a single TCPIP connection. The A2064 gives access to as many LWDAQ Driver with VME Interface (A2037A) as you can fit in the crate. In ATLAS, our crates each contain 20 A2037As, each with eight driver sockets, so that each TCPIP socket controls 160 driver sockets. Each driver socket is connects to a LWDAQ Multiplexer (A2046) at the end of a cable over 100 m long, and each multiplexer connects to up to 10 devices. That makes 1600 devices under the control of a single TCPIP socket. The average device contains a few sensors and light sources, so we have around 10,000 sensors-actuators under the control of the TCPIP socket. Image data download speed from the TCPIP-VME Interface is 340 kBytes/s over 100-Base-T.

All the cables and sockets in LWDAQ systems have names.


Figure: LWDAQ cable and socket names. Cable-mounting plugs are named after the sockets with which they mate.

Any number of multiplexers and devices may be connected to a driver, but only fifteen devices may be connected to a single multiplexer, and we cannot connect a multiplexer to a multiplexer. The figure above defines the names of LWDAQ cables and sockets. We name plugs after the sockets with which they mate. When we insert a repeater in a root cable, the cables on either side of the repeater are both called root cables. The one between the repeater and the driver is the upstream root cable, and the other is the downstream root cable.

Cables

All LWDAQ cables are category-five (CAT-5) cables, all plugs are 8-way modular plugs, and all sockets are 8-way modular jacks (RJ-45). The LWDAQ guarantees against ground loops in the data acquisition cables, enclosures, and multiplexers. In order to make this guarantee, some sockets must be shielded, and others must be unshielded.

Rule: All LWDAQ driver and branch sockets must be unshielded, and all root and device sockets must be shielded.

The cables and plugs may be shielded or unshielded, as the system designer sees fit.

Rule: A device, multiplexer, or repeater enclosure may connect to the local circuit's zero-volt potential through a resistance ≥ 1 kΩ and a capacitor ≤ 1 μF.

Recommendation: Connect the shield of a device or repeater socket to the device zero-volt potential with a 10-nF capacitor. Connect the shield of a multiplexer socket to the multiplexer zero-volt potential with a 1 μF capacitor.

Rule: A LWDAQ receiver circuit must operate without error across 100 m of solid-core CAT-5 cable.

Rule: The receiver circuit on a LWDAQ multiplexer or LWDAQ device must operate without error across 10 m of stranded-core CAT-5 cable.

The reason we specify solid-core CAT-5 in the above rule is because the dispersion and resistance specifications for CAT-5 cable are stricter for solid-core cables than they are for stranded-core cables. Likewise, we have a relaxed specification for stranded-core cables. We find that the LWDAQ functions perfectly with a fully-loaded ten-slot multiplexer at the end of a 130-m shielded, solid-wire cable. The devices begin to fail at around 150-m, and this is because the Control Signals transmitted by the driver are not received properly by the multiplexers and devices. Stranded cables, on the other hand, must be far shorter. With a single device on the end of a stranded cable, we can operate reliably up to a length of 13 m. We begin to see occasional failures at 15 m, and frequent failures at 20 m.

Repeaters allow you to extend the range of the LWDAQ by restoring the outgoing logic signal from the driver at some point along the root cable before the signal has degraded significantly. If we want the length of cable from the driver to the multiplexer to be 200 m, the best place to put the repeater is 100 m from the driver.

Devices, repeaters, and multiplexers may be mounted in metal enclosures, with shielded connectors making direct contact with the enclosures through their shields. To avoid ground loops, there can be no low-impedance low-frequency connection between a cable shield and the zero-volt supply on either a multiplexer or a device.

For more information about cables, see the Cable-Making Manual. There you will find a graph showing data acquisition quality with total cable length. When a driver clocks analog data out of a device, it synchronizes its digitization of the signal returned from the device by first measuring the round-trip propagation delay to the device. We call this propagation delay the loop time, and it increases with cable length by 5.0 ns/m.

PinSignalWire ColorDescription
1T+BrownLVDS Transmit Positive from Driver
2T−Brown and WhiteLVDS Transmit Negative from Driver
3R+OrangeLVDS Receive Positive from Device
4R−Orange and WhiteLVDS Receive Negative from Device
5+5VGreen5-V Power
60VGreen and White0-V Power
7+15VBlue+15-V Power
8−15VBlue and White−15-V Power
SShieldCoaxial Foil or BraidElectric shield for high frequency noise.
Table: Connector Pin-Out and Color Codes. LVDS is for "low voltage differentialsignal".

Rule: The plug and socket pin-outs, and wire color-codes, for all LWDAQ connectors and cables, must conform to the Connector Pin-Out and Color Codes Table.

Devices

The LWDAQ specification allows up to sixteen devices per multiplexer. When these devices draw current, their power supply voltages drop because of the resistance of the wires that connect the multiplexer to the driver. Although one device may tolerate a large drop in the supply voltages, another may not. The LWDAQ specifies maximum ranges for power supplies at the devices. We do not want inactive devices to waste the current-delivering capacity of the multiplexer.

Rule: All LWDAQ devices must provide a sleep state, in which consumption from each of the three supply voltages (+5V, +15V, −15V) is less than 5 mA.

Rule: All LWDAQ devices must power-up asleep.

Rule: All LWDAQ devices must be such that it is impossible to damage them by any sequence of transmissions from the driver.

We reset a LWDAQ by turning off the power and turning it on again. When the devices turn on, they will be in the sleep state, and they will not over-load the power supplies.

Multiplexers

To select a device, a LWDAQ driver disables all its driver sockets except the one connected to the device. We call this the active driver socket. The driver transmits an address word through the active socket. Assuming there is a multiplexer connected to this socket, the multiplexer will disable all its branch sockets except the one specified by the address word, which becomes the active branch socket. By means of the address transmission, the driver selects a unique device as the target of its next command word transmission. We call this unique device the target device.

Rule: On a LWDAQ multiplexer, address bit DA1 enables branch socket 1, DA2 enables socket 2, and so on up to bit DA15 enabling socket 15.

The A2046 multiplexer provides ten branch sockets, while the A2085 provides fourteen, with a fifteenth dummy socket for local loop-back.

Repeaters

A repeater performs two functions. It receives and re-transmits all logic signals from the driver. This allows us to place multiplexers and devices farther from the driver. The repeater also allows us to turn off power to its multiplexer or device. When we transmit address word $0001, which is the same as selecting branch socket zero, the repeater recognises the zero socket selection and switches off its downstream power.

Example: The LWDAQ Repeater (A2058) is a single repeater. The Patch Panel (A2059) is a six-way repeater circuit.

Drivers

A large LWDAQ may consist of many drivers, each controlling many multiplexers, repeaters, and devices.

Example: The LWDAQ Driver with VME Interface (A2037A) is a one-slot wide, double-height VME card that uses the +5V and ±12V supplies on the VME backplane to power its multiplexers and devices.

Other drivers combine the functions of a driver with those of a server.

Example: The LWDAQ Driver with Ethernet Interface (A2037E) and the LWDAQ Driver with Ethernet Interface (A2071E) are stand-alone black boxes that connect to the local ethernet. They allow the LWDAQ to be controlled by LWDAQ Software over the internet.

Servers

A server makes the LWDAQ available over TCPIP. Clients connect to the server and control it by means of LWDAQ messages. We define the LWDAQ Message protocol in the TCPIP Communication section of the A2037E Manual.

When we have many drivers that do not include the server function, we use a dedicated server to provide TCPIP access to all the drivers.

Example: The VME-TCPIP Interface (A2064) is a one-slot wide, double-height VME card that allows all the A2037As in a VME crate to be controlled over the internet by LWDAQ Software.

Software

Our LWDAQ software is an accessory to the LWDAQ. You are welcome to use it, but some LWDAQ users prefer to write their own data acquisition software. Our software uese the LWDAQ messaging protocol to acquire data from a LWDAQ server over TCPIP. You can download our software here. We describe the installation and operation of the software in our LWDAQ User Manual.

Transmit Signals

The transmit signal, T, is the signal carried from the driver by the T+ and T− wires. The receive signal is always a logic signal, never an analog signal. The driver uses the transmit signal to switch on repeaters, select sockets on multiplexers, and control devices. The termination of the transmit signal at the receiver end is mandatory.

Rule: The transmit signal must be terminated with 100Ω on every multiplexer, repeater, and device.

The transmit signal carries information with sequences of transmit bits. Each transmit bit begins with a 50-ns LO followed by a 50-ns HI. The following figure shows the three valid transmit bit patterns. Each pattern occupies 375 ns if we measure from the rising edge that marks the start of the pattern.


Figure: Transmit Bit Patterns. The horizontal axis is time in nanoseconds. Time zero is the rising edge of the bit pattern, which must be preceded by at least 50 ns of LO.

Rule: All rising edges on the transmit signal must be part of a valid transmit bit pattern, as defined in the Transmit Bit Patterns figure. The timing of the transmit bit pattern must be correct to within ±5 ns at every edge.

The driver combine transmit bits to perform the following control functions.

SignalDescriptionFunction
Command Word1 one bit, 16 one and zero bits, 1 stop bit.Transmit sixteen command bits
to a device.
Address Word1 zero bit, 16 one and zero bits,1 stop bit.Transmit sixteen address bits
to a repeater or multiplexer.
Data Synchronizationsequences of stop bits.synchronize return of analog or
digital data on R+/R−
Idlelogic HI or ZNo activity.
Table: Transmit Signals

Address and command words are both control words. A control word transmission must follow either a stop bit transmission or a power-up reset. The control word begins with a type bit. If the type bit is a one bit, the control word is a command word. If the type bit is a zero bit, the control word is an address word. Sixteen data bits follow the type bit. Each data bit can be a one bit or a zero bit. A complete word transmission takes 6.8 μs. Each of the eighteen bits takes 375 ns, and we need 50 ns to set up the first rising edge of the transmission.

Repeaters and multiplexers respond only to address words. Devices respond only to command words. The circuit on a device or multiplexer that receives command or address words is a control receiver. It may be a command receiver, an address receiver, or both at the same time.

Every time a control receiver sees a LO-to-HI transition on T, it must determine which of the three bit patterns follows the transition. A stop bit always restores the receiver to its rest state. If the receiver is already in its rest state, the stop bit does nothing. When a receiver powers up, it must enter rest state.

When an command receiver sees the first bit of an address word, it enters its a passive state in which it until a stop bit arrives. An address receiver responds to a command word in the same way.

When a command receiver sees the first bit of a command word, it enters its active state. In its active state, the receiver records the subsequent command bits. We number the command bits from sixteen down to one. The driver transmits bit sixteen first and bit one last. The last command bit is followed by a stop bit. The receiver does not apply the command bits to the device until it receives the stop bit that marks the end of the command word. When it sees the stop bit, the command receiver applies all the command bits at the same time.

An address receiver responds to an address word in the same way. But we number the address bits from fifteen down to one. The driver transmits bit fifteen first and bit zero last.

Note: A receiver need not count data bits as they come in. The driver is responsible for making sure there are exactly sixteen bits.

Note: A receiver need not save all the data bits. It can discard the bits that it does not use.

We designed the transmit bit patterns so that they could be received without a clock oscillator. The receiver needs only two delayed versions of the transmit signal's rising edge to identify incoming bits. One delayed edge occurs after 125±25 ns and the other after 250±50 ns. We can generate these delayed edges with a monostable flip-flop, such as the 74VHC123. The 74VHC123's quiescent current is only 20 μA.

Example: The Proximity Mask Head (A2045) uses standard logic chips to implement a command receiver. The receiver consists of two 74VHC74s, 1 74VHC123, and 74VHC594. These are U8-U11 in the schematic. The A2045 uses the 74VHC594 to latch the bottom eight bits of the incoming command. It throws away the top eight bits. The VHC123 provides the two 125-ns delays necessary to sample T at the required moments after the rising edge that marks the commencement of bit activity. The two VHC74s distinguish between address and command transmissions, and act upon the stop bit that terminates a command reception. The logic that performs this termination is a little hard to understand. Two flip-flops reset one another and latch the shift register at the same time. The quiescent current consumption of the A2045 is less than 1 mA. The command receiver itself consumes less than 100 μA.

Example: The Polar BCAM Head (A2051) adds another 74VHC594 shift register to the command receiver used in the A2045, and so retains all sixteen command bits.

Example: The Inclinometer Head A2065 uses programmable logic with a ring oscillator to receive commands. When the logic sees a rising edge on T, the ring oscillator starts up and provides timing for reception. The command receiver consumes only 11 mA, but this still exceeds the LWDAQ limit of 5 mA when asleep. You will find the logic program here.

A command or address receiver in its rest state ignores stop bits. Consequently, the driver is free to transmit stop bits as often as it likes do a receiver in its rest state. Some devices allow the driver to their synchronize data return with stop bits. We call this use of stop bits data synchronization. For more on data synchronization, see Receive Signals.

Recommendation: Use the SN65LVDM180D transceiver, manufactured by Texas Instruments, to send and receive low-voltage differential logic signals on T+/T− and R+/R−. These transceiver operate from a 3.3-V supply. With the driver portion disabled, their typical current consumption is only 1.7 mA. The driver and receiver have separate enable lines. When the driver is disabled, it enters a high-impedance state that allows analog circuits in a device to drive the R+/R− lines.

Recommendation: Do not use the SN65LVDS180 transceiver in devices that use R+/R− for analog data return. Because of an error on the part of Texas Instruments, the SN65LVDS180 does not enter its high-impedance state properly.

Note: The SN65LVDS180D transceiver works fine in devices that do not use R+/R− for analog data return, and has the advantage of lower current consumption.

Example: The Proximity Mask Head (A2045) uses the SN65LVDS180 transceiver. Even thought this chip does not enter its high-impedance state properly, we do not mind because we transmit only LVDS logic back from the A2045. We could instead use the LVDM chip, but its quiescent current is slightly higher.

Rule: The circuits driving T+ and T− must assert a defined logic level on the two lines at all times.

The above rule guarantees that the circuit receiving T+ and T− does not receive spurious commands. If the circuit driving T+ and T− allows the T+ and T− lines to float, spurious commands can be received at the other end of a long cable, even if the receiving circuit has its own pull-up resistors. The SN65LVDM180D uses 300 kΩ pull-up resistors to define its state when its input floats, but these are not adequate at the end of an 80-m cable. But 10-kΩ pull-up and pull-down resistors on T+ and T− are sufficient to guarantee a logic HI.

Recommendation: Connect the source of T+ to a voltage 2.4-5.0 V with a 10 kΩ resistor. Connect the source of T− to 0V with a 10 kΩ resistor.

Pull-up and pull-down resistors at the source of the transmit signal guarantee that the transmit logic level will remain HI during and after the LVDS driver enters its high-impedance state.

Receive Signals

The receive signal, R, is the signal carried from a device by the R+ and R− wires. Unlike the transmit signal, the receive signal can be either analog or digital. Either way, it is always a low-voltage differential signal.

Rule: The driver's differential-mode input range for DC-coupled signals must include the range −0.5 to +0.5 V.

Rule: The driver's common-mode input range for DC-coupled signals must include the range −0.7 to +5 V.

The driver must provide an active clamp that marks a particular voltage level in the data stream according to a schedule of double-correlated double-sampling for such applications as image sensor readout. Following the clamp interval, the differential voltage on R will change, and the driver must measure this change with respect to the clamped voltage.

Rule: The LWDAQ driver dynamic range for synchronously-clamped signals must include the 0 to 1 V.

The devices produce the R+/R− signal. They are responsible for making sure these signals do not exceed their limits.

Rule: All devices must clamp R+ and R− to the range −0.7 to +5.0 V when influenced by a 100-mA current source or current sink. The lower end of the signal range should include 0 V, and the upper end should include 3.3 V.

Drivers and multiplexers may clamp R+ and R− also, but this is optional. The A2082 device clamps all four transmit and receive lines to 0 V and 3.5 V with diodes. This power supply is in turn clamped with a 4.1-V zener diode. The clamp starts to turn on when the transmit signals reach 4.8 V. The clamp passes 100 mA when the signals reach 5.0 V. On the lower end, the clamp passes 100 mA when the signals reach −0.7 V. The A2046 multiplexer terminates all incoming R+/R− signals with 100-Ω resistors, and buffers them with op-amps before sending them on to the driver. It clamps R+/R− and T+/T− to 0 V and 3.9 V with diodes, with a resulting 100-mA clamping range of −0.7 to 4.7 V. The A2071 terminates all R+/R− signals with 100 Ω. Instead of clamping the signals, it pulls them up to +3.3 V with a 10 kΩ resistor. The A2085 multiplexer does not terminate any receive signals, nor does it clamp them. All it does is them to the driver with analog switches. There is no buffering. But the A2085 analog switches tolerate R+/R− lying anywhere in the range −0.7 to 5.0 V.

Rule: When the transmit and receive signals in use, they must be terminated with 100 Ω. Their common mode voltage when thus terminated must lie in the range 1.0 to 2.0 V, and their differential voltages must lie in the range −0.7 to +0.7 V.

The driver puts a device into its loop-back state by transmitting a command word with bit seven set to 1. When in the loop-back state a device drives the transmit signal onto the receive signal, so that whatever logic level arrives from the driver will be returned to the driver.

Rule: Every LWDAQ device must implement the loop-back state.

The return of T on R allows the driver to check the functionality of the device, and also to measure the propagation time of a signal traveling to the device and back again. We call this round-trip propagation time the loop time. The driver uses its knowledge of the loop time to synchronize its reception of a stream of data arriving at one sample per 500 ns, even when the clock signal it transmits takes 500 ns to reach the device along a 100-m cable, and the data takes another 500 ns to come back again.

A device can switch from digital to analog R by disabling its LVDS driver.

Example: The Polar BCAM Head (A2051) uses the SN65LVDM180D to drive R+/R− with a logic value (U1 in the schematic). But R+/R− are also connected through 100-Ω resistors to two op-amps (U19 in the schematic). With U1's driver enabled, the op-amps cannot alter the logic level on R+/R−. But once U1 is disabled, the op-amps drive an analog voltage onto the R+/R−. Thus the act of disabling U1's driver switches the circuit between logic and analog return. No analog switch is required.

By transmitting stop bits, a driver can synchronize the return of analog data from a device. We call this direct clocking, and describe it in more detail below.

Devices can transmit data bytes to the driver using the receive signal. The driver synchronizes this transfer with stop bits. First the driver sets the device into a state where it is ready to transmit bytes. In this state, the device drive enables its logic-level LVDS driver and asserts a logic HI on the receive signal. After that, each stop bit the device receives from the driver will provoke a byte transfer on the receive signal. The byte transfer consists of ten 50-ns bits. The first bit is a zero, and is called the start bit. The next eight bits are the data bits. The most significant data bit is first, and the least significant is last. After the last data bit is a one, which is called the stop bit. The entire byte transfer takes 500 ns, suggesting a maximum transfer rate of 2 MBytes/s. In practice, however, the device needs 425 ns to receive the stop bit before it begins its byte transfer. Loop time also adds to the byte transfer period.

Example: Block byte transfers from the ADC Tester (A2100) to the LWDAQ Driver (A2037E) take place at 1.1 MByte/s.

Rule: The timing of every feature of a byte transfer must be accurate to ±10 ns.

Because the ten-bit transfer takes 500 ns, and the final bit must be accurate to 10 ns, we see that the clock used on the device to generate the byte transfer timing must be accurate to 2%. This precludes the use of self-calibrated ring oscillators such as those used in the Inclinometer Head (A2065). A low-power device that performs byte transfer must turn on a precision oscillator when the driver wakes it up, and turn off this oscillator when the driver sends it to sleep.

Direct Clocking

The driver can send stop bits on T to any multiplexer or device and be sure that no address or command receiver will respond. All stop bits are ignored. A stop bit, as shown here, consist of low period on T of duration at least 50 ns, followed by a high period of at least 325 ns. We call the repetition of stop bits direct clocking, because it allows the driver to deliver a clock signal directly to components on a device.

Example: A driver transmits 425-ns stop bits continuously. The stop bits form a 2.35-MHz clock that a device can use for 2.35-MHz data synchronization. For another device, the driver inserts a 1075-ns LO period between the stop bits, and so generates a 1-MHz clock. Each clock period consists of a 375-ns HI followed by a 1125-ns LO.

Direct clocking allows the LWDAQ to deliver to any device a clock signal that is synchronous with its own internal clock. Direct clocking avoids the need for a precise clock to be installed upon the device, where it will take up space, consume current, and in any event be asynchronous with the driver.

Example: Devices types TC255, TC237, KAF0400, and ICX424 use direct clocking at 2 MHz to control the read-out of image pixels at 2 MPS. The stop bits consist of 125-ns low periods and 374-ns high periods, which creates a period of exactly 500 ns, for a frequency of 2 MHz. These devices return the pixel intensity during the 125-ns low pulse, and the driver digitizes the pixel value upon its arrival at the eight-bit ADC. The high period allows 125 ns for a reset pulse to be delivered to the CCD, to clear its output gate, followed by a 125 ns black-level clamp in the driver circuit.

When the driver digitizes a signal that it clocks out itself, it can be assured that the returned analog signal will be synchronous with its own clock, but there will be a phase shift between the returned signal and the outgoing clock. This phase shift is approximately equal to 10 ns per meter of cable between the driver and the devicde, plus a 50-ns offset. In order to place its sampling instant to within 25 ns of the correct phase of the returned signal, the driver delays digitization by the loop time, which it measures with the help of the loop-back state of any LWDAQ device..

Example: A Black Polar BCAM (circuit A2051L) sits at the end of a 120-m cable. The driver measures the loop time and saves it in its loop time register. The loop-back register contains the value 50, which indicates a loop time of 1250 ns. The driver sends repeated command words to the BCAM to clear its front-facing TC255P image sensor and set up the horizontal pixel register for pixel readout. Now the driver starts transmitting stop bits at 2 MHz. At time 1250 ns later, the pixel intensity clocked out by the falling edge of the first stop bit arrives at the driver. Another 50 ns later, the driver digitizes the signal, and 500 ns after that it digitizes the next pixel.

In order to allow direct clocking to take effect upon a device, it must be enabled by a command bit. The CCD devices use DC1 as direct clock enable, or DCEN. When we want to disable direct clocking, which any CCD device will do at the end of a line of pixels, we send another command with DCEN cleared. We note, however, that while transmitting the command, the transitions present on T might be forwarded to the circuit for which direct clocking was enabled, until the end of the command transmission, when the DCEN will be unasserted. It is possible to use the command strobe, CS, signal in a receiver to stop such spurious clocking by the command transmission, but in the case of CCD devices, the spurious clocking does nothing except retrieve black pixels from the sensor.

Power Supplies

A LWDAQ Driver supplies +5 V and ±15 V power to devices, multiplexers, and repeaters. All LWDAQ Drivers must be able to deliver at least 300 mA on each of these power supplies. The A2037E and A2071E can supply up to 2 A from +5 V and 500 mA from ±15 V.

The voltage drop in the resistance of a CAT-5 cable limits the number of devices that may share the same root cable. The following table gives the maximum current consumption of a device when asleep and awake, and the range of power supply voltages that must be acceptable to a LWDAQ receiver.

Power SupplyVoltage at DeviceMax Sleep CurrentMax Awake Current
+15 V+13 V to +16 V5 mA200 mA
−15 V−13 V to −16 V5 mA200 mA
+ 5 V+3.1 V to +6 V5 mA20 mA
Table: Device Power Supply Constraints. Voltages specified with respect to zero volts, which is pin six of the device socket.

With 20-mA current consumption, the drop across a low-power, low drop-out regulator is less than 50 mV. But regulators that rely upon internal band-gap references are vulnerable to ionizing radiation, and the feedback loop that controls their output can latch up when hit by a high-energy proton. For radiation resistance, therefore, the LWDAQ specification allows us to use an emitter-follower logic supply. The incoming +5 V supply connects to the collector of an NPN transistor, and the base receives current through a resistor from the +15 V supply (for an example schematic, see the A2075). The saturation voltage of an NPN transistor is around 100 mV. The SN65LVDS-series chips will operate with 3-V supplies, as will low-voltage logic chips. Thus the minimum logic supply voltage is +3.1 V.

Rule: Every LWDAQ device must obey the limits given in the Device Power Supply Constraints Table.

Power SupplyVoltage at MultiplexerMax Sleep CurrentMax Awake Current
+15 V+13 V to +16 V20 mA20 mA
−15 V−13 V to −16 V20 mA20 mA
+5 V+3.1 V to +6 V20 mA20 mA
Table: Multiplexer and Repeater Power Supply Constraints. Voltages are specified with respect to the local zero volts, which is pin six of the root socket.

Rule: Every LWDAQ multiplexer and repeater must obey the limits given in the Multiplexer and Repeater Power Supply Constraints Table.

The CAT-5 specification limits the resistance of solid CAT-5 conductors to 10 Ω per 100 m. This resistance, combined with the maximum device current consumption and power voltage ranges, limits the number of devices that may share a root cable. With the limits given in Tables 2 and 3, sixteen sleeping devices and two waking devices can share the same ninety-meter root cable and remain operational.

Rule: It must be impossible to damage a LWDAQ component by exceeding the current consumption limits at another LWDAQ component.

SignalAbsolute MinimumAbsolute Maximum
+15 V−0.5 V+17 V
−15 V−17 V+0.5 V
+5 V−0.5 V+6 V
T+ T− R+ R−−0.5 V+4 V
Table 4: Absolute Maximum Ratings.

Rule: All devices, repeaters and multiplexers must tolerate the power supply voltages given in the Absolute Maximum Ratings table.

Recommendation: Clamp the T+, T−, R+, and R− signals to 3-V logic power and to 0 V with silicon diodes on every root, branch, and device socket. In addition, clamp the 3-V logic power supply to 0 V with a zener diode.

Components suitable for clamping are diode arrays such as the BAV99DW. To clamp the 3-V logic supply we use the LM4050-4.1 radiation-tolerant 4.1-V shunt regulator in parallel with a 1.0-μF capacitor. Clamping makes the circuits less vulnerable to static electricity and to power surges that occur when we plug a live cable into the device.

Subtle problems arise in large LWDAQ systems when we make a device's receiver logic power supply dependent upon anything other than the incoming +5V power from the device socket.

Example: The Proximity Mask Head (A2045) uses +15V to supply base current to a radiation-tolerant 3.3V regulator, as shown here. The 3.3V logic supply depends upon a 300-μA flow of current into the regulator from +15V. Suppose the mask is faulty, so that turning it on shorts the +15V supply. Now the logic supply fails also. As soon as the mask turns off again, the +15V supply rises and powers the logic once more. Unless there is a perfect power-up reset, it's possible that the mask will turn on again, beginning an endless cycle. It so happens that the A2045 doe snot have a perfect power-up reset. Its power-up reset is produced by a capacitor and resistor. The problem with power-up reset circuits is that they tend to be vulnerable to radiation, just like low drop-out regulators. For more on large-system problems see Large Systems.

Recommendation: Make the power supply used by command and address receivers dependent only upon the incoming +5V supply.

Every device and multiplexer will need to decouple its power supplies with capacitors. The LWDAQ Driver power supplies must charge all the decoupling capacitors when it turns on. We may have eighty devices and eight multiplexers connected to a single driver. When we plug a root cable into the driver, we may be plugging ten devices and a multiplexer into LWDAQ power at once. The LWDAQ must be able to continue functioning while we are connecting and disconnecting cables. The converters we use on our LWDAQ drivers maintain their output voltages to within 10% when we connect a 10 μF capacitor, but not when we connect a 100-μF capcaitor.

Rule: The maximum decoupling capacitance on LWDAQ power supplies at devices, repeaters and multiplexers is 1 μF unless the capacitance is isolated from the power supplies by a 1-kΩ resistor when we plug the power into the device.

Rule: If a device consumes current pulses such that the derivative of current versus time exceeds 100 mA/μs, the device must include 10-Ω decoupling resistors in series with its decoupling capacitors to avoid contaminating the LWDAQ power supplies with voltage transients.

The Contact Injectors (A2080) provides 20-μF decoupling of ±15 V for its buck converters. When we plug the device into the LWDAQ, however, the 20-μF capacitors are connected to ±15 V only through 1 kΩ resistors. The capacitors charge in 20 ms, drawing no more than 15 mA from the LWDAQ Driver. When we wake the board, a transistor switch closes and now the capacitors are connected to the LWDAQ Driver through 10-Ω decoupling resistors.

Device Types

Whenever we execute a device-dependent job in a driver, we must let the driver know the type of device the job is operating upon. We give the driver a device type number. Associated with each device type is command bit allocation.

Device NameValueBehavior
Null 0 device-dependent jobs take 125 ns
LED 1 no device-dependent jobs
particular assignment of command bits to light sources
TC255 2 read job clocks out 244x344 pixels
particular assignment of command bits to light sources
Data 3 read job causes byte transfer to driver when DTX=1 (DC5=1)
DC16-DC9 is data byte to device when DRX=1 (DC6=1)
KAF0400 4 read job clocks out 520×800 pixels
particular assignment of command bits to light sources
TC237 5 read job clocks out 500×690 pixels
particular assignment of command bits to light sources
ICX424 6 read job clocks out 520×700 pixels
particular assignment of command bits to light sources
ICX424Q 7 read job clocks out 260×350 pixels
particular assignment of command bits to light sources
KAF0261 8 read job clocks out 520×520 pixels
particular assignment of command bits to light sources
Future9-63undefined
Table: Reserved Device Types.

Rule: All LWDAQ drivers must respect the device types given in the Reserved Device Types.

Command Bits

A device's command bit allocation is the use it makes of each bit in its most recent command word. We name the device command bits DC1 to DC16. A manual describing a LWDAQ device will tell you its command bit allocation, and a multiplexer or repeater manual will tell you its address bit allocation.

Rule: Command bit DC8 is the wake bit (WAKE or !SLEEP on schematics). The device wakes up when the wake bit is one, and goes to sleep when the wake bit is zero.

Rule: Command bit DC7 is the loop-back bit (LB on the schematic). When the loop-back bit is one, the device drives onto R+/R− the logic level it receives on lines T+/T−.

The LB and WAKE bits must be respected by all devices. If WAKE is 1, the device must enter its lowest-power state. A device can ignore WAKE if it enters its lowest-power state when it receives a command of all zeros. If LB is 1, the device must drive the R+/R− lines with the same logic value it receives on T. But a device can ignore LB if it always drives T onto R.

Rule: All command and address bits on all devices and multiplexers must reset to zero on power-up.

Some command bits have reserved functions depending upon the device type that receives them. The table below lists reserved command bits for various device types. Some command bits have two possible purposes, depending upon how the device is set up. Some devices we can operate with more than one device type.

Example: The HBCAM Head (A3025A) uses DC14 and DC15 for ON4 and ON5, and it is device type ICX424 or ICX424Q. The Bar Head (A2082A) uses DC14 and DC15 for VDS0 and VDS1 instead, and is also device type ICX424 or ICX424Q. Because we use the same device types in the LWDAQ Driver for both devices, we could instruct the driver to flash device element 4 in the A2082A, but no such source would flash. Or we could try to select virtual device number three in the A3025A in the hope of reading out a third or fourth image sensor, but we would instead turn sources 5 and 6 in the A3025A and read out image sensors 1 and 2.

Name Bit
Number
Device Types Meaning
ON1-ON6 DC1-DC6 LED Turn On A Light Source
DCEN DC1 TC255, TC237, KAF0400, KAF0261 Direct Clock Enable
RDP DC1 ICX424, ICX424Q Read Pulse
SRGD DC2 TC255, TC237 Serial Register Gate Inverted
HD DC2 KAF0400, KAF0261 Horizontal Clock Inverted
H DC2 ICX424, ICX424Q Horizontal Clock Enable
SAGD DC3 TC255, TC237 Storage Area Gate Digital
V1 DC3 KAF0400, KAF0261, ICX424, ICX424Q Vertical Clock Phase One
IAGD DC4 TC255, TC237 Image Area Gate Digital
V2 DC4 KAF0400, KAF0261, ICX424, ICX424Q Vertical Clock Phase Two
ABGD DC5 TC255 Anti-Blooming Gate Digital
V3D DC5 ICX424, ICX424Q Vertical Clock Phase Three
DTX DC5 Data Data Transmit from Device to Driver
ABEN DC6 TC255 Anti-Blooming Enable
SUB DC6 ICX424, ICX424Q Substrate Clock
DRX DC6 Data Data Receive by Device from Driver
LB DC7 All Loop Back
WAKE DC8 All Wake Up Device
ON7-ON14 DC9-DC15 LED Turn On A Light Source
CCD1 DC9 TC255, KAF0400, KAF0261, TC237, ICX424, ICX424Q Select the First of Two Sensors
ON1-ON4 DC10-DC13 TC255, KAF0400, KAF0261, TC237, ICX424, ICX424Q Turn On A Light Source
ON5-ON6 DC14-DC15 TC255, KAF0400, KAF0261, TC237, ICX424, ICX424Q Turn On A Light Source
VDS0-VDS1 DC14-DC15 TC255, KAF0400, KAF0261, TC237, ICX424, ICX424Q Virtual Device Select
PXBN DC16 ICX424, ICX424Q Pixel Bin Enable
Table: Reserved Command Bits that Apply to Device-Specific Jobs.

Rule: All LWDAQ drivers must respect the command bit allocations given in the Reserved Command Bits Table, even if they do not implement all the bits for all devices.

Unused command bits in any device may be assigned to their own uses when set by writing directly to the command register with the driver's command job.

Address Bits

A multiplexer or repeater's address bit allocation is the use it makes of its most recent address word. We name the address bits DA0 to DA15.

Address words instruct multiplexers in the same way that command words instruct devices. Each of the sixteen address bits selects one of sixteen hypothetical branch sockets on a multiplexer. If multiple bits are set, then multiple sockets will be enabled. In this way, it is possible to send command words to any subset of devices attached to a multiplexer simultaneously, or to only one device. Repeaters make use of the DA0 bit to turn off downstream power.

Rule: Multiplexers use bits DA1 to DA15 to select branch sockets 1 to 15. Repeaters use bit DA0 to turn off power to multiplexers and devices.

Element Numbers

When a target device has multiple sensors and transmitters, we select individual sensors and transmitters with a device element number. Just as a driver must interpret a device-dependent jobs with the help of a device type number, it must interpret the element number with the help of a device type number also.

ElementValueDescription
OUTxxx'th transmitter
INxxx'th sensor
Table: Element Numbers.

Rule: All LWDAQ drivers must respect the element numbers given in the Element Numbers Table.

Example: A flash job with device type 2 and element number 1 causes the driver to set bit 10 in the command word to turn on source number 1. A flash job with device type 1 and element number 1 causes the driver to set bit 1 to turn on source number 1.

Driver Jobs

We control LWDAQ drivers by instructing them to perform LWDAQ jobs. The following table is a list of job numbers and their associated names. We use these names in our LWDAQ driver firmware and software.

Job NameValueDevice DependentDescriptionOther Names
null0nodoes nothingnone
wake1nowakes up the deviceexpose
move2yesmoves data within the deviceclear
read3yestransfers data to drivernone
fast_toggle4notoggles outgoing logic levelnone
alt_move5yesalternative move transfernone
flash6yesflashes a transmitternone
sleep7nosends the device to sleepnone
toggle8yestoggles a logic signal in the deviceab_expose
loop9nomeasures cable loop timenone
command10nosends specified command to devicenone
adc1611nodigitizes to sixteen bits and storesnone
adc812nodigitizes to eight bits and storesnone
delay13nowaits for a specified timenone
fast_adc15nodigitizes to eight bits and stores in minimum timenone
reserved16-63 reserved for future usenone
Table: Driver Jobs. Job number 14 is unused.

Rule: All LWDAQ drivers must respect the job numbers and functions given in the Driver Jobs Table.

A device-dependent job is any job whose implementation depends upon the target device. Examples of device-independent jobs are sleep, wake, and loop. Examples of device-dependent jobs are flash and read. When a driver executes a flash job, it must turn on and then turn off a transmitter in the target device. The command bits the driver must set to turn on and off the first transmitter on a device vary with the device type.

Example: The first transmitter on a BCAM Head (A2051) turns on with command bit ten. The first transmitter on an Inplane Mask Head (A2052) turns on with command bit one.

TCPIP Messages

All LWDAQ drivers provide a TCPIP interface for communication between the devices and the data acquisition computer. The LWDAQ driver acts as a TCPIP server, listening upon a particular port for connections from TCPIP clients. We call it the LWDAQ server. The clients are LWDAQ clients. The client-server message protocol runs on top of TCPIP and defines the way in which the client will control the server. Starting with driver software version thirteen (file name C2037E13.c), LWDAQ servers support two TCPIP message protocols. The first is the original LWDAQ Message Protocol, which we refer to as the LWDAQ protocol. The second is the newer Simple Instruction-Answer Protocol, which we refer to as the SIAP protocol. A LWDAQ server chooses the SIAP whenever we configure its server port to lie within the range 30,000 to 40,000. Otherwise the server uses the LWDAQ protocol.

The two protocols are similar. A message contains a message identifier and a content length, both in big-endian byte order. A LWDAQ message has the following format.


Figure: LWDAQ Message Format. Messages use big-endian format, meaning the most significant byte of any multi-byte variable comes first, and the least significant byte comes last.

The first byte of LWDAQ message is the start byte, which has value $A5 (that's hexadecimal A5). The final byte of the message is the end byte, which has value $5A. Following the start byte is the message identifier, a 32-bit integer with the most significant byte first. Following the message identifier is the content length, another 32-bit integer that gives the number of bytes in the content of the message. If we send a message that begins with any byte other than the $A5 start byte, the driver will close the TCPIP socket. Whenever we close a LWDAQ socket, we transmit an end of tranmission character (EOT, value $04) to force immediate closure of the socket at the server end.

A SIAP message has not prefix or suffix bytes. The content length comes first, and includes the length of the message identifier. The message identifier follows the content length, and the content itself comes after that. When we open a SIAP socket, we wait for the SIAP server to send the SIAP acceptance string, which is "DONE". When we close a SIAP socket, we simply close it. We don't send a termination character.

Example: The A2037E is a server. Its TCPIP stack and Ethernet interface run on an RCM2200 embedded processor. We implement the server functions with a C program. Today's version of the program (September 2008) isP2037E13.c. The program supports RCM2200, RCM3200, and RCM4200 embedded modules, and implements both LWDAQ and SIAP message protocols.

Example: The LWDAQ Software runs on a Windows, Linux, or MacOS computer, and implements a client with graphical user interface and automatic data acquisition and recording to disk. The software uses Tcl to open TCPIP sockets to a server. It uses Tcl to send and receive messages. The script called Driver.tcl contains the Tcl code that implements the client. This script is one of the files included in our software, which you can download here.

The message identifier tells the client or server how to interpret the remainder of the message, and what action to take in response to the message. Here are the message identifiers currently defined by the message protocol.

Message IdentifierNameFunctionMSV
0version_readread relay software version1
1byte_writewrite to controller location1
2byte_readread from controller location and return result1
3stream_readread repeatedly from controller location1
4data_returnmessage contains block of data1
5byte_pollpoll a controller byte until it equals specified value4
6loginsend password to relay to attain higher access privelidge7
7config_readread relay configuration file7
8config_writere-write relay configuration file7
9mac_readread relay configuration file7
10stream_deletewrite repeatedly to controller location9
11echoreceive and return the message contents13
Table: Message Identifiers Defined by the LWDAQ and SIAP Message Protocols. The MSV column gives the minimum driver software version required for implementation.

We describe each of the message identifiers in the sections below. There are three components to the TCPIP communication. One is the client computer, which we call the master. The center component is the embedded computer on the server, which we call the relay. The final part is the hardware and memory on the far side of the relay, which we call the controller, or controllers when there is more than one separate controller circuit. Some messages communicate only with the relay. Others read from and write to bytes on in the controller address space.

Example: The A2037E is a server containing one relay and one controller. The A2064 is a server that contains only a relay. The relay conveys messages to one or more VME-resident controllers, such as the A2037A.

Most messages instruct the relay to write to or read from locations in control address space. This address space uses 32-bit addresses and provides byte-wise access to locations. All interactions with the controller take the form or reading from or writing to locations in the control address space.

version_read

When a server receives a version_read message, it transmits a data_return message. The content of the data_return message is four bytes long. These four bytes contain a 32-bit integer giving the server software version.

byte_write

When it receives a byte_write message, the relay writes to a controller location. The first four bytes of the message content contain the controller address. The fifth byte gives the value to write. The server transmits no message in response to a byte_write.

byte_read

When it receives a byte_read message, the relay reads a single byte from a controller location. The first four bytes of the message content contains the contoller address. The server returns the byte it reads in a data_return message. The content of the data_return message is one byte long. This byte is the byte the relay read from the controller.

stream_read

When it receives a stream_read message, the relay reads repeatedly from a single controller location and returns all the bytes it reads from in a data_return message. The first four bytes of the stream_read message content contains the controller address. The next four bytes give the number of times the relay should read from the controller location.

Example: We use the stream_read message to read images out of A2037E controller RAM. Address $3F (decimal 63) is the RAM portal address. The value we read from the RAM portal is the value pointed to by the controller's internal data address. Each read from the RAM portal increments the data address, so the stream_read from the RAM portal ends up being a block move out of the controller memory. The stream_read is more efficient because it requires no change in the address presented by the relay to the controller. To read a block of RAM, we set the data address to point to the first byte of the block, and send the stream_read message with the block length.

byte_poll

When it receives a byte_poll message, the relay goes into a loop waiting for a controller location to assume a particular value. The first four bytes of the message content contain the controller address. The fifth byte contains the value the relay should wait for. The relay will drop out of this loop when the client closes its socket to the server.

With the byte_poll job, you can send a list of instructions to the server in one Ethernet packet, and have them executed in the most efficient way by the relay. The last instruction can be a stream_read, which causes the A2037E to send data back to the data acquisition computer.

Example: To obtain an image from a camera, the client sends the sequence of instructions required to obtain and return the image and waits for the image data to arrive. Instead of polling the controller's BUSY bit over TCPIP, we use byte_poll to instruct the relay to poll the BUSY bit for us.

login

Depending upon how a server is configured, it may require a login message with the correcct password before it will respond to any other message. One such login is required for each TCPIP connection. If the server security level is 1, the client must send a login message with a valid password in order to execute a config_write. If the security level is 2, the client must send a login message to execute any command. The contents of a login message is the ascii-encoded password.

config_read

When the server receives a config_read, it sends back a RAM-resident copy of the its EEPROM-based configuration file. The RAM-resident copy is made after any hardware reset, but does not get modified after it is made, even if the EEPROM copy of the configuration file gets modified by a config_write instruction. The contents of a config_red job are empty. The message returned contains the ascii-encoded characters of the configuration file.

config_write

When the server receives a config_write, it writes the contents of the message to its EEPROM-based configuration file. The contents of the file must be compatible with the server's TCPIP Interface software, or else the server will ignore its contents. The file does not take effect until after a hardware reset. You cannot re-configure a server remotely with config_write. You must be able to press its hardware reset button or turn off its power supply.

stream_delete

When the server receives a stream_delete, it writes a single byte value repeatedly to a single-byte control location. The first four bytes of the message content give the location to which the relay will write the byte value.

Example: In the A2037E, this address will invariably be $3F (decimal 63), the RAM Portal, because consecutive writes to this location set consecutive locations in the controller's RAM space.

The next four bytes of the stream_read message content give the number of times the relay will write the value to the control location. The final byte of the nine-byte message content gives the value the relay will write.

echo

When the server receives an echo message, it extracts the message contents and returns them, unmodified, in a data_return message.

Design

We designed the LWDAQ to meet the demands of a particular data acquisition problem, that of the end-cap alignment system in the ATLAS muon detector, as we describe in The Optical Alignment System of the ATLAS Muon Spectrometer Endcaps. The ATLAS detector is a cylinder forty meters long and twenty meters wide. The alignment system's cameras and light sources are LWDAQ devices.

One of the first problems we faced in the ATLAS detector was how to provide electrical power to our devices. We could not use batteries, because the power consumption of the light sources cannot be reduced below certain limits, and no battery small enough to fit in the light source enclosures could supply the required power for the ten-year operating life of the experiment. If we were to deliver high-voltage power and convert it to low-voltage power, we would have to do so in the strong magnetic field of the detector. This magnetic field would saturate ferrite inductor cores. High-frequency converters with air-core inductors are vulnerable to ionizing radiation. We expect some devices to receive as much as 7 krad of ionizing radiation during the ten-year operating life of ATLAS, and we would like them to be able to endure 20 krad to give us a margin of safety.

We considered delivering power to our devices through a separate, low-resistance cable, while communicating through a network cable. But if we deliver power with one cable, and signals with another, we invite ground loops. We could try coupling the signals into the devices optically, but LEDs (light-emitting diodes) in opto-couplers are vulnerable to neutron radiation. We expect some of our alignment devices to receive as much as 1012 1-MeV equivalent n/cm2 (1 Tn) during the ten-year operating life of ATLAS, and we would like them to be able to endure 10 Tn to give us a margin of safety. The most neutron-resistant LED we have found can lose up to 90% of its transmitting power after 10 Tn. Aside from the ground-loop problems introduced by separate cables, there is the problem of placing power supplies in the experiment hall to deliver power over these separate cables. Such power supplies would have to operate in a magnetic field, or else the power cables would be tens of meters long.

We decided to deliver power and signals to each alignment device through a single cable. There are no LWDAQ power supplies in the ATLAS detector hall.

But we do not have space in ATLAS to bring a cable into the detector for every one of our thousands of cameras and light sources. We must provide some kind of multiplexer. With multiplexing, our power supply problems become more severe. Some cables from the service hall to the devices in the detector are over 100 m long. Despite their length, such cables must be able to provide power to all devices connected to a multiplexer.

The ATLAS end-cap alignment system, is one in which only one or two devices out of the thousands in contains need to operate at one time. We reduce the power consumption of the multiplexers by putting to sleep any devices that are not active. The LWDAQ allows us to put a device to sleep either with a single command from its driver, or by cutting off power to the device and then turning the power on again. With the sleeping power consumption reduced to tens of milliwatts, we can supply power to two active devices on the same multiplexer through a single 130-m solid-wire CAT-5 network cable. We can connect multiplexers to their devices with CAT-5 cables as well. We tend to use stranded-core cables for the shorter connections between multiplexers and devices. Stranded wires are more flexible. Solide wires are stiff, but they are faster. The CAT-5 specification for solid-wire cables is much more strict than for stranded-wire cables. It is more difficult to control the dielectric properties experienced by a signal traveling down a stranded wire. In ATLAS, all our root and branch cables are shielded and halogen-free.

In each cable, the same allocation of conductors applies: one twisted pair transmits commands from the driver, another twisted pair returns data from the devices, and the remaining four wires, which may be twisted or not, carry ±15V, +5V, and 0V power.

Another problem we faced in ATLAS, as we have mentioned already, is the pervasive ionizing and neutron radiation to which our circuits will be subjected for the ten-year operating lifetime of the detector. The highest ionizing dose is approximately 7 krad, and the highest neutron dose is roughly 1 Tn (1012 1-MeV equivalent n/cm2). Our most-severely irradiated devices will be inaccessible for years at a time. We would like them to be resistant to radiation, rugged, and long-lived, which suggests that they should be simple. On the other hand, we would like them to be versatile, so that a single device, with a single cable, can perform all the alignment functions needed in its immediate neighborhood.

Neutron radiation damages the image sensors and the infra-red LEDs we use in our alignment devices. Our TC255P image sensor is a CCD (charge-coupled device) from Texas Instruments. It suffers an increase in dark current in neutron radiation. After absorbing 10 Tn, its pixels fill up with dark current in 50 ms. If we are to capture images with these sensors after a does of 10 Tn, we must capture and read them out in less than 50 ms. There are eighty thousand pixels per image, and we must allow at least 10 ms for exposure to light, so we must retrieve the pixels at a rate no slower than two million per second.

Our HSDL4400 LED is an infra-red emitter from Hewlett-Packard. The HSDL4400 is more resistant to neutron radiation than any other we tested. Nevertheless, it can lose up to 90% of its optical output power after 10 Tn. We can use these diodes up to a dose of 10 Tn only if the time for which an undamaged diode must be flashed to obtain an adequate image is no more than 1 ms. In order to provide 1-ms flashes, our data acquisition system must be able to turn on and off light sources, and switch between one device and another, in a fraction of a millisecond.

We decided upon a number of policies designed to keep alignment devices simple, but at the same time versatile and fast. All timing signals required by a device are provided by its driver, with the exception of the short pulses required to decode the serial transmissions from the driver. Devices do not digitize analog signals, but transmit them directly to the driver. To preserve the integrity of these analog voltages, they propagate as low-voltage differential signals (LVDS) and all ATLAS-resident LWDAQ cables are shielded. Likewise, the driver transmits its commands as LVDS logic levels.

And so we arrived at the Long-Wire Data Acquisition System, with its generic drivers, multiplexers, repeaters, and cables. The devices can contain as many sources and sensors as we need, provided their waking power consumption remains below the LWDAQ specified limits (see below).

Example: The BCAM Head (A2051), of which there will be several hundred in ATLAS, provides four laser diode light sources and two image sensors. We connect the BCAM Head to the LWDAQ with a single CAT-5 cable. The Bar Head (A2044), of which there will be two hundred in ATLAS, provides four platinum resistance-temperature devices (RTD), two image sensors, and two infra-red light-emitting diode arrays. Assuming a perfect RTD, the Bar Head provides temperature measurement accuracy of 40 mK, and resolution of 20 mK. No digitization takes place in the Bar Head. Instead, the device returns analog voltages, which the driver digitizes for temperature measurements with its sixteen-bit ADC. Both the BCAM Head and the Bar Head have a typical sleeping current consumption of 1.8 mA at +5 V, 400 μA at +15 V, and 100 μA at −15 V. Their sleeping power consumption is therefore 17 mW.

The ATLAS LWDAQ Driver with VME Interface (A2037) resides in a VME crate, and provides eight CAT-5 sockets. We can connect a device or a multiplexer to each one of these sockets, although in ATLAS, only multiplexers will be connected directly to the drivers. The ATLAS detector requires nearly eight hundred LWDAQ Ten-Way Multiplexers (A2046) with eight hundred CAT-5 cables running out of the detector and into the service hall, where one hundred VME-resident LWDAQ drivers will receive them.

We might have included a second layer of multiplexing in the ATLAS detector, to reduce the number of cables that run into the service hall. A second-layer multiplexer might provide ten sockets for ten first-layer multiplexers, and thus allow one hundred devices to be connected to a driver with only one cable. But this cable would have to be larger than our existing CAT-5 cables, and would require a larger connector. We would have to design the second-layer multiplexer itself, and test it. In the end, we found that the cost of designing, implementing, building, and testing a second layer of multiplexing was greater than the cost of building, testing, and installing eight hundred cables.


Figure: Block Diagram of ATLAS DAQ.

We are wary of connecting a large number of devices to a single detector-resident circuit, or of making a large number of devices dependent upon any single cable. With a second layer of multiplexing, the failure of a single device, or of a single device cable, could, by shorting the power supplies, cripple a second-layer multiplexer and disable ninety-nine other devices at the same time. A fault on a single device could damage, by its affect on the power supplies, every other device connected to the second-layer multiplexer. To avoid such disasters, a second-layer multiplexer would have to be sophisticated in its distribution and monitoring of the LWDAQ power supplies. Such a circuit would be difficult to design and complicated to test, and its failure at any time during the running of the ATLAS detector would cut off a hundred fully-functional alignment devices from their LWDAQ driver. The LWDAQ, therefore, provides only one layer of multiplexing.

When the LWDAQ was a few years old, we began to suspect that cables in the ATLAS detector hall would be longer than the 100 m for which we initially designed the system. We had to increase the maximum cable length to over 130 m. To this end, we designed the LWDAQ Repeater (A2058). The repeater restores outgoing logic signals. It allows us to extend our operating range to 200 m. The repeater also allows us to shut off power to the downstream circuits. Turning off power to individual multiplexers allows us to increase the proportion of time for which each device is without power. Devices are more resistant to ionizing radiation and single-event upsets when they are without power. The repeaters also allos us to isolate in software any faulty cables, multiplexers, and devices that would otherwise bring down the driver power supplies.

We discuss the problems we encountered when installing the ten-thousand device LWDAQ of the ATLAS end-cap muon spectrometer in the following section.

Problems

Here we describe several problems we have encountered in large LWDAQ systems. These problems arise from design mistakes and unforseen behavior of systems with a large number of long cables and distributed capacitors.

Reset Failure

Symptom: Excessive power consumption after power-up, made obvious by laser light sources turning on at random from one power-up to the next.

Cause: Power-up reset on many devices does not endure for long enough to allow power supplies to settle.


Figure: Rise of 5-V Power with Unmodified A2037E. Time scale is 10 ms per division. We have C39 = 1 μF (see schematic). The top trace is the 5V power on a device at the far side of a 100-m root cable and a multiplexer. The middle trace is the voltage on Q4-3, the gate of the mosfet that turns on the 5-V supply in the driver. The lower trace is the logic level that begins the turn-on.

The power-up reset of registers on our radiation-tolerant devices is performed by an RC network, as you can see in this schematic. The RC network works well if the logic power turns on quickly, in a fraction of the RC time constant. The time constant of our RC network is around 10 ms (1 μF and 10 kΩ). As we can see from the above plot, the 5-V supply takes almost 60 ms to rise to its final value, and it rises in two steps. The first step is due to the earlier turn-on of the +15-V power supply, which feeds into the 5-V supply through the device circuits. The second step is the genuine turn-on. The first step allows the RC reset networks to settle at around 1.6 V, which means they are no longer effective when the power starts up.

The failure of the reset circuit alone would not be adequte to cause the Reset Failure symptoms we see in large experiments. The spike in the 5-V power supply, which we can see in the plot above, causes false transitions in the low-voltage differential transceivers of the multiplexers. These transisions are misinterpreted as commands by the devices, and so lasers and other functions turn on.

Cure: Speed up the logic power supply by removing capacitor C39 on the LWDAQ Driver (A2037), see schematic. Newer LWDAQ Drivers have fast power supplies and need no such modification.


Figure: Rise of 5-V Power with Modified A2037E. Traces are plotted as in figure above. C39 has been removed.

The above figure shows the dramatically-faster power-up at the device after we remove C39. The plot below compares the 5-V power-up for various values of C39.


Figure: Rise of 5-V Power with Various Values of C39. Traces left to right: 0 μF, 0.4 μF, and 1.0 μF.

With a system of eight multiplexers and eighty BCAMs, we measured the number of lasers that would light up on power-up over the course of ten power-up cycles, and plotted this with respect to the value of C39. The result is shown below for three different A2037E drivers (circuit involved is identical to that of the A2037A).


Figure: Frequency of Laser Turn-Ons at Power Up versus Value of C39 on A2037. There are 80 BCAMs with 160 lasers in total attached to the driver.

History: We first noticed the Reset Failure at CERN when we had cables over 100 m long and more than twenty or thirty devices connected to each driver. When we turn on power to a large LWDAQ, current surges along the 130-m cables to the devices. With fifty or sixty devices attached to a single driver, the ±15V and +5V supplies must fill fifty or sixty 1-μF capacitors at the end of long cables. As we describe in the Power Supplies section of the A2037A Manual, when the driver is heavily-loaded with devices and capacitors, we find that the +5V supply rises too slowly for the RC reset circuit to work properly. The RC reset circuit assumes an immediate turn-on of +5V while the capacitor in the divider holds the reset line LO. So we find that our radiation-tolerant circuits can fail to reset properly on power-up with our A2037 drivers. Light sources might be on. Power may be consumed from the +15V supply.

For several years we worked around the Reset Failure by sending all devices to sleep after we turn on the device power. This is the purpose of the Diagnostic Instrument's sleepall command. All Acquisifier scripts for large LWDAQ systems included a power-cycle at the start, in which we turned on the power and sent all devices to sleep on all drivers, and again at the end of the cycle, in which we turned off the power.

In June 2010, we figured out that it was the slow turn-on of power in the A2037 itself that was at the heart of this problem. By removing the poorly-conceived slow-down capacitor, C39, in the A2037, we are able to stop the power-up reset error from happening alltogether. The removal of C39 stops the Reset Failure in all LWDAQ systems, with or without repeaters.

Cold Start

Symptom: Power supplies on LWDAQ Driver (A2037) oscillate at roughly 1 Hz after we have turned them off for more than ten seconds. Hence the term Cold Start problem.

Cause: The 3.3V logic supply on our radiation-tolerant devices is dependent upon the +15V supply (example schematic). When the +15V power is over-loaded, it turns off. When the +15V power turns off, the NPN transistor in our radiation-tolerant 3.3V regulators is deprived of base current. The transistor turns off disconnects power to the device's logic circuits. This means that any loss of the +15V power switches off the logic. Suppose we turn on ten or fifteen light sources in a LWDAQ system. The current consumption from the +15V supply of our A2037A exceeds its maximum 500 mA and the +15V supply switches off. All the logic circuits lose power. The light sources switch off. The +15V power returns. The logic power returns.

This self-induced power cycle would not occur if our 3.3V regulators were independent of ±15V, as in the our radiation-vulnerable A2036 (see schematic). If a self-induced shut-down is going to occur, we would at the very least like the command registers on all the devices to return to a known state, such as all-zeroes. But we now encounter the Reset Failure problem. After power-up, some internal circuits are not asleep. Some light sources are shining. There may be enough light sources shining that the +15V supply fails, and we enter another self-induced power cycle. We now have self-induced power-supply oscillations.

Given that we can have many light sources turning on when we power up a large LWDAQ, before we ever get a chance to execute a sleepall, we see that it is possible to enter power-supply oscillations immediately after turning on the power supplies. The ATLAS and ALICE systems for some years exhibited these power-up oscillations. They occur in large systems that have no repeaters in the root cables and that use unmodified A2037 drivers (C39 has not been removed). The only drivers in the ATLAS system that exhibit power-up oscillations are those without repeaters. There are no repeaters in the ALICE system.

We find that the oscillations occur only if the device power has been turned off for more than ten seconds. If we turn on the power, turn it off again immediately, wait half a second, and turn it on again, we never enter power-up oscillations.

Cure: Same as for the Reset Failure. Speed up the logic power supply by removing capacitor C39 on the LWDAQ Driver (A2037), see schematic. Newer LWDAQ Drivers have fast power supplies and need no such modification. The cure works with and without repeaters.

History: Our initial cure in ALICE and ATLAS was to turn off and on the power supplies repeatedly. We apply the Diagnostic Instrument with "on 500 off 500 on sleepall" as DAQ actions. After these actions, all LWDAQ devices are powered up and asleep. In June 2010 we discovered that removing C39 on the A2037 stopped the Reset Failure and therefore the Cold Start problem as well.

Mask Burn-Out

Symptom: Rasnik masks stop producing light after some days in the apparatus.

Cause: Some of our LED array light sources cannot stay on for more than a few hours without over-heating and suffering damage. Because of the Reset Failure problem, it's possible for these arrays to power-up in the illuminated state, and subsequently burn out. We describe the origins of the burn-out in the A2045 Manual.

Cure: We avoid burning out these vulnerable arrays by making sure that we don't leave on the power to our large LWDAQ systems when we are not taking data. We always perform a sleepall before we start data-taking. We cure the Reset Failure problem by removing C39 of the A2037 circuit. The cure works with and without repeaters.

History: We lost about a dozen masks to this problem during our test stands, and a few in the actual ATLAS alignment system. Once we took care to turn off power while we were not taking data, and to apply sleepall after power-up, we lost no more masks.

Incorrect Pull-Up

Symptom: Cannot capture reliable camera images over a 120-m cable using A2071E drivers with hardware version 0 or 1.

The LWDAQ Driver (A2071E) perpetuates the error we made in the LWDAQ Driver (A2037E), whereby we pull up the outgoing T+ signal to +5 V instead of +3.3 V. This in itself does not cause a problem, but the original design of the A2071E uses 1 kΩ pull-up and pull-down resistors. With a 120-m cable, we see the following traces for T+ and T− at the device during command transmission.


Figure: Transmit Signal at Device, 1-kΩ Pull-Up and Pull-Down, 120-m Cable. Top trace is Transmit Device Command, a trigger. Middle trace is T+, bottom trace is T−, 200 mV/div, 200 ns/div, both with same offset voltage.

The command transmission begins with a 50-ns low pulse. This pulse appears on both T+ and T−, but the two do not cross over, and we detect no start bit. The separation of the T+ and T− signals during a long HI period is 900 mV. For traces of T+ at the device and T+ at the driver, see here. If we switch to 10-kΩ pull-up and pull-down resistors, we see the following.


Figure: Transmit Signal at Device, 10-kΩ Pull-Up and Pull-Down, 120-m Cable. Top trace is Transmit Device Command, a trigger. Middle trace is T+, bottom trace is T−, 200 mV/div, 200 ns/div, both with same offset voltage.

The separation of T+ and T− during a long HI is now 600 mV. The two signals cross during the 50-ns pulse. For traces of T+ at the device and T+ at the driver, see here. In both cases, we see the distortion of the original T+/T- signals by the 120-m cable. The command transmission is made up of high frequencies, represented by the sharp 400-mV transitions, and low frequencies, represented by the shift in the logic HI level of T+ over a few microseconds. Low frequencies travel slower than high frequencies, and so arrive later. But high frequencies are attenuated by their journey down the cable. These two effects combine to produce a failure of the 50-ns pulse when we T+ and T− are pulled apart by 900 mV with 1-kΩ resistors. With 10 kΩ resistors, the 600 mV permits reception of the command. But we see the same failure taking place with a 160-m cable.