Vector measurements with the rtl-sdr, 2

05.07.2020 10:34

In my last post I've talked about a setup for performing vector reflection measurements with the rtl-sdr. I've come up with an idea for a simple time multiplex hardware so that I could receive both the reference and the measured signal with rtl-sdr's single channel ADC. I did some simulations of the setup and I mentioned that I saw some ±5 degree phase errors. I didn't investigate the source of that error at the time.

After spending some more time thinking about it it turned out that the phase errors I've seen in simulations are due to switch cross-talk. It's quite obvious in retrospect. The measured and reference signals get mixed in the switches and this changes their phase a little. It's best to show this with a phasor diagram:

Phasor diagram of ideal signals.

These are the ideal signals. The reference Uref and the measured signal Udut that passed through the device under test. Udut has a different phase and amplitude compared to the reference and I want to measure that difference.

Phasor diagram of signals with cross-talk.

Due to switch cross-talk, what I'm actually measuring is U'ref and U'dut. U'ref is a sum of both the ideal reference and ideal measured signals, but the measured signal has been attenuated by k, which is the switch cross-talk in linear scale. Vice-versa for Udut. εref and εdut are the phase errors due to this addition.

\varepsilon = \varepsilon_{ref} + \varepsilon_{dut}

The combined phase error depends on the phase shift α in the signal caused by the device under test and the attenuation of the measured signal. The error is largest when α = 90° and the amplitude of the measured signal is smallest. Some geometry also shows that this maximum phase error ε in radians is, for small errors, roughly the same as the switch cross-talk (CT) minus the attenuation of the DUT (A, ratio between Uref and Udut) in linear scale:

\varepsilon \approx 10^{\frac{CT_{dB} - A_{dB}}{20}}\qquad\mathrm{[rad]}

I expect this error to be smaller in practice than what I had in this simulation. First reason is that I made an error and only accounted for one switch in the simulation. I reality there will be two switches and hence, at least in theory, double the attenuation on the unused signal path. The second is that I now plan to use Renesas F2933 switches, which have a much better rated isolation than the F2976 I've considered in my simulation.

Measurement phase error versus switch cross-talk.

Given the limited dynamic range of the rtl-sdr, -80 dB cross-talk or less should probably suffice for a reasonable accuracy over the entire measurable range. I also expect this is the kind of error that I can compensate for, at least to some degree, in software with short-open-load-through (SOLT) calibration. I have to lookup some of my old notes on the math behind that.

Talking about practice, I have the circuit schematic I want to make roughly drawn up on paper. I've decided on all the components I will use. The digital part for driving the switches will be low-voltage 3.3V CMOS logic, since that's compatible with F2933 inputs. For testing purposes I want to also be able to drive the switches from an external signal source and select the signal path manually. Next step is to draw the circuit in some EDA software and design the PCB layout.

Posted by Tomaž | Categories: Analog | Comments »

Vector measurements with the rtl-sdr

21.06.2020 11:41

Previously I was writing about some experiments with reflection measurements using an rtl-sdr receiver. I used the rtl-sdr as a simple power meter together with an RF bridge to measure VSWR. This was a scalar measurement. All the phase information from the signal was lost and with it also the angle information about the complex impedance of the load I was measuring. Since I was happy with how the method performed in that experiment I was wondering if I could adapt the setup to measure the phase information as well.

With a vector measurement I need a reference signal to compare the phase of the measured signal to. This is a problem, since the rtl-sdr only has one input and can only sample a single signal at a time. My idea was that perhaps I could multiplex the reference and the measured signal onto the single input. Both time and frequency multiplex seemed doable, but time multiplex seemed by far simpler to implement in hardware. Integrated microwave switches with usable characteristics, such as Renesas F2976, are reasonably cheap these days.

Previously I recorded a 2 second array of digital samples of the measured signal for each scalar measurement point. With the time multiplex setup, I could record similar 2 seconds of input, but that input would now contain both the reference and the measured signal. I could then de-multiplex and process it in software. The new setup would look something like the following. The device under test could again be a bridge, or something else:

Block diagram of the vector measurement setup.

The time multiplex board contains two SPDT switches. In one position, the switches direct the reference signal to the rtl-sdr. In the other position the signal passes through the device under test and then back to the rtl-sdr. The switch frequency I'm thinking about is somewhere around 100 Hz. A complex baseband signal recorded by the rtl-sdr would then look something like this:

Plot of the simulated complex baseband signal.

Most of the complexity in this setup would be in software. The software would need to find out which parts of the recording is the reference and which part is the measured signal. This is similar to clock recovery in a receiver. It would then compare the two signals and do some filtering. This is a rough block diagram of the processing pipeline:

Block diagram of the signal processing setup.

The reality is a bit more complicated though. Especially clock recovery seems tricky. My original intention was to use auto-correlation of the signal but it turned out much too slow. Right now I'm just using simple amplitude thresholding, which works as long as DUT is attenuating the signal enough compared to the reference. There's also some additional processing required to account for the fact that my delay is an integer number of samples, which introduces an additional random phase shift that needs to be accounted for.

So far I've only performed some proof-of-concept simulations of this setup using the performance of the rtl-sdr I've seen in my scalar measurements and the properties of the switches from the datasheet. It does seem feasible. Here are simulated vector measurements of three points on the complex plane. For example, these might be complex reflection coefficient measurements on the Smith chart. The gray dots show the true value and the blue dots show the simulated measurements:

Simulated vector measurements of three points on the complex plane.

There is some angle and amplitude error, but otherwise the principle seems to work fine. These are the histograms of the errors over a large number of simulated measurements of random points on the complex plane, where the measured signal was well above the receiver noise floor:

Histogram of the amplitude measurement errors.

Histogram of the angle measurement errors.

I'm not sure yet what part contributes the most to these errors. I'm simulating several hardware imperfections, such as switch cross-talk, frequency and phase inaccuracies and receiver noise. The most complicated part here is the clock recovery and I suspect that has the largest effect on the accuracy of the output. The problems with clock recovery actually made me think that having a four-state "off-dut-off-ref" cycle instead of a two-state "dut-ref" for the switches would be better since that would gave a much stronger pattern to match against. Another idea would be to lock the multiplex clock to the actual signal. ERASynth Micro does provide a 10 MHz reference clock out, but dividing it down to 100 Hz would need a huge divider.

Anyway, the simulations so far seem encouraging and I probably can't get much further on paper. I'm sure other factors I haven't thought of will become evident in practice. I plan to make the time multiplexing board in the future and try to do some actual experiments with such a setup.

Posted by Tomaž | Categories: Analog | Comments »

Resistor tolerance and bridge directivity

13.06.2020 14:20

Another short note related to the RF bridge I was writing about previously. The PCB has four resistors soldered on. Two pairs of 100 Ω in parallel. Each pair forms one of the two fixed impedances in the two branches of the bridge circuit. The two variable impedances in the bridge are the device under test (left) and the termination on the REF terminal (right). The black component on the bottom is the RF transformer of the balun circuit.

Four 100 Ω resistors on the RF bridge PCB.

My father pointed out the fact that these resistors don't look particularly high precision. The "101" marking (10 times 101 ohms) is typical of 5% tolerance resistors. 1% parts more often have "1001" or the EIA-96 character code. Unfortunately I can't simply measure them in circuit with a multimeter, because the balun forms a DC short circuit across them. I don't want to desolder them. Still, I was wondering how much variances in these resistors would affect the bridge directivity.

Following is the result of a Monte Carlo simulation showing three histograms for bridge directivity. Each was calculated for one possible tolerance class of the 4 resistors used. The assumption was that individual resistor values are uniformly distributed between their maximum tolerances. The effect of two parallel resistors on the final distribution was included. The peak on each histogram shows the value for directivity that is most likely for a bridge constructed out of such resistors.

Directivity histogram calculated using a Monte Carlo method.

Each tolerance class defines the lowest possible directivity (where the two resistors are most mismatched). On the high end the histogram isn't limited. In any tolerance class there exist some small possibility that the resistors end up being perfectly matched, however the more you move away from the average directivity the less likely that is, as the probability asymptotically approaches zero.

Cumulative distribution function of bridge directivity.

This is the same data shown as an estimate of the cumulative distribution function. The annotations on the graphs show the 90% point. For example, for 5% resistors, 90% of the bridges would have higher than 32.1 dB directivity. You gain approximately 20 dB in directivity each time you reduce the resistor tolerance by a factor of 10.

It's important to note that this was calculated using a low-frequency bridge model. In other words, none of the high-frequency effects that cause the real-life directivity to fall as you go towards higher frequencies are counted. Any effects of the balun circuit and the quality of the REF termination were ignored as well. So the directivity numbers here should be taken as the best possible low-frequency case.

Anyway, I thought this was interesting. Similar results apply to other devices that use a resistor bridge circuit as a directional coupler, such as the NanoVNA and its various variants. Also somewhat related and worth pointing out is this video by W0QE where he talks about resistor matching for calibration loads and how different SMT resistors behave at high frequencies.

Posted by Tomaž | Categories: Analog | Comments »

Experiments with the "Transverters Store" RF bridge

07.06.2020 17:54

The "Transverters Store" RF bridge, for a lack of a better name, is a low-cost bridge circuit that can be used to measure reflection loss or voltage standing wave ratio (VSWR) at radio frequencies. It claims to be usable from 0.1 MHz to 3 GHz. Basic design and operating principle of a similar device is described in "A Simple Wideband Return Loss Bridge Revisited", an article by Paul McMahon from 2005. In it he also gives measurements of its performance up to 500 MHz. The exact device I have seems to be very much related to Paul McMahon's design. It came from a web page called Transverters-Store and shipped from Ukraine. Very similar looking products with more or less exact copies of the PCB layout are available from various other sources. Since there's no product number or a clear name associated with it, most often people refer to these as simply "that cheap bridge from eBay".

The "Transverters Store" RF bridge.

In short, the bridge operates similarly to the typical resistor bridge networks, like the Wheatstone bridge. The network is composed of two 50 Ω resistors on the PCB itself (each made out of two parallel 100 Ω resistors), the reference load on the REF port and the device under test on the DUT port. The biggest difference is the addition of a balun circuit. This makes the detector output referenced to ground, instead of floating between the two bridge branches like in a low-frequency bridge. The balun is implemented here as a high-frequency transformer made out of a row of black ferrite beads and two lengths of coax.

The bridge can in principle be thought of as a directional coupler. The signal power on the OUT port only corresponds to the reflected power coming back in to the DUT port, but (ideally) not the forward power going out of that port to the device under test. Compared to a true directional coupler however the bridge can't be operated in the reverse. You can't measure forward power by connecting a signal source on the OUT port and a detector on the IN port.

Ideal bridge output versus return loss.

This is how the bridge would behave if everything was ideal. The vertical axis shows the power on the OUT port relative to the power on the IN port. The horizontal axis shows return loss of the device under test. Using directional coupler terminology, the bridge has a coupling factor of 16 dB if used with a 50 Ω detector. It's also interesting to see that if using such a detector on the OUT port, the output of the bridge is slightly non-linear in respect to return loss. The difference is small - an open circuit will measure around 1.5 dB too high and a short will measure around 1.5 dB too low. Considering other inaccuracies, this detail probably isn't significant in practice.

Below you can see the setup I used for the experiments. Signal source on the top left is an ERASynth Micro. The detector on top right is an Ezcap DVB-T dongle (using Elonics E4000 tuner) and rtl-power-fftw. Both the source and the detector are controlled from a PC through a USB connection. Above the bridge you can see the terminations I used in the experiments: A borrowed professional Narda Micro-Pad 30 dB attenuator (DC - 18 GHz) which I used as a terminator, a couple of home-made 50 Ω SMA terminators (using two parallel 100 Ω 0603 resistors in a SMA connector), a home-made short and a no-name terminator that has a through-hole metal-film 51 Ω resistor inside.

The setup for experiments with the RF bridge.

Using this setup I tried to measure the directivity of the bridge. Directivity is the measure of how well the bridge selects between the forward and reflected power. The higher directivity, the lower return loss and VSWR you can reliably measure with it. Anritsu has a good application note that describes how directivity affects measurement error. I measured directivity by measuring OUT port power twice: once with a short on the DUT port and once with my home-made terminator. Dividing these two values gave an estimate of bridge directivity. I performed two measurements: once with the Narda attenuator on the REF port and once with my home-made terminator using two 0603 SMT resistors.

There is an approximately 200 MHz wide gap in my measurements at around 1.1 GHz because the DVB-T receiver cannot tune on those frequencies.

Measured directivity of the RF bridge.

You can see that my two measurements differ somewhat. In both, the bridge shows good directivity up to around 1 GHz. Above that, it's below 20 dB which introduces a large error in measured VSWR as you'll see below. The DVB-T receiver I used also shows a decrease in sensitivity above 1 GHz, however I've repeated this measurement using different input power levels (from -10 dBm to -30 dBm on ERASynth micro) and they all show similar results, hence I believe the measured decrease in directivity is not due to the limited dynamic range of my measurement setup.

In general, I think the biggest source of errors during these experiments are the terminators I've used. The directivity measurement assumes that the bridge can be measured with perfect termination on both the DUT and REF ports. By testing various combinations of terminators I had at hand I've seen significant differences in output port power, which suggest they are all slightly imperfect in different ways.

My measurements of directivity compared to seller's.

This is how my measurements compare with the measurements published on the Transverters Store website. The blue and orange plots are same results as above, only rescaled. Red plot is the Transverters' result. Again, my results differ from theirs. Below 500 MHz mine show a bit better directivity. Above 500 MHz mine are worse. Both show a slow decrease and then a sharp fall at around 1 GHz. I'm not claiming their measurements are wrong. My setup is very much improvised and can't compare to professional equipment. It's also very likely that different devices differ due to manufacturing tolerances.

Measured VSWR of the cheap SMA terminator.

Finally, here's an example of a VSWR measurement using this setup. I've measured the bad attenuator I've mentioned in my previous blog post that's made using a standard through-hole resistor. Again, the blue and orange plots show measurements using the two different references on the REF port of the bridge. The shaded areas show the error interval of the VSWR measurement due to bridge directivity I measured earlier. The VSWR of the device under test can be anywhere inside the area.

Interestingly, the terminator itself doesn't seem that bad based on this measurement. Both of my measurements show that the upper bound of the VSWR is below 1.5 up to 1 GHz. Of course, it all depends on the application whether that is good enough. You can also see that above 1 GHz the error intervals increase dramatically due to low bridge directivity. The lower bounds do carry some information (e.g. the terminator can't have VSWR below 1.5 at 2 GHz), but the results aren't really useful for anything else that a rough qualitative estimate.


In the end, this seems to be useful method of measuring return loss and VSWR below 1 GHz. Using it at higher frequencies however doesn't look too promising. 3 GHz upper limit seems to me like a stretch at this point. The largest practical problem is finding a good 50 Ω load to use on the REF port, a problem which was also identified by others (see for example a comment by James Eagleson below his video here). Such precision loads are expensive to buy and seem hard to build at home.

I was also surprised how well using the DVB-T tuner as a power meter turned out in this case. I was first planning to use a real power meter with this setup, but the device I ordered seemingly got lost in the mail. I didn't see any indication that the dynamic range of the tuner was limiting the the accuracy of the measurement. Since all measurements here use only ratios of power, absolute calibration of the detector isn't necessary. With the rtl-sdr device you only need to make sure the automatic gain control is turned off.

Posted by Tomaž | Categories: Analog | Comments »

What's inside cheap SMA terminators

29.05.2020 14:16

I've recently ordered a bag of "YWBL-WH" 50 Ω SMA terminators off Amazon along with some other stuff. Considering they were about 3 EUR per piece and I was paying for shipment anyway, they seemed like a good deal. Unsurprisingly, they turned out less than stellar in practice.

50 Ω SMA terminators I bought off Amazon.

At the time when I bought them, the seller's page listed these specifications, claiming to be usable up to 6 GHz and 2 W of power dissipation. There's no real brand listed and identical-looking ones can be found from other sellers:

Specifications for 50 ohm SMA terminators.

Their DC resistances all measured very close to 51 Ω, which is good enough. However when I tried using them for some RF measurements around 1 GHz I got some unusual results. I thought the terminators could be to blame even though I don't currently have equipment to measure their return loss. If I had bothered to scroll down on that Amazon page, I might have seen a review from Dominique saying that they have only 14 dB return loss at 750 MHz and are hence useless at higher frequencies.

I suspected what's going on because I've seen this before in cheap BNC terminators sold for old Ethernet networks, but I still took one apart.

Cheap SMA terminator taken apart.

Indeed they simply have a standard through-hole axial resistor inside. The center SMA pin is soldered to the lead of the resistor, but ground lead was just pressed against the inside of the case. According to the resistor's color bands it's rated at 51 Ω, 5% tolerance and 100 ppm/K. I suspect it's a metal film resistor based on the blue casing and low thermal coefficient (if that's what the fifth color band stands for). It might be rated for 2 W, although judging by the size it looks more like 1/2 W to me. In any case, this kind of resistor is useless at RF frequencies because of its helical structure that acts like an inductor.

Again it turned out that cheaping out on lab tooling was just a waste of money.

Posted by Tomaž | Categories: Analog | Comments »

Simple method of measuring small capacitances.

22.05.2020 18:17

I stumbled upon this article on Analog Devices' website while looking for something else. It looks like instructions for a student lab session. What I found interesting about it is that it describes a way of measuring small capacitances (around 1 pF) with only a sine-wave generator and an oscilloscope. I don't remember seeing this method before and it seems useful in other situations as well, so I thought I might write a short note about it. I tried it out and indeed it gives reasonable results.

Breadboard capacitance measurement schematic.

Image by Analog Devices, Inc.

I won't go into details - see original article for a complete explanation and a step-by-step guide. In short, what you're doing is using a standard 10x oscilloscope probe and an unknown, small capacitance (Crow in the schematic above) as an AC voltage divider. From the attenuation of the divider and estimated values of other components it's possible to derive the unknown. Since the capacitance of the probe is usually only around 10 pF, this works reasonably well when the unknown is similarly small. The tricky part is calibrating this measurement, by estimating stray capacitances of wires and more accurately characterizing the resistance and capacitance of the probe. This is done by measuring both gain of the divider and its 3 dB corner frequency.

Note that the article is talking about using some kind of an instrument that has a network analyzer mode and can directly show a gain vs. frequency plot. This is not necessary and it's perfectly possible to do this measurement with a separate signal generator and a digital oscilloscope. For measuring capacitances of around 1 pF using a 10 pF/10 MΩ probe a signal generator capable of about 100 kHz sine-wave is sufficient. Determining when the amplitude of the signal displayed on the scope falls by 3 dB probably isn't very accurate, but for a rough measurement it seems to suffice.

The measurement depends on the probe having a DC resistance to ground as well as capacitance. I found that on my TDS 2002B scope you need to set the channel to DC coupled, otherwise there is no DC path to ground from the probe tip. It seems obvious in retrospect, but it did confuse me for a moment why I wasn't getting good results.

I also found that my measured signal was being overwhelmed by the 50 Hz mains noise. The solution was to use external synchronization on the oscilloscope and then use the averaging function. This cancels out the noise and gives much better measurements of the signal amplitude at the frequency that the signal generator is set to. You just need to be careful with the attenuator setting so that noise + signal amplitude still falls inside the scope's ADC range.

Posted by Tomaž | Categories: Analog | Comments »

Another SD card postmortem

16.05.2020 11:28

I was recently restoring a Raspberry Pi at work that was running a Raspbian system off a SanDisk Ultra 8 GB micro SD card. It was powered on continuously and managed to survive almost exactly 6 months since I last set it up. I don't know when this SD card first started showing problems, but when the problem became apparent I couldn't log in and Linux didn't even boot up anymore after a power cycle.

SanDisk Ultra 8 GB micro SD card.

I had a working backup of the system, however I was curious how well ddrescue would be able to recover the contents of the failed card. To my surprise, it did quite well, restoring 99.9% of the data after about 30 hours of run time. I've only ran the copy and trim phase (--no-scrape). Approximately 8 MB out of 8 GB of data remained unrecovered.

This was enough that fsck was able to recover the filesystem to a good enough state so that it could be mounted. Another interesting thing in the recovered data was the write statistic that is kept in ext4 superblock. The system only had one partition on the SD card:

$ dumpe2fs /dev/mapper/loop0p2 | grep Lifetime
dumpe2fs 1.43.4 (31-Jan-2017)
Lifetime writes:          823 GB

On one hand, 823 GB of writes after 6 months was more than I was expecting. The system was setup in a way to avoid a lot of writes to the SD card and had a network mount where most of the heavy work was supposed to be done. It did have a running Munin master though and I suspect that was where most of these writes came from.

On the other hand, 823 GB on a 8 GB card is only about 100 write cycles per cell, if the card is any good at doing wear leveling. That's awfully low.

In addition to a raw data file, ddrescue also creates a log of which parts of the device failed. Very likely a controller in the SD card itself is doing a lot of remapping. Hence a logical address visible from Linux has little to do with where the bits are physically stored in silicon. So regardless of what the log says, it's impossible to say whether errors are related to one failed physical area on a flash chip, or if they are individual bit errors spread out over the entire device. Still, I think it's interesting to look at this visualization:

Visualization of the ddrescue map file.

This image shows the distribution of unreadable sectors reported by ddrescue over the address space of the SD card. The address space has been sliced into 4 MB chunks (8192 blocks of 512 bytes). These slices are stacked horizontally, hence address 0 is on the bottom left and increases up and right in a saw-tooth fashion. The highest address is on the top right. Color shows the percentage of unreadable blocks in that region.

You can see that small errors are more or less randomly distributed over the entire address space. Keep in mind that summed up, unrecoverable blocks only cover 0.10% of the space, so this image exaggerates them. There are a few hot spots though and one 4 MB slice in particular at around 4.5 GB contains a lot of more errors than other regions. It's also interesting that some horizontal patterns can also be seen - the upper half of the image appears more error free than the bottom part. I've chosen 4 MB slices exactly because of that. While internal memory organization is a complete black box, it does appear that 4 MB blocks play some role in it.

Just for comparison, here is the same data plotted using a space-filling curve. The black area on the top-left is part of the graph not covered by the SD card address space (the curve covers 224 = 16777216 blocks of 512 bytes while the card only stores 15523840 blocks or 7948206080 bytes). This visualization better shows grouping of errors, but hides the fact that 4 MB chunks seem to play some role:

Visualization of the ddrescue map file using a Hilbert curve.

I quickly also looked into whether failures could be predicted by something like SMART. Even though it appears that some cards do support it, none I tried produced any useful data with smartctl. Interestingly, plugging the SanDisk Ultra into an external USB-connected reader on a laptop does say that the device has a SMART capability:

$ smartctl -d scsi -a /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-12-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               Generic
Product:              STORAGE DEVICE
Revision:             1206
Compliance:           SPC-4
User Capacity:        7 948 206 080 bytes [7,94 GB]
Logical block size:   512 bytes
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
Serial number:        000000001206
Device type:          disk
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
Local Time is:        Thu May 14 16:36:47 2020 CEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
Device does not support Self Test logging

However I suspect this response comes from the reader, not the SD card. Multiple cards I tried produced the same 1206 serial number. Both a new and a failed card had the "Health Status: OK" line, so that's misleading as well.

This is a second time I was replacing the SD card in this Raspberry Pi. The first time it lasted around a year and a half. It further justifies my opinion that SD cards just aren't suitable for unattended systems or those running continuously. In fact, I suggest avoiding them if at all possible. For example, newer Raspberry Pis support booting from USB-attached storage.

Update: For further reading on SD card internals, Peter mentions an interesting post in his comment below. The flashbench tool appears to be still available here. The Linaro flash survey seems to have been deleted from their Wiki. A copy is available from archive.org.

Posted by Tomaž | Categories: Digital | Comments »

On missing IPv6 router advertisements

03.05.2020 16:58

I've been having problems with Internet connectivity for the past week or so. Randomly connections would timeout and some things would work very slowly or not at all. In the end it turned out to be a problem with IPv6 routing. It seems my Internet service provider is having problems with sending out periodic Router Advertisements and the default route on my router often times out. I've temporarily worked around it by manually adding a route.

I'm running a simple, dual-stack network setup. There's a router serving a LAN. The router is connected over an optical link to the ISP that's doing Prefix Delegation. Problems appeared as intermittent. A lot of software seems to gracefully fall back onto IPv4 if IPv6 stops working, but there's usually a more or less annoying delay before it does that. On the other hand some programs don't and seem to assume that there's global connectivity as long as a host has a globally-routable IPv6 address.

The most apparent and reproducible symptom was that IPv6 pings to hosts outside of LAN often weren't working. At the same time, hosts on the LAN had valid, globally-routable IPv6 addresses, and pings inside the LAN would work fine:

$ ping -6 -n3 host-on-the-internet
connect: Network is unreachable
$ ping -6 -n3 host-on-the-LAN
PING ...(... (2a01:...)) 56 data bytes
64 bytes from ... (2a01:...): icmp_seq=1 ttl=64 time=0.404 ms
64 bytes from ... (2a01:...): icmp_seq=2 ttl=64 time=0.353 ms
64 bytes from ... (2a01:...): icmp_seq=3 ttl=64 time=0.355 ms

--- ... ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2026ms
rtt min/avg/max/mdev = 0.353/0.370/0.404/0.032 ms

Rebooting my router seemed to help for a while, but then the problem would reappear. After some debugging I've found out that the immediate cause of the problems was that the default route on my router would disappear approximately 30 minutes after it has been rebooted. It would then randomly re-appear and disappear a few times a day.

On my router, the following command would return empty most of the time:

$ ip -6 route | grep default

But immediately after a reboot, or if I got lucky, I would get a route. I'm not sure why there are two identical entries here, but the only difference is the from field:

$ ip -6 route | grep default
default from 2a01::... via fe80::... dev eth0 proto static metric 512 pref medium
default from 2a01::... via fe80::... dev eth0 proto static metric 512 pref medium

The following graph shows the number of entries returned by the command above over time. You can see that most of the day router didn't have a default route:

Number of valid routes obtained from RA over time.

The thing that was confusing me the most was the fact that the mechanism for getting the default IPv6 route is distinct from the the way the prefix delegation is done. This means that every device in the LAN can get a perfectly valid, globally-routable IPv6 address, but at the same time there can be no configured route for packets going outside of the LAN.

The route is automatically configured via Router Advertisement (RA) packets, which are part of the Neighbor Discovery Protocol. When my router first connects to the ISP, it sends out a Router Solicitation (RS). In response to the RS, the ISP sends back a RA. The RA contains the link-local address to which the traffic intended for the Internet should be directed to, as well as a Router Lifetime. Router Lifetime sets a time interval for which this route is valid. This lifetime appears to be 30 minutes in my case, which is why rebooting the router seemed to fix the problems for a short while.

The trick is that the ISP should later periodically re-send the RA by itself, refreshing the information and lifetime, hence pushing back the deadline at which the route times out. Normally, a new RA should arrive well before the lifetime of the first one runs out. However in my case, it seemed that for some reason the ISP suddenly started sending out RA's only sporadically. Hence the route would timeout in most cases, and my router wouldn't know where to send the packets that were going outside of my LAN.

To monitor RA packets on the router using tcpdump:

$ tcpdump -v -n -i eth0 "icmp6 && ip6[40] == 134"

This should show packets like the following arriving in intervals that should be much shorter than the advertised router lifetime. On a different, correctly working network, I've seen packets arriving roughly once every 10 minutes with lifetime of 30 minutes:

18:52:01.080280 IP6 (flowlabel 0xb42b9, hlim 255, next-header ICMPv6 (58) payload length: 176)
fe80::... < ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 176
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
	...
19:00:51.599538 IP6 (flowlabel 0xb42b9, hlim 255, next-header ICMPv6 (58) payload length: 176) 
fe80::... < ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 176
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 0ms, retrans timer 0ms
	...

However in this case this wasn't happening. Similarly to what the graph above shows, these packets only arrive sporadically. As far as I know, this is an indication that something is wrong on the ISP side. Sending a RA in response to RS seems to work, but periodic RA sending doesn't. Strictly speaking there's nothing that can be done to fix this on my end. My understanding of RFC 4861 is that a downstream host should only send out RS once, after connecting to the link.

Once the host sends a Router Solicitation, and receives a valid Router Advertisement with a non-zero Router Lifetime, the host MUST desist from sending additional solicitations on that interface, until the next time one of the above events occurs.

Indeed, as far as I can see, Linux doesn't have any provisions for re-sending RS in case all routes from a previously received RAs time out. This answer argues that it should, but I can find no references that would confirm this. On the other hand, this answer agrees with me that RS should only be sent when connecting to a link. On that note, I've also found a discussion that mentions blocking multicast packets as a cause of similar problems. I don't believe that is the case here.

In the end I've used an ugly workaround so that things kept working. I've manually added a permanent route that is identical to what is randomly advertised in RA packets:

$ ip -6 route add default via fe80::... dev eth0

Compared to entries originating from RA this manual entry in the routing table won't time out - at least not until my router gets rebooted. It also doesn't hurt anything if additional, identical routes get occasionally added via RA. Of course, it still goes completely against the IPv6 neighbor discovery mechanism. If anything changes on the ISP side, for example if the link-local address of the router changes, the entry won't get updated and the network will break again. However it does seem fix my issues at the moment. The fact that it's working also seems to confirm my suspicion that something is only wrong with RA transmissions on the ISP side, and that actual routing on their end works correctly. I've reported my findings to the ISP and hopefully things will get fixed on their end, but in the mean time, this will have to do.

Posted by Tomaž | Categories: Code | Comments »

Measuring some Zener diodes

19.04.2020 12:05

I've been stuck working on a problem for the past few days. I need to protect an analog switch in a circuit from expected over-voltage conditions. Zener diodes seemed like a natural solution, but the border between normal operation and over-voltage is very thin in this particular case. I couldn't find components with a characteristic that would fit based solely on the specifications given in the datasheets. I've been burned before by overestimating the performance of Zener diodes so I decided to do some measurements and get some better feel for how they behave. The results were pretty interesting and I thought they might be useful to share.

The following measurements have all been done with my tiny home-brew curve tracer connected to a Tektronix TDS 2002B oscilloscope. Unfortunately this model only has 8-bit vertical resolution. This caused some visible stair-stepping on the vertical parts of the traces below. Nevertheless the measurements should give a pretty good picture of what's going on. Before doing the measurements I've also checked the DC calibration of the whole setup against my new Keysight U1241C multimeter. The error in measured voltage and current values should not be more than ±3%. Measurements were done roughly at room temperature and at a low frequency (100 Hz).

First measurement is with SZMMBZ5230BLT11G, a 4.7 V Zener diode from ON Semi in a SOT-23 SMT package. I've only measured a single one of these, since soldering leads to the SMT package was time consuming. The figure shows current vs. voltage characteristic in the reverse direction. The narrow, dark blue graph shows the actual measured values. The black dashed line shows the maximum power dissipation limit from the datasheet. I also made a model for the diode based on the minimum and maximum values for VZ and the single ZZT value given in the datasheet. The light blue area is the range of characteristics I predicted with that model.

Voltage vs. current graph for SZMMBZ5230BLT11G

The relevant part of the datasheet for this diode:

Excerpt from the MMBZ52xxBLT1G datasheet.

Image by ON Semiconductor

This is the same measurement repeated for BZX79C4V7, also a 4.7 V Zener diode from ON Semi, but this time in a sealed glass THT package. I've measured 10 of these. All came shipped in the same bag, which might mean they're from the same production batch, but I can't be sure. All 10 measurements are shown overlapped on the same graph.

Voltage vs. current graph for BZX79C4V7.

The relevant part of the datasheet:

Excerpt from the BZX79Cxx datasheet.

Image by ON Semiconductor

It's interesting to see that both of these parts performed significantly better than what their datasheets suggest. They were both in the allowed voltage range at the specified current (note that one is specified at 20 mA and the other at 5 mA). The differential impedance was much lower however. SZMMBZ5230BLT11G is specified at 19 Ω at 20 mA and I measured around 1 Ω. BZX79C4V7 is specified at 80 Ω at 5 mA and I measured 11 Ω. The datasheet for BZX79C4V7 does say that 80 Ω is the maximum, but SZMMBZ5230BLT11G isn't clear on whether that is a typical or the maximum value. It's was also surprising to me how the results I got for all 10 BZX79C4V7 measurements were practically indistinguishable from each other.

A note regarding the models. I used the classic diode equation where I calculated the parameters a and b to fit VZ and ZZ (or ZZT) values from the datasheets.

I = a ( e^\frac{U}{b} - 1)

As far as I know, a and b don't have any physical meaning here. This is in contrast to the forward characteristic, where they represent saturation current and thermal voltage. I wasn't able to find any reference that would explain the physics behind this characteristic and most people just seem to use such empirical models. The Art of Electronics does say that the Zener impedance is roughly inversely proportional to the current, which implies an exponential I-U characteristic.

From my rusty understanding of breakdown physics I was expecting that a junction after breakdown wouldn't have much of a non-linear resistance at all. I was expecting that a good enough model would just be a voltage source (representing the junction in breakdown) and a series resistance (representing ohmic contacts and bulk semiconductor). It seems this is not so, at least for the relatively low current conditions I've measured here. The purely exponential model also fits my measurements perfectly, which seems to confirm that this was a correct choice for the model.

Update: I found Zener and avalanche breakdown in silicon alloyed p-n junctions—I: Analysis of reverse characteristics (unfortunately pay-walled). It contains an overview of the various mechanisms behind junction breakdown. In contrast to all other references I've looked at it actually goes into mathematical models and doesn't just stop at hand-waving qualitative descriptions. The mechanisms are complicated and the exponential characteristic I've used is indeed just an empirical approximation.

Finally, it's interesting to also look at how the forward characteristics compare. Here they are plotted against a common signal diode 1N4148. Both Zener diodes are very similar in this plot, despite a different Zener impedance and a differently specified forward voltage in the datasheet. Compared to the signal diode they have the knee at a slightly higher voltage, but also steeper slopes after the knee:

Comparison of forward characteristics.

In conclusion, it's interesting to see how these things look like in practice, beyond just looking at their specifications. Perhaps the largest take away for me was the fact that a purely resistive model obviously isn't a good way of thinking about Zener diodes in relation to large signals. Of course, it's dangerous to base a design around such limited measurements. Another batch might be completely different in terms of ZZ and I've only measured a single instance of the SOT-23 diode. Devices might change after aging and so on. After all, the manufacturer only guarantees what's stated in the datasheet. Still, seeing these measurements was useful for correcting my feel for how these parts are behaving.

Posted by Tomaž | Categories: Analog | Comments »

How a multimeter measures capacitance

13.03.2020 10:57

I've recently bought a Keysight U1241C multimeter. One of the features it has is a capacitance measurement. Since this is my first multimeter that can do that I was curious what method it uses. I was also wondering what voltage is applied to the capacitor under test and whether the probe polarity matters (for example, when measuring electrolytic capacitors).

The Figure 2-16 in the User's Guide seems to imply that polarity is important. The red probe (V terminal) is marked as positive and the black probe (COM terminal) is marked as negative:

Figure 2-16: Measuring capacitance from the U1241C User's Guide.

Image by Keysight Technologies

The description of the measurement method is limited to this note and doesn't say what voltages or frequencies are involved, but does give a rough idea of what is going on:

Note about capacitance measurement from the U1241C User's Guide.

Image by Keysight Technologies

Connecting an oscilloscope to a capacitor while it is being measured by the multimeter reveals a triangle waveform. I made the following screenshot with a 47 μF electrolytic capacitor connected to the multimeter set to the 100 μF range. The oscilloscope was set to DC coupling, so the DC level is correctly shown as 0 V at the center of the screen:

Voltage on the 47 μF capacitor during measurement.

Since current into a capacitor is proportional to the time derivative of the voltage, a triangle-shaped voltage means that there is a constant current flowing alternatively in and out of the capacitor. Connecting different capacitors revealed that the current and the amplitude of the voltage stay constant for each measurement range, while the period of the signal changes. So the multimeter applies a known current source I to the probes and measures time t it takes for the voltage to rise (or fall) for a present difference Upk-pk. From the measured rise (or fall) time it then calculates capacitance:

C = \frac{I\cdot t}{U_{pk-pk}}

These are the approximate current and voltages used by the multimeter for each range:

Range [μF] I [μA] Upk-pk [mV]
1 1.5 800
10 15 800
100 150 800
1000 340 200
10000 340 200

Note that 1000 μF and 10000 μF ranges seem identical in this respect. I'm guessing the only change is how the time is measured internally. Perhaps a different clock is used for the counter.

If a high range is selected while a small capacitor is connected, the voltage on the capacitor can reach much higher amplitudes. The highest I saw was about 2 V peak-to-peak when I had a 4.7 nF capacitor connected while the instrument was set to 100 μF range.

Voltage on the 4.7 nF capacitor during measurement.

In conclusion, the polarity of the probes isn't important. The applied signal to the capacitor is symmetrical and the capacitor will be alternatively polarized in the positive and negative direction regardless of how it is connected to the multimeter. The voltages do seem low enough that they probably don't damage polarized electrolytic capacitors.

Posted by Tomaž | Categories: Analog | Comments »