Vector measurements with the HackRF, 2

13.05.2021 20:10

Over the past year I've been slowly building up a small, one-port vector network analyzer. The last improvement I made to it was replacing the rtl-sdr receiver with the HackRF One. This increased the frequency range, but the dynamic range of the measurement was still quite low at higher frequencies. I suspected that a significant source of noise in the system was phase noise. In this post I describe some measurements I performed to get a better idea of what is going on in regard to phase. I also wanted to have a base reference to compare with when I change things in the future. This way I will be able to see whether I improved the instrument or made it worse.

My small, home-made vector network analyzer, upgraded with a HackRF.

First thing I measured was the apparent phase noise of the stimulus signal in the digital baseband. I manually set my instrument so that the signal coming from the ERASynth Micro synthesizer was routed directly to the HackRF receiver. I then recorded an IQ signal trace from the HackRF and calculated the apparent phase noise of the sine wave. The ERASynth Micro output frequency was set to 1 GHz.

Apparent phase noise of the stimulus signal in the digital baseband.

This is the resulting plot of the phase noise versus frequency offset. It is based on the FFT of the recorded digital baseband signal with 128k points and the Hann window. I verified that the measured noise level is above the spectral leakage due to FFT windowing (for Hann window the leakage falls off by 60 dB per decade). For reference I also plotted the phase noise specification of the ERASynth Micro from its datasheet. That would be the ideal result if HackRF was perfect and didn't contribute any additional noise. In reality, HackRF's internal oscillator is probably much noisier than the ERASynth Micro.

Currently the ERASynth Micro and HackRF are both running free from their own internal oscillators. They are not synchronized to a common reference, hence this graph is the combination of all sorts of effects in both devices: phase noise in both oscillators, jitter from various phase locked loops and probably other effects as well. The noise shown on the plot is not present in any real analog signal anywhere. It shows up on the digital data that comes out of the HackRF's ADC. Since that is the input to all further processing it's the thing I'm most interested in.

Trace phase noise when measuring the open standard.

Another thing I was interested in was the final noise level in the vector measurement. This is the trace phase noise that's usually specified for commercial vector network analyzers in degrees root-mean-square. It's the effective error on the phase coordinate that shows up in the final measurement result, after all the processing has been done. To estimate this for my system I did a zero-span vector measurement of the open calibration standard. I recorded 200 points at 1 GHz. The plot above shows the result on a linear-scale polar plot. The estimated error of the measurement was 1.39 degrees RMS.

Its apparent from the plot that my measurements are smeared more along the phase than the amplitude axis. This is where my initial assumption came from that the phase noise is currently more problematic in my system than inaccuracies in measuring the amplitude of the signal. Just for comparison I looked up some datasheets for commercial network analyzers. It seems a typical value for this would be in the range of 0.1 to 0.01 degrees RMS. Not that I ever expect to reach that level of accuracy with my home-brew instrument, but it's interesting to see how it compares.

Next step for this project is definitely to try to run the HackRF from the 10 MHz TCXO in ERASynth Micro and see how much this improves the metrics I described above. After some research it seems that I need to be careful with how I approach this. HackRF needs a 3.3V CMOS digital signal as a reference while Ref out on ERASynth Micro is a sine wave. I need to design a board that will convert the waveform, however a sloppy conversion can introduce additional jitter. I've been looking at some previous work published by the amazing Osmocom project and I will likely take their osmo-clock-gen and/or osmo-clock-conv designs as a starting point.

Posted by Tomaž | Categories: Analog | Comments »

Measuring interrupt response times, part 3

01.05.2021 18:17

Around five years ago I performed some measurements of interrupt response times in a Raspberry Pi Zero and an Arduino. My goal was to get some rough estimates of what kind of real-time performance you can expect from these systems. I was not interested in pushing them to their limits. I wanted to compare the most straightforward approaches - code you would find in documentation or in examples that pop up on top of web searches. This year the Raspberry Pi Pico was released and it promises to become just as popular. It brings some interesting new features that I wanted to explore, like MicroPython and the programmable I/O (PIO). I thought it would be interesting to repeat my old measurements and see how well it compares to the other two systems.

I only briefly summarize my previous results here. Read my original blog post for a longer introduction, description of the test setup and more in-depth discussion of the first batch of measurements. In the follow up post I also dug a little deeper into the reasons behind some of the more unusual results I got with Arduino and Raspberry Pi Zero.

Raspberry Pi Pico connected to the test setup.

For the purpose of this test, the interrupt response time is the time the system takes to change a state of an output GPIO pin in response to the change in an input GPIO pin. In real applications there is usually some kind of processing involved, so this value represents only the best-case scenario of how fast the software can respond to external events.

This response time was measured using a signal generator and an oscilloscope. A square wave generated by the signal generator was connected to the input pin. The two-channel oscilloscope was connected to both the input pin and the output pin. It was setup to measure the interval between the two state changes. The measurement was automated and repeated 500 times for each setup. Exact settings used are noted here.

To perform the test with the RP2040 processor on the Raspberry Pi Pico I installed a MicroPython firmware, as described in the Getting Started guide. I tried two implementations: A pure Python implementation was using the machine.Pin built-in class to configure a Python function as an interrupt handler. The PIO implementation used the rp2.asm_pio decorator to program the PIO state machine from Python code (see Section 3.9 in the Python SDK manual). After the state machine was programmed, the input was handled purely inside the PIO and the Python interpreter was not involved. You can find exact code I used in the GitHub repo.

Here is how the new measurements with the RP2040 compare with Arduino and the Raspberry Pi Zero:

Histogram of interrupt response time measurements.

The MicroPython implementation on the RP2040 (yellow) has the average response time of around 60 μs. This is around 3.5 times faster than using a CPython implementation on the Zero (cyan) which averages at around 210 μs. It is also more consistent, with less spread between minimum and maximum response times. A surprising result at the first glance, since Zero has a much more capable CPU running at up to 1000 MHz while the ARM core in the Pico only runs at 125 MHz.

The difference is very likely due to all the Linux kernel housekeeping and context switching that happens before the interrupt is propagated from the hardware to the Python process. MicroPython, while quite complex, is still a lightweight interpreter compared to the full CPython on the Zero. This is consistent with the fact that a C implementation that runs in the kernel on the Zero (blue) is much faster than MicroPython on the RP2040.

The following figure zooms in on the left end of the histogram:

Zoomed view of the left end of the response time histogram.

Here you can see that the PIO implementation is amazingly fast compared to all previously tested configurations. With the average response time of 0.043 μs it beats both the polling and the interrupt-driven C++ implementation on the Arduino by two orders of magnitude.

This comparison is a bit unfair though. The specialized PIO state machines on the RP2040 are indeed very fast, with only 8 ns per instruction and an instruction set that is optimized for responding to input events. However, the amount of processing you can do with them is very limited compared to all other approaches I've tested. Each PIO can only process 32 instructions. Most real-life applications beyond interfacing with a simple bus protocol will need a round-trip to MicroPython. This puts the response time back into the hundred-microsecond range.

Still, investigating PIO performance is interesting. Here is another level of zoom to show only the distribution of response times by the PIO implementation:

Histogram of response times for the RP2040 PIO implementation.

The response times should be in the range of 4 to 5 instruction cycles - 2 cycles for the input synchronizer (see 3.5.6.3 in the RP2040 Datasheet), between 1 or 2 cycles for WAIT and 1 cycle for SET. I did not use any clock dividers and used the default 125 MHz system clock, so each instruction takes 8 ns. This gives the range of response times between 32 to 40 ns.

I measured between 38 and 48 ns. Very likely this is a measurement error. Unfortunately my signal generator has a rise-time of around 10 ns. This means that in the nanosecond range the transition between low and high logic level is not well defined and this introduces an error into my measurement. I verified by other means that one PIO instruction indeed takes exactly 8 ns in my setup. It is also possible that I missed something and there is an additional PIO cycle (or two) needed somewhere before the response propagates to the GPIO pin.

On the oscilloscope screenshot below, the blue trace is the stimulus signal from the signal generator and the yellow trace is the response generated by the PIO on the output pin. You can see that the rise times are not insignificant compared to the measured time interval.

Signals on the input and output pins on the RP2040.

In the end this was an interesting exercise. I was surprised by the performance of MicroPython on the Raspberry Pi Pico and how quick the development setup is. I honestly expected Python code to run slower and I was again reminded that my intuition can be wrong sometimes. Unfortunately I didn't have time to setup the C SDK to also try out a native implementation of the same test on the RP2040. Perhaps some other day.

Programmable I/O is certainly the most interesting part of the RP2040. It took me a while to understand the unusual instruction set and how the FIFO buffers work. I like how the integration of the assembler into MicroPython makes it easily accessible for experimentation. I was impressed by the performance and quick response times. On the other hand, I was also surprised by how limited PIOs are in terms of the program size and the choice of instructions. I was expecting something similar to PRUs on the Sitara SoC. PIOs seem indeed very specialized devices for interfacing with digital buses and can't do much more in terms of algorithmic complexity.

Posted by Tomaž | Categories: Digital | Comments »

Exploring an old Belkin UPS

27.04.2021 19:53

Sometime around June 2005 I bought a small Belkin UPS to protect my home server from blackouts. For more than 15 years it has worked flawlessly. It survived several generations of server hardware and required only one battery change during that time. I was using it until a few months ago when it had developed an unusual problem: it still powers the load without issues from the battery when the mains power goes out. However it shuts down the moment the mains voltage returns, even if the battery charge is not yet depleted. I was curious to see what went wrong with it, so I recently had a closer look at its circuit. I couldn't find any obvious problems and it's possible it only needs a new battery. Still, it was an interesting thing to pick apart.

Belkin 650 VA Regulator Pro Silver Series UPS

This is the Belkin 650 VA UPS, model number F6C650uSER-SB. I believe this model was sold under the brand name The Regulator Pro: Silver Series. It uses a 5250 mAh sealed lead-acid battery. The battery provided at most 30 minutes of run time. I suspect the run time is internally limited regardless of battery capacity. Even with a minimal load I never saw it running on battery for longer.

The UPS can be controlled through a RS-232 serial connection. I used it on Debian through the belkin driver in the Network UPS Tools. The only issue I had with the driver that I can remember was that it was impossible to turn off the beeper which is annoyingly loud on this model.

Parts of the plastic enclosure with marked latches.

Accessing the electronics without damage takes some effort. The plastic enclosure consists of two black halves (1, 2) and a silver front panel (3). Obviously, remove the battery and all external connections before opening. There is a single screw holding the two black parts together. This is accessible through a hole in one of the sides. After removing the screw the next step is to remove the silver front panel. The panel is strongly held in place with four latches that are marked on the photo above. I used a spudger to go around the edges and loosen it a bit. Still, removing it was mostly a matter of applying brute force. After detaching the front panel the two black halves split easily.

Disassembled Belkin UPS showing the circuit board and internal wiring.

The circuit board and everything else inside is held in place with plastic latches on one of the halves of the enclosure. I had no problems removing the circuit once the enclosure was open. There are two large, single-side printed circuit boards. The horizontal board on the picture above holds the power conversion electronics. The vertical board contains the control circuit and the optically-isolated serial interface. There are also some parts, like ferrite beads, fuses and so on that are just hanging off the wires in the bottom right corner.

The main controller IC in the Belkin UPS.

The control board is soldered to the power board and isn't easily removable. The main controller appears to be this large IC in a 42-pin DIP package. The chip is marked ST72C334. It seems to be an ST7-series 8-bit microcontroller from STMicroelectronics. The C in the part number tells that the software is stored in flash memory, not factory-coded ROM. Sticker on it reads 5015320501 (probably some internal part number) and date code 0113 (I'm guessing week 13 of 2001 - it must have been already several years old when I bought it in 2005).

A glob of solder hanging off one of the tabs that hold the control board.

I've noticed a glob of solder just barely holding onto one of the tabs that hold the control board in place. It looked like it was just moments away from dropping away and causing a short circuit disaster on the circuit below. The solder joint also had a crack in it, however that tab appears to be only for mechanical support and doesn't have any electrical function. The break couldn't have been a source of the problems I had with the UPS.

Top side of the power circuit board with labeled parts.

Unsurprisingly for its relatively low cost and small size, this is an offline UPS. When mains power is present, the load is powered directly from the input via a relay (1). A battery charger keeps the 12 V battery topped up from the mains AC voltage (2). When mains power is lost, a high-voltage DC-DC boost converter converts the 12 V battery to a high DC voltage (3). The H-bridge 50 Hz inverter then chops that high DC voltage to AC (4) to power the load. There is also a common-mode filter on the power board (5) and a current transformer (6) that is used by the controller to measure the current drawn by the load.

I've traced out the main parts of the circuit. The sections of the schematic are labeled in the same way as the parts of the circuit board in the previous photograph. Obviously there is a lot missing, but the topology of the voltage converters is clearly visible.

A rough schematic with the main components of the Belkin UPS.

There is no isolation in this circuit. Everything is referenced to mains voltage, even the battery and the control circuit. Only the serial interface is separated from the rest of the circuit via optocouplers. This means that it's a very bad idea to connect anything to the battery terminals that's not a battery.

A weird detail in this circuit is the relay with the question mark. It connects the DC-DC boost converter to the battery charger. I'm not sure what its purpose is. It might be there to keep the 400 V capacitor charged while the mains power is present so that the inverter can start faster. However in that case a diode should work just as well. It might also serve as a part of some kind of a self-test function.

I haven't found exactly how the control circuit is powered. There is no obvious DC-DC converter dedicated to it. Very likely there is a linear regulator hidden somewhere that is powered from the 12V battery voltage. I'm certain that the relay coils are powered from the battery. What this means is that if the battery is depleted or degraded to the point where it can't activate the main relay the UPS won't start. The UPS by itself cannot recover from a completely discharged battery since the relays in their idle position disconnect the charger, and everything else, from the mains voltage.


As I mentioned above, this UPS developed a problem where it shuts off when the mains power comes back after an outage. The switch-over from mains to battery power works fine. It's the transition from battery power back to mains that's broken. After an outage you need to long press the button on the front panel to restore power to the load. If the load is still running on battery power when the mains comes back it will lose power without warning (i.e. without a clean shutdown).

I did a few basic checks. Visually everything looks good. Surprisingly, all the big electrolytic capacitors also seem just fine. The pair of 2200 μF 16 V in parallel with the battery measured in-circuit as 4700 μF and 40 mΩ ESR. The 22 μF 400 V for the inverter input voltage measured 20 μF and 1.7 Ω ESR.

The battery I bought in 2014 has a rated capacity of 5250 mAh. After 7 years of use it still retained 2300 mAh when I measured it outside of the UPS. The internal resistance measured around 1 Ω, which does seem a big high. Obviously this battery is ripe for replacement. It might be that the cause of my problems is simply due to the high internal resistance of the battery. When the mains power comes back, the battery must actuate the main relay as well as continue to power the inverter. Perhaps this current spike causes enough of a voltage drop to reset the control circuit.

Unfortunately I don't have an isolation transformer so I can't do much debugging of the circuit while it's live. Connecting an oscilloscope to it is out of the question so I can't check for voltage drops on the battery side. It certainly seems possible though that simply buying a new battery would fix it. When the previous battery went bad the UPS was completely dead. Only reading on the web that this is a typical symptom of a bad battery in these models convinced me to just replace the battery instead of buying a new UPS. I don't need another UPS at the moment though so I don't think I'll go that route. From the last time I remember it wasn't trivial to find a supplier anyway. For now this UPS will just end up in my spare computer parts pile.


I remember this was a pretty expensive gadget when I bought it even though it was the cheapest entry level model I could find. If I knew that it would serve me for 15 years I wouldn't hesitate to buy it. Looking inside it now it also looks well designed with plenty of safety features. I guess it's longevity isn't that surprising though. Even at the start this UPS was never loaded anywhere near its maximum rating. Over the years my server only grew less power hungry with each update. The last computer the UPS was connected to didn't even register on its power meter. It always showed that the output was unloaded.

Posted by Tomaž | Categories: Analog | Comments »

Optimizing an amplitude-shift keying detector

15.04.2021 20:03

Nothing is more permanent than a temporary solution to an engineering problem. Some time ago I was reverse engineering a proprietary wired network protocol. I ended up quickly throwing together a simple audio-frequency amplitude-shift keying detector just so that I could record some traffic. I needed many packet captures before I could begin to understand what is being sent over the line and just taking screenshots on an oscilloscope was too inconvenient. A few months later and almost the exact schematic I made with a few back-of-the-napkin calculations ended up in an actual product. It seemed to work fine in practice so there didn't appear to be any need for additional design work. A year later however and some problems became apparent with my original design. With a better idea about what kind of sensitivity and selectivity was required I got to revisit my old circuit.

The simple amplitude shift keying detector.

The first detector I made for my experiments just used a passive, RC band pass filter to isolate the carrier. It has a single transistor acting both as a demodulator and an amplifier to produce a 5V digital signal. I made it during lockdown last year on a piece of breadboard from spare components I had lying around. The design that ended up in manufacturing switched to smaller, surface mount components but was basically unmodified in function.

To better understand the performance of this circuit I made a simulation model of the detector and the signal being detected. I used Spice OPUS for this task. Although I've also used ngspice a few times in the past, I keep returning to Spice OPUS. I've used it since my undergrad days and at this point I know most of its quirks. I like to use tools that I understand very well and know when I can and when I can't trust the results they give me.

Since a detector is a non-linear circuit I had to use the transient analysis. The basic Spice analysis types don't allow you to run the transient analysis on a range of input signals automatically. I had to write up a short program in Nutmeg, the Spice scripting language that is included in Spice OPUS. I varied the carrier amplitude and frequency on a logarithmic scale and chose 60 points on each axis. This resulted in 3600 separate simulations being run. I chose this number simply because it still returned a result reasonably fast on my computer.

I could use Spice itself to visualize the results, but the plotting capabilities are limited and I much rather work in Python than Nutmeg. Hence I only wrote out raw Spice vectors into a text file and then used Python and matplotlib to visualize the results:

Visualization of the usable region for the old detector design.

On this plot the axes are input carrier frequency and amplitude on a logarithmic scale. The color shows demodulated output signal level. The output is in inverted logic, hence if a carrier is detected the output is low. The hatched area shows where the output is not defined according to the 5V CMOS logic levels. Those input ranges are forbidden since in those cases the output of the detector could be interpreted either as low or high by the digital logic after it.

The circuit reliably detects the carrier for a range of frequencies, but the amplitude must be relatively high. At the frequency with best sensitivity, it would only detect signals with amplitudes of around 1 V or more. The two red dots on the plot are my new design requirements. The system uses a frequency multiplex to carry various channels on one line. I wanted to make a new circuit that would still detect the wanted carrier at 150 mV amplitude at 50 kHz. At the same time it should not be sensitive to an unwanted signal at 1 kHz and up to 2 V amplitude. The existing circuit obviously falls short of the former requirement.

Meeting these requirements wasn't simple. Simply increasing sensitivity of the detector wouldn't work. It would not be selective enough and would react to carriers outside of the desired frequency band. I needed to add better filtering as well as increase gain. Since the original circuit was so compact there was very little spare PCB space left for improvements. I considered a few opamp-based approaches as well as pushing demodulation to the digital domain, but in the end those all turned out to be infeasible.

I ended up adding another transistor with which I implemented both a Sallen-Key active second-order filter as well a small gain stage. Having a transistor pair in a SOT-23 doesn't take any more space than the single one in the original design. I also optimized the circuit so that it had many identical-valued resistors. That made it possible to replace four individual 0603 chip resistors with a single 0805 resistor array, saving more board space.

This is how the simulation of the new design looks like:

Visualization of the usable region for the new detector design.

The plot shows that the new circuit meets both the requirements I was aiming for. It produces a well-defined digital output for weaker signals thanks to the additional gain. The new filter also results in more attenuation outside of the desired frequency band. This means that the stronger, unwanted signal at 1 kHz still does not produce any output.

I was worried that the sensitivity of the detector would depend too much on variations in transistor parameters, so I also did some additional simulations to check that. I varied the transistor model in Spice to include the edge cases for DC gain and base-emitter voltage specified in the datasheet. The results show that even with worst-case transistors the new design should still meet the requirements:

Effect of transistor variations on the usable region.

In conclusion, this was an interesting challenge, but also frustrating at times. I kept thinking that I should come up with a more modern design. This kind of discrete-transistor circuit seemed old-fashioned. I felt there should be an off-the-shelf IC that would do what I needed, but I failed to find one. I looked at the LM567 tone detector and a few similar solutions, but I simply couldn't fit any of them on the available PCB space.

I haven't done any such involved discrete design in a while and I had to dig myself into some reference books first and refresh my knowledge of transistor properties. After a few tries and iterations I ended up with a solution that I think is quite elegant and solves the problem I had in front of me. An opamp design would likely have better properties, but in the end a circuit that you can use is better than one that you can't.

Posted by Tomaž | Categories: Analog | Comments »

Making replacement Chieftec drive rails, 3

03.04.2021 18:30

A couple of months ago I was writing about 3D-printable replacement drive rails for my Chieftec PC enclosure. Back then I've designed and printed some parts that were functional enough to use for mounting a new set of 3.5" hard drives into my computer and I have been using them since. However I was bothered by the fact that the new rails required two parts to be glued together. I've now updated the design so that the latch snaps onto the base part of the rail. It's purely a friction fit and hence assembly of the new rails doesn't require any glue.

A pile of original purple Chieftec drive rails.

Having to glue together 12 rails for the complete set of drives was bothersome. However the main reason why I wanted to avoid using glue was because I couldn't find one that would bond well to the PETG plastic that my parts were printed from. Every glue I tried produced a very weak bond that soon fell apart. I tried to modify the design so that it minimized the stress on the bond, but that didn't really work. I've heard that this specific glue produces good results with the filament I used, however I could not find a shop that would have it in stock.

Replacement Chieftec disk rail render from FreeCAD.

Coming up with a design where the latch just snaps into the base required two more round trips between CAD and the 3D printer. I find that the most troublesome part of any 3D printed design is always the place where two parts need to slide or engage with each other. Each printer has slightly different tolerances. Often this differs even between prints on the same printer. It's hard to find the exact amount of space you need to leave in the STL files between surfaces. Too much and the parts fit too loosely, too little and the parts don't fit together or break when they are assembled.

Replacement drive rails being printed on a Prusa printer.

I've written in one of my earlier posts that I was also worried about the 3D printed rails getting soft when in contact with the warm hard drives. I've been using them now for close to 2 months and so far I haven't seen any signs of deformation due to heat. On the other hand, the winter has barely ended. I expect the drives to reach higher temperatures in summer.

I've put the new designs in place of the old ones. There's now also a README file there that has some condensed instructions for printing.

Replacement rail with the latch mounted on a hard drive.

In the end, this took way more time than I anticipated. 3D printers are fun and convenient, but getting to a design that works well can still be very time consuming. In this particular case it was also mostly due to me being stubborn and wanting a replacement that functions more or less exactly like the original. It turned out that even without the latch the rails function quite well. There are sheet metal springs in the case that grab onto the holes for the screws on the rails. At least in my enclosure, these springs alone provide enough friction that drives are held pretty well even without the additional security of the latch action on the rail itself.

Posted by Tomaž | Categories: Life | Comments »

Vector measurements with the HackRF

27.03.2021 20:09

Over the last year I slowly built up a small, one-port vector network analyzer. The instrument consists of a rtl-sdr USB receiver dongle, the ERASynth Micro frequency synthesizer, an RF bridge, a custom time multiplex board I designed and, of course, a whole lot of software that glues everything together and does final signal processing in the digital domain. In the past months I've written pretty extensively about its development here on this blog and I've also used the system in practice, mainly to measure VSWR and matching networks of various multiband LTE antennas.

The original instrument could perform S11 measurements up to around 2 GHz. However the 2 GHz limit was only due to the E4000 tuner in the rtl-sdr. Since ERASynth Micro can generate signals up to 6.4 GHz and I designed the multiplexer with signals up to 8 GHz in mind, getting measurements at higher frequencies was only a matter of upgrading the receiver. I was tempted to do that since the popular 2.4 GHz ISM band was just out of reach of my measurements. I did a lot of searching around for a suitable SDR that would cover that frequency range. The top of my list was an USRP B200, but amid the lockdowns and the general chaos in international shipping last year I couldn't find a supplier. In the end I settled for a HackRF One.

My small, home-made vector network analyzer, upgraded with a HackRF.

On one hand, moving from the rtl-sdr to HackRF was pretty simple. After I got hackrf_tcp up and running most of my old code just worked. Getting things to work reasonably well took longer though. I lost a lot of time figuring out why the gain in the system varied a lot from one measurement to the other. I would optimize signal levels for best dynamic range, only to try it again next day and find that they have changed considerably. In the end, I found out that the USB hub that I was using could not supply the additional current required by the HackRF. Interestingly, nothing obvious broke with the USB bus voltage wildly out of spec, only the analog performance became erratic.

I wish HackRF came with some hardware settings (jumpers or something) that would allow me to semi-permanently disable some of its more dangerous features. As it is right now I need to use a DC block to protect my multiplex board from a DC bias in case the antenna port power gets enabled by a software bug. I also had to put in a 20 dB attenuator to protect the HackRF in case its pre-amplifier gets turned on, since that would get damaged by the signal levels I'm commonly using.

Error network terms when using Henrik's bridge with HackRF.

Here are the error network terms with the upgraded system. The faded lines are the measurements by Henrik Forstén from whom I copied the RF bridge design. Up to 2 GHz the error terms match pretty well with those I measured with the rtl-sdr. As I noted in my previous blog post, the error terms are a measure of the whole measurement system, not only the bridge. Hence my results differ from Henrik's quite significantly since apart from the bridge his system is quite different.

Unfortunately, my new system still isn't very well behaved beyond 2 GHz. I suspect this has to do with the construction that has a lot of connectors everywhere and RG-316 cables of questionable quality. Every contact introduces some impedance mismatch and in the end it's hard to say what is causing what. I also know that I made an error in calculating the dimensions of the coplanar waveguides on my multiplex board. I'm sure that is not helping things.

Estimated dynamic range of the system.

This is an estimated dynamic range of the vector measurement, using the method described in this post. Let's say it's better than 50 dB below 1.5 GHz and better than 30 dB below 3.5 GHz. It quickly gets worse after that. At around 5.5 GHz I suspect there's some kind of a resonance in the 10 dB attenuator I have on the multiplex board that's made out of 0603 resistors. Beyond that point the signal that passes through the attenuator is stronger than the un-attenuated signal and I can't measure anything anymore.

I really would like to improve the dynamic range in the future. Basically all the practical measurements I did with this system was with the device-under-test being behind a U.FL connector. The problem with that is that the connector introduces a significant mismatch before the device-under-test. You can calibrate this out, but this comes at a cost of the effective dynamic range. Hence it may be true that 30 dB of dynamic range is enough for practical measurements. However as soon as you start moving the measurement plane away from the port of the instrument, you really want to start with as much dynamic range as possible.

I believe most of the total noise in the system now actually comes from the phase noise. Currently the signal source and the receiver use independent clocks and each of those clocks has its own drift and various artifacts introduced by individual phase locked loops. I suspect things would improve significantly if I could run both ERASynth Micro and HackRF from a common clock source, ideally from the ERASynth Micro's low phase noise TCXO. Unfortunately they use incompatible interfaces for reference clock in/out. I need to make an adapter circuit before I can try this out.

In the end, this is all kind of an endless rabbit's hole. Every measurement I take seems like it could be done better and there appears to be a lot of room for improvement. I've already accumulated a wish list of changes for the multiplex circuit. At one point I will likely make a new versions of the bridge and the multiplexer PCBs using the experience from the past year of experimenting.

Posted by Tomaž | Categories: Analog | Comments »

Characterizing the RF Demo Kit

19.03.2021 18:47

The RF Demo Kit is a small printed circuit board with several simple RF circuits on it. There are several variants out there from various sellers. It's cheap and commonly sold together with NanoVNA as a learning tool since it is nicely labeled. Among random circuits like filters and attenuators, it also contains a set of short, open, load and thru (SOLT) standards that can be used for VNA calibration. These are all accessible over U.FL surface mount coaxial connectors.

The "RF Demo Kit" circuit board, NWDZ Rev-01-10.

I've been using the SOLT standards on the Demo Kit for calibrating my home-made vector network analyzer. I've been measuring some circuits with an U.FL connection and I lack a better U.FL calibration kit at the moment.

Of course, for this price it's unrealistic to expect the Demo Kit to come with any detailed characterization of the standards. Initially I just assumed the standards were ideal in my calculations. However I suspected this wasn't very accurate. The most telling sign was that I would often get an S11 measurement of a passive circuit that had the absolute value larger than 1. This means that the circuit reflected more power than it received and that wasn't possible. Hence the calibration must have given my instrument a wrong idea of what a perfect signal reflection looks like.

The "RF Demo Kit" connected to my home-made vector network analyzer.

I suspected that my measurements would be more accurate if I could estimate some stray components of the standards on the Demo Kit and take them into account in my calibration. I did the estimation using a method that is roughly described in the Calibrating Standards for In-Fixture Device Characterization white paper from Agilent.

I did my measurements in the frequency range from 600 MHz to 3 GHz. First I used my SMA cal-kit to calibrate the VNA with the measurement plane on its SMA port. I then measured the S11 of the Demo Kit short standard, using a SMA-to-U.FL coax cable:

S11 for the short standard before port extension.

I used this measurement to calculate the port extension parameters to move the measurement plane from the VNA port to the position of the short on the Demo Kit PCB. I verified that the calculated port extension made sense. Calculated time delay was approximately 0.95 ns. With a velocity factor of 0.7 for the RG-316 cable, this gives a length of 19.9 cm which matches the length of the 20 cm cable I used almost exactly.

S11 for the short standard after port extension.

Using the port extension "unwinds" the Smith chart for the short standard. Ideally the whole graph should be in one spot at Z = 0 (red dot). In reality, noise of the VNA and various other imperfections make it a fuzzy blob around that point.

I then applied the same port extension to the measurement of the open standard. Ideally, this graph should be in one point at Z = infinity (again, marked with a red dot). This graph is still very noisy, like the previous one. However one important difference is that it clearly shows a systematic, frequency dependent deviation, not just random noise around the ideal point. Some calculation shows that this deviation is equivalent to a stray capacitance of about 0.58 pF. The simulated response of an open with 0.58 pF stray capacitance is shown in orange:

S11 for the open standard after port extension.

The Agilent white paper goes further and also estimates the stray inductance of the load standard. This is how my measurement of the Demo Kit load standard looks like with the same port extension applied as before. Again, the ideal value is marked with the red dot:

S11 for the load standard after port extension.

It looks pretty messy, with maximum reflection loss of about -12 dB at frequencies up to 3 GHz. Note that the U.FL connectors themselves are only specified up to about -18 dB reflection loss, so a lot of this is probably due to connector contacts. The white paper describes using time gating to isolate the effect of the load from the effect of contacts, but the mathematics of doing that based on my measurements escape me for the moment. I also suspect that my setup lacks the necessary bandwidth.

I stopped here. I suspect estimating the stray load inductance wouldn't make much difference. Keep in mind that the graph is drawn with port extension in place, which removes the effect of the cable. Hence the circling of the graph must be due to phase delays in the load resistor and the U.FL connector. The phase delay also cannot be due to the short length of microstrip between the U.FL and the 0603 resistor on the RF Demo Kit PCB. That shouldn't account for more than about 15° of phase shift at these frequencies.

At the very least I'm somewhat satisfied that the Figure 7(b) in the white paper shows a somewhat similarly messy measurement for their load standard. I'm sure they were using a much higher quality standard, but they were also measuring up to 20 GHz:

S11 measurement of the load standard from the Agilent white paper (Fig 7b)

Image by Agilent Technologies, Inc.

In the end, here is the practical effect of taking into account the estimated stray capacitance of the open standard when measuring S11 of a device:

S11 measurements calibrated using ideal open or open with stray capacitance.

When the VNA was calibrated with the assumption of a ideal open standard, the log magnitude of the measured S11 went above 0 dB at around 1400 MHz. This messed up all sorts of things when I wanted to use the measurement in some circuit modeling. However, when I took the stray capacitance of the open standard into account, the measured S11 correctly stayed below 0 dB over the entire frequency range. Since I don't have a known-good measurement I can't be sure in general whether one or the other is more accurate. However the fact that the limits seem correct now suggests that taking the estimated stray capacitance into account is an improvement.

Posted by Tomaž | Categories: Analog | Comments »

hackrf_tcp, a rtl_tcp for HackRF

13.03.2021 19:03

rtl_tcp is a small utility that exposes the functionality of a rtl-sdr receiver over a TCP socket. It can be interfaced with a simple Python script. I find it convenient to use when I need to grab some IQ samples from the radio and I don't want to go into the complexity of interfacing with the GNU Radio or the C library. Using it is also much faster compared to repeatedly running the rtl-sdr command-line utility to grab samples into temporary files at different frequencies or gain settings.

I wanted to use something similar for the HackRF. Unfortunately the stock software that comes with it doesn't include anything comparable. After some searching however I did find an old fork of the HackRF software repository by Zefie on GitHub that included a file named hackrf_tcp.c. Upon closer inspection it seemed to be a direct port of rtl_tcp from librtlsdr to libhackrf. It didn't compile out of the box and merging the fork with the latest upstream produced some conflicts, but it did look promising.

I resolved the merge conflicts and fixed the code so that it now compiles cleanly with the latest libhackrf. I also added a few small improvements.

Just like with rtl_tcp, the protocol on the socket is very simple. Upon receiving the client's connection the server initializes the radio and starts sending raw IQ samples. The client optionally sends commands back to the server in the form of a simple structure:

struct command{
	unsigned char cmd;
	unsigned int param;
}

cmd is the command id and param is a parameter for the command. Commands are things like SET_FREQUENCY = 0x01 for setting frequency, SET_SAMPLERATE = 0x02 for setting ADC sample rate and so on.

The original code attempted to keep some backwards compatibility by mapping HackRF's functionality to existing rtl-sdr commands. This included things like using the frequency correction command to enable or disable the HackRF's RF preamplifier. I dropped most of that. Obviously I kept the things that were direct equivalents, like center frequency and sample rate setting. The rest seemed like a dangerous hack. Enabling HackRF's preamp by mistake can damage it if there's a strong signal present on the antenna input.

Instead, there is now a new set of commands starting at 0xb0 that is HackRF exclusive. Unsupported rtl-sdr commands are ignored.

Even with hacks backwards compatibility wasn't that good in the first place and I wasn't interested in keeping it. HackRF produces IQ samples as signed 8 bit values while rtl-sdr uses unsigned. The code makes no attempt to do the conversion. There is also a problem that the 32 bit unsigned parameter to the SET_FREQUENCY = 0x01 command can only be used for frequencies up to around 4.3 GHz, which is less than what HackRF can do. To work around that limitation I added a new command SET_FREQUENCY_HI = 0xb4 that sets the central frequency to parameter value plus 0x100000000.

My updated version of hackrf_tcp is in my hackrf fork on GitHub. It seems reasonably stable, but I've seen it hang occasionally when a client disconnects. I haven't looked into this yet. In that case it usually requires a kill -9 to stop it. In hindsight, separating hackrf_tcp out into its own repository instead of keeping it with the rest of the upstream tools might have been a better idea.

As it is right now, you need to compile the whole libhackrf and the rest of the host tools to get hackrf_tcp. The basic instructions in the README still apply. After installation you can just run hackrf_tcp from a shell without any arguments:

$ hackrf_tcp
Using HackRF HackRF One with firmware 2018.01.1
Tuned to 100000000 Hz.
listening...

You can also specify some initial radio settings and socket settings on the command-line. See what's listed with --help.

Posted by Tomaž | Categories: Code | Comments »

RAID and hard disk read errors

27.02.2021 19:07

I have been preoccupied with data storage issues lately. What I though would be a simple installation of a solid state drive into my desktop PC turned into a month-long project. I found out I need to design and make new drive rails, decided to do some overdue restructuring of the RAID array and then had to replace two failing drives. In any case, a thing that caught my eye recently was a warning about RAID 5 I saw repeated on a few places on the web:

Note: RAID 5 is a common choice due to its combination of speed and data redundancy. The caveat is that if one drive were to fail and another drive failed before that drive was replaced, all data will be lost. Furthermore, with modern disk sizes and expected unrecoverable read error (URE) rates on consumer disks, the rebuild of a 4TiB array is expected (i.e. higher than 50% chance) to have at least one URE. Because of this, RAID 5 is no longer advised by the storage industry.

This is how I understand the second part of the warning: it talks about a RAID 5 array with total usable capacity of 4 TiB. Such an array would typically consist of three 2-terabyte disks. In the described scenario one of the disks has failed and, after adding in a replacement drive, the array is restored by reading the contents of the remaining two disks. This means we need to read out 2 times 2 terabytes of data without errors to successfully restore the array.

I was surprised by the stated higher-than-50% chance of a read error during this rebuild procedure. It seemed too high given my experience. Hence I've looked up the reliability section of the datasheet for the new P300-series, 2 TB Toshiba desktop-class hard drive I just bought:

Reliability section of the datasheet for Toshiba P300 hard drive.

I'm a bit suspicious of the way probability is specified here. Strictly reading the exponential notation, 10E14 means that the probability of an unrecoverable error (URE) is one error per 10⋅1014 bits. Expressed as probability of an error when reading a single bit:

P_{URE} = \frac{1}{10\cdot10^{14}} = 10^{-15}

In another datasheet for a different series of drives (however this time for data center instead of consumer use) the error rate is given as 10 errors per 1016 bits. This again gives the same error probability of 10-15.

Consider this probability for a second. It's such a fantastically low number. I don't remember ever encountering an actual technical specification that would involve a probability that has a one preceded by fifteen zeros - or in other words - fifteen nines of reliability.

The number is just on the edge of what you can represent with the common 64-bit double-precision floating-point format. If using a tool like numpy that only uses double-precision, any calculations with such values need to be done extra carefully to ensure that loss of numerical precision doesn't lead to nonsensical results.

Hard drives tend to use SI prefixes instead of binary, so I'll do the calculation for 4 terabytes instead of 4 tebibytes like it says in the quote:

n = 4 \cdot 10^{12} \cdot 8 \mathrm{bits}

For this calculation it doesn't matter whether we're reading this number of bites from one or two drives since the URE probabilities are assumed independent. The probability of getting at least one error during the rebuild is:

P_{rebuild-error} = 1 - (1 - P_{URE})^n \approx 3.1\%

Note that if I read 10E14 in the original reliability specification as 1014, the probability of a rebuild error goes up to 27%.

This comes out a bit more optimistic than the higher-than-50% figure given in the warning, at least for this specific series of hard drives. I guess whether 3.1% is still too high depends on how much you value your data. However consider that in the original scenario this is the probability of an error given that another hard drive in the array has already catastrophically failed. So the actual probability of data loss is this multiplied with the (unknown) probability of a complete drive failure.

Then again, consider that this is a desktop drive. It is not meant to be put into a RAID array and is typically used without any redundancy. Some people will even scream at you if you use desktop drives in a RAID due to timeout issues. Without any redundancy this probability directly becomes the probability of data loss. And that seems exceedingly high - especially considering that drives up to 8 TB seem to be sold with this same error rate specification. Even with that amazing reliability of reading a single bit, modern drives are simply so large that the vanishingly tiny error probabilities add up.

Posted by Tomaž | Categories: Life | Comments »

Reading RAID stride and stripe_width with dumpe2fs

20.02.2021 20:08

Just a quick note, because I found this confusing today. stride and stripe_width are extended options for ext filesystems that can be used to tune their performance on RAID devices. Many sources on the Internet claim that the values for these settings on existing filesystems can be read out using tune2fs or dumpe2fs.

However it is possible that the output of these commands will simply contain no information that looks related to RAID settings. For example:

$ tune2fs -l /dev/... | grep -i 'raid\|stripe\|stride'
$ dumpe2fs -h /dev/... | grep -i 'raid\|stripe\|stride'
dumpe2fs 1.44.5 (15-Dec-2018)

It turns out that the absence of any lines relating to RAID means that these extended options are simply not defined for the filesystem in question. It means that the filesystem is not tuned to any specific RAID layout and was probably created without the -E stripe=...,stripe_width=... option to mke2fs.

However I've also seen some filesystems that were created without this option still display a default value of 1. I'm guessing this depends on the version of mke2fs that was used to create the filesystem:

$ dumpe2fs -h /dev/... |grep -i 'raid\|stripe\|stride'
dumpe2fs 1.44.5 (15-Dec-2018)
RAID stride:              1

For comparison, here is how the output looks like when these settings have actually been defined:

$ dumpe2fs -h /dev/md/orion\:home |grep -i 'raid\|stripe\|stride'
dumpe2fs 1.44.5 (15-Dec-2018)
RAID stride:              16
RAID stripe width:        32
Posted by Tomaž | Categories: Code | Comments »