On "ap_pass_brigade failed"

25.05.2016 20:37

Related to my recent rant regarding the broken Apache 2.4 in Debian Jessie, another curious thing was the appearance of the following in /var/log/apache2/error.log after the upgrade:

[fcgid:warn] [pid ...:tid ...] (32)Broken pipe: [client ...] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: ...

Each such error is also related to a 500 Internal Server Error HTTP response logged in the access log.

There's a lot of misinformation floating about this on the web. Contrary to the popular opinion, this is not caused by wrong values of various Fcgid... options or the PHP_FCGI_MAX_REQUESTS variable. Actually, I don't know much about PHP (which seems to be the primary use case for FCGI), but I do know how to read the mod_fcgid source code and this error seems to have a very simple cause: clients that close the connection before waiting for the server to respond.

The error is generated on line 407 of fcgid_bridge.c (mod_fcgid 2.3.9):

/* Now pass any remaining response body data to output filters */
if ((rv = ap_pass_brigade(r->output_filters,
                          brigade_stdout)) != APR_SUCCESS) {
    if (!APR_STATUS_IS_ECONNABORTED(rv)) {
        ap_log_rerror(APLOG_MARK, APLOG_WARNING, rv, r,
                      "mod_fcgid: ap_pass_brigade failed in "
                      "handle_request_ipc function");
    }

    return HTTP_INTERNAL_SERVER_ERROR;
}

The comment at the top already suggests the cause of the error message: failure to send the response generated by the FCGI script. The condition is easy to reproduce with a short Python script that sends a request and immediately closes the socket:

import socket, ssl

HOST="..."
# path to some document generated by an FCGI script
PATH="..."

ctx = ssl.create_default_context()
conn = ctx.wrap_socket(socket.socket(socket.AF_INET), server_hostname=HOST)
conn.connect((HOST, 443))
conn.sendall("GET " + PATH + " HTTP/1.0\r\nHost: " + HOST + "\r\n\r\n")
conn.close()

Actually, you can do the same with a browser by mashing refresh and stop buttons. The success somewhat depends on how long the script takes to generate the response - for very fast scripts it's hard to tear down the connection fast enough.

Probably at some point ap_pass_brigade() returned ECONNABORTED when the client broke the connection, hence the if statement in the code above. It appears that now EPIPE is returned and mod_fcgid was not properly updated. I was testing this on apache2 2.4.10-10+deb8u4.

In any case, this error message is benign. Fiddling with the FcgidOutputBufferSize might cause the response to be sent out earlier and reduce the chance that this will be triggered by buggy crawlers and such, but in the end there is nothing you can do about it on the server side. The 500 response in the log is also clearly an artifact in this case, since it's the client that caused the error, not the server, and no error page was actually delivered.

Posted by Tomaž | Categories: Code | Comments »

Jessie upgrade woes

23.05.2016 19:59

Debian 8 (Jessie) was officially released a bit over a year ago. Previous January I mentioned I plan to upgrade my CubieTruck soon which in this case meant 16 months. Doesn't time fly when you're not upgrading software? In any case, here are some assorted notes regarding the upgrade from Debian Wheezy. Most of them are not CubieTruck specific, so I guess someone else might find them useful. Or entertaining.

Jessie armhf comes with kernel 3.16, which supports CubieTruck's Allwinner SoC and most of the peripherals I care about. However, it seems you can't use the built-in NAND flash for booting. It would be nice to get away from the sunxi 3.4 kernel and enjoy kernel updates through apt, but I don't want to get back to messing with SD cards. Daniel Andersen keeps the 3.4 branch reasonably up-to-date and Jessie doesn't seem to have problems with it, so I'll stick with that for the time being.

CubieTruck

Dreaded migration to systemd didn't cause any problems, apart from having to migrate a couple of custom init.d scripts. The most noticeable change is a significant increase in the number of mounted tmpfs filesystems, which makes df output somewhat unwieldy and, by consequence, Munin's disk usage graphs a mess.

SpeedyCGI was a way of making dynamic web pages back in the olden days. In the best Perl tradition it tweaked some low-level parts of the language in order to avoid restarting the interpreter for each HTTP request - like automagically persisting global state and making exit() not actually exit. From a standpoint of a lazy web developer it was an incredibly convenient way to increase performance of plain old CGI scripts. But alas, it remained unmaintained for many years and was finally removed in Jessie.

FCGI and Apache's mod_fcgi (not to be confused with mod_fastcgi, it's non-free and slightly more broken cousin) seemed like natural replacements. While FCGI makes persistence explicit, the programming model is more or less the same and hence the migration required only some minor changes to my scripts - and working around various cases of FCGI's brain damage. Like for instance intentional ignorance of Perl's built-in Unicode support. Or the fact that gracefully stopping worker processes is more or less unsupported. In fact, FCGI's process management seems to be broken on multiple levels, as mod_fcgi has problems maintaining a stand-by pool of workers.

Perl despair

In any case, the new Apache 2.4 is a barrel of fun by itself. It changes the syntax for access control in such a way that config files need to be updated manually. It now also ignores all config files if they don't end in .conf. Incidentally, Apache will serve files from /var/www/html if it has no VirtualHosts defined. This seems to be a hard-coded default, so you can't find why it's doing that by grepping through /etc/apache2.

The default config in Jessie frequently warns about deadlocks in various places:

(35)Resource deadlock avoided: [client ...] mod_fcgid: can't lock process table in pid ...
(35)Resource deadlock avoided: AH00273: apr_proc_mutex_lock failed. Attempting to shutdown process gracefully.
(35)Resource deadlock avoided: AH01948: Failed to acquire OCSP stapling lock

I'm currently using the following in apache2.conf, which so far seems to work around this problem:

# was: Mutex file:${APACHE_LOCK_DIR} default
Mutex sem default

Apache 2.4 in Jessie breaks HTTP ETag caching mechanism. If you're using mod_deflate (it's used by default to compress text-based content like HTML, CSS, RSS), browsers won't be getting 304 Not Modified responses, which means longer load times and higher bandwidth use. The workaround I'm using is the following in mods-available/deflate.conf (you need to also enable mod_headers):

Header edit "Etag" '^"(.*)-gzip"$' '"$1"'

This differs somewhat from the solution proposed in Apache's Bugzilla, but as far as I can see restores the old and tested behavior of Apache 2.2, even if it's not exactly up to HTTP specification.

I wonder whether this state of affairs means that everyone has moved on to nginx or these are just typical problems for a new major release. Anyway, to conclude on a more positive note, Apache now supports OCSP stapling, which is pretty simple to enable.

Finally, rsyslog is slightly broken in Jessie on headless machines that don't have an X server running. It spams the log with lines like:

rsyslogd-2007: action 'action 17' suspended, next retry is Sat May 21 18:12:53 2016 [try http://www.rsyslog.com/e/2007 ]

This can be worked around by commenting-out the following lines in rsyslog.conf:

#daemon.*;mail.*;\
#       news.err;\
#       *.=debug;*.=info;\
#       *.=notice;*.=warn       |/dev/xconsole
Posted by Tomaž | Categories: Code | Comments »

Materialized Munin display

15.05.2016 21:25

Speaking of Munin, here's a thing that I've made recently: A small stand-alone display that cycles through a set of measurements from a Munin installation.

Munin display

(Click to watch Munin display video)

Back when ESP8266 chip was the big new thing I ordered a bag of them from eBay. The said bag then proceeded to gather dust in the corner of my desk for a year or so, as such things unfortunately tend to do these days. I also had a really nice white transflective display left over from another project (suffice to say, it cost around 20 £ compared to ones you can get for a tenth of the price with free shipping on DealExtreme). So something like this looked like a natural thing to make.

The hardware is not worth wasting too many words on: an ESP8266 module handles radio and the networking part. The display is a 2-line LCD panel using the common 16-pin interface. An Arduino Pro Mini acts as glue between the display and the ESP8266. There are also 3.3 V (for ESP8266) and 5 V (for LCD and Arduino) power supplies and a transistor level shifter for the serial line between ESP8266 and the Arduino.

ESP8266 runs stock firmware that exposes a modem-like AT-command interface on a serial line. I could have omitted the Arduino and ran the whole thing from the ESP8266 alone, however the lack of GPIO lines on the module I was using meant that I would have to use some kind of GPIO extender or multiplexer to run the 16-pin LCD interface. Arduino with the WeeESP8266 library just seemed less of a hassle.

Top side of the circuit in the Munin display.

From the software side, the device basically acts as a dumb display. The ESP8266 listens on a TCP socket and Arduino pushes everything that is received on that socket to the LCD. All the complexity is hidden in a Python daemon that runs on my CubieTruck. The daemon uses PyMunin to periodically query Munin nodes, renders the strings to be displayed and sends them to the display.

Speaking of ESP8266, my main complaint would be that there is basically zero official documentation about it. Just getting it to boot means reconciling conflicting information from different blog and forum posts (for me, both CH_PD and RST/GPIO16 needed to be pulled low high). No one mentioned that RX pin has an internal pull-up. I also way underestimated the current consumption (it says 1 mA stand-by on the datasheet after all and the radio is mostly doing nothing in my case). It turns out that a linear regulator is out of the question and a 3.3 V switch-mode power supply is a must.

My module came with firmware that was very unreliable. Getting official firmware updates from a sticky forum post felt kind of shady and it took some time to get an image that worked with 512 kB flash on my module. That said, the module has been working without resets or hangs for a couple of weeks now which is nice and not something that all similar radio modules are capable of.

Inside the Munin display.

Finally, this is also my first 3D printed project and I learned several important lessons. It's better to leave too much clearance than too little between parts that are supposed to fit together. This box took about four hours of careful sanding and cutting before the top part could be inserted into the bottom since the 3D printer randomly decided to make some walls 1 mm thicker than planned. Also, self-tapping screws and automagically hollowed-out plastic parts don't play nice together.

With all the careful measuring and planning required to come up with a CAD drawing, I'm not sure 3D printing saved me any time compared to a simple plywood box which I could make and fit on the fly. Also, relying on the flexibility and precision of a 3D print made me kind of forget about the mechanical design of the circuit. I'm not particularly proud of the way things fit together and how it looks inside, but most of it is hidden away from view anyway and I guess it works well enough for a quick one-off project.

Posted by Tomaž | Categories: Life | Comments »

Power supply voltage shifts

02.05.2016 20:16

I'm a pretty heavy Munin user. In recent years I've developed a habit of adding a graph or two (or ten) for every service that I maintain. I also tend to monitor as many aspects of computer hardware as I can conveniently write a plugin for. At the latest count, my Munin master tracks a bit over 600 variables (not including a separate instance that monitors 50-odd VESNA sensor nodes deployed by IJS).

Monitoring everything and keeping a long history allows you to notice subtle changes that would otherwise be easy to miss. One of the things that I found interesting is the long-term behavior of power supplies. Pretty much every computer these days comes with software-accessible voltmeters on various power supply rails, so this is easy to do (using lm-sensors, for instance).

Take for example voltage on the +5 V rail of an old 500 watt HKC USP5550 ATX power supply during the last months of its operation:

Voltage on ATX +5 V rail versus time.

From the start, this power supply seemed to have a slight downward trend of around -2 mV/month. Then for some reason the voltage jumped up for around 20 mV, was stable for a while and then sharply dropped and started drifting at around -20 mV/month. At that point I replaced it, fearing that it might soon endanger the machine it was powering.

The slow drift looks like aging of some sort - perhaps a voltage reference or a voltage divider before the error amplifier. Considering that it disappeared after the PSU was changed it seems that it was indeed caused by the PSU and not by a drifting ADC reference on the motherboard or some other artifact in the measurements. Abrupt shifts are harder to explain. As far as I can see, nothing important happened at those times. An application note from Linear mentions that leakage currents due to dirt and residues on the PCB can cause output voltage shifts.

It's also interesting that the +12 V rail on the same power supply showed a bit different pattern. The last voltage drop is not apparent there, so whatever caused the drop on the +5 V line seemed to have happened after the point where regulation circuit measures the voltage. The +12 V line isn't separately regulated in this device, so if the regulation circuit would be involved, some change should have been apparent on +12 V as well.

Perhaps it was just a bad solder joint somewhere down the line or oxidation building up on connectors. At 10 A, a 50 mV step only corresponds to around 5 mΩ change in resistance.

Voltage on ATX +12 V rail versus time.

This sort of voltage jumps seem to be quite common though. For instance, here is another one I recently recorded on a 5 V, 2.5 A external power supply that came with CubieTruck. Again, as far as I can tell, there were no external reasons (for instance, power supply current shows no similar change at that time).

Voltage on CubieTruck power supply versus time.

I have the offending HKC power supply opened up on my bench at the moment and nothing looks obviously out of place except copious amounts of dust. While it would be interesting to know what the exact reasons were behind these voltage changes, I don't think I'll bother looking any deeper into this.

Posted by Tomaž | Categories: Analog | Comments »