My Hackergotchi

Updated: Never — Philip's Blog

Now featuring regular updates!

Thu, 22 Oct 2009

11:33 – Fixing strange DHCP behaviour

Someone -- I thought it was Kristof, but he claims not to have this problem so it must be someone else -- told me a while ago that Telenet's DHCP server "exhibits weird behaviour". That sort of mystery certainly gets the hyperactive mind interested.

For totally unrelated reasons, I found myself looking at a packet capture of DHCP traffic on a Telenet connection. Indeed, there was something very strange in there. The DHCP client would get a perfectly fine lease with perfectly reasonably renewing and rebinding times. When the renewing timer (T1) expired, the client would unicast a DHCPREQEST to the server and expect a unicast DHCPACK back. Only the DHCPACK would never arrive, and the client would retransmit the unicast DHCPREQUEST messages until the rebinding timer (T2) expired. At that time, the client would broadcast a DHCPREQUEST after which the DHCPACK would arrive.

The fact that the DHCPACK messages came through the DHCP relay server put me on a side-track briefly. I discovered that the DHCP server (mentioned in option 54) would not respond to my DHCP requests. While it makes perfect sense to protect a DHCP server from clients, you do want your clients to be able to get packets to them somehow.

I sent some packet captures to a contact inside Telenet (thanks ;-) I couldn't imagine trying to explain this to a helldesk!) wondering if they'd put too sharp an access control list between me and the DHCP server (recently -- because I hadn't seen the problem before). After some digging, they found that I was sending my unicast DHCPREQUEST messages with a random source port number. From my reading of the RFC, this is "allowed", but no one else does it. It turns out that Telenet does some sanity checking (sensible precaution) on DHCP messages before allowing them to go to the DHCP server. This sanity checking does not like (or recognize, presumably) DHCP messages with a source port other than bootpc (68).

FreeBSD's dhclient is a rather old version of ISC's reference implementation, simplified by OpenBSD. I found that OpenBSD has had a patch for a couple of years that purported to fix this behaviour. When I ported this patch to FreeBSD however, I found that sendmsg would return EINVAL, which was not documented to ever happen.

Again I wondered how people without source code to their operating systems get through the day? Do they resort to alcohol and panic at this stage? I used DDB to set a breakpoint on sendmsg and stepped through briefly, expecting it to blow up somewhere quickly when copying in the iovec or so. No such luck however, and I found myself in sosend_generic, which is not so much fun to step through without symbol information, so I set up remote debugging so I could use ddd.

Eventually, I found my way to rip_output and found that my EINVAL came from here:

if (((ip->ip_hl != (sizeof (*ip) >> 2)) && inp->inp_options)
    || (ip->ip_len > m->m_pkthdr.len)
    || (ip->ip_len < (ip->ip_hl << 2))) {
        INP_RUNLOCK(inp);
        m_freem(m);
        return (EINVAL);
}

Oh dear...:

(gdb) p m->M_dat.MH.MH_pkthdr.len
$6 = 328
(gdb) p ip->ip_len
$7 = 18433

Obviously (to the trained -- or strained -- eye which sees this kind of thing often), 18433 and 328 are strikingly similar. Indeed - it helps if you put the bytes in the right order!

For hysterical raisins, the raw socket interface on BSD-derived network stacks expects the ip_len field of the IP header included when IP_HDRINCL is sent to be in host byte order. dhclient used to only send packets with headers through the BPF, which will put the packet on the wire exactly as given (ie: the ip_len needs to be in the right order). For reasons which don't seem to be explained in CVS history, OpenBSD decided to change this behaviour in their network stack (making it differ from every other network stack and many books written about sockets).

To make a very long story short: I committed revision 198352 to make dhclient on FreeBSD work in networks which put sharp teeth between DHCP clients and servers. Debugging the problem also kept me out of trouble for a couple of hours.

I'm told that finding the cause of weird errors in the protocol stack is now significantly easier with DTrace. I will have to find some time to play with that. While ddd "works", it's not exactly the most pleasant tool to work with.

Entirely aside: I'm still not convinced that "sharp teeth" should care about the source port of unicast DHCPREQUEST messages, but I'm happy to accept that if everyone uses port 68, there's no reason to gratuitously differ from that. Thanks to the Telenetists for helping me look into this.

That could have been me who informed you about strange Telenet DHCP behavior, but that must have been at least two years ago. I switched to a more sensible ISP 1.5y ago.

Posted by Amedee at Thu Oct 22 15:37:45 2009

Name:

Email:

URL:

Comment:


Prove that you are not a spammer: