On Subdomain
Thu, January 24, 2008 - 6:45 PM PST
So today at work I'm working on a fairly interesting problem. A customer is having a problem that we think is stemming from a shared library that we ship on one of our products. We cannot reproduce this problem, but the customer sees the issue several times a day, so we are rolling an instrumented binary that'll give us more information on what's happening so that we can address the issue.
This particular shared library is linked to by almost every custom daemon that we ship - and that's like, 20 or 30 fairly big and complex programs. Hence, we'd rather not replace the system-wide library since any changes might just have serious unintended consequences, and on a customer's production systems, that would be the very definition of "a bad thing". So I suggest that we force this one program to use our customer version by using some linker trickery and it's agreed that in this case it makes sense.
So I get to playing around with it all and I notice that when I try to view what this daemon is linked against, I don't get any output. ldd works for almost everything else I try it on, and when I copy the binary to my workstation it works, but not on my test system. I try it on a few others, and they all fail. WTF? I play around with ldd and $LD_TRACE_LOADED_OBJECTS and copy over binaries for readelf and other binutils and I'm just getting pissed off because it just doesn't make any sense. So I bring it up in a meeting and a little while later, a coworker tells me he was hacking around and figured it out: subdomain.
Subdomain is some sort of kernel level policy tool that we ship that polices access to files and other resources. And since this daemon didn't have access to, get this, write to my pseudoterminal, the dynamic linker when invoked couldn't either, hence no output. Run it from a system console, and it worked fine. Gah.
So I'm hacking around and I go to set $LD_LIBRARY_PATH so that I can fool the daemon into using my shared object and not the one in /usr/lib only it doesn't take it. So I bust out strace and see that it's trying to load it, but errno is getting set to -1 (Permission Denied). So I start checking out permissions and ownership and just really confuse myself. Even when sitting in / with 777 permissions, I can't get this library to link. And then it hit me: subdomain.
I didn't even fix the problem. I just got up from my desk and left for the day. I can't believe that the same problem tripped me up twice within the same afternoon.
Never again. Never again.

syndication
