Uncle Dave’s Proxy Toolkit

At work, I’ve spent much of the last several weeks working on deploying a proxy service. A proxy is a service that can retrieve and cache Web pages on behalf of a large number of users.

In theory, you can use it to save bandwidth and protect your users by stopping viruses and such before they reach the users’ desktops. In practice, it’s mostly used to make sure your employees aren’t screwing around on Facebook at work.

There are two main parts to offering a proxy service: The proxy server itself, and the necessary endpoint configuration.

The easy part was the proxy server. (Okay, three proxy servers. HA and failover, y’know.) We chose Cisco’s Web Security Appliance, which was low-cost (we already do a lot of business with Cisco), and dead-simple to deploy (it’s a VM image, you just have the VM people install and configure the initial IP address).


The hard part, of course, was/is the desktop configuration. There are several ways to push proxy configuration to an endpoint, and that’s just thinking about Windows desktops. Group policy, DHCP, DNS, oh and everything for Firefox is different from Internet Explorer and Google Chrome. We’re still fine-tuning some of the details, but things are mostly-working at this stage.

Disregarding HOW you push the settings, there’s two common kinds of settings. First is a simple “everything goes through proxy X on port Y.” We’re using a PAC file, though (PAC stands for Proxy Auto-Configuration), which is a small chunk of JavaScript code that the browser parses to determine what proxy/proxies to use, if any, for a given Web request.

Here’s a few links and notes, mostly for my own memory in case something like this comes up again…

  • Find Proxy For URL — Get this URL tattooed on the inside of your eyelids. It details PAC file syntax, covers what functions aren’t likely to work on what browsers, has good working examples. You could almost refer JUST to this site.
  • pacparser — This is a small library and (Unix/Linux) program for testing and validating PAC files. Once our initial deployment is complete and we’ve handled all the squirrely edge cases, I’m going to put our PAC files in version control and try to cook up an automated testing script using this.
  • PacDbg for Windows — more of a Windows person? Here’s the same kind of stuff but wrapped in a nice, functional GUI.

Stupid problems we ran into:

  • One of our common work sites is actually someone else’s, and their proxy settings were overriding ours in most instances. Was hard to track down, because the Internet Explorer settings window still looked “right”. Their network was using, in different places, both DNS WPAD (where your machine is being told to download proxy settings simply because a given hostname exists) and DHCP WPAD (where their network is sending over a proxy configuration as soon as you connect). These both can be real pains to track down.
  • If your proxy isn’t public (and it really shouldn’t be), there will be times when it’s not reachable. Ours, for instance, is on an internal network, which works great until your users take their laptops home, then the proxy is unreachable. For the most part, you can work around this with a well-crafted PAC file (make sure you put a “DIRECT” command at the end of every return line). But if you travel a lot, or switch between networks regularly, Windows may be tricking you. Read up on how to disable Windows’ proxy results cache. PC resources are so cheap, nobody will even notice. You may also want to consider a PowerShell script to modify the proxy settings when you change networks (you can do so with Windows Task Scheduler).
  • If your machines will regularly be used on a restrictive network (government facility, hospital, things like that), be sure you have sensible defaults. You might want to have everything in yourdomain.com to route DIRECT, because most of that stuff is on-site, but so many things are being hosted in public clouds these days, that may not be optimal. You’ll want to test as much as you can, and periodically re-test to be sure some key service hasn’t moved off-site. Similarly, test from as many networks as you can. Take a test machine home, maybe, or at least to the coffee shop across the street.

We ultimately settled on a fairly small PAC file, that handles our internal network (all 10.x.x.x IPs) separately, then sends everything else through our proxy. Here’s what it looks like:

function FindProxyForURL(url, host) {
    // Set up a default response
    var retstr = "PROXY our.proxy.server:3128; DIRECT";

    // normalize for pattern matching
    host = host.toLowerCase();
    url = url.toLowerCase();

    // single-element hosts (i.e. http://intranet/)
    if (isPlainHostName(host)) return "DIRECT";

    // If you want to exception hostnames, do so here
    // if(dnsDomainIs(host, '.yourcompany.com')) return "DIRECT";

    // DNS lookups can block, so we wait until after the common cases (above)
    var hostIP = dnsResolve(host);

    // Selected IP ranges are DIRECT, no proxy used
    if (isInNet(hostIP, "10.0.0.0", "255.0.0.0")) return 'DIRECT';      // our internal network
    if (isInNet(hostIP, "172.16.0.0", "255.240.0.0")) return 'DIRECT';  // someone else's internal network
    if (isInNet(hostIP, "127.0.0.0", "255.0.0.0")) return 'DIRECT';     // localhost

    // Capture all the protocols we can actually handle
    if ( (url.substring(0,5) == 'http:') ||
        (url.substring(0,6) == 'https:') ||
        (url.substring(0,4) == 'ftp:') ) {
            return retstr;
    }

    // Default for protocols we can't handle anyway
    return 'DIRECT';
}

 

  • Setting up SSL interception (it’s as creepy as it sounds) was difficult, but that’s mostly on the desktop people. You have to make sure all your machines trust the self-signed certificate you’re using. (And it has to be a self-signed certificate, or at least signed by an internal certification authority, because no legitimate CA will issue you a cert for this.)
Uncle Dave’s Proxy Toolkit

One thought on “Uncle Dave’s Proxy Toolkit

Comments are closed.