roadrunnertwice: Dee perpetrates some Mess. (Arts and crafts (Little Dee))
[personal profile] roadrunnertwice

So if you have access to an Apache2 server that allows .htaccess overrides and has mod_actions turned on, you can make a single CGI script take over the whole URL hierarchy for an entire site. (Or just for a subtree of it, although the app would need to be aware and ready for that.)

In short, you make a new directory called __internal (or something) at the top of your site, and put your CGI executable in there with a filename of my-app.cgi (or something). Then you make TWO .htaccess files.

The root-level .htaccess disables special handling for bare directories, then tells the server to unconditionally use your CGI script to handle every URL pointing into your site, without consideration for whether a path would otherwise aim at a file on disk.

# Root-level .htaccess file
Options -Indexes
DirectoryIndex disabled
Action my-app "/__internal/my-app.cgi" virtual
SetHandler my-app
AcceptPathInfo on # that's the default, but still

That CGI path in the Action directive needs to be a URL path pointed at somewhere reachable on your site, rather than a path on disk. That's kind of odd, and it hung me up for a while when I was trying to get this working! But the upshot is, we now need a second .htaccess in that __internal directory that un-does everything we did in the root-level .htaccess so that the server can actually resolve that script. (Otherwise you end up in a recursive loop and the site doesn't work.)

# .htaccess file in /__internal
Options +ExecCgi -Indexes
SetHandler None
AddHandler cgi-script .cgi

Ta-daaaa! Now your program can handle all the top-level routing for your site, using CGI vars like REQUEST_URI to reconstruct the original request and do your routing. (And don't worry about needing to keep __internal private or anything, it just needed some kind of weird name to avoid trampling on any of your app's real URL paths.)


A lengthy digression on the nature of Script Soup

Let's distinguish three kinds of website you can serve:

  1. Pure static content. There's Some Fuckin Files and httpd will serve them.
  2. Monolithic app. You write some software that takes over the entire URL hierarchy; it Knows Things About HTTP, it receives every request to the site, and it makes informed central routing decisions about what code to run in response. This is the default modality for modern server-side software.
  3. Script soup, or: "the mildly dynamic website". A free mixture of static files and executable programs. Instead of having an all-knowing router at the top level that decides what to do with each request, the web server makes local, delegated decisions about whether to serve a plain file or run some dynamic code. This can be based on file extensions (.php means it's a script you execute in a PHP interpreter), preconfigured special directories (/cgi-bin/ probably contains CGI scripts), or one-off configuration directives. This is the "traditional" default modality.

Script soup has a lot of really admirable and convenient properties! (So durable and stable! Static portions are bulletproof! Loosely coupled polyglot programming!) It's also low-key incredibly fucking weird, if you encounter it while trying to craft a site as a unified corpus of code. It basically turns an app inside out and detonates it, so that the URL routing dictates the division of sub-executables on disk (and some kinds of URL structures become impossible without fuckery like mod_rewrite rules, which further explode your core app logic out into the server configuration).

It's so different from the environment a monolithic app expects that the hosting models basically don't overlap; if you write software for one paradigm, you're sort of locked out of the other. Which is real unfortunate, because the traditional script soup hosting model is vastly superior for a modestly-technical user who wants to run a site without owning a bunch of infrastructure. (You may have heard this before.)

Personally, I want the power to write monolithic or semi-monolithic software to share with modestly-technical users who can't tolerate infrastructure more complex than a user account on shared hosting. There's also an opposite direction to come from, about trying to make mildly-dynamic content-centric sites more portable, declarative, and encapsulated (because currently they tend to leak their core logic out into server configuration and host-specific assumptions); I think that one's important too, but I don't have as many useful thoughts about it.

I've had some previous success with FastCGI β€”Β that's the whole Busriders mk 1 effort that I used to ship the Eardogger rewrite. But subsequent research sez I cannot use this to distribute useful software to normal people. FastCGI is largely forgotten tech, and the number of hosts I could find that definitely support it (in the "will launch arbitrary apps" mode, not the "we technically use this to run your PHP, but you can't access the pipe yourself" mode) was in the low single-digits.

That basically leaves CGI as the Entry Point Of The People, but it took me a while to find a way around the "URL hierarchy explodes onto disk" problem. (mod_fcgid had an easier path around that, due to its concept of a "script wrapper.") Now that I've solved that, well, who knows. The concurrent sqlite writers problem is still potentially annoying, but I'm sure there's ways to cope.