Creating a simple link registry

Posted on by Matthias Noback

The problem: if you publish any document as PDF, in print, etc. and the text contains URLs, there is a chance that one day those URLs won't work anymore. There's nothing to do about that, it happens.

Luckily, this is a solved problem. The solution is to link to a stable and trustworthy website, that is, one that you maintain and host (of course, you're trustworthy!). Then in the document you link to that website, and the website redirects visitors to the actual location.

An example: my book contains a link to https://enjoy.gitstore.app/repositories/matthiasnoback/read-with-the-author. When I moved that repository to a new organization on GitHub, this link resulted in a 404 Page not found error. The proper URL is now https://enjoy.gitstore.app/repositories/read-with-the-author/read-with-the-author. Chris from Gitstore was able to save the day by setting up a redirect on their site, but I wanted to make sure this kind of problem would never be a problem for me again.

The ingredients for the solution:

  • A domain name (I registered advwebapparch.com)
  • A simple website that can redirect visitors to the actual locations

I wanted to hook this new website into my existing Docker-based setup which uses Traefik to forward traffic to the right container based on labels. It turns out, with a simple Nginx image and some custom setup we can easily set up a website that is able to redirecting visitors.

The Dockerfile for such an image:

FROM nginx:stable-alpine
COPY default.conf /etc/nginx/conf.d/default.conf

Where default.conf looks like this:

server {
    listen 80 default_server;
    index index.html;
    root /srv;

    error_page 404 /404.html;

    rewrite /repository https://enjoy.gitstore.app/repositories/read-with-the-author/read-with-the-author redirect;
}

This already works, and when I deploying the resulting image to the server that receives traffic for advwebapparch.com, a request for /repository will indeed redirect a visitor to https://enjoy.gitstore.app/repositories/read-with-the-author/read-with-the-author using a temporary redirect.

Generating the Nginx configuration from a text file

When I'm working on my book, I don't want to manually update a server configuration file every time I'm adding a URL. Instead, I'd like to work with a simple text file. Let's name this file forwards.txt:

/repository https://enjoy.gitstore.app/repositories/read-with-the-author/read-with-the-author
/blog https://matthiasnoback.nl

And then I want the Docker image build process to add rewrite rules automatically, So I wrote a little PHP script that does this runs during the build. Here's what the Dockerfile looks like. It uses a multi-stage build:

FROM php:7.4-alpine as php
# This will copy build.php from the build context to the image
COPY . .
# This will generate default.conf based on template.conf
RUN php build.php

FROM nginx:stable-alpine
# Copy the default.conf from the php image to the nginx image
COPY --from=php default.conf /etc/nginx/conf.d/default.conf

Here's what happens inside the PHP script:

function insertRewritesInNginxConf(string $conf): string
{
    $rewrites = [];

    foreach (file('forwards.txt') as $line) {
        $line = trim($line);
        if (empty($line)) {
            continue;
        }

        $rewrites[] = '    ' . 'rewrite ' . $line . ' redirect;';
    }

    return str_replace(
        '%INSERT_URL_REWRITES_HERE%',
        implode("\n", $rewrites),
        $conf
    );
}

/*
 * Generate the Nginx configuration which includes all the actual
 * redirect instructions
 */
file_put_contents(
    'default.conf',
    insertRewritesInNginxConf(file_get_contents('template.conf'))
);

We should add a bit of validation for the data from the forwards.txt file so we don't end up with a broken Nginx configuration, but otherwise, this works just fine.

I don't want to manually check that all the links that are inside the "link registry" still work. Instead, I'd like to use Oh Dear for that, which does uptime monitoring and checks for broken links as well.

For this purpose I added another function to the PHP script, which, based on the same forwards.txt file generates an HTML index page containing all the links. Oh Dear can easily crawl such a page and will send me an email if a link no longer works.

function insertLinksInHtmlTemplate(string $htmlTemplate): string
{
    $links = [];

    foreach (file('forwards.txt') as $line) {
        $line = trim($line);
        if (empty($line)) {
            continue;
        }

        list($linkSource, $linkTarget) = explode(' ', $line);

        $links[] = '    ' . '<li><a href="' . $linkSource . '">'
            . $linkSource . '</a></li>';
    }

    return str_replace(
        '%INSERT_LINKS_HERE%',
        implode("\n", $links),
        $htmlTemplate
    );
}

/*
 * Generate the index HTML file including all the redirect links,
 * which can also be used to keep track of broken links.
 */

file_put_contents(
    'index.html',
    insertLinksInHtmlTemplate(file_get_contents('template.html'))
);

We only need to copy the second build artefact, index.html to the Nginx image as well, and we're done.

FROM php:7.4-alpine as php
COPY . .
RUN php build.php

FROM nginx:stable-alpine
COPY --from=php default.conf /etc/nginx/conf.d/default.conf
COPY --from=php index.html /srv

As a bonus, by crawling our index page, Oh Dear will also "test" our redirect functionality. If that doesn't work, it will also be seen as a broken link.

Conclusion

It's amazing how much is possible using a few simple but powerful tools like Docker, Traefik, and useful base images like those for Nginx and PHP. I'm pretty happy about the result; a statically generated link registry web site that can be rebuilt and redeployed in seconds.

PHP
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).