Problems with Hash Fragment Subscriptions

Superfeedr has this awesome feature allowing you to subscribe to fragments of an HTML page rather than the whole thing — perfect for cases where the publisher of the HTML page doesn’t send PubSubHubbub pings to let the hub know when content has changed, as this forces the hub to poll the content and notify subscribers on even the smallest change to the content.

Julien wrote a blog post detailing exactly how to set this up: Indieweb Microformats Fragment Subscriptions.

As a standalone feature of Superfeedr this is an extremely useful thing to be able to do, and in theory it should also fit right into the new content-agnostic PuSH 0.4 spec:

  • A publisher knows that the hub they ping supports fragment subscriptions
  • Knowing this, when a potential subscriber does a GET request with a fragment identifer on the URL they want to subscribe to the publisher returns a response with a Link rel=self header pointing to the URL with the fragment identifier
  • When there’s new content at that URL, the publisher pings the hub about the URL without the fragment
  • The hub figures out that the content identified by the fragment for this particular subscriber has updated, so sends a ping to the subscriber

Unfortunately, there are some practical problems with this approach.

Firstly, it adds more complexity to hubs. Whilst on the face this isn’t a huge problem, any added complexity will make it harder for people to build and host hubs. Additionally, it’s another thing for people to think about when choosing a hub to use.

The second, more significant problem is that the discovery flow demands that the publisher

  • know that its hub supports fragment subscriptions,
  • be able to tell which requests have fragment identifiers in, and
  • be able to adjust the response accordingly.

There are cases in which one or more of these might not be possible, or feasible.

For a publisher to know whether or not their hub supports fragment subscriptions is yet another thing to configure or discover, each of which have their own problems and complexities. Furthermore, a publisher can list more than one hub for any resource — what if there’s mixed support for fragment subscriptions? One potential answer to this is for the publisher to detect whether the current request has a fragment identifier, and adjust the output accordingly, but that runs into the second and third problems:

It is not always possible for a publisher to be able to determine if a request was made with a fragment identifier. I tested this using PHP and Django projects I had to hand.

Given a PHP file containing the following code:

print_r($_SERVER);

a GET request with a fragment identifier results in responses very much like this:

Array
(
    [HTTP_HOST] => waterpigs.co.uk
    [PATH] => …
    [SERVER_SIGNATURE] => 
    [SERVER_SOFTWARE] => …
    [SERVER_NAME] => waterpigs.co.uk
    [SERVER_ADDR] => 94.76.254.5
    [SERVER_PORT] => 80
    [REMOTE_ADDR] => …
    [DOCUMENT_ROOT] => /path/to/waterpigs.co.uk/www/web
    [REQUEST_SCHEME] => http
    [CONTEXT_PREFIX] => 
    [CONTEXT_DOCUMENT_ROOT] => /path/to/waterpigs.co.uk/www/web
    [SERVER_ADMIN] => barnaby@waterpigs.co.uk
    [SCRIPT_FILENAME] => /path/to/waterpigs.co.uk/www/web/frag.php
    [REMOTE_PORT] => 56432
    [GATEWAY_INTERFACE] => CGI/1.1
    [SERVER_PROTOCOL] => HTTP/1.0
    [REQUEST_METHOD] => GET
    [QUERY_STRING] => 
    [REQUEST_URI] => /frag.php
    [SCRIPT_NAME] => /frag.php
    [PHP_SELF] => /frag.php
    [REQUEST_TIME_FLOAT] => 1396021769.276
    [REQUEST_TIME] => 1396021769
)

I tested requests to this file served both via Apache and the built in PHP server, sent from the cURL command line tool, the PHP cURL extension, the Python requests library and a web browser. None of the responses included a hash fragment anywhere.

Wondering if this might be a deficiency with PHP, I added this view to a Django project:

def fragment_test(request):
    return HttpResponse(request.path)

Served using the built-in Django server, the results were the same: no fragment. This means that, at least for projects using PHP or Django, detecting GET requests with fragments is not possible.

There’s also a small issue with the third point too — in some cases, for example static sites, there simply isn’t a way to change the links sent in the HTML, precluding the above approach.

Given this evidence, I suspect that using PuSH to subscribe to fragments of a page is going to remain a feature (albeit an exceedingly useful one) of Superfeedr rather than becoming more widely supported as part of the PuSH protocol and ecosystem.

The original problem remains: how to efficiently subscribe to new h-entry content on a page regardless of whether or not the page supports PuSH? Given that content that does support PuSH will be updated only when new content is available, and that Superfeedr supports hash fragment subscriptions, the following algorithm should suffice:

  • GET the resource you want to subscribe to
  • If it supports PuSH, linking to self and a hub, subscribe to updates at that hub. On new notifications, parse the full HTML page and figure out what’s new.
  • If it doesn’t support PuSH, subscribe to the resource, plus a #.h-entry hash fragment using Superfeedr. Superfeedr will ping only when the first item with class~=h-entry changes

Initially, this appears to be the most sensible combination of both approaches and should result in manageable and consistent updates for the consumer, and the easiest possible, static site-compatible workflow for the publisher.

I have implemented a basic publisher workflow on waterpigs.co.uk, and plan to implement subscribers and eventually hubs too. I’ll be documenting what I learn here and on indiewebcamp.com/PubSubHubbub, and encourage you to do the same!