Web Actions

My Take on web actions as a new building block for the web. Includes detailed examples and nice drawings.

Background

“By limiting login or social sharing buttons, a publisher assumes that you use one of those services. By not limiting them, they assume you care enough about your content to search through a list.

Both of these are arrogant assumptions, not sustainable in the long run”

Barnaby Walters, 2012

What is a Web Action?

A Web Action framework consists of several loosely coupled pieces of software which allow a user to specify particular services for a user agent to delegate certain actions to.

    Web Action Delegate
    A service which provides facility to carry out a given verb
    Web Action Delegate URI
    A URI which provides an interface for carrying out a verb. The URI may accept query string parameters which auto-fill fields. If the interface is web-based it may be used within an iframe, in which case it is referred to as an Embedded Delegate UI
    Web Action Provider
    A user agent or user agent extension which allows users to specify Web Action Delegate URIs to use for particular actions in some config UI. It may also allow URI templates to be expanded into certain variables (e.g. Current URI, date, etc.) and may also perform UI injection into web pages.

Design Goals

  • The user must be able to control what services handle what verbs
  • There must be no inter-app authentication/authorisation
  • URI and related standards must be used as the interface between services
  • All parts of the system should be easy to implement

They Already Exist

These already exist, and not just targeting the web, either.

  • Twitter’s Web Intents
    • These are UIs (hyperlinks/buttons) and URL templates allowing a 3rd party site to offer the user an interface for liking/replying to/retweeting a tweet, then delegating the task to twitter.
    • The 3rd party site can specify text, username and URI to pre–load the tweet interface with
    • The 3rd party site never had to authenticate with Twitter, or make any requests to twitter.com
  • mailto: hyperlinks
    • These allow a 3rd party site to offer a proxy to an interface for sending a message
    • The 3rd party site can specify text for the mail interface to be preloaded with
    • The 3rd party site never had to know what OS or mail client the user was using

There are many other examples, including some which publish Web Action Delegates without documenting them (Readability, for example). I am documenting these over at the IndiewebCamp.com wiki. TODO: document them here.

The Flow

Here it is in an abstract form:

  1. “Hey Browser, I want to {verb} this {object}” User
  2. “User: You’ve told me you want {X Web Action Consumer} to handle that sort of verb, I’ll GET you their UI for {verb}ing that”
    “Hey, Web Action Consumer: User wants to {verb} this {object}!” Browser
  3. “Browser: User is authenticated with me, so here’s that UI you wanted” Web Action Consumer
  4. “Great, I’ll fill that in and submit it” User

Detailed Example

I’m going to use the example of finding content and publishing a reply to it in a distributed environment (i.e. the web), as that’s a flow which:

  • I’m familiar with; and
  • has had a prototype implemented ATOW

Components are marked up in bold, verbs are marked up in italics.

Assume the following components exist:

  • A browser which attaches a reply web action UI to any h-entries it sees in markup
  • A feed aggregator/reader which marks up items as h-entries
  • A publishing client which acts as a reply Web Action Delegate and REST client
    • It serves a UI for the reply action at http://publisher.com/reply?url={url}&text={text}, where:
      • {url} is the URL—encoded URL of the content being replied to; and
      • {text} is any text the web action originator sees fit to auto—populate the reply field with
  • A persistent data store which we’ll assume also presents and syndicates data (e.g. a website)
  • A happy and curious user, always willing to participate in conversation on the web

And that the user has taken the following steps:

  • Subscribed to some feeds of content in the feed aggregator
  • Set up the publishing client to publish to her persistent data store
  • Set up her browser to delegate reply actions to the publishing client

Exactly how these steps might take place is either not relevant to this document, or is described in more detail below.

  1. The user browses to her feed reader using her browser (I won’t bother specifying that any more)
  2. The user sees a piece of content she wants to reply to.
    • It’s URL is http://exampleuser.com/note/145
    • It was posted by user http://exampleuser.com, who’s @-name is @example
  3. As the content is marked up with h-card, the browser intelligently inserts/otherwise provides a UI element for replying to the content.
  4. When the user activates the reply UI element, the browser navigates to the following URL: http://publisher.com/reply?url=http%3A%2F%2Fexampleuser.com%2Fnote%2F145&text=%40example
  5. If the user is not logged in to the publishing client, she is presented with a login UI which, on success, redirects back to the web action endpoint
  6. The web action endpoint resolves to a UI for posting a reply, auto populated with @example
  7. Once the user has written out the reply, they submit the form
  8. The publishing client makes some request to the persistent data store, which stores the reply and syndicates it out to any subscribers, including @example, who is notified of the reply

That’s about as complicated as it gets.

More Example Flows

Most of these are related to the federated social web, as that’s what floats my boat right now. Web Actions are not limited to social activities, however.

Favourite/Like/Star/Props

  1. The user sees some content they like
  2. They activate the relevant browser-inserted UI element (like button, star button, +1 button, whatever)
  3. The browser directs them to the web action endpoint they’ve chosen to handle the like verb, providing the URL
    • e.g. http://publisher.com/like?url=http%3A%2F%2Fexample.com%2furl%2fof&2fcontent
  4. The web action consumer does whatever backend/social logic is required to like that content

Follow

This is a big (important, not complicated) one which has been the target of IMO oft–misdirected work in the past.

  1. The user sees the profile of someone and decides they want to subscribe to their updates
    • As the soon–to–be–followed user (call them User B)'s profile is marked up with h-card, the browser has provided some follow UI
  2. The user activates that UI
  3. The browser navigates to the web action endpoint which has been registered to handle follows
    • e.g. http://feedreader.com/follow?url=http%3A%2F%2Fuserb.com
  4. The web action consumer does whatever backend/social logic is required to follow that user, for example adding them to an XFN listing and/or subscribing to their feed.

Existing Implementations

There are many existing services which can act as Web Action Delegates, but very few user agents or user agent extensions which provide users with the means to put them to good use. Amongst them are:

  • My fork of Aaron Parecki’s Indieweb Reply cross-browser extension, which allows users to redirect social sharing buttons to Web Action Delegates. Includes support for multiple verbs and URI Templating.
  • Own Your Comments, my cross-browser extension for hijacking existing comments UIs to delegate to Web Action Delegates. The first example of Embedded Web Action Delegates.

To Do

  • What UI/flow should be used to:
    1. make the web action endpoint known to the user/browser; and
    2. allow the user to specify particular endpoints for particular verbs
    • In my implementations I solve the second problem by providing a UI which clearly maps verbs to text fields, with an example URI as a placeholder, a description of what the verb means and where it can be invocated from, and a list of templates which might be available at clicktime
  • What about the parsing of data out of the content and into the web action URLs?
    • This is an interesting issue which will require a lot of fallback–friendly coding

For the moment, I am more interested in getting an actual implementation of this system up and running across several sites than making it easy for every user to understand, so these notes are here more as a declaration of “hey, I haven’t overlooked the fact that this isn’t an intuitive system to the end user” and an acknowledgement that these issues exist.

Notes

  • Is this RESTful?
    • I’d be inclined to say Yes it is. The web action URL requests can only be GET requests, but they’re always getting a UI from which some other request could happen, so there’s no HTTP verb abuse going on here