Web Dataflow Itches

A working list of things I want to be able to do with data on the web

ATOM and RSS, combined with UIs like Yahoo Pipes, made data on the web tangible. You could get data from here and here and here, mash it together and make new data (which, could then be combined with more data, ad nauseum).

Microformats2’s flexible, easy syntax and meticulously well-researched vocabularies render ATOM and RSS unnecessary, but despite many independents creating amazing tools like indieweb comments with them, there are few tools which make microformats2 data actionable and tangible.

Here are a bunch of raw, unedited ideas I plan on building in the near future:

Personal Itches

I.E. “I want do do X practical thing with data I’m publishing/reading on the web right now”

  • I want to be notified (push notifications to desktop or mobile device) of new content in any given stream.

  • I want to be able to graph my and data over time.

    • mixture of mf2 data (p-count) and raw text parsing, must be configurable
    • could easily be extended to any metric
  • I want to be able to run textual/sentiment analysis over my content and visualise patterns

    • E.G. using the emoome API — I had a listener running but it would be silly to reimplement it for every publishing platform, better to create it as a standalone app which consumes mf2 data
  • I want to be notified of links going dead in previous notes

    • maybe also automating fixing them, initially at least I don’t trust code to be able to figure out the best way of fixing any given broken link
  • I want to be able to see the contents of a stream in a slick UI (maybe updating in realtime) which makes the content actionable (e.g. webactions, integration with posting UI) and tangible.

  • I want to be able to stop publishing ATOM feeds of my content myself, and generate them automatically from the mf2 data

    • Sandeep already made a tool which does this, which he’s selfdogfooding so probably trustable
  • I want to be able to filter the contents of a stream based on mf2 properties, e.g. “show only content tagged with X”.

    • Aside: keeping this sort of logic in the client reduces the pressure on publishers to provide complex APIs, and removes the need for such interoperable APIs to be designed.
  • I want to be able to combine the contents of multiple streams into one stream, to pipe into any of the above items.

  • I want to be able to create dynamic streams, e.g. “content from every person linked to (h-card or XFN) on this page”

Secondary Itches

I.E. “I don’t personally have a need for X but have seen that others need/would benefit from it” or “it would be cool to do X”. Not as high priority as personal use cases.

  • People who publish static sites should be able to point their streams at a service which will poll/subscribe to changes and send webmentions for any links. This reduces the amount of work the publisher has to do — at most, send a simple HTTP POST to a service, with the option to not have to do anything (tradeoff: not so realtime as polling has to be done).
  • ditto but for POSSE? Saves you having to set up twitter/fb/etc API access, just get a hosted service to subscribe to your content, use their UI to connect to silos, then let it automatically POSSE for you.
    • downside: potential lack of POSSE control, lack of feedback in UI, more difficult to get u-syndication data back for original post discovery
    • still potentially useful for people with static sites, and a nice stepping stone (want to POSSE? enter your URL here and log in to twitter)

What needs to exist to make this happen

  • microformats2 data published in public
    • Needs to be valid — better validation tools, documentation
  • knowledge of past data: crawlable using rel=prev
  • knowledge of future data: PuSH pings, subscriptions enable realtime stuff
  • better mf2 consuming tools
    • Already making progress on this with mf-cleaner, will continue to spot patterns and figure out consumption best practices as I implement things, codify/standardise them as appropriate

All in all, not really that much we don’t already have.