7 Questions for Superfeedr
Superfeedr is the brainchild of French developer Julien Genestoux, a leading authority on the emerging crop of applications and systems based on delivering data with extremely low-latency. The project is the evolution of Notifixious (which I've previous profiled), which sought to help content creators by making the distribution of RSS- and Atom-based feeds currently in use more expedient.
Julien benefited from some very generous and impressive external investment, allowing him to expand his platform to be a service using a variety of real-time technologies including XMPP, PubSubHubbub and SUP to harvest distant-end feeds.
Here's 7 Questions for Julien about Superfeedr:
1. Superfeedr was the driving force (your custom feed-fetching/parsing technology) behind Notifixious, and it's now become the brand of your service. The service's major notable form of growth is notifications via PubSubHubbub, alongside pure XMPP. What's the reaction been to expanding your supported real-time protocols?
We love XMPP and we think it's the most powerful technology for real-time web. Yet, with a little pragmatism, it's obvious that this protocol scares a lot of people. HTTP is something quite well-known by many people: how to scale it, cache, etc.; it's pretty much a required computer science skill. Adopting PubSubHubbub brought us a lot of attention from people who would have never otherwise looked into our direction.
On top of that, PubSubHubbub is a very well thought protocol and covers the publish/subscribe mechanism in a much clearer and precise (albeit not as rich in terms of features than XMPP PubSub): they both belong to the real-time web world.
2. You're perhaps only the second major instance of a deployed PubSubHubbub hub, aside from the initial instance the Google engineers who developed it put out there. Have you seen a rise in interest in using such a technology stack for enabling web-based pubsub systems?
Yes, definitely. The pubsub pattern was known for years. Yet as I said previously, this implementation of it is very elegant and the massive use of WebHooks makes it very attractive. The emergence at the same time of the real-time web interest was also a major traction factor.
3. Superfeedr is based on a very clever business plan. Describe it.
For subscribers: stop polling, we'll do that for you. Tell us how much it costs you, we'll match that anyway. The assumption is that we fetch the same data for several people. We're selling the same thing several times, when the cost of the n entries is the same as one entry.
For publishers: we'll make your feeds real-time for free (and we benefit from that for the subscribers, too). If you want cool features (analytics, customization, etc.) then we'll charge you, based on the volume of data which transits through us on your behalf.
4. You've received investment from some pretty well-known sources - Mark Cuban among them. What has the capital infusion allowed you to do in terms of your technology and supporting infrastructure? And explain the composition of your backend - specific languages, servers and technologies.
This was a small seed round. I'm no "Internet superstar" and I don't host parties. Yet, this investment was made for us to invest in servers (we basically tripled our number of servers since then), as well as - more importantly - key hires. Speed is our only competitive advantage, staying small is important to stay fast, but this money also helped us get a few great engineers who are helping us with new features to be announced!
Our architecture is quite simple: almost like a distributed BotNet. We have many parsers who receive feed URLs from dispatchers and then return the parsed content to it. The dispatchers send the feeds based on a pre-determined 'next-fetch' time. And that's pretty much it. Everybody is connected via XMPP, which brings stuff like presence, querying and XML - convenient when dealing with feeds!
5. I see that some of your more notable clients include Posterous, Tumblr and twitterfeed. In what ways have they leveraged low-latency push notifications and with which technologies? And you've also got FriendFeed as a client, which uses SUP (its own protocol). How is Superfeedr being used for that platform?
We host PubSubHubbub hubs for them. We believe publishers should focus on publishing, and we can help them improve their deliverability, by providing them with a hosted solution where the only things they have to do is (1) add some discovery inside their XML feeds, and (2) ping us whenever they update these. We will deal with the subscription and notification processes. Most of them have some kind of ping mechanism in place; we just make sure we translate these pings into an universal protocol, which is PubSubHubbub.
As for FriendFeed, we also use their Simple Update Protocol (SUP), on the subscriber side to know when a feed has been updated. It's not a ping protocol per se, but we consider it as similar - if a publisher already uses SUP, they can very easily turn on a hub at Superfeedr.
6. The service works great for relatively low-volume RSS/Atom feeds. But as we move away from that format towards update-intensive stream-based data, how does your system scale under periods of heavy duress with feeds like Digg's 'Popular Stories' or firehoses?
Well, there's heavy and then there's heavy. I think we could pretty much handle Digg's "Popular Stories" feed, but we couldn't handle Twitter's firehose. PubSubHubbub, given the fact that it's build on HTTP, can't go seriously and reliably under a few seconds of latency. So feeds like the Twitter firehose couldn't really work (at more than 1 entry/sec, it's not reasonable to expect anything just yet). However, all feeds can be seen as an aggregate of many other feeds. The Twitter feed is nothing more than the sum of all the user feeds, and, expect maybe for Robert Scoble (kidding!), a user wouldn't update his Twitter feed more often than once every few seconds. And then, we're good to go!
7. Superfeedr now promises to provide notifications to feeds "within 15 minutes, or it's free". What continue to be some of the challenges of trying to deliver instantaneous alerts about content updates?
This 15-minute guarantee comes from the fact that we still have to do some polling in the worst-case scenario, and if we have to do polling, going much less than that is hard to do. So our approach is to decrease the average detection time rather than the maximum. If for 95% of our feeds we can guarantee 1 minute, then who cares about the 5% that are guaranteed to be below 15 minutes?
As long as they're will still be content that isn't pushed anywhere, we will have no way to get it without polling.
Thanks Julien, and congratulations, again! Good luck in all your work! :-)