An Alternative to Required API Keys

Requiring API keys to JSON services for publicly available data is leading us down the unfortunate path to screen scraping and HTML parsing. The justification for requiring API keys is that abuses of these data APIs are rampant and without the keys there would be no way to deal with bad behavior. Rate limiting is one approach but it is easily gamed. We need a better solution.

Required keys make it much harder for developers to learn new things. I recently created a sample application for Typesafe Activator called Reactive Stocks which fetches publicly available tweets and then does a sentiment analysis on the tweets. The goal of the sample is to quickly teach developers about how to build Reactive applications. If each developer had to go through the process of setting up Twitter API keys just to get the app running, many would give up. Most developers prefer to instantly see something working and then deconstruct it to learn how it works.

Shortly after releasing the first version of the Reactive Stocks sample application, Twitter shut off the publicly available JSON service. After searching for an alternative to tweets as an interesting and changing sample dataset, I found a key-less JSON API for news by Faroo. Shortly after switching to Faroo’s key-less JSON service, they shut it off. So I created a better way to provide tweet data for sample applications which prevents abuse while also teaching developers to think about failure as they use the API.

The Reactive Stocks application now uses a simple JSON service which proxies search requests to Twitter, adding the required key. This means developers learning how to build Reactive apps won’t need to setup Twitter API keys. But exposing this API publicly would likely lead to abuse. The solution that insures the service will only be used for development purposes, without requiring a key, is simply to make it randomly fail and hang.

Services go down and requests hang in real-world usage but this happens infrequently so many developers ignore that it will happen. Causing about twenty percent of requests to fail or hang forces developers to deal with these real situations. Not only does this solution teach developers to do the right thing, it also prevents abuses since a production system will not likely want to have twenty percent of their requests fail. For production usage, API providers could also simply remove this behavior when a key is used.

To implement the randomly failing Twitter search proxy I used Play Framework with Scala. The full code is on GitHub but lets walk through the interesting bits. The app handles requests to /search/tweets by calling the following function:

  def tweets(query: String) = {
    FailFast {
      WaitOneMinute {
        Cached(query, 60 * 15) {
          Action {
            Async {
              Logger.info(s"Cache miss for $query")
              try {
                Twitter.bearerToken.flatMap { bearerToken =>
                  Twitter.fetchTweets(bearerToken, query).map { response =>
                    Ok(response.json)
                  }
                }
              } catch {
                case illegalArgumentException: IllegalArgumentException =>
                  Logger.error("Twitter Bearer Token is missing", illegalArgumentException)
                  Future(InternalServerError("Error talking to Twitter"))
              }
            }
          }
        }
      }
    }
  }

The tweets function uses Action Composition in Play to compose a number of different behavior together to handle a request. (Note: These could be collapsed down into a single Action.) The FailFast class simply either returns an InternalServerError about one in every ten times or continues to the next Action:

case class FailFast[A](action: Action[A]) extends Action[A] with Controller {
 
  def apply(request: Request[A]): Result = {
    // fail about once in every 10 times
    if (Random.nextInt(10) == 0) {
      Logger.info("FailFast")
      InternalServerError
    } else {
      action(request)
    }
  }
 
  lazy val parser = action.parser
}

The WaitOneMinute class either waits one minute to return a RequestTimeout response about one in every ten times or continues to the next Action:

case class WaitOneMinute[A](action: Action[A]) extends Action[A] with Controller {
 
  def apply(request: Request[A]): Result = {
    // wait one minute about once in every 10 times
    if (Random.nextInt(10) == 0) {
      Logger.info("WaitOneMinute")
      Async {
        Promise.timeout(RequestTimeout, 1 minute)
      }
    }
    else {
      action(request)
    }
  }
 
  lazy val parser = action.parser
}

The next step in the chain is to check the cache for a response based on the specified query. If there is a cache miss then the response will automatically be added to the cache. Finally if none of the other Actions returned a response then the Tweets will be fetched and returned (asynchronous and non-blocking).

This example does not require an API key but if an API provider wanted to turn off the failure when a valid key is sent, an authentication Action could easily be added to the chain.

I hope this example inspires API providers to use a new approach for dealing with abuse so that developers can continue to use APIs without resorting to nasty screenscraping techniques to easily get publicly available data. Please let me know what you think!

This entry was posted in Play Framework, REST. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
  • Pingback: Play as an ESB « Richard Searle's Blog()

  • Suresh

    Nice! Could you please explain more on the issues regards to the Rate limiting approach ?

  • pascal

    If I wanted to abuse your api, I would simply retry the query after a server error, or interrupt and retry a hanged connection. Interesting idea, but this is not really better than doing nothing.

    • http://www.jamesward.com James Ward

      You can do that but your users would suffer from a poor experience. Also my service caches the data, providing yet another incentive not to use this for production and making it easier for me to handle load.

      • Phil

        There is no need for the black hat to wait for your 60 second relatively long timeout. He can himself implement e.g. a 2 second timeout (based on the assumption that Twitter answers > 95% of the requests in that time) including a retry mechanism.

        • http://www.jamesward.com James Ward

          Absolutely! Then the developer has to think about how to deal with timeouts.

  • Chris Warburton

    I agree that developers often ignore error cases, but this is easily worked around by sending multiple requests and ignoring the failures, which just exacerbates the abuse.

    I think a proof-of-work system like HashCash ( http://en.wikipedia.org/wiki/Hashcash ) is a better idea in general. Basically we require API users to solve a computationally-hard problem, like hash inversion, for each request. This makes API users pay a computational cost which is proportional to the cost of providing the service. Note that this is basically how BitCoin works.

    Whilst a system like HashCash works right now, I would rather a more productive use of computational power, eg. hooking it up to the World Community Grid.

    • http://www.jamesward.com James Ward

      That is a great idea! Thanks for sharing.

  • Scott Stamp

    I’m probably going to piss off the guys at 8tracks.com, but their internal API (the one their own site uses) accesses the same URLs as their public API (which requires a key) but with the GET argument ?format=jsonh (the “h” is important, from what I can tell JSONH isn’t a format standard). It’ll return normal JSON and not JSONP, and will not require an API key. Documentation can be found on their website. *whistles innocently*

    Correction, this is JSONH: https://github.com/WebReflection/JSONH – but it’s not what they’re using for these requests. JSONH would break *most* (de)constructors.

    • http://www.jamesward.com James Ward

      Nice! We’ll see how long that lasts…

      • Scott Stamp

        Lol, don’t jinx it! I’m actually using it, they used to kill my keys a lot because my app (not public I should add) was technically violating their licensing contracts. I doubt they’ll change it, not in a meaningful way I couldn’t work around easily. The only thing they could really do is add some sort of key to the request that the website itself generates, so I could just skim that key.

  • http://www.faroo.com/ Wolf Garbe

    Nice idea, but works only in cases with human end users. Most of the search requests we are seeing are from non-real time analytics/data mining tools.
    They would simply ignore if twenty percent of their requests fail, still being able to game the rate limiting and abuse the API, if no API keys are used.

    • http://www.jamesward.com James Ward

      In that case occasionally returning bad data might be a better option.



  • View James Ward's profile on LinkedIn