An Alternative to Required API Keys

2013-07-29

Requiring API keys to JSON services for publicly available data is leading us down the unfortunate path to screen scraping and HTML parsing. The justification for requiring API keys is that abuses of these data APIs are rampant and without the keys there would be no way to deal with bad behavior. Rate limiting is one approach but it is easily gamed. We need a better solution.

Required keys make it much harder for developers to learn new things. I recently created a sample application for Typesafe Activator called Reactive Stocks which fetches publicly available tweets and then does a sentiment analysis on the tweets. The goal of the sample is to quickly teach developers about how to build Reactive applications. If each developer had to go through the process of setting up Twitter API keys just to get the app running, many would give up. Most developers prefer to instantly see something working and then deconstruct it to learn how it works.

Shortly after releasing the first version of the Reactive Stocks sample application, Twitter shut off the publicly available JSON service. After searching for an alternative to tweets as an interesting and changing sample dataset, I found a key-less JSON API for news by Faroo. Shortly after switching to Faroo’s key-less JSON service, they shut it off. So I created a better way to provide tweet data for sample applications which prevents abuse while also teaching developers to think about failure as they use the API.

The Reactive Stocks application now uses a simple JSON service which proxies search requests to Twitter, adding the required key. This means developers learning how to build Reactive apps won’t need to setup Twitter API keys. But exposing this API publicly would likely lead to abuse. The solution that insures the service will only be used for development purposes, without requiring a key, is simply to make it randomly fail and hang.

Services go down and requests hang in real-world usage but this happens infrequently so many developers ignore that it will happen. Causing about twenty percent of requests to fail or hang forces developers to deal with these real situations. Not only does this solution teach developers to do the right thing, it also prevents abuses since a production system will not likely want to have twenty percent of their requests fail. For production usage, API providers could also simply remove this behavior when a key is used.

To implement the randomly failing Twitter search proxy I used Play Framework with Scala. The full code is on GitHub but lets walk through the interesting bits. The app handles requests to /search/tweets by calling the following function:

def tweets(query: String) = {
    FailFast {
      WaitOneMinute {
        Cached(query, 60 * 15) {
          Action {
            Async {
              Logger.info(s"Cache miss for $query")
              try {
                Twitter.bearerToken.flatMap { bearerToken =>
                  Twitter.fetchTweets(bearerToken, query).map { response =>
                    Ok(response.json)
                  }
                }
              } catch {
                case illegalArgumentException: IllegalArgumentException =>
                  Logger.error("Twitter Bearer Token is missing", illegalArgumentException)
                  Future(InternalServerError("Error talking to Twitter"))
              }
            }
          }
        }
      }
    }
  }

The tweets function uses Action Composition in Play to compose a number of different behavior together to handle a request. (Note: These could be collapsed down into a single Action.) The FailFast class simply either returns an InternalServerError about one in every ten times or continues to the next Action:

case class FailFast[A](action: Action[A]) extends Action[A] with Controller {

  def apply(request: Request[A]): Result = {
    // fail about once in every 10 times
    if (Random.nextInt(10) == 0) {
      Logger.info("FailFast")
      InternalServerError
    } else {
      action(request)
    }
  }

  lazy val parser = action.parser
}

The WaitOneMinute class either waits one minute to return a RequestTimeout response about one in every ten times or continues to the next Action:

case class WaitOneMinute[A](action: Action[A]) extends Action[A] with Controller {

  def apply(request: Request[A]): Result = {
    // wait one minute about once in every 10 times
    if (Random.nextInt(10) == 0) {
      Logger.info("WaitOneMinute")
      Async {
        Promise.timeout(RequestTimeout, 1 minute)
      }
    }
    else {
      action(request)
    }
  }

  lazy val parser = action.parser
}

The next step in the chain is to check the cache for a response based on the specified query. If there is a cache miss then the response will automatically be added to the cache. Finally if none of the other Actions returned a response then the Tweets will be fetched and returned (asynchronous and non-blocking).

This example does not require an API key but if an API provider wanted to turn off the failure when a valid key is sent, an authentication Action could easily be added to the chain.

I hope this example inspires API providers to use a new approach for dealing with abuse so that developers can continue to use APIs without resorting to nasty screenscraping techniques to easily get publicly available data. Please let me know what you think!