Redirecting and Chunking Around Heroku’s 30 Second Request Timeout

In most cases a web request shouldn’t take more than 30 seconds to return a response so it is for good reason that Heroku has a 30 second request timeout. But there are times when things just take a while. There are different methods for dealing with this. Where possible, the best solution is to offload the job from the web request queue and have a background job queue that can be scaled separately. If the requestor needs the result then it can either poll for it or be pushed the value when the background job is complete. Yet there are some cases where this is overkill. For instance, if a web request takes a while but the user interaction must remain blocked (e.g. a modal spinner) until the request is complete, then setting up background jobs for slow requests can be unnecessary.

Lets look at two different methods for handling long (> 30 seconds) web requests on Heroku. On Heroku the request must start returning some data within 30 seconds or the load balancer will give up. One way to deal with this is to continually wait 25ish seconds for the result and then redirect the request to do the same thing again. The other option is to periodically dump empty chunks into the response until the actual response can be returned. Each of these methods has tradeoffs so lets look at each in more detail. I’ll be using Play Framework and Scala for the examples but both of these method could be handled in most frameworks.

Redirect Polling

The Redirect Polling method of dealing with long web requests continuously sends a redirect every 25 seconds until the result is available. Try it out! The downside of this approach is that HTTP clients usually have a maximum number of redirects that they will allow which limits the total amount of time this method can take. The upside is that the actual response status can be based on the result.

Ideally the web framework is Reactive / Non-Blocking so that threads are only used when there is active I/O. In some cases the underlying reason for the long request is another service that is slow. In that case the web request could be fully Reactive, thus preserving resources that would traditionally be wasted in waiting states.

To implement Redirect Polling (Reactively) in Play Framework and Scala I’ll use Akka as a place to run a long job off of the web request thread. The Actor job could be something that is computationally taxing or a long network request. By using Akka Actors I have a simple way to deal with job distribution, failure, and thread pool assignment & management. Here is my very simple Akka Actor job that takes 60 seconds to complete (full source):

class LongJob extends Actor {

  lazy val jobFuture: Future[String] = Promise.timeout("done!", 60.seconds)

  override def receive = {
    case GetJobResult => jobFuture.pipeTo(sender())
  }

}

case object GetJobResult

When this Actor receives a GetJobResult message, it creates a job that in 60 seconds returns a String using a Scala Future. That String is sent (piped) to the sender of the message.

Here is a web request handler that does the Redirect Polling while waiting for a result from the Actor (full source):

def redir(maybeId: Option[String]) = Action.async {

  val (actorRefFuture, id) = maybeId.fold {
    // no id so create a job
    val id = UUID.randomUUID().toString
    (Future.successful(actorSystem.actorOf(Props[LongJob], id)), id)
  } { id =>
    (actorSystem.actorSelection(s"user/$id").resolveOne(1.second), id)
  }

  actorRefFuture.flatMap { actorRef =>
    actorRef.ask(GetJobResult)(Timeout(25.seconds)).mapTo[String].map { result =>
      // received the result
      actorSystem.stop(actorRef)
      Ok(result)
    } recover {
      // did not receive the result in time so redirect
      case e: TimeoutException => Redirect(routes.Application.redir(Some(id)))
    }
  } recover {
    // did not find the actor specified by the id
    case e: ActorNotFound => InternalServerError("Result no longer available")
  }

}

This request handler uses an optional query string parameter (id) as the identifier of the job. Here is the logic for the request handler:

  1. If the id is not specified then a new LongJob Actor instance is created using a new id. Otherwise the Actor is resolved based on its id.
  2. If either a new Actor was created or an existing Actor was found, then the Actor is asked for its result and given 25 seconds to return it. Otherwise an error is returned.
  3. If the result is received within the timeout, the result is returned in a 200 response. Otherwise a redirect response is returned that includes the id in the query string.

This is really just automatic polling for a result using redirects. It would be nice if HTTP had some semantics around the HTTP 202 response code for doing this kind of thing.

Empty Chunking

In the Empty Chunking method of allowing a request to take more than 30 seconds the web server sends HTTP/1.1 chunks every few seconds until the actual response is ready. Try it out! The downside of this method is that the HTTP response status code must be returned before the actual request’s result is available. The upside is that a single web request can stay open for as long as it needs. To use this method a web framework needs to support chunked responses and ideally is Reactive / Non-Blocking so that threads are only used when there is active I/O.

This method doesn’t require an Actor like the Redirect Polling method. A Future could be used instead but I wanted to keep the job piece the same for both methods. Here is a web request handler that does the empty chunking (full source):

def chunker = Action {
  val actorRef = actorSystem.actorOf(Props[LongJob])
  val futureResult = actorRef.ask(GetJobResult)(Timeout(2.minutes)).mapTo[String]
  futureResult.onComplete(_ => actorSystem.stop(actorRef)) // stop the actor

  val enumerator = Enumerator.generateM {
    // output spaces until the future is complete
    if (futureResult.isCompleted) Future.successful(None)
    else Promise.timeout(Some(" "), 5.seconds)
  } andThen {
    // return the result
    Enumerator.flatten(futureResult.map(Enumerator(_)))
  }

  Ok.chunked(enumerator)
}

This web request handler does the following when a request comes in:

  1. An instance of the LongJob Actor is created
  2. The Actor instance (actorRef) is asked for the result of the GetJobResult message and given two minutes to receive a result which is mapped to a String.
  3. An onComplete handler stops the Actor instance after a result is received or request has timed out.
  4. An Enumerator is created that outputs spaces every five seconds until the result has been received or timed out, at which time the result is outputted and is done.
  5. A HTTP 200 response is returned that is setup to chunk the output of the Enumerator

That is it! I’ve used this method in a number of places with Scala and Java in Play Framework making them fully Reactive. This logic could be wrapped into something more reusable. Let me know if you need that or need a Java example.

Wrapping Up

As you have seen it is pretty easy to have traditional web requests that take longer than 30 seconds on Heroku. While this is not ideal for background jobs it can be an easy way to deal with situations where it is overkill to implement “queue and push” for long requests. The full source for the Redirect Polling and Empty Chunking methods are on GitHub. Let me know if you have any questions.