Combining Reactive Streams, Heroku Kafka, and Play Framework

Heroku recently announced early access to the new Heroku Kafka service and while I’ve heard great things about Apache Kafka I hadn’t played with it because I’m too lazy to set that kind of stuff up on my own. Now that I can setup a Kafka cluster just by provisioning a Heroku Addon I figured it was time to give it a try.

If you aren’t familiar with Kafka it is kinda a next generation messaging system. It uses pub-sub, scales horizontally, and has built-in message durability and delivery guarantees. Originally Kafka was built at LinkedIn but is now being used by pretty much every progressive enterprise that needs to move massive amounts of data through transformation pipelines.

While learning Kafka I wanted to build something really simple: an event producer that just sends random numbers to a Kafka topic and a event consumer that receives those random numbers and sends them to a browser via a WebSocket. I decided to use Play Framework and the Akka Streams implementation of Reactive Streams.

In Reactive Streams there is the pretty standard “Source” and “Sink” where an event producer is a Source and a consumer is a Sink. A “Flow” is a pairing between a Source and a Sink with an optional transformation. In my example there are two apps, each with a Flow. A worker process will send random numbers to Kafka so its Source will be periodically generated random numbers and its Sink will be Kafka. In the web process the Source is Kafka and the Sink is a WebSocket that will push the random numbers to the browser.

Here is the worker app with some necessary config omitted (check out the full source):

object RandomNumbers extends App {
 
  val tickSource = Source.tick(Duration.Zero, 500.milliseconds, Unit).map(_ => Random.nextInt().toString)
 
  kafka.sink("RandomNumbers").map { kafkaSink =>
    tickSource
      .map(new ProducerRecord[String, String]("RandomNumbers", _))
      .to(kafkaSink)
      .run()(app.materializer)
  }
 
}

The tickSource is a Source that generates a new random Int every 500 milliseconds. That Source is connected to a Kafka Sink with a Flow that transforms an Int into a ProducerRecord (for Kafka). This uses the Reactive Kafka library which is a Reactive Streams API for working with Kafka.

On the web app side, Play Framework has builtin support for using Reactive Streams with WebSockets so all we need is a controller method that creates a Source from a Kafka topic and hooks that to a WebSocket Flow (full source):

def ws = WebSocket.acceptOrResult[Any, String] { _ =>
  kafka.source(Set("RandomNumbers")) match {
    case Failure(e) =>
      Future.successful(Left(InternalServerError("Could not connect to Kafka")))
    case Success(source) =>
      val flow = Flow.fromSinkAndSource(Sink.ignore, source.map(_.value))
      Future.successful(Right(flow))
  }
}

Notice that the Flow has a Sink.ignore which just says to ignore any incoming messages on the WebSocket (those sent from the browser). Play takes care of all the underlying stuff and then whenever Kafka gets a message on the “RandomNumbers” topic, it will be sent out via the WebSocket.

And it all works!
hello-play-kafka

Check out the full source for instructions on how to get this example setup on your machine and on Heroku. Let me know how it goes!

The 6 Minute Cloud/Local Dev Roundtrip with Spring Boot

Great developer experiences allow you go from nothing to something amazing in under ten minutes. So I’m always trying to see how much I can minimize getting started experiences. My latest attempt is to deploy a Spring Boot app on Heroku, download the source to a developer’s machine, setup & run the app locally, make & test changes, and then redeploy those changes — all in under ten minutes (assuming a fast internet connection). Here is that experience in about six minutes:

To try it yourself, start at the hello-springboot GitHub repo. Let me know how it goes!

Salesforce REST APIs – From Zero to Cloud to Local Dev in Minutes

When getting acquainted with new technologies I believe that users shouldn’t have to spend more than 15 minutes getting something simple up and running. I wanted to apply this idea to building an app on the Salesforce REST APIs so I built Quick Force (Node). In about 12 minutes you can deploy a Node.js app on Heroku that uses the Salesforce REST APIs, setup OAuth, then pull the app down to your local machine, make and test changes, and then redeploy those changes. Check out a video walkthrough:

Ok, now give it a try yourself by following the instructions in Quick Force (Node)!

I hope this will be the quickest and easiest way you’ve gotten started with the Salesforce REST APIs. Let me know how it goes!

FYI: This *should* work on Windows but I haven’t tested it there yet. So if you have any problems please let me know.

Dreamforce 2015 Video: Tour of Heroku + Salesforce Integration Methods

This year at Dreamforce I presented a session that walked through a few of the ways to integrate Heroku apps with Salesforce. Here is the session description:

Combining customer-facing apps on Heroku with employee-facing apps on Salesforce enables a whole new generation of connected and intelligent experiences. There are four primary ways to do this integration: Heroku Connect, Canvas, Apex / Process Callouts, and the Salesforce REST APIs. Using code and architectural examples, we’ll walk through these different methods. You will walk away knowing when you should use each and how to use them.

Check out the video recording of the session.

To dive into these methods here are the “Further Learning” resources for each method:

I hope this is helpful. Let me know if you have any questions.

Smoothing the Cloud & Local Roundtrip Developer Experience

Getting started with new technologies is usually a huge pain. Often I stumble around for hours trying to get an app’s toolchain setup correctly. Instructions are usually like:

Things get worse when I lead workshops for hundreds of enterprise developers where many are on Windows machines and not very comfortable with cmd.exe.

Experiencing this pain over-and-over is what led me to create Typesafe Activator as a smooth way to get started with Play Framework, Akka, and Scala. Developers have been thrilled with how easy taking their first step with Activator is but I never finished polishing the experience of the second step: App Deployment.

Over the past few months I’ve been working on a set of tools that make the roundtrip between deployment and local development super smooth with zero-CLI and zero-install. Check out a demo:

Here is a summary of the “from scratch” experience:

  1. Deploy the Click, Deploy, Develop app on the cloud
  2. Download the app’s source
  3. Run gulp from a file explorer to download Node, the app’s dependencies, and Atom and then launch the Node / Express server and the Atom code editor
  4. Open the local app in a browser: http://localhost:5000
  5. Make a change in Atom to the app.js file
  6. Test the changes locally
  7. Login to Heroku via Atom
  8. Deploy the changes via Atom

That is one smooth roundtrip!

For more detailed docs on this flow, checkout the Click, Deploy, Develop project.

Great dev experience starts with the simplest thing that can possibly work and has layered escape hatches to more complexity.

That kind of developer experience (DX) is something I’ve tried to do with this toolchain. It builds on top of tools that can be used directly by advanced users. Underneath the smooth DX is just a normal Node.js / Express app, a Gulp build, and the Atom code editor. Here are the pieces that I’ve built to polish the DX, creating the zero-CLI and zero-install experience:

I hope that others find this useful for helping to give new users a great roundtrip developer experience. Let me know what you think.

Note: Currently gulp-atom-downloader does not support Linux because there isn’t a standalone zip download of Atom for Linux. Hopefully we can get that resolved soon.

Redirecting and Chunking Around Heroku’s 30 Second Request Timeout

In most cases a web request shouldn’t take more than 30 seconds to return a response so it is for good reason that Heroku has a 30 second request timeout. But there are times when things just take a while. There are different methods for dealing with this. Where possible, the best solution is to offload the job from the web request queue and have a background job queue that can be scaled separately. If the requestor needs the result then it can either poll for it or be pushed the value when the background job is complete. Yet there are some cases where this is overkill. For instance, if a web request takes a while but the user interaction must remain blocked (e.g. a modal spinner) until the request is complete, then setting up background jobs for slow requests can be unnecessary.

Lets look at two different methods for handling long (> 30 seconds) web requests on Heroku. On Heroku the request must start returning some data within 30 seconds or the load balancer will give up. One way to deal with this is to continually wait 25ish seconds for the result and then redirect the request to do the same thing again. The other option is to periodically dump empty chunks into the response until the actual response can be returned. Each of these methods has tradeoffs so lets look at each in more detail. I’ll be using Play Framework and Scala for the examples but both of these method could be handled in most frameworks.

Redirect Polling

The Redirect Polling method of dealing with long web requests continuously sends a redirect every 25 seconds until the result is available. Try it out! The downside of this approach is that HTTP clients usually have a maximum number of redirects that they will allow which limits the total amount of time this method can take. The upside is that the actual response status can be based on the result.

Ideally the web framework is Reactive / Non-Blocking so that threads are only used when there is active I/O. In some cases the underlying reason for the long request is another service that is slow. In that case the web request could be fully Reactive, thus preserving resources that would traditionally be wasted in waiting states.

To implement Redirect Polling (Reactively) in Play Framework and Scala I’ll use Akka as a place to run a long job off of the web request thread. The Actor job could be something that is computationally taxing or a long network request. By using Akka Actors I have a simple way to deal with job distribution, failure, and thread pool assignment & management. Here is my very simple Akka Actor job that takes 60 seconds to complete (full source):

class LongJob extends Actor {
 
  lazy val jobFuture: Future[String] = Promise.timeout("done!", 60.seconds)
 
  override def receive = {
    case GetJobResult => jobFuture.pipeTo(sender())
  }
 
}
 
case object GetJobResult

When this Actor receives a GetJobResult message, it creates a job that in 60 seconds returns a String using a Scala Future. That String is sent (piped) to the sender of the message.

Here is a web request handler that does the Redirect Polling while waiting for a result from the Actor (full source):

def redir(maybeId: Option[String]) = Action.async {
 
  val (actorRefFuture, id) = maybeId.fold {
    // no id so create a job
    val id = UUID.randomUUID().toString
    (Future.successful(actorSystem.actorOf(Props[LongJob], id)), id)
  } { id =>
    (actorSystem.actorSelection(s"user/$id").resolveOne(1.second), id)
  }
 
  actorRefFuture.flatMap { actorRef =>
    actorRef.ask(GetJobResult)(Timeout(25.seconds)).mapTo[String].map { result =>
      // received the result
      actorSystem.stop(actorRef)
      Ok(result)
    } recover {
      // did not receive the result in time so redirect
      case e: TimeoutException => Redirect(routes.Application.redir(Some(id)))
    }
  } recover {
    // did not find the actor specified by the id
    case e: ActorNotFound => InternalServerError("Result no longer available")
  }
 
}

This request handler uses an optional query string parameter (id) as the identifier of the job. Here is the logic for the request handler:

  1. If the id is not specified then a new LongJob Actor instance is created using a new id. Otherwise the Actor is resolved based on its id.
  2. If either a new Actor was created or an existing Actor was found, then the Actor is asked for its result and given 25 seconds to return it. Otherwise an error is returned.
  3. If the result is received within the timeout, the result is returned in a 200 response. Otherwise a redirect response is returned that includes the id in the query string.

This is really just automatic polling for a result using redirects. It would be nice if HTTP had some semantics around the HTTP 202 response code for doing this kind of thing.

Empty Chunking

In the Empty Chunking method of allowing a request to take more than 30 seconds the web server sends HTTP/1.1 chunks every few seconds until the actual response is ready. Try it out! The downside of this method is that the HTTP response status code must be returned before the actual request’s result is available. The upside is that a single web request can stay open for as long as it needs. To use this method a web framework needs to support chunked responses and ideally is Reactive / Non-Blocking so that threads are only used when there is active I/O.

This method doesn’t require an Actor like the Redirect Polling method. A Future could be used instead but I wanted to keep the job piece the same for both methods. Here is a web request handler that does the empty chunking (full source):

def chunker = Action {
  val actorRef = actorSystem.actorOf(Props[LongJob])
  val futureResult = actorRef.ask(GetJobResult)(Timeout(2.minutes)).mapTo[String]
  futureResult.onComplete(_ => actorSystem.stop(actorRef)) // stop the actor
 
  val enumerator = Enumerator.generateM {
    // output spaces until the future is complete
    if (futureResult.isCompleted) Future.successful(None)
    else Promise.timeout(Some(" "), 5.seconds)
  } andThen {
    // return the result
    Enumerator.flatten(futureResult.map(Enumerator(_)))
  }
 
  Ok.chunked(enumerator)
}

This web request handler does the following when a request comes in:

  1. An instance of the LongJob Actor is created
  2. The Actor instance (actorRef) is asked for the result of the GetJobResult message and given two minutes to receive a result which is mapped to a String.
  3. An onComplete handler stops the Actor instance after a result is received or request has timed out.
  4. An Enumerator is created that outputs spaces every five seconds until the result has been received or timed out, at which time the result is outputted and is done.
  5. A HTTP 200 response is returned that is setup to chunk the output of the Enumerator

That is it! I’ve used this method in a number of places with Scala and Java in Play Framework making them fully Reactive. This logic could be wrapped into something more reusable. Let me know if you need that or need a Java example.

Wrapping Up

As you have seen it is pretty easy to have traditional web requests that take longer than 30 seconds on Heroku. While this is not ideal for background jobs it can be an easy way to deal with situations where it is overkill to implement “queue and push” for long requests. The full source for the Redirect Polling and Empty Chunking methods are on GitHub. Let me know if you have any questions.

Auto-Deploy GitHub Repos to Heroku

My favorite new feature on Heroku is the GitHub Integration which enables auto-deployment of GitHub repos. Whenever a change is made on GitHub the app can be automatically redeployed on Heroku. You can even tell Heroku to wait until the CI tests pass before doing the deployment. I now use this on almost all of my Heroku apps because it allows me to move faster and do less thinking (which I’m fond of).

For apps like jamesward.com I just enable deployment straight to production. But for apps that need a less risky setup I have a full Continuous Delivery pipeline that looks like this:

  1. Push to GitHub
  2. CI Validates the build
  3. Heroku deploys changes to staging
  4. Manual testing / validation of staging
  5. Using Heroku Pipelines, promote staging to production

I’m loving the flexibility and simplicity of this new feature! Check out a quick screencast to see how to setup and use Heroku GitHub auto-deployment:

Notice that none of this required a command line! How cool is that?!?

Jekyll on Heroku

Jekyll is simple static content compiler popularized by GitHub Pages. If you use Jekyll in a GitHub repo a static website will automatically be created for you by running Jekyll on your content sources (e.g. Markdown). That works well but there are cases where it is nice to deploy a Jekyll site on Heroku. After trying (and failing) to follow many of the existing blogs about running Jekyll on Heroku, I cornered my coworker Terence Lee and got some help. Turns out it is pretty simple.

Not interested in the details? Skip right to a diff that makes a Jekyll site deployable on Heroku or start from scratch:

Deploy on Heroku

Here are the step by step instructions:

  1. Add a Gemfile in the Jekyll project root containing:

    source 'https://rubygems.org'
    ruby '2.1.2'
    gem 'jekyll'
    gem 'kramdown'
    gem 'rack-jekyll'
    gem 'rake'
    gem 'puma'
  2. Run: bundler install
  3. Create a Procfile telling Heroku how to serve the web site with Puma:

    web: bundle exec puma -t 8:32 -w 3 -p $PORT
  4. Create a Rakefile which tells Heroku’s slug compiler to build the Jekyll site as part of the assets:precompile Rake task:

    namespace :assets do
      task :precompile do
        puts `bundle exec jekyll build`
      end
    end
  5. Add the following lines to the _config.yml file:

    gems: ['kramdown']
    exclude: ['config.ru', 'Gemfile', 'Gemfile.lock', 'vendor', 'Procfile', 'Rakefile']
  6. Add a config.ru file containing:

    require 'rack/jekyll'
    require 'yaml'
    run Rack::Jekyll.new

That is it! When you do the usual git push heroku master deployment, the standard Ruby Buildpack will kick off the Jekyll compiler and when your app runs, Puma will serve the static assets. If you are starting from scratch just clone my jekyll-heroku repo and you should have everything you need.

To run Jekyll locally using the dependencies in the project, run:

bundle exec jekyll serve --watch

Let me know how it goes.

New Adventures with Play, Scala, and Akka at Typesafe

Today I’m heading out on a new adventure at Typesafe, the company behind Play Framework, Scala, and Akka!

The past year and a half at Heroku have been really amazing. Not only have I enjoyed teaching others about Heroku, I’ve enjoyed my own frequent use of Heroku. It says something when a technology switch makes one never want to go back to the way it was done before. This is the experience that I (and many others) have had with Heroku. I can’t imagine going back to managing servers and painful deployments. I’m certainly a Heroku Evangelist for Life, but it’s time for a new adventure.

Typesafe is assembling a platform that is quickly becoming the foundation for next generation software. Working with the Typesafe technologies has been a delight and has opened my eyes to the future of scalable software. To build more scalable applications we need better programming models for asynchronous and concurrent code. Web applications are also transitioning to a modern Client/Server architecture where the Client is executed in the browser via JavaScript and the Server is RESTful JSON services.

The Typesafe Stack brings together a scalable foundation and a modern web framework to form a platform that is perfect for the next generation of software. I’m incredibly excited to continue learning Play, Scala, and Akka while also helping other developers do the same!

Working at Heroku has been a magnificent adventure and I’m sad to see it end. I will certainly continue showing off Heroku and using it myself, especially combined with the Typesafe Stack!