Quick & Easy ETL from Salesforce to MySQL with Workflow & Heroku

While sometimes unfortunate it is often necessary to have data silos that share data. The Extract, Transform, and Load (ETL) pattern has been around for a long time to address this need and there are tons of solutions out there. If you just need a quick and easy way to copy new & updated records in Salesforce to an external data source, a simple Heroku app and Salesforce Workflow might be the quickest and easiest solution. I’ve put together a sample Node.js application for this: https://github.com/jamesward/salesforce-etl-mysql

Check out a demo:

This sample app uses a Salesforce Workflow that sends created and updated Contact records to an app on Heroku, which inserts or updates those records in a MySQL database. A simple transform JavaScript function makes it easy to customize this app for your own use case. To setup this app for your own uses, check out the instructions in the GitHub repo: https://github.com/jamesward/salesforce-etl-mysql

Let me know how it goes!

Machine Learning on Heroku with PredictionIO

Last week at the TrailheaDX Salesforce Dev Conference we launched the DreamHouse sample application to showcase the Salesforce App Cloud and numerous possible integrations. I built an integration with the open source PredictionIO Machine Learning framework. The use case for ML in DreamHouse is a real estate recommendation engine that learns based on users with similar favorites. Check out a demo and get the source.

For the DreamHouse PredictionIO integration to work I needed to get the PredictionIO service running on Heroku. Since it is a Scala app everything worked great! Here are the steps to get PredictionIO up and running on Heroku.

First you will need a PredictionIO event server and app defined in the event server:

  1. Deploy:
  2. Create an app:

    heroku run -a <APP NAME> console app new <A PIO APP NAME>
  3. List apps:

    heroku run -a <APP NAME> console app list

Check out the source and local dev instructions for the event server.

Now that you have an event server and app, load some event data:

export URL=http://<YOUR HEROKU APP NAME>.herokuapp.com
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"user\", \"entityId\" : \"u$i\" }"; done
for i in {1..50}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"item\", \"entityId\" : \"i$i\", \"properties\" : { \"categories\" : [\"c1\", \"c2\"] } }"; done
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"view\", \"entityType\" : \"user\", \"entityId\" : \"u$i\",  \"targetEntityType\" : \"item\", \"targetEntityId\" : \"i$(( ( RANDOM % 50 )  + 1 ))\" }"; done

Check out the demo data:

http://<YOUR HEROKU APP NAME>.herokuapp.com/events.json?accessKey=<YOUR APP ACCESS KEY>&limit=-1

Now you need an engine that will learn from a set of training data and then be able to make predictions. With PredictionIO you can use any algorithm you want but often SparkML is a great choice. For this simple example I’m just using single-node Spark and Postgres but the underlying data source and ML engine can be anything.

This example is based on PredictionIO’s Recommendation Template so it uses SparkML’s Alternating Least Squares (ALS) algorithm. To deploy it on Heroku follow these steps:

  1. Deploy:
  2. Attach your PredictionIO Event Server’s Postgres:

    heroku addons:attach <YOUR-ADDON-ID> -a <YOUR HEROKU APP NAME>

    Note: You can find out <YOUR-ADDON-ID> by running:

    heroku addons -a <YOUR EVENT SERVER HEROKU APP NAME>

  3. Train the app:

    heroku run -a <YOUR HEROKU APP NAME> train
  4. Restart the app to load the new training data:

    heroku restart -a <YOUR HEROKU APP NAME>
  5. Check the status of your engine:

    http://<YOUR HEROKU APP NAME>.herokuapp.com

Now you can check out the recommendations for an item (must be an item that has events):

curl -H "Content-Type: application/json" -d '{ "items": ["i11"], "num": 4 }' -k http://<YOUR HEROKU APP NAME>.herokuapp.com/queries.json

Check out the source and local dev instructions for this example engine.

Let me know if you have any questions or problems. Happy ML’ing!

Combining Reactive Streams, Heroku Kafka, and Play Framework

Heroku recently announced early access to the new Heroku Kafka service and while I’ve heard great things about Apache Kafka I hadn’t played with it because I’m too lazy to set that kind of stuff up on my own. Now that I can setup a Kafka cluster just by provisioning a Heroku Addon I figured it was time to give it a try.

If you aren’t familiar with Kafka it is kinda a next generation messaging system. It uses pub-sub, scales horizontally, and has built-in message durability and delivery guarantees. Originally Kafka was built at LinkedIn but is now being used by pretty much every progressive enterprise that needs to move massive amounts of data through transformation pipelines.

While learning Kafka I wanted to build something really simple: an event producer that just sends random numbers to a Kafka topic and a event consumer that receives those random numbers and sends them to a browser via a WebSocket. I decided to use Play Framework and the Akka Streams implementation of Reactive Streams.

In Reactive Streams there is the pretty standard “Source” and “Sink” where an event producer is a Source and a consumer is a Sink. A “Flow” is a pairing between a Source and a Sink with an optional transformation. In my example there are two apps, each with a Flow. A worker process will send random numbers to Kafka so its Source will be periodically generated random numbers and its Sink will be Kafka. In the web process the Source is Kafka and the Sink is a WebSocket that will push the random numbers to the browser.

Here is the worker app with some necessary config omitted (check out the full source):

object RandomNumbers extends App {
  val tickSource = Source.tick(Duration.Zero, 500.milliseconds, Unit).map(_ => Random.nextInt().toString)
  kafka.sink("RandomNumbers").map { kafkaSink =>
      .map(new ProducerRecord[String, String]("RandomNumbers", _))

The tickSource is a Source that generates a new random Int every 500 milliseconds. That Source is connected to a Kafka Sink with a Flow that transforms an Int into a ProducerRecord (for Kafka). This uses the Reactive Kafka library which is a Reactive Streams API for working with Kafka.

On the web app side, Play Framework has builtin support for using Reactive Streams with WebSockets so all we need is a controller method that creates a Source from a Kafka topic and hooks that to a WebSocket Flow (full source):

def ws = WebSocket.acceptOrResult[Any, String] { _ =>
  kafka.source(Set("RandomNumbers")) match {
    case Failure(e) =>
      Future.successful(Left(InternalServerError("Could not connect to Kafka")))
    case Success(source) =>
      val flow = Flow.fromSinkAndSource(Sink.ignore, source.map(_.value))

Notice that the Flow has a Sink.ignore which just says to ignore any incoming messages on the WebSocket (those sent from the browser). Play takes care of all the underlying stuff and then whenever Kafka gets a message on the “RandomNumbers” topic, it will be sent out via the WebSocket.

And it all works!

Check out the full source for instructions on how to get this example setup on your machine and on Heroku. Let me know how it goes!

The 6 Minute Cloud/Local Dev Roundtrip with Spring Boot

Great developer experiences allow you go from nothing to something amazing in under ten minutes. So I’m always trying to see how much I can minimize getting started experiences. My latest attempt is to deploy a Spring Boot app on Heroku, download the source to a developer’s machine, setup & run the app locally, make & test changes, and then redeploy those changes — all in under ten minutes (assuming a fast internet connection). Here is that experience in about six minutes:

To try it yourself, start at the hello-springboot GitHub repo. Let me know how it goes!

Salesforce REST APIs – From Zero to Cloud to Local Dev in Minutes

When getting acquainted with new technologies I believe that users shouldn’t have to spend more than 15 minutes getting something simple up and running. I wanted to apply this idea to building an app on the Salesforce REST APIs so I built Quick Force (Node). In about 12 minutes you can deploy a Node.js app on Heroku that uses the Salesforce REST APIs, setup OAuth, then pull the app down to your local machine, make and test changes, and then redeploy those changes. Check out a video walkthrough:

Ok, now give it a try yourself by following the instructions in Quick Force (Node)!

I hope this will be the quickest and easiest way you’ve gotten started with the Salesforce REST APIs. Let me know how it goes!

FYI: This *should* work on Windows but I haven’t tested it there yet. So if you have any problems please let me know.

Dreamforce 2015 Video: Tour of Heroku + Salesforce Integration Methods

This year at Dreamforce I presented a session that walked through a few of the ways to integrate Heroku apps with Salesforce. Here is the session description:

Combining customer-facing apps on Heroku with employee-facing apps on Salesforce enables a whole new generation of connected and intelligent experiences. There are four primary ways to do this integration: Heroku Connect, Canvas, Apex / Process Callouts, and the Salesforce REST APIs. Using code and architectural examples, we’ll walk through these different methods. You will walk away knowing when you should use each and how to use them.

Check out the video recording of the session.

To dive into these methods here are the “Further Learning” resources for each method:

I hope this is helpful. Let me know if you have any questions.

Smoothing the Cloud & Local Roundtrip Developer Experience

Getting started with new technologies is usually a huge pain. Often I stumble around for hours trying to get an app’s toolchain setup correctly. Instructions are usually like:

Things get worse when I lead workshops for hundreds of enterprise developers where many are on Windows machines and not very comfortable with cmd.exe.

Experiencing this pain over-and-over is what led me to create Typesafe Activator as a smooth way to get started with Play Framework, Akka, and Scala. Developers have been thrilled with how easy taking their first step with Activator is but I never finished polishing the experience of the second step: App Deployment.

Over the past few months I’ve been working on a set of tools that make the roundtrip between deployment and local development super smooth with zero-CLI and zero-install. Check out a demo:

Here is a summary of the “from scratch” experience:

  1. Deploy the Click, Deploy, Develop app on the cloud
  2. Download the app’s source
  3. Run gulp from a file explorer to download Node, the app’s dependencies, and Atom and then launch the Node / Express server and the Atom code editor
  4. Open the local app in a browser: http://localhost:5000
  5. Make a change in Atom to the app.js file
  6. Test the changes locally
  7. Login to Heroku via Atom
  8. Deploy the changes via Atom

That is one smooth roundtrip!

For more detailed docs on this flow, checkout the Click, Deploy, Develop project.

Great dev experience starts with the simplest thing that can possibly work and has layered escape hatches to more complexity.

That kind of developer experience (DX) is something I’ve tried to do with this toolchain. It builds on top of tools that can be used directly by advanced users. Underneath the smooth DX is just a normal Node.js / Express app, a Gulp build, and the Atom code editor. Here are the pieces that I’ve built to polish the DX, creating the zero-CLI and zero-install experience:

I hope that others find this useful for helping to give new users a great roundtrip developer experience. Let me know what you think.

Note: Currently gulp-atom-downloader does not support Linux because there isn’t a standalone zip download of Atom for Linux. Hopefully we can get that resolved soon.

Redirecting and Chunking Around Heroku’s 30 Second Request Timeout

In most cases a web request shouldn’t take more than 30 seconds to return a response so it is for good reason that Heroku has a 30 second request timeout. But there are times when things just take a while. There are different methods for dealing with this. Where possible, the best solution is to offload the job from the web request queue and have a background job queue that can be scaled separately. If the requestor needs the result then it can either poll for it or be pushed the value when the background job is complete. Yet there are some cases where this is overkill. For instance, if a web request takes a while but the user interaction must remain blocked (e.g. a modal spinner) until the request is complete, then setting up background jobs for slow requests can be unnecessary.

Lets look at two different methods for handling long (> 30 seconds) web requests on Heroku. On Heroku the request must start returning some data within 30 seconds or the load balancer will give up. One way to deal with this is to continually wait 25ish seconds for the result and then redirect the request to do the same thing again. The other option is to periodically dump empty chunks into the response until the actual response can be returned. Each of these methods has tradeoffs so lets look at each in more detail. I’ll be using Play Framework and Scala for the examples but both of these method could be handled in most frameworks.

Redirect Polling

The Redirect Polling method of dealing with long web requests continuously sends a redirect every 25 seconds until the result is available. Try it out! The downside of this approach is that HTTP clients usually have a maximum number of redirects that they will allow which limits the total amount of time this method can take. The upside is that the actual response status can be based on the result.

Ideally the web framework is Reactive / Non-Blocking so that threads are only used when there is active I/O. In some cases the underlying reason for the long request is another service that is slow. In that case the web request could be fully Reactive, thus preserving resources that would traditionally be wasted in waiting states.

To implement Redirect Polling (Reactively) in Play Framework and Scala I’ll use Akka as a place to run a long job off of the web request thread. The Actor job could be something that is computationally taxing or a long network request. By using Akka Actors I have a simple way to deal with job distribution, failure, and thread pool assignment & management. Here is my very simple Akka Actor job that takes 60 seconds to complete (full source):

class LongJob extends Actor {
  lazy val jobFuture: Future[String] = Promise.timeout("done!", 60.seconds)
  override def receive = {
    case GetJobResult => jobFuture.pipeTo(sender())
case object GetJobResult

When this Actor receives a GetJobResult message, it creates a job that in 60 seconds returns a String using a Scala Future. That String is sent (piped) to the sender of the message.

Here is a web request handler that does the Redirect Polling while waiting for a result from the Actor (full source):

def redir(maybeId: Option[String]) = Action.async {
  val (actorRefFuture, id) = maybeId.fold {
    // no id so create a job
    val id = UUID.randomUUID().toString
    (Future.successful(actorSystem.actorOf(Props[LongJob], id)), id)
  } { id =>
    (actorSystem.actorSelection(s"user/$id").resolveOne(1.second), id)
  actorRefFuture.flatMap { actorRef =>
    actorRef.ask(GetJobResult)(Timeout(25.seconds)).mapTo[String].map { result =>
      // received the result
    } recover {
      // did not receive the result in time so redirect
      case e: TimeoutException => Redirect(routes.Application.redir(Some(id)))
  } recover {
    // did not find the actor specified by the id
    case e: ActorNotFound => InternalServerError("Result no longer available")

This request handler uses an optional query string parameter (id) as the identifier of the job. Here is the logic for the request handler:

  1. If the id is not specified then a new LongJob Actor instance is created using a new id. Otherwise the Actor is resolved based on its id.
  2. If either a new Actor was created or an existing Actor was found, then the Actor is asked for its result and given 25 seconds to return it. Otherwise an error is returned.
  3. If the result is received within the timeout, the result is returned in a 200 response. Otherwise a redirect response is returned that includes the id in the query string.

This is really just automatic polling for a result using redirects. It would be nice if HTTP had some semantics around the HTTP 202 response code for doing this kind of thing.

Empty Chunking

In the Empty Chunking method of allowing a request to take more than 30 seconds the web server sends HTTP/1.1 chunks every few seconds until the actual response is ready. Try it out! The downside of this method is that the HTTP response status code must be returned before the actual request’s result is available. The upside is that a single web request can stay open for as long as it needs. To use this method a web framework needs to support chunked responses and ideally is Reactive / Non-Blocking so that threads are only used when there is active I/O.

This method doesn’t require an Actor like the Redirect Polling method. A Future could be used instead but I wanted to keep the job piece the same for both methods. Here is a web request handler that does the empty chunking (full source):

def chunker = Action {
  val actorRef = actorSystem.actorOf(Props[LongJob])
  val futureResult = actorRef.ask(GetJobResult)(Timeout(2.minutes)).mapTo[String]
  futureResult.onComplete(_ => actorSystem.stop(actorRef)) // stop the actor
  val enumerator = Enumerator.generateM {
    // output spaces until the future is complete
    if (futureResult.isCompleted) Future.successful(None)
    else Promise.timeout(Some(" "), 5.seconds)
  } andThen {
    // return the result

This web request handler does the following when a request comes in:

  1. An instance of the LongJob Actor is created
  2. The Actor instance (actorRef) is asked for the result of the GetJobResult message and given two minutes to receive a result which is mapped to a String.
  3. An onComplete handler stops the Actor instance after a result is received or request has timed out.
  4. An Enumerator is created that outputs spaces every five seconds until the result has been received or timed out, at which time the result is outputted and is done.
  5. A HTTP 200 response is returned that is setup to chunk the output of the Enumerator

That is it! I’ve used this method in a number of places with Scala and Java in Play Framework making them fully Reactive. This logic could be wrapped into something more reusable. Let me know if you need that or need a Java example.

Wrapping Up

As you have seen it is pretty easy to have traditional web requests that take longer than 30 seconds on Heroku. While this is not ideal for background jobs it can be an easy way to deal with situations where it is overkill to implement “queue and push” for long requests. The full source for the Redirect Polling and Empty Chunking methods are on GitHub. Let me know if you have any questions.

Auto-Deploy GitHub Repos to Heroku

My favorite new feature on Heroku is the GitHub Integration which enables auto-deployment of GitHub repos. Whenever a change is made on GitHub the app can be automatically redeployed on Heroku. You can even tell Heroku to wait until the CI tests pass before doing the deployment. I now use this on almost all of my Heroku apps because it allows me to move faster and do less thinking (which I’m fond of).

For apps like jamesward.com I just enable deployment straight to production. But for apps that need a less risky setup I have a full Continuous Delivery pipeline that looks like this:

  1. Push to GitHub
  2. CI Validates the build
  3. Heroku deploys changes to staging
  4. Manual testing / validation of staging
  5. Using Heroku Pipelines, promote staging to production

I’m loving the flexibility and simplicity of this new feature! Check out a quick screencast to see how to setup and use Heroku GitHub auto-deployment:

Notice that none of this required a command line! How cool is that?!?