Machine Learning on Heroku with PredictionIO

Last week at the TrailheaDX Salesforce Dev Conference we launched the DreamHouse sample application to showcase the Salesforce App Cloud and numerous possible integrations. I built an integration with the open source PredictionIO Machine Learning framework. The use case for ML in DreamHouse is a real estate recommendation engine that learns based on users with similar favorites. Check out a demo and get the source.

For the DreamHouse PredictionIO integration to work I needed to get the PredictionIO service running on Heroku. Since it is a Scala app everything worked great! Here are the steps to get PredictionIO up and running on Heroku.

First you will need a PredictionIO event server and app defined in the event server:

  1. Deploy:
  2. Create an app:

    heroku run -a <APP NAME> console app new <A PIO APP NAME>
  3. List apps:

    heroku run -a <APP NAME> console app list

Check out the source and local dev instructions for the event server.

Now that you have an event server and app, load some event data:

export ACCESS_KEY=<YOUR ACCESS KEY>
export URL=http://<YOUR HEROKU APP NAME>.herokuapp.com
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"user\", \"entityId\" : \"u$i\" }"; done
 
for i in {1..50}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"item\", \"entityId\" : \"i$i\", \"properties\" : { \"categories\" : [\"c1\", \"c2\"] } }"; done
 
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"view\", \"entityType\" : \"user\", \"entityId\" : \"u$i\",  \"targetEntityType\" : \"item\", \"targetEntityId\" : \"i$(( ( RANDOM % 50 )  + 1 ))\" }"; done

Check out the demo data:

http://<YOUR HEROKU APP NAME>.herokuapp.com/events.json?accessKey=<YOUR APP ACCESS KEY>&limit=-1

Now you need an engine that will learn from a set of training data and then be able to make predictions. With PredictionIO you can use any algorithm you want but often SparkML is a great choice. For this simple example I’m just using single-node Spark and Postgres but the underlying data source and ML engine can be anything.

This example is based on PredictionIO’s Recommendation Template so it uses SparkML’s Alternating Least Squares (ALS) algorithm. To deploy it on Heroku follow these steps:

  1. Deploy:
  2. Attach your PredictionIO Event Server’s Postgres:

    heroku addons:attach <YOUR-ADDON-ID> -a <YOUR HEROKU APP NAME>

    Note: You can find out <YOUR-ADDON-ID> by running:

    heroku addons -a <YOUR EVENT SERVER HEROKU APP NAME>

  3. Train the app:

    heroku run -a <YOUR HEROKU APP NAME> train
  4. Restart the app to load the new training data:

    heroku restart -a <YOUR HEROKU APP NAME>
  5. Check the status of your engine:

    http://<YOUR HEROKU APP NAME>.herokuapp.com

Now you can check out the recommendations for an item (must be an item that has events):

curl -H "Content-Type: application/json" -d '{ "items": ["i11"], "num": 4 }' -k http://<YOUR HEROKU APP NAME>.herokuapp.com/queries.json

Check out the source and local dev instructions for this example engine.

Let me know if you have any questions or problems. Happy ML’ing!

  • holografix

    Fantastic work James, I’m trying to get Prediction.io running locally as per their installation instructions and I’m struggling to get the Classification template working – I keep getting an unresolved dependency issue with io itself.
    Any hints on replicating what you’ve done here but instead for the Classification engine? My goal is to have it running on Heroku as well of course.

    • Sorry I missed this comment! It is a bit tricky to fit the out-of-the-box templates into the format I’m using here. I think the PIO team is working on making this easier over time. Until then you should be able to take the actual engine code and merge it with my engine project. If you use IntelliJ it should make that a bit easier. Let me know if I can help!

  • Mehdi

    Hello,
    I tried to follow the steps. in step 1, I am asked to provide config variable required field. I don’t see which values should be set.
    Is it possible to explain how I can found these values.
    Thank you in advance,
    Mehdi

    • Hi Mehdi,

      Sorry for the hassles. There are a lot of pieces that need to fit together with this one and I’m sure my instructions could be better! Is this for the engine app? If so, here are the values you need:
      ACCESS_KEY & APP_NAME = These should have been created in a previous step. They comes from the event server.

      EVENT_SERVER_IP = The server name of the event server, like: foobar.herokuapp.com

      EVENT_SERVER_PORT = Just use “80” for the port since I’m not sure it works with https.

      Let me know if that helps.

      -James

      • Mehdi

        Hi James,

        Thank you for your help, I was able to create a engine server. I reach the step 3 : Train the app, I got an error “[ERROR] [Common$] Invalid app name application-mehdi”
        I tried to change the variable APP_NAME in the engine server but I got the same error:
        “[ERROR] [Common$] Invalid app name eventserver-mehdi”.

        Mehdi

        • Can you do:
          heroku run -a console app list

          That should give you the app name you need to use in the engine.

          • Mehdi

            Hi James,

            Thank you for your feedback. I tried with the name from CLI you indicated but still give the error. I guess I have to redo one more time the all process I probably have missed one step.

            Thank you a lot for your support, great job with PredictionIO

            Mehdi

          • Maybe the problem is actually due to one of the config vars being wrong. If you go to your engine app in https://dashboard.heroku.com you can go to the app settings and check that they are correct.

  • @jlward4th:disqus What Heroku Dynos are you using ? I am trying to experiment which runs fine on my local machine but on heroku the build fails with memory exception .Are you running this on a free dyno or a performance dyno?

    • I think it worked fine on a regular dyno but I had to change the stack size: https://github.com/jamesward/pio-engine-heroku/blob/master/app.json#L23

      That should have been done automatically if you deployed via Heroku Button.

      • Thanks @jlward4th:disqus . I guess I am using one of the standard templates and the engine code that you have and one that comes OOB from the PredictionIO is different .Also the buildpack instructions says it requires perfomance dynos .

        • Ah. Yeah, if you are using the PIO Buildpack then you will need to do things differently.