Machine Learning on Heroku with PredictionIO

Last week at the TrailheaDX Salesforce Dev Conference we launched the DreamHouse sample application to showcase the Salesforce App Cloud and numerous possible integrations. I built an integration with the open source PredictionIO Machine Learning framework. The use case for ML in DreamHouse is a real estate recommendation engine that learns based on users with similar favorites. Check out a demo and get the source.

For the DreamHouse PredictionIO integration to work I needed to get the PredictionIO service running on Heroku. Since it is a Scala app everything worked great! Here are the steps to get PredictionIO up and running on Heroku.

First you will need a PredictionIO event server and app defined in the event server:

  1. Deploy:
  2. Create an app:

    heroku run -a <APP NAME> console app new <A PIO APP NAME>
  3. List apps:

    heroku run -a <APP NAME> console app list

Check out the source and local dev instructions for the event server.

Now that you have an event server and app, load some event data:

export ACCESS_KEY=<YOUR ACCESS KEY>
export URL=http://<YOUR HEROKU APP NAME>.herokuapp.com
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"user\", \"entityId\" : \"u$i\" }"; done
 
for i in {1..50}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"item\", \"entityId\" : \"i$i\", \"properties\" : { \"categories\" : [\"c1\", \"c2\"] } }"; done
 
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"view\", \"entityType\" : \"user\", \"entityId\" : \"u$i\",  \"targetEntityType\" : \"item\", \"targetEntityId\" : \"i$(( ( RANDOM % 50 )  + 1 ))\" }"; done

Check out the demo data:

http://<YOUR HEROKU APP NAME>.herokuapp.com/events.json?accessKey=<YOUR APP ACCESS KEY>&limit=-1

Now you need an engine that will learn from a set of training data and then be able to make predictions. With PredictionIO you can use any algorithm you want but often SparkML is a great choice. For this simple example I’m just using single-node Spark and Postgres but the underlying data source and ML engine can be anything.

This example is based on PredictionIO’s Recommendation Template so it uses SparkML’s Alternating Least Squares (ALS) algorithm. To deploy it on Heroku follow these steps:

  1. Deploy:
  2. Attach your PredictionIO Event Server’s Postgres:

    heroku addons:attach <YOUR-ADDON-ID> -a <YOUR HEROKU APP NAME>

    Note: You can find out <YOUR-ADDON-ID> by running:

    heroku addons -a <YOUR EVENT SERVER HEROKU APP NAME>

  3. Train the app:

    heroku run -a <YOUR HEROKU APP NAME> train
  4. Restart the app to load the new training data:

    heroku restart -a <YOUR HEROKU APP NAME>
  5. Check the status of your engine:

    http://<YOUR HEROKU APP NAME>.herokuapp.com

Now you can check out the recommendations for an item (must be an item that has events):

curl -H "Content-Type: application/json" -d '{ "items": ["i11"], "num": 4 }' -k http://<YOUR HEROKU APP NAME>.herokuapp.com/queries.json

Check out the source and local dev instructions for this example engine.

Let me know if you have any questions or problems. Happy ML’ing!

  • holografix

    Fantastic work James, I’m trying to get Prediction.io running locally as per their installation instructions and I’m struggling to get the Classification template working – I keep getting an unresolved dependency issue with io itself.
    Any hints on replicating what you’ve done here but instead for the Classification engine? My goal is to have it running on Heroku as well of course.

    • Sorry I missed this comment! It is a bit tricky to fit the out-of-the-box templates into the format I’m using here. I think the PIO team is working on making this easier over time. Until then you should be able to take the actual engine code and merge it with my engine project. If you use IntelliJ it should make that a bit easier. Let me know if I can help!

  • Mehdi

    Hello,
    I tried to follow the steps. in step 1, I am asked to provide config variable required field. I don’t see which values should be set.
    Is it possible to explain how I can found these values.
    Thank you in advance,
    Mehdi

    • Hi Mehdi,

      Sorry for the hassles. There are a lot of pieces that need to fit together with this one and I’m sure my instructions could be better! Is this for the engine app? If so, here are the values you need:
      ACCESS_KEY & APP_NAME = These should have been created in a previous step. They comes from the event server.

      EVENT_SERVER_IP = The server name of the event server, like: foobar.herokuapp.com

      EVENT_SERVER_PORT = Just use “80” for the port since I’m not sure it works with https.

      Let me know if that helps.

      -James

      • Mehdi

        Hi James,

        Thank you for your help, I was able to create a engine server. I reach the step 3 : Train the app, I got an error “[ERROR] [Common$] Invalid app name application-mehdi”
        I tried to change the variable APP_NAME in the engine server but I got the same error:
        “[ERROR] [Common$] Invalid app name eventserver-mehdi”.

        Mehdi

        • Can you do:
          heroku run -a console app list

          That should give you the app name you need to use in the engine.

          • Mehdi

            Hi James,

            Thank you for your feedback. I tried with the name from CLI you indicated but still give the error. I guess I have to redo one more time the all process I probably have missed one step.

            Thank you a lot for your support, great job with PredictionIO

            Mehdi

          • Maybe the problem is actually due to one of the config vars being wrong. If you go to your engine app in https://dashboard.heroku.com you can go to the app settings and check that they are correct.

  • @jlward4th:disqus What Heroku Dynos are you using ? I am trying to experiment which runs fine on my local machine but on heroku the build fails with memory exception .Are you running this on a free dyno or a performance dyno?

    • I think it worked fine on a regular dyno but I had to change the stack size: https://github.com/jamesward/pio-engine-heroku/blob/master/app.json#L23

      That should have been done automatically if you deployed via Heroku Button.

      • Thanks @jlward4th:disqus . I guess I am using one of the standard templates and the engine code that you have and one that comes OOB from the PredictionIO is different .Also the buildpack instructions says it requires perfomance dynos .

        • Ah. Yeah, if you are using the PIO Buildpack then you will need to do things differently.

  • pankaj mehra

    Hi James,

    How can I add a database table from Postgree database instead of dummy data, I am using heroku connect with the same Postgree database and will send suggestion to salesforce.

    Do I need to change the code as well?

  • Pravin Choubey

    Hi James,

    When i am running the below command getting the below error
    2nd section step 3:
    command running $ heroku run -a enginebabyengine train

    [NativeCodeLoader] Unable to load native-hadoop library for your platform…
    few more lines in b/w
    then
    [ERROR] [Common$] Invalid app name piobabyapp

    In case of engine deployment tried with specifying the APP_NAME variable with value of event server app name(first section Step 1) then app name(first section step 2). For both case getting below messages on running the train command.

    [Common$] Invalid app name runbabyrun

    [ERROR] [Common$] Invalid app name piobabyapp

    Here runbabyrun is the deployed event server app name and piobabyapp is the app which we create in 2nd step.
    And enginebabyengine is the deployed app engine name on heroku account.
    Platform Using gitbash on windows.

    Please help me out in resolving it.

    • Can you run: heroku run -a YOUR_EVENT_SERVER_HEROKU_APP_NAME console app list
      And verify the app exists with the right name?

      • Pravin Choubey

        Hi James,

        When i ran the mentioned command it refelects the app name.
        heroku run -a YOUR_EVENT_SERVER_HEROKU_APP_NAME console app list
        It gives me the piobabyapp with access key details.

        providing you the summary details which i have gone through during the process.
        Event server section:
        1.Deployed Event server on heroku with app Name:runbabyrun(mentioned in Step 1)
        2.Pio App created name:piobabyapp (mentioned in step 2)

        In second Section Engine Deployment
        1.Deployed the Engine package on heroku with App Name:enginebabyengine

        During Engine deployment set the below value to variables:

        1.ACCESS_KEY=FKtv0UoT1D1qvhPhjmVc1ErPrtrRBNIs3kkgTs-I-DN4RJCus7LLO0QaofMKMJIC
        2.APP_NAME=piobabyapp
        3.DATABASE_URL=postgres://nuowytragbqaye:f…:5432/d79qcc8vdrcg2m

        4.EVENT_SERVER_IP=.runbabyrun.hero…
        5.EVENT_SERVER_PORT=80
        6.HEROKU_POSTGRESQL_TEAL_URL=postgres://zjiuyswhrsoxma:7…:5432/dbi6vfcjk8iht6

        7.JAVA_OPTS=-Xss4m

        Unable to figure out the issue.Please help out.

        • Can you verify that EVENT_SERVER_IP=runbabyrun.herokuapp.com ?

          • Pravin Choubey

            Hi James,

            Yes the EVENT_SERVER_IP = runbabyrun.herokuapp.com
            Tried again after removing everything but still getting same error.
            [Error: 1.Unable to load native-hadoop library for your platform… using builtin-java classes where applicable.
            2. Invalid app name runbabyrun]

            Below are config variables which we are setting on engine deployment.
            * ACCESS_KEY
            PredictionIO Event Server App Access Key =
            Wo9re5A0iLTX-t9LB98-L8ToYob1aWzhFKJKk8Fi6S809gAJ6HrmtbOoi_gDxSOG

            * APP_NAME
            PredictionIO Event Server App Name = runbabyrun(tried both piobabyapp /runbabyrun but error in both cases)

            * EVENT_SERVER_IP
            PredictionIO Event Server Name = runbabyrun.herokuapp.com

            * EVENT_SERVER_PORT
            PredictionIO Event Server Port = 80

          • Can you email me? j.ward@salesforce.com Then we can work through this easier.

          • Pravin Choubey

            thanks,emailed you the the details

  • Pravin Choubey

    Hi James,

    Even tried 2nd times
    Getting error unable to load hadoop library on running the below train command.
    $heroku run -a pure-dusk-49572 train

    Created below apps
    Event server Deployed app Name:sheltered-sierra-30907
    Pio App created name:pioappname
    Engine Deplyed App Name:pure-dusk-49572

    During Engine deployment set the below value to variables:

    1.ACCESS_KEY=FKtv0UoT1D1qvhPhjmVc1ErPrtrRBNIs3kkgTs-I-DN4RJCus7LLO0QaofMKMJIC
    2.APP_NAME=tried with both pioappname/sheltered-sierra-30907
    3.DATABASE_URL=postgres://nuowytragbqaye:f6ecbe547e7fb6a10a04f36e3bbde788039062d11f259776b99acfc72d558fd8@ec2-54-225-119-223.compute-1.amazonaws.com:5432/d79qcc8vdrcg2m

    4.EVENT_SERVER_IP=sheltered-sierra-30907.herokuapp.com
    5.EVENT_SERVER_PORT=80
    6.HEROKU_POSTGRESQL_TEAL_URL=postgres://zjiuyswhrsoxma:71ffdc1681ab4efa408922e9b3c3d71d8f38d04bf4ae0138570296685e8fb32d@ec2-23-23-237-68.compute-1.amazonaws.com:5432/dbi6vfcjk8iht6

    7.JAVA_OPTS=-Xss4m

    Unable to figure out the issue.Please help out.

    Regards
    Pravin