Machine Learning on Heroku with PredictionIO

Last week at the TrailheaDX Salesforce Dev Conference we launched the DreamHouse sample application to showcase the Salesforce App Cloud and numerous possible integrations. I built an integration with the open source PredictionIO Machine Learning framework. The use case for ML in DreamHouse is a real estate recommendation engine that learns based on users with similar favorites. Check out a demo and get the source.

For the DreamHouse PredictionIO integration to work I needed to get the PredictionIO service running on Heroku. Since it is a Scala app everything worked great! Here are the steps to get PredictionIO up and running on Heroku.

First you will need a PredictionIO event server and app defined in the event server:

  1. Deploy:
  2. Create an app:

    heroku run -a <APP NAME> console app new <A PIO APP NAME>
  3. List apps:

    heroku run -a <APP NAME> console app list

Check out the source and local dev instructions for the event server.

Now that you have an event server and app, load some event data:

export ACCESS_KEY=<YOUR ACCESS KEY>
export URL=http://<YOUR HEROKU APP NAME>.herokuapp.com
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"user\", \"entityId\" : \"u$i\" }"; done
 
for i in {1..50}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"item\", \"entityId\" : \"i$i\", \"properties\" : { \"categories\" : [\"c1\", \"c2\"] } }"; done
 
for j in {1..20}; do for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"view\", \"entityType\" : \"user\", \"entityId\" : \"u$i\",  \"targetEntityType\" : \"item\", \"targetEntityId\" : \"i$(( ( RANDOM % 50 )  + 1 ))\" }"; done; done

Check out the demo data:

http://<YOUR HEROKU APP NAME>.herokuapp.com/events.json?accessKey=<YOUR APP ACCESS KEY>&limit=-1

Now you need an engine that will learn from a set of training data and then be able to make predictions. With PredictionIO you can use any algorithm you want but often SparkML is a great choice. For this simple example I’m just using single-node Spark and Postgres but the underlying data source and ML engine can be anything.

This example is based on PredictionIO’s Recommendation Template so it uses SparkML’s Alternating Least Squares (ALS) algorithm. To deploy it on Heroku follow these steps:

  1. Deploy:
  2. Remove the auto-added Heroku Postgres addon:

    heroku addons:destroy heroku-postgresql
  3. Attach your PredictionIO Event Server’s Postgres:

    heroku addons:attach <YOUR-ADDON-ID> -a <YOUR HEROKU APP NAME>

    Note: You can find out <YOUR-ADDON-ID> by running:

    heroku addons -a <YOUR EVENT SERVER HEROKU APP NAME>

  4. Train the app:

    heroku run -a <YOUR HEROKU APP NAME> train
  5. Restart the app to load the new training data:

    heroku restart -a <YOUR HEROKU APP NAME>
  6. Check the status of your engine:

    http://<YOUR HEROKU APP NAME>.herokuapp.com

Now you can check out the recommendations for an item (must be an item that has events):

curl -H "Content-Type: application/json" -d '{ "items": ["i11"], "num": 4 }' -k http://<YOUR HEROKU APP NAME>.herokuapp.com/queries.json

Check out the source and local dev instructions for this example engine.

Let me know if you have any questions or problems. Happy ML’ing!

  • holografix

    Fantastic work James, I’m trying to get Prediction.io running locally as per their installation instructions and I’m struggling to get the Classification template working – I keep getting an unresolved dependency issue with io itself.
    Any hints on replicating what you’ve done here but instead for the Classification engine? My goal is to have it running on Heroku as well of course.

    • Sorry I missed this comment! It is a bit tricky to fit the out-of-the-box templates into the format I’m using here. I think the PIO team is working on making this easier over time. Until then you should be able to take the actual engine code and merge it with my engine project. If you use IntelliJ it should make that a bit easier. Let me know if I can help!

  • Mehdi

    Hello,
    I tried to follow the steps. in step 1, I am asked to provide config variable required field. I don’t see which values should be set.
    Is it possible to explain how I can found these values.
    Thank you in advance,
    Mehdi

    • Hi Mehdi,

      Sorry for the hassles. There are a lot of pieces that need to fit together with this one and I’m sure my instructions could be better! Is this for the engine app? If so, here are the values you need:
      ACCESS_KEY & APP_NAME = These should have been created in a previous step. They comes from the event server.

      EVENT_SERVER_IP = The server name of the event server, like: foobar.herokuapp.com

      EVENT_SERVER_PORT = Just use “80” for the port since I’m not sure it works with https.

      Let me know if that helps.

      -James

      • Mehdi

        Hi James,

        Thank you for your help, I was able to create a engine server. I reach the step 3 : Train the app, I got an error “[ERROR] [Common$] Invalid app name application-mehdi”
        I tried to change the variable APP_NAME in the engine server but I got the same error:
        “[ERROR] [Common$] Invalid app name eventserver-mehdi”.

        Mehdi

        • Can you do:
          heroku run -a console app list

          That should give you the app name you need to use in the engine.

          • Mehdi

            Hi James,

            Thank you for your feedback. I tried with the name from CLI you indicated but still give the error. I guess I have to redo one more time the all process I probably have missed one step.

            Thank you a lot for your support, great job with PredictionIO

            Mehdi

          • Maybe the problem is actually due to one of the config vars being wrong. If you go to your engine app in https://dashboard.heroku.com you can go to the app settings and check that they are correct.

  • @jlward4th:disqus What Heroku Dynos are you using ? I am trying to experiment which runs fine on my local machine but on heroku the build fails with memory exception .Are you running this on a free dyno or a performance dyno?

    • I think it worked fine on a regular dyno but I had to change the stack size: https://github.com/jamesward/pio-engine-heroku/blob/master/app.json#L23

      That should have been done automatically if you deployed via Heroku Button.

      • Thanks @jlward4th:disqus . I guess I am using one of the standard templates and the engine code that you have and one that comes OOB from the PredictionIO is different .Also the buildpack instructions says it requires perfomance dynos .

        • Ah. Yeah, if you are using the PIO Buildpack then you will need to do things differently.

  • pankaj mehra

    Hi James,

    How can I add a database table from Postgree database instead of dummy data, I am using heroku connect with the same Postgree database and will send suggestion to salesforce.

    Do I need to change the code as well?

  • Pravin Choubey

    Hi James,

    When i am running the below command getting the below error
    2nd section step 3:
    command running $ heroku run -a enginebabyengine train

    [NativeCodeLoader] Unable to load native-hadoop library for your platform…
    few more lines in b/w
    then
    [ERROR] [Common$] Invalid app name piobabyapp

    In case of engine deployment tried with specifying the APP_NAME variable with value of event server app name(first section Step 1) then app name(first section step 2). For both case getting below messages on running the train command.

    [Common$] Invalid app name runbabyrun

    [ERROR] [Common$] Invalid app name piobabyapp

    Here runbabyrun is the deployed event server app name and piobabyapp is the app which we create in 2nd step.
    And enginebabyengine is the deployed app engine name on heroku account.
    Platform Using gitbash on windows.

    Please help me out in resolving it.

    • Can you run: heroku run -a YOUR_EVENT_SERVER_HEROKU_APP_NAME console app list
      And verify the app exists with the right name?

      • Pravin Choubey

        Hi James,

        When i ran the mentioned command it refelects the app name.
        heroku run -a YOUR_EVENT_SERVER_HEROKU_APP_NAME console app list
        It gives me the piobabyapp with access key details.

        providing you the summary details which i have gone through during the process.
        Event server section:
        1.Deployed Event server on heroku with app Name:runbabyrun(mentioned in Step 1)
        2.Pio App created name:piobabyapp (mentioned in step 2)

        In second Section Engine Deployment
        1.Deployed the Engine package on heroku with App Name:enginebabyengine

        During Engine deployment set the below value to variables:

        1.ACCESS_KEY=FKtv0UoT1D1qvhPhjmVc1ErPrtrRBNIs3kkgTs-I-DN4RJCus7LLO0QaofMKMJIC
        2.APP_NAME=piobabyapp
        3.DATABASE_URL=postgres://nuowytragbqaye:f…:5432/d79qcc8vdrcg2m

        4.EVENT_SERVER_IP=.runbabyrun.hero…
        5.EVENT_SERVER_PORT=80
        6.HEROKU_POSTGRESQL_TEAL_URL=postgres://zjiuyswhrsoxma:7…:5432/dbi6vfcjk8iht6

        7.JAVA_OPTS=-Xss4m

        Unable to figure out the issue.Please help out.

        • Can you verify that EVENT_SERVER_IP=runbabyrun.herokuapp.com ?

          • Pravin Choubey

            Hi James,

            Yes the EVENT_SERVER_IP = runbabyrun.herokuapp.com
            Tried again after removing everything but still getting same error.
            [Error: 1.Unable to load native-hadoop library for your platform… using builtin-java classes where applicable.
            2. Invalid app name runbabyrun]

            Below are config variables which we are setting on engine deployment.
            * ACCESS_KEY
            PredictionIO Event Server App Access Key =
            Wo9re5A0iLTX-t9LB98-L8ToYob1aWzhFKJKk8Fi6S809gAJ6HrmtbOoi_gDxSOG

            * APP_NAME
            PredictionIO Event Server App Name = runbabyrun(tried both piobabyapp /runbabyrun but error in both cases)

            * EVENT_SERVER_IP
            PredictionIO Event Server Name = runbabyrun.herokuapp.com

            * EVENT_SERVER_PORT
            PredictionIO Event Server Port = 80

          • Can you email me? j.ward@salesforce.com Then we can work through this easier.

          • Pravin Choubey

            thanks,emailed you the the details

          • ken

            Hi, James
            I was encounted just a same problem… So what did you do for Pravin Choubey’s case ? It may help me

          • I think the main issue was that there were two Postgres addons and the wrong one was being used. Feel free to email me if you need some help working through this.

          • ken

            Thank you for your extremely hign speed response, and I resolved my problem. As you said, I used wrong Postgres ( I have three of them ) so I delete all and attached new one.

          • Oh good! I think the problem is that Heroku automatically adds a DB. I need to revise my instructions to deal with this.

  • Pravin Choubey

    Hi James,

    Even tried 2nd times
    Getting error unable to load hadoop library on running the below train command.
    $heroku run -a pure-dusk-49572 train

    Created below apps
    Event server Deployed app Name:sheltered-sierra-30907
    Pio App created name:pioappname
    Engine Deplyed App Name:pure-dusk-49572

    During Engine deployment set the below value to variables:

    1.ACCESS_KEY=FKtv0UoT1D1qvhPhjmVc1ErPrtrRBNIs3kkgTs-I-DN4RJCus7LLO0QaofMKMJIC
    2.APP_NAME=tried with both pioappname/sheltered-sierra-30907
    3.DATABASE_URL=postgres://nuowytragbqaye:f6ecbe547e7fb6a10a04f36e3bbde788039062d11f259776b99acfc72d558fd8@ec2-54-225-119-223.compute-1.amazonaws.com:5432/d79qcc8vdrcg2m

    4.EVENT_SERVER_IP=sheltered-sierra-30907.herokuapp.com
    5.EVENT_SERVER_PORT=80
    6.HEROKU_POSTGRESQL_TEAL_URL=postgres://zjiuyswhrsoxma:71ffdc1681ab4efa408922e9b3c3d71d8f38d04bf4ae0138570296685e8fb32d@ec2-23-23-237-68.compute-1.amazonaws.com:5432/dbi6vfcjk8iht6

    7.JAVA_OPTS=-Xss4m

    Unable to figure out the issue.Please help out.

    Regards
    Pravin

  • Shruthi Mikkilineni

    Hi,
    I have followed all the steps as you specified but i am getting the following error while traing the app

    Caused by: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

    Please help me to solve this issue.

    • Looks like your local Postgres server isn’t running.

      • Shruthi Mikkilineni

        i am running on heroku not on local system

        • Ah. Does the app have a Heroku Postgres Addon added to it?

          • Shruthi Mikkilineni

            I attached pio server’s Postgres Addon to pio’s engine.
            for EVENT_SERVER_IP i have specified app link
            for EVENT_SERVER_PORT i have given 80
            what should i give for APP_NAME.

          • Which app is getting the postgres connection error?

            The APP_NAME is whatever you specified when you created a PIO app in step 2 of the PIO Event Server setup.

          • Shruthi Mikkilineni

            I am getting error when i am training engine app in setp 3 : Train the app

            APP_NAME also i specified correctly.

          • Can you verify in the engine app that a DATABASE_URL env var is set? Also, if the engine app has its own Postgres addon, then you’ll need to remove that, detach the one from the event server, and then re-attach it.

          • I’ve just updated the engine app instructions to include removing the auto-added Postgres addon.

          • Shruthi Mikkilineni

            DATABASE_URL is available for engine app, I detached postgres addon of engine app and added postgres app from server, now i am getting this error while training : viewEvents in PreparedData cannot be empty. Please check if DataSource generates TrainingData and Preprator generates PreparedData correctly.

            and when i am trying to open postgres of my server i am getting this error :
            {“error”:{“id”:”unauthorized”,”message”:”Invalid credentials provided.”}}.

          • Just to verify, the two apps are sharing the same postgres, right?

          • Shruthi Mikkilineni

            Yeah both apps are sharing same postgres.

          • Can you connect to the postgres instance with:
            heroku psql -a YOUR_PIO_HEROKU_APP

          • Shruthi Mikkilineni

            Yes i am able to Connect.

            $heroku psql -a dreamhouse-engine

            WARNING: Installation instructions are at https://cli.heroku.com
            –> Connecting to postgresql-shaped-55449
            psql (10.3)
            WARNING: Console code page (437) differs from Windows code page (1252)
            8-bit characters might not work correctly. See psql reference
            page “Notes for Windows users” for details.
            SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
            Type “help” for help.

            dreamhouse-engine::DATABASE=>

          • That’s good. Maybe the demo data didn’t get into the db. Can you check the event server to make sure it is there: http://.herokuapp.com/events.json?accessKey=&limit=-1

          • Shruthi Mikkilineni

            Hi,

            is this the URL are you trying to tell ?
            https://heroku-app.herokuapp.com/events.json?accessKey=abcdefghijklmnopaccesskey&limit=-1

            If so this is what i got.. when i tried to hit that URL
            {“message”:”Not Found”}

          • Yeah, the comment system mucked with the url. But you’re url looks right. Just make sure it is the event server’s url.

          • Shruthi Mikkilineni

            i am using the Event servers URL itself. Still receiving
            {“message”:”Not Found”}
            error.

            Also,
            We found two processes in the Procfile :
            web: source bin/env.sh && target/universal/stage/bin/dreamhouse-pio -main ServerApp
            train: source bin/env.sh && target/universal/stage/bin/pio-engine-heroku -main TrainApp

            Can you please explain why do we need train here? and is it okay if i dont use it now because my heroku account is a free plan.

          • That Procfile is for the engine app, not the event server. Just the definition of the process doesn’t mean that they are actually run. You have to manually run the train process to use dyno hours.