Introduction - DASE architecture

Van-Duyet Le / [email protected]

Bigdata and machine learning

Building machine learning an application from scratch is hard.

Available Tools

Processing Framework

  • e.g. Apache Spark, Apache Hadoop

Algorithms Libraries

  • e.g. MLib, Mahout

Data Storage

  • e.g. HBase, Cassandra

Integrate everything together nicely and move from prototyping to production?

  • collect data
  • train your algorithm
  • build a layer to serve the prediction results
  • manage the different algorithms
  • evaluations
  • deploy your application in production

Build and Deploy
Machine Intelligence

in a fraction of the time.


  • PredictionIO is a machine learning server for building and deploying predictive engines on production in a fraction of the time
  • Built on Apache Spark, MLlib and HBase.
  1. Open-source Machine Learning server.
  2. Core - engine deployment platform built on top of Apache Spark.
  3. Event Server built on top of Apache HBase.
  4. DASE architecture of engine is the "MVC for Machine Learning".
  5. Swap and evaluate algorithms as you wish.
  6. Zero downtime training and deployment.
  7. Engine template.
$ pio template get
$ cd MyEngine
$ pio build; pio train; pio deploy

Start with an Engine Template

Data Source                - readTraining()
Data Preparator            - prepare()
Algorithm:                 - train() 
                           - predict()
Serving                    - serve()
Evaluator                  - evaluate()

Customize Code with Software Design Pattern

  • DASE - the "MVC" for Machine Learning
  • Built-in support to Spark MLlib, or create your own algorithm
cli = predictionio.EventClient("")
# predict top preferences
eng = predictionio.EngineClient("")
rec = eng.send_query({"uid":"John","n":5})

Connect to your App with SDKs


  • [D] Data Source and Data Preparator
  • [A] Algorithm
  • [S] Serving
  • [E] Evaluation Metrics

PredictionIO helps you modularize these components so you can build

Training a Model - The DASE View

Respond to Prediction Query - The DASE View

Query (Input) :

$ curl -H "Content-Type: application/json" -d
'{ "user": "1", "num": 4 }'

Predicted Result (Output):



$ bash -c "$(curl -s"


Start the Event Server

$ pio eventserver

Deploy an Engine

$ pio build; pio train; pio deploy 

Update Engine Model with New Data

$ pio train; pio deploy