Machine Learning, Hive

講者: Makoto Yui / Research Engineer, Treasure Data, Inc.
地點:4F – 國際會議廳
講題:Machine Learning, Hive


This talk will introduce new features of Hivemall, a open-source machine learning library for
Apache Hive. Hivemall provides a number of machine learning functionalities across
classification, regression, ensemble learning, and feature engineering through
UDFs/UDAFs/UDTFs of Hive and is very easy to use as every machine learning step is done within
HiveQL. Hivemall is primary designed for Apache Hive but it works on Apache Pig and Apache
Spark as well through their UDF compatibility to Apache Hive. Since we introduced the initial
version of Hivemall in the Hadoop summit 2014, Hivemall added lots of attractive
functionalities such as Apache Spark/Pig support, Factorization Machines, Matrix Factorization,
RandomForest, and Gradient Boosting. As a consequence, the project have gotten a lot of
attention as seen in 380+ stars and 120+ forks on This talk
introduce those new functionalities and present how our customers use Hivemall in their data
analytics projects. We consider that this talk is particularly interesting and relevant to
people already familiar with Hive and working on big data analytics.


Makoto Yui is a research engineer of a Hadoop-as-a-Service startup, Treasure Data, Inc. He is
working on Hivemall, an open source library for scalable machine learning on Apache Hive. He
holds a Ph.D degree in computer science from NAIST. Finds his profile on
Tagged on: ,