Machine Learning, Hive
講者: Makoto Yui / Research Engineer, Treasure Data, Inc. 時段:10:50~11:30 地點:4F – 國際會議廳 講題:Machine Learning, Hive |
摘要:
This talk will introduce new features of Hivemall, a open-source machine learning library for Apache Hive. Hivemall provides a number of machine learning functionalities across classification, regression, ensemble learning, and feature engineering through UDFs/UDAFs/UDTFs of Hive and is very easy to use as every machine learning step is done within HiveQL. Hivemall is primary designed for Apache Hive but it works on Apache Pig and Apache Spark as well through their UDF compatibility to Apache Hive. Since we introduced the initial version of Hivemall in the Hadoop summit 2014, Hivemall added lots of attractive functionalities such as Apache Spark/Pig support, Factorization Machines, Matrix Factorization, RandomForest, and Gradient Boosting. As a consequence, the project have gotten a lot of attention as seen in 380+ stars and 120+ forks on https://github.com/myui/hivemall. This talk introduce those new functionalities and present how our customers use Hivemall in their data analytics projects. We consider that this talk is particularly interesting and relevant to people already familiar with Hive and working on big data analytics.
講者簡介:
Makoto Yui is a research engineer of a Hadoop-as-a-Service startup, Treasure Data, Inc. He is working on Hivemall, an open source library for scalable machine learning on Apache Hive. He holds a Ph.D degree in computer science from NAIST. Finds his profile on http://myui.github.io/.
- YARN Resource Management Using Machine Learning
- Apache Kylin 架構及案例