Scaling by Cheating: approximation, sampling, and fault-friendliness for scalable Big Learning

Track:

store

Speaker(s):

Sean Owen

Data storage and analysis technology has traditionally focused on absolute guarantees of accuracy: transactional correctness, consistency, error correction. Big Data storage paradigms like NoSQL give up some of these for scalability, since some large scale data processing doesn't demand these guarantees. Machine Learning applications differ in that there aren't known correct answers and outputs to begin with. Then what about Big Learning?

It turns out that it's often fine to sample, to approximate, to estimate, randomize, guess, or even accept some data loss in these large-scale learning applications. In fact, it's often essential to scale up, and that can improve results or reduce costs while, strangely, sacrificing correctness in the details.

This talk will survey several representative parts of Apache Mahout where these techniques are deployed successfully, including random projection, sampling, and approximations. The talk will also take an example of tolerating data loss from Myrrix, a recommender product built from Mahout. These examples will be generalized to suggest and inspire applications to other Big Learning projects.

Some basic familiarity with machine learning is likely required to make the most of the discussion. No Mahout knowledge is necessary.

About the speaker:

Sean is a committer and PMC member for Apache Mahout, author of much of its recommender / collaborative filtering implementations, and co-author of Mahout in Action. He is founder of Myrrix (http://myrrix.com) a commercialization of recommender technology and evolved from his work on Mahout.

You are currently visiting an old archive website with limited functionality. If you are looking für the current Berlin Buzzwords Website, please visit https://berlinbuzzwords.de

Scaling by Cheating: approximation, sampling, and fault-friendliness for scalable Big Learning

Gold-Partner

Silver-Partner

Startup-Sponsor

User login