Beyond scaling: real-time event analysis with stream mining
High volume event streams are an important case of big data applications. Dealing with millions of events per day is a huge challenge, in particular for batch-oriented scalability approaches like map-reduce. In this talk, I will discuss an alternative approach based on stream mining algorithms, which have been developed in the mid 2000s in the data mining community, but have to yet make it into the mainstream. Instead of relying on scalability and parallelization alone, stream mining allows you to trade accuracy for resource usage, resulting in robust algorithms with performance guarantees. I will focus on two classes of algorithms, counter based algorithms for identifying so-called heavy hitters, and sketch based algorithms to estimate activities of different event types. While these algorithms seem pretty basic at first, in the last part of the talk, I'll discuss how these algorithms can be used for more advanced analytics, for example, trending, probabilistic modelling and outlier detection, clustering, TF-IDF and related relevancy reweighting measures, and classification.
Schedule info
- Login to post comments