Review of “Mining of Massive Datasets”

Finally, I finished the reviewed of this excellent book about mining massive datasets. The sheer mass of data on the web is continuosly growing, a lot of new methods, algorithms and tools are emerging in order to deal with this big amount of data but in some cases without providing a formal model to process the information. In this book, authors present a compilation of the most used algorithms (and its formal definition) to build recommendation systems based on data mining techniques.

I strongly recommend the reading of this book because it focuses on data mining of very large amounts of data that does not fit in main memory. Currently this situation can be applied to the management of digital libraries, analysis of social networks, bioinformatics, etc. in which the processing of large datasets is necessary. The main topics can be shown in the next figures but according to authors you will learn the next concepts:

Distributed file systems and map/reduce approaches a a tool for creating parallel algorithms
Methods to estimate and calculate similarity search
Processing of data streams with specializaed algorithms
Technology of existing search engines: page rank, link-spam detection, etc.
Frequent-itemset mining
Algorithms for clustering very large and high-dimensional datasets
Two main applications of these techniques: advertising and recommendation systems

Nevertheless, I miss a section about real time processing of large amounts data instead of streaming techniques.

Published

May 28, 2012

josem.alvarez in Research | May 28, 2012

Review of “Mining of Massive Datasets”

Published

May 28, 2012

Cancel Reply

Write a Comment