It seems like everyone is an expert in data mining these days. What is the hype around data mining?
The buzz word data mining is notoriously hard to define as it means different things to different people. One of the best definitions I have come across is the following:
Data mining is the process of automatically extracting valid, novel, potentially useful
and ultimately comprehensible information from very large databases.
"It is the art of charming data into a confession."
While the techniques employed are similar as those in the realms of statistical learning and multivariate regression, data mining problems distinguish themselves by
Large datasets, leading to data-in-memory and run-time issues
Data cleansing and preparation often constitutes a large fraction of the analysis
Fuzzy goals: One does not necessarily know what one is looking for
Feature Selection: One frequently has to choose relevant variables or combinations thereof ("features") among a large number of candidates
Non-numeric data: Data frequently consist of text, binary and descriptive non-numeric measurements
We will literally squeeze knowledge out of our clients' data applying customized versions of the most appropriate data mining tools in the field.
|