Data Mining is part of a very broad process, a fundamental part of what is called Knowledge Discovery in Database (KDD). Maybe the term data mining you don’t know, but big data does, right? Both have a relationship, since Big Data is a “powered up” version of data mining. The main difference between them is the scale of data each will work on, a disparity between sampling.
There is a definition of Data Mining that is very good and essential for understanding other fundamental concepts. Let’s go to this definition: “Data Mining is the search for valuable information in large databases. It is a cooperative effort between men and computers. Men design databases, describe problems, and define their goals. Computers check data and look for patterns that match the goals set by men”
This definition is interesting because right off the bat it reads “data mining is the search for valuable information in large databases”. To understand the practical role of Data Mining it is critical that you know the difference between data and information. They may look the same, but they are completely different elements. Added to them we also have to add the definition of knowledge.
In the book Data Mining (GEN LTC / 2015) the three terms are perfectly defined, based on the classic example of the pyramidal hierarchy. The scale is as follows: data at the bottom, information at the center and knowledge at the top.
The data mining process of each of them is as follows, following what is proposed in the book Data Mining is as follows:
• Data: can be interpreted as elementary items, captured and stored by Information Technology resources. They are symbol chains and have no semantics (meaning). Its purpose is to express real world facts in a way that they can be treated in the computational context.
• Information: represent the processed data, with well-defined meanings and contexts. Several information technology resources are used to process data and obtain information.
• Knowledge: At the top of the pyramid is knowledge, which corresponds to a pattern or set of patterns whose formulation can involve and relate data and information.
Whether through pure human factor analysis or techniques that combine technology, such as Data Mining, the goal is to turn data into knowledge. This knowledge is invaluable to companies as it provides an in-depth look at what can drive even more profit and drive approaches that the company can follow. According to PwC’s Global Data and Analytics Survey, 39% of companies identify themselves as highly data driven, with 36% of companies able to better predict next steps than other companies.
It is often said that claiming data is the main asset of humanity in the modern age, but in fact what really matters is the knowledge that can be extracted from that data, a kind of panning. It is based on this knowledge that a company can, for example, predict whether or not you want to buy a particular product.
Several industries can benefit from Data Mining, one of which is marketing. With this data mining, turning data into knowledge, companies are able to adopt increasingly targeted campaigns, increasing return.
The professional qualified to ly this data analysis is the Data Scientist, constantly cited as a profession of the future, a title recognized even by the World Economic Forum, which, based on more than 300 companies surveyed, reached the result that 85% of companies intend to expand their use of Big Data and Analytcis by 2022.