Please login first
Why data mining?
1  Pennsylvania State University

Abstract:

Data mining (DM) is not a well-understood subject. Data mining is perceived as being a mystical, not mathematically rigorous methodology that often produces results without any validation or verification. Some of that is a correct perception. Indeed, few mathematical theorems demonstrate the rigorous validity of Data Mining. We need to interest more mathematicians in this field—a field that was born from Biology and not from Mathematics. Then, why use it if it is not mathematically fully validated? Simply, it is the best strategy we have knowledge and data are sparse, incomplete, or imprecise. We do not use this technique in a well-defined and well-understood problem where models have been proven and validated. Instead, we use DM when those mechanistic/deterministic models fail, knowledge is insufficient, but data is abundant. Data Mining comprises several mathematical techniques. They fulfill different needs: Expert System is a reasoning-based technique that is useful when knowledge is available and rules on a system’s behavior can be derived. One part of an Expert System is a technique now known as Block-Chain (originally as Black-Board). Information is shared securely. A sharing partner can acquire someone else’s information as long they add information to the blackboard. This technique was popularized by the Bitcoin industry and is now being rapidly accepted by Banks. Expert Systems are used mainly when knowledge is available. Additional information can be obtained by forming knowledge trees that are concatenated. Artificial Neural Networks (ANN) are techniques that can be used when only data instead of knowledge is available. The approach "mimics" the brain's "Pattern-Recognition" capabilities. It is mainly used to extract knowledge from data. There are two main types of ANN: Supervised and unsupervised learning. As an example of Supervise d learning are those ANN tools used to discover relationships between the dependent (output) and independent (input) variables. The non-supervised ANN are frequently used as excellent tools for Data Compression and for estimating the most probable values in incomplete data. Those main techniques get help from Fuzzy Logic, which enables us to deal with the data's uncertainties or with incomplete knowledge that is available. Search Techniques, such as Genetic algorithms (an exciting and fun option instead of Optimization techniques). Traditional Optimization techniques and Statistics are commonly used too as part of Data Mining. In this paper, I will address each technique's fundamentals, discuss the validity of the results, and guide the reader at the best choices for a given problem type. We will close the paper with examples on using the techniques for alloy design and stress corrosion cracking in Nuclear reactors, time permitting.

Keywords: Data mining; artificial neural networks, fuzzy logic, alloy development, stress corrosion cracking.

 
 
Top