Events 1st Corrosion and Materials Degradation Web Conference

Event submissions

Published

10.3390/CMDWC2021-09991

This submission belongs to the session S1. Mechanism and Predictive/Deterministic Aspects of Corrosion of the event 1st Corrosion and Materials Degradation Web Conference

Published date

08 May, 2021

Citation

Mirna Urquidi-Macdonald, Why data mining?, in Proceedings of 1st Corrosion and Materials Degradation Web Conference, 17 May–19 May 2021, MDPI: Basel, Switzerland, doi: 10.3390/CMDWC2021-09991

Facebook

Twitter

Why data mining?

Mirna Urquidi-Macdonald ¹

1. Pennsylvania State University, USA

Abstract

Data mining (DM) is not a well-understood subject. Data mining is perceived as being a mystical, not mathematically rigorous methodology that often produces results without any validation or verification. Some of that is a correct perception. Indeed, few mathematical theorems demonstrate the rigorous validity of Data Mining. We need to interest more mathematicians in this field—a field that was born from Biology and not from Mathematics. Then, why use it if it is not mathematically fully validated? Simply, it is the best strategy we have knowledge and data are sparse, incomplete, or imprecise. We do not use this technique in a well-defined and well-understood problem where models have been proven and validated. Instead, we use DM when those mechanistic/deterministic models fail, knowledge is insufficient, but data is abundant. Data Mining comprises several mathematical techniques. They fulfill different needs: Expert System is a reasoning-based technique that is useful when knowledge is available and rules on a system’s behavior can be derived. One part of an Expert System is a technique now known as Block-Chain (originally as Black-Board). Information is shared securely. A sharing partner can acquire someone else’s information as long they add information to the blackboard. This technique was popularized by the Bitcoin industry and is now being rapidly accepted by Banks. Expert Systems are used mainly when knowledge is available. Additional information can be obtained by forming knowledge trees that are concatenated. Artificial Neural Networks (ANN) are techniques that can be used when only data instead of knowledge is available. The approach "mimics" the brain's "Pattern-Recognition" capabilities. It is mainly used to extract knowledge from data. There are two main types of ANN: Supervised and unsupervised learning. As an example of Supervise d learning are those ANN tools used to discover relationships between the dependent (output) and independent (input) variables. The non-supervised ANN are frequently used as excellent tools for Data Compression and for estimating the most probable values in incomplete data. Those main techniques get help from Fuzzy Logic, which enables us to deal with the data's uncertainties or with incomplete knowledge that is available. Search Techniques, such as Genetic algorithms (an exciting and fun option instead of Optimization techniques). Traditional Optimization techniques and Statistics are commonly used too as part of Data Mining. In this paper, I will address each technique's fundamentals, discuss the validity of the results, and guide the reader at the best choices for a given problem type. We will close the paper with examples on using the techniques for alloy design and stress corrosion cracking in Nuclear reactors, time permitting.

Keywords

Data mining

artificial neural networks

fuzzy logic

alloy development

stress corrosion cracking.

Spatially Resolved Local Electrochemistry Visualizes the Interface of bioabsorbable Metals

Corrosion behavior of additively manufactured stainless steel alloys obtained by laser metal deposition