dc.description |
COANDA, Ilie. The Impact of Data Pre-Processing on the Assessment of the Similarity of Trend Functions. In: Competitiveness and Innovation in the Knowledge Economy [online]: 27th International Scientific Conference: Conference Proceeding, September 22-23, 2023. Chişinău: ASEM, 2023, pp. 426-430. ISBN 978-9975-167-39-0 (PDF). |
en_US |
dc.description.abstract |
An approach to the way, the technologies of cleaning, completing, smoothing of large volumes of data to be subjected to analysis is proposed. As a rule, depending on the field and the method of data collection / recording on various supports, they could be classified at least in two categories: precise data (recorded by automated techniques, without any influence of the human factor) and data, with a level of approximation (when collecting / recording, to some extent, at a certain stage of the activity, the "man" (human) participates). If, in the case of the same activity, relatively, many people participate, then, and the quality level of the records will be at a different level of precision than the records performed in an automated way. This work aims to highlight the importance / impact of the influence of the quality of the preliminary processing (smoothing, cleaning, etc.) of the primary data used in the analysis process. In case studies, the object of the research is considered to be a set of time series corresponding to data collected regarding the phenomenon of the spread of an epidemic. The data recording of such a phenomenon fits perfectly in the studied case when the data collection is carried out with the intense participation of the "human", who is characterized by frequent deviations from the regulations prescribed by the situation. Consequently, some data could be fixed with a delay or / and people affected by the disease signal the doctor in a different period of time. Such phenomena can create anomalies in the data structure. In order to highlight the impact of the application of different smoothing methods, the completion of the primary data, the approximating functions for each time series were obtained, having previously been "corrected" by: a) averaging the neighboring data; b) "suspicious" data were excluded. As a result, two sets of approximating functions are obtained (approximating functions can be obtained by involving non-linear regressions). By applying the technologies for evaluating the similarity of the functions, the distance (similarity level) between the functions of each set of approximating functions is calculated. Next, the hierarchical clusters of the sets of approximating functions (two sets of approximating functions) can be obtained. By comparing the hierarchical clusters, the level of impact of the "correction" methodology approach a) and b) can be evaluated. DOI: https://doi.org/10.53486/cike2023.44; UDC: 004.6:61; JEL: C63, I21, I23, I25, I29 |
en_US |