The Impact of Preprocessing Approaches on Neural Network Performance: A Case Study on Evaporation in Adana, a Mediterranean Climate

Authors

  • Okan Mert Katipoglu Department of Civil Engineering, Erzincan Binali Yildirim University, Erzincan 24002, Turkiye
  • Muhammet Ali Pekin 12th Regional Directorate of Meteorology, Turkish State Meteorology Service, 25050, Turkiye
  • Sercan Akil Research Department, Turkish State Meteorology Service, 06120, Turkiye

DOI:

https://doi.org/10.52562/injoes.2023.821

Keywords:

Evaporation, Artificial intelligence, Multilayer Perceptron Neural Network, Preprocessing, Standart scaler, Minimum Maximum scaler

Abstract

The application of artificial intelligence (AI) technologies is quickly expanding in water management. Additionally, the artificial neural network methodology has an advantage over traditional statistical approaches in that it does not need assumptions about the distribution of data and variables. These methods can be used if there is a large enough data collection and criteria relevant to the nature of the problem. Preprocessing data before utilizing it improves the performance of the AI model. Evaporation matters in water management, agriculture processes and soil science. It is critical to ensure proper estimation of evaporation losses for effective water resource planning and management particularly in drought-prone areas such as Adana. Artificial intelligence approaches can be applied successfully in evaporation calculation. In this research, we used the Standard scaler, power transformer, robust scaler quantile transformer (Uniform) and quantile transformer (Normal), and min-max scaler preprocessing techniques to preprocess the multilayer perceptron neural network (MLPNN). We also trained the MLPNN using unprocessed data and compared it to the results of the preprocessed model. In the setup of the model, daily temperature, pressure, wind, sunny hours, and humidity parameters covering the years 2018-2021 were presented as input to the MLPNN model. Consequently, we pinpoint that all preprocessing approaches produce better outcomes than unscaled. Although all models produced statistically high accuracy predictions according to statistical criteria, the MLPNN model established without transformation (test phase: r2: 0.93, NSE : 0.927, SMAPE: 10.77, RMSE: 1.2, MAE: 0.9) exhibited the lowest accuracy. The evaporation prediction model that was developed using the MLPNN-based standard scalar optimization algorithm exhibited the highest level of accuracy  (test phase: r2: 0.978, NSE: 0.977, SMAPE: 5.93, RMSE: 0.68, MAE: 0.48). Power Transformer (test phase: r2: 0.978, NSE: 0.977, SMAPE: 5.81, RMSE: 0.67, MAE: 0.49) showed second-degree promising results. It can be concluded from these results that the estimation of meteorological variables requires the scaling and presentation of data in a uniform structure. Therefore, improving efficiency and productivity in water management or agricultural processes can be enhanced by making more accurate evaporation estimates.

Downloads

Download data is not yet available.

References

Abed, M., Imteaz, M. A., Ahmed, A. N., & Huang, Y. F. (2023). A novel application of transformer neural network (TNN) for estimating pan evaporation rate. Applied Water Science, 13(2), 31. https://doi.org/10.1007/s13201-022-01834-w

Abghari, H., Ahmadi, H., Besharat, S., & Rezaverdinejad, V. (2012). Prediction of daily pan evaporation using wavelet neural networks. Water resources management, 26, 3639-3652. https://doi.org/10.1007/s11269-012-0096-z

Apaydin, H., Feizi, H., Sattari, M. T., Colak, M. S., Shamshirband, S., & Chau, K. W. (2020). Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water, 12(5), 1500. https://doi.org/10.3390/w12051500

Bhatnagar, R. (2018). Machine learning and big data processing: a technological perspective and review. In The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018) (pp. 468-478). Springer International Publishing. https://doi.org/10.1007/978-3-319-74690-6_46

Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society Series B: Statistical Methodology, 26(2), 211-243. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly, A., Holt, B., & Varoquaux, G. (2013). API design for machine learning software: Experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238. https://doi.org/10.48550/ARXIV.1309.0238

Chanal, D., Steiner, N. Y., Chamagne, D., and Pera, M. C. (2021, October). Impact of standardization applied to the diagnosis of LT-PEMFC by Fuzzy C-Means clustering. In 2021 IEEE Vehicle Power and Propulsion Conference (VPPC) (pp. 1-6). IEEE. https://doi.org/10.1109/VPPC53923.2021.9699234

Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623. https://doi.org/10.7717/peerj-cs.623

Chung, D., & Sohn, I. (2023). Neural Network Optimization Based on Complex Network Theory: A Survey. Mathematics, 11(2), 321. https://doi.org/10.3390/math11020321

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303–314. https://doi.org/10.1007/BF02551274

Deepa, B., & Ramesh, K. (2022). Epileptic seizure detection using deep learning through min max scaler normalization. International Journal of Health Sciences, 6(S1), 10981–10996. https://doi.org/10.53730/ijhs.v6nS1.7801

Dongare, A.D., Kharde, R. R., & Kachare, A. D. (2012). Introduction to artificial neural network. International Journal of Engineering and Innovative Technology, 2(1), 189-194.

Ehteram, M., Panahi, F., Ahmed, A. N., Huang, Y. F., Kumar, P., & Elshafie, A. (2022). Predicting evaporation with optimized artificial neural network using multi-objective salp swarm algorithm. Environmental Science and Pollution Research, 1-27. https://doi.org/10.1007/s11356-021-16301-3

Flores, B. E. (1986). A pragmatic view of accuracy measurement in forecasting. Omega, 14(2), 93-98. https://doi.org/10.1016/0305-0483(86)90013-7

Gao, F., & Zhang, B. (2023). Data-aware customization of activation functions reduces neural network error. arXiv preprint arXiv:2301.06635. https://doi.org/10.48550/arXiv.2301.06635

Garrido, M. C., Cadenas, J. M., Bueno-Crespo, A., Martínez-España, R., Giménez, J. G., & Cecilia, J. M. (2022). Evaporation forecasting through interpretable data analysis techniques. Electronics, 11(4), 536. https://doi.org/10.3390/electronics11040536

Ghorbani, M. A., Deo, R. C., Yaseen, Z. M., H. Kashani, M., & Mohammadi, B. (2018). Pan evaporation prediction using a hybrid multilayer perceptron-firefly algorithm (MLP-FFA) model: case study in North Iran. Theoretical and applied climatology, 133, 1119-1131. https://doi.org/10.1007/s00704-017-2244-0

Goodwin, P., & Lawton, R. (1999). On the asymmetry of the symmetric MAPE. International Journal of Forecasting, 15(4), 405-408. https://doi.org/10.1016/S0169-2070(99)00007-2

Gümü?, V., ?im?ek, O., Soydan, N. G., Aköz, M. S., & Yenigün, K. (2016). Adana istasyonunda buharla?man?n farkl? yapay zeka yöntemleri ile tahmini. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 7(2), 309-318.

Ioannou, G., Tagaris, T., & Stafylopatis, A. (2023). AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization. Neural Processing Letters, 55, 1-28. https://doi.org/10.1007/s11063-022-11140-w

Ismail, A. H., Soliman, T. A., Rihan, M., & Dessouky, M. I. (2023). Deep Learning-Based Beamforming for Millimeter-Wave Systems Using Parametric ReLU Activation Function. Wireless Personal Communications, 129(2), 825-836. https://doi.org/10.1007/s11277-022-10157-7

Jain, S., Shukla, S., & Wadhvani, R. (2018). Dynamic selection of normalization techniques using data complexity measures. Expert Systems with Applications, 106, 252-262. https://doi.org/10.1016/j.eswa.2018.04.008

Kisi, O., & Zounemat-Kermani, M. (2014). Comparison of two different adaptive neuro-fuzzy inference systems in modelling daily reference evapotranspiration. Water resources management, 28, 2655-2675. https://doi.org/10.1007/s11269-014-0632-0

Liu, S., Wang, X., Liu, M., & Zhu, J. (2017). Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics, 1(1), 48-56. https://doi.org/10.1016/j.visinf.2017.01.006

Ma, J., Wang, J., Han, Y., Dong, S., Yin, L., & Xiao, Y. (2023). Towards data-driven modeling for complex contact phenomena via self-optimized artificial neural network methodology. Mechanism and Machine Theory, 182, 105223. https://doi.org/10.1016/j.mechmachtheory.2022.105223

Masters, T. (1993). Practical Neural Network Recipes in C++. London: Academic Press, Inc.

Monteith, J. L. (1965). Evaporation and environment. In Symposia of the society for experimental biology (Vol. 19, pp. 205-234). Cambridge University Press (CUP) Cambridge.

Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10(3), 282-290. https://doi.org/10.1016/0022-1694(70)90255-6

Nourani, V., Sayyah-Fard, M., Alami, M. T., & Sharghi, E. (2020). Data pre-processing effect on ANN-based prediction intervals construction of the evaporation process at different climate regions in Iran. Journal of Hydrology, 588, 125078. https://doi.org/10.1016/j.jhydrol.2020.125078

O?uz, K. & Pekin, M. A. (2019). Predictability of Fog Visibility with Artificial Neural Network for Esenboga Airport . Avrupa Bilim ve Teknoloji Dergisi (15), 542-551 . https://doi.org/10.31590/ejosat.452598

O?uz, K. & Pekin, M. A. (2022). Makine Ö?renme Algoritmalar? ile PM10 Konsantrasyon Tahmini. Journal of Advanced Research in Natural and Applied Sciences, 8(2), 201-213. https://doi.org/10.28979/jarnas.981202

Olaiya, F., & Adeyemo, A. B. (2012). Application of Data Mining Techniques in Weather Prediction and Climate Change Studies. International Journal of Information Engineering and Electronic Business, 4(1), 51–59. https://doi.org/10.5815/ijieeb.2012.01.07

Özfidaner, M., ?apolyo, D., Topalo?lu, F., & Baydar, A. (2018). Adana ?linde Buharla?ma Serilerinde Gidi?lerin Yeni Bir Gidi? Analiz Yöntemi ?le Belirlenmesi. Journal of Agricultural Faculty of Gaziosmanpa?a University (JAFAG), 34, 59-66.

Penman, H. L. (1956). Estimating evaporation. Eos, Transactions American Geophysical Union, 37(1), 43-50. https://doi.org/10.1029/TR037i001p00043

Podhoranyi, M. A. (2021). Comprehensive social media data processing and analytics architecture by using big data platforms: a case study of twitter flood-risk messages. Earth Science Informatics, 14, 913–929. https://doi.org/10.1007/s12145-021-00601-w

Raju, V. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A., & Padma, V. (2020). Study the influence of normalization/transformation process on the accuracy of supervised classification. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 729-735). IEEE. https://doi.org/10.1109/ICSSIT48917.2020.9214160

Reddy, K. V. A., Ambati, S. R., Reddy, Y. S. R., & Reddy, A. N. (2021). AdaBoost for Parkinson's Disease Detection using Robust Scaler and SFS from Acoustic Features. In 2021 Smart Technologies, Communication and Robotics (STCR) (pp. 1-6). IEEE. https://doi.org/10.1109/STCR51658.2021.9588906

Roberts, W., Williams, G., Jackson, E., Nelson, E., & Ames, D., (2018). Hydrostats: A Python Package for Characterizing Errors between Observed and Predicted Time Series. Hydrology, 5(4), 66. https://doi.org/10.3390/hydrology5040066

Sar?göl, M., & Katipo?lu, O. M. (2023). Estimation of monthly evaporation values using gradient boosting machines and mode decomposition techniques in the Southeast Anatolia Project (GAP) area in Turkey. Acta Geophysica, 1-18. https://doi.org/10.1007/s11600-023-01067-8

Seo, J., Ma, H., & Saha, T. (2015). Probabilistic wavelet transform for partial discharge measurement of transformer. IEEE Transactions on Dielectrics and Electrical Insulation, 22(2), 1105-1117. https://doi.org/10.1109/TDEI.2015.7076812

Singh, A., Panda, R. K., & Pramanik, N. (2009). Appropriate data normalization range for daily river flow forecasting using an artificial neural network. IAHS-AISH publication, 331, 51-57.

Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524

Thara, D. K., PremaSudha, B. G., & Xiong, F. (2019). Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognition Letters, 128, 544-550. https://doi.org/10.1016/j.patrec.2019.10.029

Wang, S. C. (2003). Interdisciplinary computing in Java programming (Vol. 743). Springer Science & Business Media.

Yeo, I. K., & Johnson, R. A. (2000) A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. https://doi.org/10.1093/biomet/87.4.954

Zhang, H., Zhou, J., Jahed Armaghani, D., Tahir, M., Pham, B., & Huynh, V. (2020). A Combination of Feature Selection and Random Forest Techniques to Solve a Problem Related to Blast-Induced Ground Vibration. Applied Science, 10(3), 869. https://doi.org/10.3390/app10030869

Downloads

Published

2023-12-29

How to Cite

Katipoglu, O. M., Pekin, M. A., & Akil, S. (2023). The Impact of Preprocessing Approaches on Neural Network Performance: A Case Study on Evaporation in Adana, a Mediterranean Climate. Indonesian Journal of Earth Sciences, 3(2), A821. https://doi.org/10.52562/injoes.2023.821