The necessity of HPC
Cybele is using high performance computing (HPC) to bridge the ever-growing gap between the average user and their needs. With sensors and digital storage means becoming less expensive over time, data acquisition and storage is a typical process taking place in most of the industrial sectors but, in most cases without exploiting this data to the certain extend that could yield profit.
For example, a lot of typical farmers and aquafarmers with fields and aquacultures respectively, hold large volumes of data without having the necessary knowledge and resources to process and extract useful insight from them.
Cybele’s main purpose is to fill that gap by utilizing the potential lying in the Artificial Intelligence field and in particular by integrating machine/deep learning algorithms in end-to-end processes that give answers to certain problems of every industrial sector. To this end technical teams develop and utilize algorithms written in frameworks such as PySpark, ApacheSpark while also maintaning and developing AI workflows in common frameworks such as PyTorch and TesnorFlow which will take advantage of HPC resources through containerization.
The UPRC team in collaboration with BioSense have implemented and deployed an end-to-end workflow regarding soyabeans yield prediction. This workflow consists of three parts; preprocessing, training and evaluation. The preprocessing exploits satellite images through image stacking, field boundaries alignment and yield map interpolation from point measurement. Having preprocessed data in a tabular form ML algorithms’ PySpark versions such as Gradient Boosting Trees Regressor, Decision Trees Regressor, Random Forrest and Linear Regression are trained for the yield prediction task. The last step in this whole pipeline is the evaluation of training, which is fulfilled through a variety of metrics such as R^2, RMSE,MAE to provide both BioSense and UPRC data science teams a more complete image of the model’s performance.