Regional crop monitoring and yield forecasting is heavily based on the use of meteorological data, crop simulation modelling and statistics to forecast the expected crop yield and production in Europe and elsewhere in the World. With new data sources available through satellites observations and meteorological modelling there are opportunities to improve the predictive capabilities and apply the system for new purposes such as field level yield estimates. However, the computational challenges for creating such applications are large. The CYBELE project aims to bring the power of HPCs and machine learning to agriculture to improve existing applications and realize new ones.
The top three levels in the figure below show the current system setup. Meteorological observations and forecasts are gathered in the first level, in the second level they are combined with crop specific data and used as input for the WOFOST crop simulation model. Finally, results from the two top levels are used as input in a statistical module that combines historical data with meteo and crop results to make a yield forecast. Within the current setup we identified 3 areas where CYBELE will make a contribution. Moreover we added a whole new layer where CYBELE will extend the system to go to parcel level.
First of all, we will make the architecture of level 1 and 2 suitable for use with Spark DataFrames. The distributed nature of Apache Spark allows us to easily divide computational tasks such as WOFOST crop simulations across the compute nodes of a HPC, which should drastically reduce the duration needed for processing, and related work such as analysing the large-scale data in historical archives.
Second, the WOFOST model itself is a computationally intensive task and together with CYBELE partners, we are training a machine learning approach that can replace WOFOST with a highly optimized ML variant. Particularly when employing GPU’s this can drastically reduce the processing time.
Third, CYBELE will replace the statistical model (level 3) (based on multiple linear regression) with a machine learning variant. This ML approach will use the meteo and crop products together with regional statistics from EUROSTAT to train an ML model which can be applied for prediction. The ML approach allows to better take into account non-linearity and extreme events.
Finally, we will extend the system with a new level that brings the results (currently at 25x25km grids) down to the parcel level by integrating parcel boundaries and field specific observations from the Sentinel2 satellites. We will use a 2DVAR data assimilation approach to optimize WOFOST predictions at the field level and test the predictive capabilities and applicability of field level results.
We think that through the CYBELE H2020 project we can make large steps in making the European crop yield prediction system ready for the coming decades.