Author: Priti Upadhyay, University of Strathclyde
Satellite imagery is widely used for various earth observation applications especially in the field of agriculture for crop identification, crop yield prediction, etc. [1,2]. Using machine and deep learning approaches for AgriTech applications is a growing sector trend providing reliable information for optimising farm management . This trend has been boosted by the availability of free and open-access Sentinel satellite data provided by European Space Agency (ESA) at high spatial and temporal resolution. However, securing Analysis Ready Data (ARD) for the development of machine learning models requires infrastructure and expertise in handling geospatial data. The processing of satellite images to generate ARD is both computationally and memory intensive. Our satellite pre-processing tool generates ARD with global coverage making use of High Performance Computing (HPC) clusters to handle Big Data efficiently.
Securing cloud-free data from the satellite still remains a challenge. Optical images from the satellites are significantly affected by the cloud cover. The total global cloud cover fraction average is 68%  and for the United Kingdom, the average ranges from 69% to 76% for different months of the year . Processing images obstructed by clouds is highly challenging and a large number of methods, such as crop yield prediction, can be applied accurately only to cloud-free data. This makes both training and inference stages difficult due to the missing data and reduced dataset size. To address this problem, our cloud removal approach, led by Priti Upadhyay at the University of Strathclyde, aims to recover cloud-free from the cloud-affected images by using machine learning and deep learning algorithms. The cloud removal algorithm is applied on the ARD for Sentinel-2 optical imagery. The image below shows a comparison of cloud-affected and cloud-recovered Sentinel-2 images for a region in the United Kingdom.
Machine Learning models can be trained to generate cloud-free images (right) based on observed cloudy samples (left)
The machine learning generated cloud-recovered images can then be used for crop yield prediction, crop identification models and to improve the accuracy of these models.
Priti Upadhyay, Research Assistant, University of Strathclyde
 Matton N, Canto GS, Waldner F, Valero S, Morin D, Inglada J, Arias M, Bontemps S, Koetz B, Defourny P. An Automated Method for Annual Cropland Mapping along the Season for Various Globally-Distributed Agrosystems Using High Spatial and Temporal Resolution Time Series. Remote Sensing. 2015; 7(10):13208-13232. https://doi.org/10.3390/rs71013208
 X.E. Pantazi, D. Moshou, T. Alexandridis, R.L. Whetton, and A.M. Mouazen.Wheat yield prediction using machine learning and advanced sensing techniques.Computers and Electronics in Agriculture, 121:57–65, 2016.
1] Lefteris Benos, Aristotelis C. Tagarakis, Georgios Dolias, Remigio Berruto, Dim-itrios Kateris, and Dionysis Bochtis. Machine learning in agriculture: A compre-hensive updated review.Sensors, 21(11), 2021
 C. J. Stubenrauch, W. B. Rossow, S. Kinne, S. Ackerman, G. Cesana, H. Chep-fer, L. Di Girolamo, B. Getzewich, A. Guignard, A. Heidinger, B. C. Maddux,W. P. Menzel, P. Minnis, C. Pearl, S. Platnick, C. Poulsen, J. Riedi, S. Sun-Mack,A. Walther, D. Winker, S. Zeng, and G. Zhao. Assessment of global cloud datasetsfrom satellites: Project and database initiated by the gewex radiation panel.Bul-letin of the American Meteorological Society, 94(7):1031 – 1049, 2013.
 Global distribution of total cloud cover and cloud type amounts over the land /prepared for: United States Department of Energy Office of Energy Research, Officeof Basic Energy Sciences, Carbon Dioxide Research Division, and National Centerfor Atmospheric Research; Stephen G. Warren [and others].NCAR technical notes; NCAR/TN-273 + STR. National Center for Atmospheric Research, Boulder,Colo, 1986.