Scalable Analysis-Ready-Data Framework for the Sentinel-1 and Sentinel-2 Satellite Platforms

By Chris Davison, University of Strathclyde

Access to Earth observation data has considerable benefits for Agricultural applications, at multiple points along the supply chain. For farmers and producers, it can provide insight over time into crop growth and fertiliser effectiveness, ensuring data is available to achieve higher yield and more efficient, sustainable farming. For governments and cooperatives, it can identify land usage and expected yield patterns over time, over a wide area; this can feed into policy and economic decision making, with knowledge in the type of crops, and expected yield per crop to be available in the future.

The Sentinel missions from the European Space Agency, through their Sentinel-1 and Sentinel-2 satellite platforms, have the ability to provide this information. However, due to their technical nature, it can be difficult to access the specific data required in a consistent format that is analysis-ready format. A user selects the region of interest, which time period they require, and which bands are relevant to their analysis. The data must then be downloaded; several sources are available, however some sources archive the data after a year which requires a ‘wake-up’ period where you may have to wait 24h before the data is available. Then, both the Sentinel-1 and Sentinel-2 data must go through preprocessing and collocation steps to make the output comparable over different dates, and interoperable with each other.

The team at the University of Strathclyde has delivered, as part of the CYBELE project, an open framework for discovering, downloading, and preparing data from these satellite platforms (link). Using the user’s required configuration as input, the framework finds the appropriate data products; then downloads the products, going to an alternate (freely available) source if the product is archived; and performs appropriate preprocessing on each product, before finally generating ‘patches’ based on the user requested size.

Sentinel-1 data requires radiometric calibration, so that data captured on different days is comparable; speckle-filtering is also typically performed–although not required–to remove noise imparted as part of the SAR data acquisition. Finally, terrain correction is applied to account for terrain distortion during image capture. Sentinel-2 contains bands at 10m, 20m, and 60m resolutions, and these are resampled to a common 10m spatial resolution. At this stage, the Sentinel-1 and Sentinel-2 data are clipped to the region of interest, before being collocated to a common coordinate system. Finally, each output is cut up into small patches that are suitable for input into a Machine Learning model.

The size and plethora of satellite images make the retrieval and organisation of data not only complex but also computationally intensive. The Experimental Composition Environment (ECE) component developed during the CYBELE project mitigates the complexity and computation challenges, making it easy to utilise High Performance Computing (HPC) infrastructure to dispatch the download and preparation of satellite data to multiple, powerful machines. Whereas before, a user may run one day at a time on their computing infrastructure, this allows a user to spread a week, a month, or a year of data across multiple powerful machines, significantly reducing the time between requesting data and receiving the desired output. One example of this is a dataset for a single farm: one season (3 months) of data required 30 minutes on an individual PC, and completed in under 3 minutes when using the HPC–approximately 10x faster!

More can be learned about the framework here, and how to use it yourself, here.