Extending Fast(er) Downloads

Fast(er) downloads from Google Earth Engine: Tiling

This series was inspired by this article by Noel Gorelick. I will try and apply his article for a more specific use case of downloading satellite imagery for a large region of interest. This article requires an understanding of the limitations of earth engine, particularly ee.Image.getDownloadURL(), because it is aimed at providing a work-around. Today we discuss the first step towards big downloads, tiling the region of interest.

Overview

This whole approach revolves around one function to facilitate parallel downloads, i.e. ee.FeatureCollection.aggregate_array('.geo').getInfo(). This is because, aggregate_array() returns a list, which can be passed on to map() or starmap() (in python) to be called in parallel. So any fast earth-engine downloader will have the following structure:

getTiles() : Returns the geometries of the tiles
getImage() / getImageCollection() : Returns the image/images to be tiled
getChip() : Downloads individual chips. To be called parallely against the list of geometries discussed bove.
stitch() : stitches the tiles to final image

Basic tiling

The aim of this process is to simply take a geometry and split it into smaller tiles. These tiles can later be used in chipping and downloading the satellite imagery.

First, we create a grid over the area of interest, by using ee.Geometry.coveringGrid(). This function takes two arguments, projection and scale. The projection is ofcourse the projection of the region of interest. However, the scale, is 10000, because it is the maximum tile dimension supported by ee.Image.getDownloadURL().

// roi is an instance of ee.Geometry()

var grid = roi.coveringGrid(roi.projection(), 10000)

Map.addLayer(grid)

This results in the following output: basic_tiling

For smaller regions (about 200-1000 sq.km), this solution should be sufficiently quick to download the tiles and re-stitch.

Improved tiling

For larger regions, we can go one step further, by clipping the tiles to the region of interest. This way we only download tiles containing useful information. This comes with a few caveats. Clipping is done using ee.Feature.intersection() which accepts three arguments, right, maxError and proj and returns any ee.Geometry(). However, for the sake of simplicity, we only want polygons. So we apply a filter to the final collection to remove any unwanted geometries.

// Create tiles by clipping the grid to the intersection
var tiles = grid.map(function(cell) {
  return ee.Feature(cell).intersection(ee.Feature(roi), ee.ErrorMargin(1), roi.projection())
})

// Filter tiles based on their geometry. We only want the polygons.
// First set a property 'geo' as the type of the geometry of the tile
tiles = tiles.map(function(tile) {
  return tile.set('geo', tile.geometry().type())
})

// Next, return those tiles with 'geo' = 'Polygon'
tiles_fil = tiles.filter(ee.Filter.eq('geo', 'Polygon'))

Map.addLayer(tiles_fil, {'color': 'blue'})
print('tiles', tiles)
print('filtered tiles', tiles_fil)

This gives the following output: improved_tiling

As you can see, we have clipped the tiles to the required region of interest. Further we have reduced the number of tiles slightly which should provide slight improvement to the downloads. This method delivers satisfactory results for slightly larger regions, i.e. 1,000-10,000 sq.km.

Advanced tiling

In general, the above method is the go-to for big downloads, given enough time. I have tested the tesselation method above for country sized (Ghana) mosaics and it gives me the final image in about ~10 minutes. However, you may be interested in single day pictures of the region of interest. Often times, the swaths from satellite imagery are not enough to cover the large region of interest. In these cases, you can improve the focus of the downloader by providing tiles where the image exists.

Therefore, we convert the raster image to a vector (geometry) and create tiles over them. I found this to be easiest using the ee.Image.reduceToVectors(). We iterate over the ee.FeatureCollection() returned and perform the above tiling algorithm on each ee.Feature().

Alternatively, we can also create a coveringGrid() over the initial area and filter tiles intersecting with the result of ee.Image.reduceToVectors().

Ofcourse, this comes with an overhead per image. The tiling is done dynamically and ee.Image.reduceToVectors() is susceptible to exceeding memory limits. However, for large areas, this significantly reduces the number of tiles created and therefore the download time.

// Preparing the collection
var collection = ee.ImageCollection('COPERNICUS/S1_GRD')
            .filterDate('2019-08-30', '2019-09-15')
            .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
            .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
            .filterBounds(roi)
var coll_list = collection.toList(collection.size())

// This code to mosaic the collection by dates
var unique_dates = coll_list.map(function(image) {
  return ee.Image(image).date().format('YYYY-MM-dd')
}).distinct()

var mosaic_by_date = unique_dates.map(function(date) {
  date = ee.Date(date)
  return collection.filterDate(date, date.advance(1, 'day')).mosaic()
})

// We get the final image
var img = ee.Image(mosaic_by_date.get(2))

// Create the vector

Map.addLayer(img)

var img_abs = img.select('VH').abs()
var img_set = img_abs.and(img_abs)

// This is important to reduce computation over the featureCollection() returned 
// by the reduceToVectors() below

// To see an example, use a sentinel-2 image instead :)

var features = img_set.reduceToVectors({
  scale: 10000,
  geometry: roi,
  bestEffort: true
})

// Tiling is not as straight-forward because of un-packing of ee.List()
var tiles = features.map(function(feature) {
  feature = ee.Feature(feature)
  var grid = feature.geometry().coveringGrid(roi.projection(), 10000)
  var cells = grid.map(function(tile) {
    return ee.Feature(tile.intersection(ee.Feature(roi), ee.ErrorMargin(1)))
  })
  return ee.FeatureCollection(cells.toList(cells.size()))
}).flatten()



// Alternatively, we can also filter grid tiles that intersect with the features

// var grid = roi.coveringGrid(roi.projection(), 10000)

// var tiles = features.map(function(feat) {
//   feat = ee.Feature(feat)
//   return grid.filterBounds(feat.geometry())
// }).flatten()

// You may adapt the above code to clip to the region of interest.

// Again, filtering tiles to remove non-polygons
tiles = tiles.map(function(tile) {
  return tile.set('geo_type', tile.geometry().type())
})
tiles = tiles.filter(ee.Filter.eq('geo_type', 'Polygon'))

Map.addLayer(tiles)

This gives the following ouput:

advanced_tiling

As you can see, the tiling is localized to the raster and results in significantly less tiles generated.

Final thoughts

Any downloader script should focus on the speed of the downloads as a whole. The computation times leading up to the final download (using ee.Image.getDownloadURL()) usually determine and bottleneck the speed of the downloader. With that in mind, consider the following:

Reduce getInfo() calls in the python API to absolute minimum. If they result in connection closed errors, try to simplify the computations. If still failing, encapsulate in retries.
Simplify the incoming region of interest. This has a cascading effect on all the operations in tiling and significantly increases download speed.
Consider downloading small dimension sized tiles (eg. 256x256 or 512x512) to reduce download size.

As an exercise, using fixed dimension tiles, try to determine the best scale of the coveringGrid() :) (and let me know :P)

Summary

In conclusion, we have learned how to create tiles and easily download large images. With very little external tooling we have greatly extended the knowledge imparted by Noel Gorelick in downloading parts of an image in parallel.

The next article in this series will discuss stitching said downloaded tiles using GDAL binaries, or using existing programs like QGIS.