Processors
To use stacchip for an existing imagery archive, the indexes need to be created for each scene or STAC item.
Stacchip comes with processors that can be used to collect and index imagery from multiple data sources. This will be extended as the package grows.
Each processor is registered as a command line utility so that it can be scaled easily. Note that these processors are created to work well with AWS Batch, but are not dependent on it and can be used otherwise too.
Sentinel-2¶
The stacchip-sentinel-2
processor CLi command processes Sentinel-2 data. It will process MGRS
tiles from a list of tiles from a layer that can be opened by geopandas.
Each MGRS tile will be processed by the row index in the source file.
For each tile it will process the least cloudy image in each quartal from two random years between 2018 and 2023.
The script uses environment variables to determine all inputs:
- The index of the MGRS tile to be processes from the source file
- The source file for the MGRS tile sample
- A target bucket for writing the assets, stac items, and stacchip index.
An example set of environment variables to run this script is:
export AWS_BATCH_JOB_ARRAY_INDEX=0
export STACCHIP_MGRS_SOURCE=https://clay-mgrs-samples.s3.amazonaws.com/mgrs_sample_v02.fgb
export STACCHIP_BUCKET=clay-v1-data
Landsat¶
The stacchip-landsat
processor CLI command processes Landsat data. It will process a list
of geometries from a layer that can be opened by geopandas. For each
row, it will use the centroid of the geometry to search for landsat
scenes.
For each geometry it will process the least cloudy image in each quartal from two random years between 2018 and 2023. For one year it will collect L1 data, and for the other year L2 data. The platform is either Landsat-8 or Landsat-9, depending on availability and cloud cover.
The script uses environment variables to determine all inputs:
- The index of geometry to be processes from the source file
- The source file for the source sample file
- A target bucket for writing the assets, stac items, and stacchip index.
An example set of environment variables to run this script is:
export AWS_BATCH_JOB_ARRAY_INDEX=0
export STACCHIP_SAMPLE_SOURCE=https://clay-mgrs-samples.s3.amazonaws.com/mgrs_sample_v02.fgb
export STACCHIP_BUCKET=clay-v1-data
NAIP¶
The stacchip-naip
processor CLI
command processes imagery from the National Imagery Program (NAIP).
The sample locations were created using the Natural Earth database as a source. The sample includes all popluated places, protected areas and parks, airports, and ports. In addition, we sampled one random point along each river, and one random location within each lake that is registered in Natural Earth. Finally, we sampled 4000 random points. All data was filtered to be within the CONUS region.
Similar to the other processors, the input variables are provided using env vars.
An example set of environment variables to run this script is:
export AWS_BATCH_JOB_ARRAY_INDEX=0
export STACCHIP_SAMPLE_SOURCE=https://clay-mgrs-samples.s3.amazonaws.com/clay_v1_naip_sample_natural_earth.fgb
export STACCHIP_BUCKET=clay-v1-data
LINZ¶
The stacchip-linz
processor CLI
processes data from the New Zealand high resolution open aerial imagery.
As a sample, we randomly select 50% the scenes, whith a minimum of 10 and a maximum of 2000 scenes for each catalog that was included. We selected the latest imagery for each of the available regions of new zealand. The list of catalogs is in the linz processor file.
We also resample all the imagery to 30cm so that the data is consistent.
Similar to the other processors, the input variables are provided using env vars.
An example set of environment variables to run this script is:
export AWS_BATCH_JOB_ARRAY_INDEX=0
export STACCHIP_BUCKET=clay-v1-data
MODIS¶
The stacchip-modis
processor CLI
processes data from the MODIS archive. The modis scenes are reprojected to
the web mercator projection, and stored in S3. Then the indexer will create
one index table per modis scene. We use 233 modis SIN grid tiles, with 4 random
dates of 4 years of data for each SIN grid tile.
Similar to the other processors, the input variables are provided using env vars.
An example set of environment variables to run this script is:
export AWS_BATCH_JOB_ARRAY_INDEX=0
export STACCHIP_BUCKET=clay-v1-data
Batch processing¶
The following base image can be used for batch processing. Installing the package will include the command line utilities for each processor.
FROM python:3.11
RUN pip install stacchip
Prechip¶
In cases where chips need to be computed in advance, the
stacchip-prechip
cli script
is a helper to create npz files from the chips.