About

About
Installing this module
Configuring a collection
- Harmony requests
Module Inputs
Module Outputs
Assumptions
Step Function
Local Development
- MacOS

About

bignbit is a Cumulus module that can be installed as a post-ingest workflow to generate browse imagery via Harmony and then transfer that imagery to GIBS.

In general, the high level steps are:

For each configured variable within the granule being processed, generate browse imagery via Harmony and store it in S3.
Generate a browse image metadata file for GIBS for each image produced by Harmony.
Construct a CNM message for each image that includes the image, metadata, and optional world file
Send the CNM messages to GIBS via SQS.
Wait for GIBS to process the CNM messages and send a success or failure response back to an SNS topic.
Record the result of GIBS processing in S3.

Installing this module

Add a post-ingest workflow to the cumulus ingest workflow. For example:

{
          "StartAt":"BIGChoice",
          "States":{
             "BIGChoice":{
                "Type":"Choice",
                "Choices":[
                   {
                      "And":[
                         {
                            "Variable":"$.meta.collection.meta.workflowChoice.browseimage",
                            "IsPresent":true
                         },
                         {
                            "Variable":"$.meta.collection.meta.workflowChoice.browseimage",
                            "BooleanEquals":true
                         }
                      ],
                      "Next":"QueueGranulesToBIG"
                   }
                ],
                "Default":"BIGSucceed"
             },
             "QueueGranulesToBIG":{
                "Parameters":{
                   "cma":{
                      "event.$":"$",
                      "task_config":{
                         "provider":"{$.meta.provider}",
                         "internalBucket":"{$.meta.buckets.internal.name}",
                         "stackName":"{$.meta.stack}",
                         "granuleIngestWorkflow":"BrowseImageWorkflow",
                         "queueUrl": "${aws_sqs_queue.big_background_job_queue.id}"
                      }
                   }
                },
                "Type":"Task",
                "Resource":"${module.cumulus.queue_granules_task.task_arn}",
                "Retry":[
                   {
                      "ErrorEquals":[
                         "States.ALL"
                      ],
                      "IntervalSeconds":5,
                      "MaxAttempts":3
                   }
                ],
                "Catch":[
                   {
                      "ErrorEquals":[
                         "States.ALL"
                      ],
                      "ResultPath":"$.exception",
                      "Next":"BIGFail"
                   }
                ],
                "Next": "BIGSucceed"
             },
             "BIGFail":{
                "Type":"Fail"
             },
             "BIGSucceed":{
                "Type":"Succeed"
             }
          }

Add a new terraform script to the cumulus-deploy-tf scripts used to deploy cumulus. This script should define the bignbit module and the bignbit step function state machine. See an example in browse_image_workflow.tf.
Configure one or more collections

Configuring a collection

In order to configure a collection for use with bignbit the following must be done:

Add config file to the config_bucket. The file should be named "collection shortname.cfg" and the contents should be JSON
Associate the UMM-C record to the appropriate Harmony service (HyBIG, net2cog, etc...)

The contents of the configuration file should be a valid json object with the following attributes:

Name	Type	Description
sendToHarmony	boolean	true/false if this collection should be processed using Harmony to generate browse images
operaHLSTreatment	boolean	true/false if this collection should have special OPERA_L3_DSWX-HLS processing applied to it (see apply_opera_hls_treatment)
imageFilenameRegex	string	Regular expression used to identify which file in a granule should be used as the image file. Uses first if multiple files match
imgVariables	list(object)	List of JSON objects with at least one attribute called `id` whose value is the name of a variable to generate an image for. `all` can be used in cases where the collection does not have variables or all variables in the collection should have images generated
height	int	Controls the height of the output image from Harmony (see https://github.com/nasa/harmony-browse-image-generator?tab=readme-ov-file#dimensions--scale-sizes)
width	int	Controls the width of the output image from Harmony (see https://github.com/nasa/harmony-browse-image-generator?tab=readme-ov-file#dimensions--scale-sizes)

A few example configurations can be found in the podaac/bignbit-config repository. NOTE: some of the example configurations have other options specified (e.g. variables, latVar, lonVar, etc...) that are no longer supported by this module. The table above are the attributes that are still in use.

Harmony requests

Important

bignbit uses the user owned bucket parameter when making Harmony requests. If an existing bucket is configured for the bignbit_staging_bucket parameter, it must have a bucket policy that allows Harmony write permission and GIBS read permission. If bignbit_staging_bucket is left blank, bignbit will create a new S3 bucket (named svc-${var.app_name}-${var.prefix}-staging) and apply the correct permissions automatically. This bucket will also automatically expire objects older than 30 days.

bignbit uses the harmony-py library to construct the Harmony requests for generating images. Most of the parameters are extracted from the CMA message as a granule is being processed but the width and height parameters can be set via configuration. Each variable configured for imaging will result in a unique call to Harmony.

See bignbit.submit_harmony_job.generate_harmony_request for details on how the Harmony request is constructed.

Module Inputs

This module uses the following input variables:

Name	Type	Description	Default Value
stage	string	Environment used for resource tagging (dev, int, ops, etc...)
prefix	string	Prefix used for resource naming (project name, env name, etc...)
data_buckets	list(string)	List of buckets where data is stored. Lambdas will be given read/write access to these buckets.	[]
config_bucket	string	Bucket where dataset configuration is stored
config_dir	string	Path relative to `config_bucket` where dataset configuration is stored	"big-config"
bignbit_audit_bucket	string	S3 bucket where messages exchanged with GITC will be saved. Typically the cumulus internal bucket
bignbit_audit_path	string	Path relative to `bignbit_audit_bucket` where messages exchanged with GITC will be saved.	"bignbit-cnm-output"
bignbit_staging_bucket	string	S3 bucket where generated images will be saved. Leave blank to use bucket managed by this module.	create new bucket named svc-${var.app_name}-${var.prefix}-staging
harmony_staging_path	string	Path relative to `bignbit_staging_bucket` where harmony results will be saved.	"bignbit-harmony-output"
gibs_region	string	Region where GIBS resources are deployed
gibs_queue_name	string	Name of the GIBS SQS queue where outgoing CNM messages will be sent
gibs_account_id	string	AWS account ID for GIBS
edl_user_ssm	string	Name of SSM parameter containing EDL username for querying CMR
edl_pass_ssm	string	Name of SSM parameter containing EDL password for querying CMR
permissions_boundary_arn	string	Permissions boundary ARN to apply to the roles created by this module. If not provided, no permissions boundary will be applied.
security_group_ids	list(string)
subnet_ids	list(string)
app_name	string		"bignbit"
default_tags	map(string)		{}
lambda_container_image_uri	string		""

Module Outputs

This module supplies the following outputs:

Name	Description	Value
config_bucket_name	Bucket containing dataset configs	var.config_bucket
config_path	Path relative to config bucket where configs reside	var.config_dir
pobit_handle_gitc_response_arn	ARN of the lambda function	aws_lambda_function.handle_gitc_response.arn
pobit_gibs_topic	ARN of SNS topic GIBS replies to	aws_sns_topic.gibs_response_topic.arn
pobit_gibs_queue	ARN of SQS queue GIBS replies are published to	aws_sqs_queue.gibs_response_queue.arn
bignbit_audit_bucket	Name of bucket where messages exchanged with GIBS are stored	var.bignbit_audit_bucket
bignbit_audit_path	Path relative to audit bucket where messages with GIBS are stored	var.bignbit_audit_path
get_dataset_configuration_arn	ARN of the lambda function	aws_lambda_function.get_dataset_configuration.arn
get_granule_umm_json_arn	ARN of the lambda function	aws_lambda_function.get_granule_umm_json.arn
get_collection_concept_id_arn	ARN of the lambda function	aws_lambda_function.get_collection_concept_id.arn
identify_image_file_arn	ARN of the lambda function	aws_lambda_function.identify_image_file.arn
generate_image_metadata_arn	ARN of the lambda function	aws_lambda_function.generate_image_metadata.arn
submit_harmony_job_arn	ARN of the lambda function	aws_lambda_function.submit_harmony_job.arn
submit_harmony_job_function_name	Name of the lambda function	aws_lambda_function.submit_harmony_job.function_name
get_harmony_job_status_arn	ARN of the lambda function	aws_lambda_function.get_harmony_job_status.arn
process_harmony_results_arn	ARN of the lambda function	aws_lambda_function.process_harmony_results.arn
apply_opera_hls_treatment_arn	ARN of the lambda function	aws_lambda_function.apply_opera_hls_treatment.arn
pobit_build_image_sets_arn	ARN of the lambda function	aws_lambda_function.build_image_sets.arn
pobit_send_to_gitc_arn	ARN of the lambda function	aws_lambda_function.send_to_gitc.arn
pobit_save_cnm_message_arn	ARN of the lambda function	aws_lambda_function.save_cnm_message.arn
workflow_definition	Rendered state machine definition	rendered version of state_machine_definition.tpl
bignbit_staging_bucket	Name of bignbit staging bucket	var.bignbit_staging_bucket
harmony_staging_path	Path to harmony requests relative to harmony staging bucket	var.harmony_staging_path
bignbit_lambda_role	Role created by the module applied to lambda functions	aws_iam_role.bignbit_lambda_role

Assumptions

Using ContentBasedDeduplication strategy for GITC input queue

Step Function

Visual representation of the bignbit step function state machine:

Local Development

MacOS

Install miniconda (or conda) and poetry
Run conda env create -f conda-environment.yaml to install GDAL
Activate the bignbit conda environment conda activate bignbit
Install python package and dependencies poetry install
Verify tests pass poetry run pytest tests/

Important

If developing on a darwin_arm64 based mac, running terraform locally may result in an error message during terraform init: "Provider registry.terraform.io/hashicorp/null v2.1.2 does not have a package available for your current platform, darwin_arm64." One workaround for this is to use https://github.com/kreuzwerker/m1-terraform-provider-helper to compile a local arm-based version of the hashicorp/null provider.

This is only necessary as long as cumulus core requires ~>2.1 version of hashicorp/null because v3.x of the provider does have support for darwin_arm64 platforms

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github		.github
bignbit		bignbit
docker		docker
examples		examples
terraform		terraform
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pylintrc		.pylintrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
build-lambda-zip.sh		build-lambda-zip.sh
conda-environment.yaml		conda-environment.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
stepfunctions_graph.png		stepfunctions_graph.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Installing this module

Configuring a collection

Harmony requests

Module Inputs

Module Outputs

Assumptions

Step Function

Local Development

MacOS

About

Releases 70

Packages

Contributors 6

Languages

podaac/bignbit

Folders and files

Latest commit

History

Repository files navigation

About

Installing this module

Configuring a collection

Harmony requests

Module Inputs

Module Outputs

Assumptions

Step Function

Local Development

MacOS

About

Topics

Resources

Stars

Watchers

Forks

Releases 70

Packages 0

Contributors 6

Languages

Packages