Data upload

The information on this page is primarily aimed at drone pilots wishing to add mission data to the SeaBee platform. For a more general overview of SeaBee data storage, see here.

1 Overview

There are two main use cases for pilots adding data to the SeaBee platform:

  1. Upload raw images and associated files (ground control points etc.) and then perform all subsequent processing - such as orthomosaicing - on the platform itself.

  2. Upload partially processed data. For example by performing some of the initial steps on a local desktop machine first.

Option 1 should be preferred where possible as it ensures a consistent and traceable processing pipeline for the entire workflow, and in most cases it should be faster and easier from a pilot’s perspective too. The main reason for choosing Option 2 is if pilots want to use commercial software (such as Pix4D) instead of Open Drone Map (ODM) for orthorectification. In this case, pilots must have a separate Pix4D licence to create the orthophoto(s), then upload the finished mosaics together with supporting metadata.

2 Data structure

2.1 Flight folder naming

The data from each flight should be gathered into a single folder following the subfolder structure described below. The name of the flight folder itself can be anything you choose, but it must be unique and, from a user perspective, it helps to include standard information such as location, date etc. The SeaBee platform does not enforce strict requirements for naming flight folders, because we recognise that different pilots and organisations have their own conventions. However, a recommended approach is to use grouping_area_yyyymmddHHMM or grouping_area_yyyymmddHHMM_organisation_spectrum-type_elevation. Note the use of underscores (_) to separate each of the main components, while hyphens (-) or CamelCase can be used to further subdivide each part, if necessary. For example: multipart-group_area-part-1_yyyymmddHHMM or MultipartGroup_AreaPart1_yyyymmddHHMM.

Important

The name of the flight folder is not used by any of the automated processing routines on the SeaBee platform. As long as the subfolders are arranged as described below and the details are correct in config.seabee.yaml, the mission will be processed regardless of the flight folder name. The name of the flight folder only matters for people browsing data manually (e.g. using MinIO). It is therefore important that names are human-readable.

Using the recommended structure, the components of the folder name have the following meaning:

  • grouping is any general identifier linking data from this flight with data from other related flights. For example: the name of a broad region where several flights have taken place (e.g. Runde); the name of a project (e.g. Kelpmap); or the name of a fieldwork team (e.g. Team1, or Team1Day1).

  • area is the name of the location (e.g. the name of an island, or a specific stretch of coastline).

  • yyyymmddHHMM is the flight start date and time. The date part (yyyymmdd) is mandatory, whereas the time (HHMM) is optional, but recommended. Including the time is often necessary to uniquely distinguish multiple flights taking place in the same group and area on the same day. Note that there are no separators or additional characters in the datetime (i.e. use yyyymmddHHMM, not yyyy-mm-dd-HH:MM or any other variant).

  • organisation is the organisation collecting the data.

  • spectrum-type is the type of spectral information recorded by the drone sensor e.g. RGB, multispectral or hyperspectral.

  • elevation is the flight elevation (in metres).

Important

The flight folders themselves can be organised however you wish. For example, you may choose to group flight folders into parent folders based on organisation and year (e.g. /niva/2023/flight_folder1, /niva/2023/flight_folder2 etc.), or you may wish to group them by project, pilot, or any combination of these. The automated data processing routines on the SeaBee platform simply scan the file system for folders containing files named config.seabee.yaml. As long as the data within these folders is arranged correctly, everything should work OK (see Section 2.2).

2.2 Subfolder structure

Within the “parent” flight folder, data should be organised into subfolders as follows:

grouping_area_yyyymmddHHMM_[organisation]_[spectrum-type]_[elevation]/
├─ dem/
├─ gcp/
│  ├─ gcp_list-ODM.txt
│  ├─ gcp_list-Pix4D.txt
├─ ground-truth/
├─ images/
├─ orthophoto/
├─ other/
├─ report/
│  ├─ report.pdf
│  ├─ stdout.txt
├─ texturing/
config.seabee.yaml
Note

It is not necessary to include all the folders - just include what you need. As a minimum, the flight folder must contain a subfolder named images with the raw images and a file named config.seabee.yaml. All other components are optional. The most basic flight folder is therefore:

grouping_area_yyyymmddHHMM_[organisation]_[spectrum-type]_[elevation]/
├─ images/
config.seabee.yaml

The purpose of each subfolder or file is as follows:

  • dem (optional). Contains elevation datasets generated during orthorectification (DSMs and DTMs etc.). This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D).

  • gcp (optional). Folder containing ground control points in a standard text format. The format used by ODM is different to that used by Pix4D, so please specify which format has been used in the filename, as shown above. Ground control points are usually not necessary if you are flying with differential GPS (RTK etc.), in which case this folder can be omitted..

  • ground_truth (optional). Ground truth data, if available.

  • orthophoto (optional). Georeferenced mosaic images. Ideally a single, multi-band GeoTiff. This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D). ODM generates orthophotos named odm_orthophoto.original.tif. If you are uploading an externally built mosaic from Pix4D, it should be placed in this folder and named pix4d_orthophoto.original.tif.

Important

If you are uploading a mosaic generated off the platform (e.g by Pix4D), it is important that the GeoTiff file includes proper metadata, such as band descriptions, colour interpretations and NoData values. The SeaBee pipeline will reorganise datasets into a standard format before publishing them to the GeoNode. However, this step will fail (or produce incorrect results) if externally generated mosaics are added with incomplete or incorrect metadata.

  • other (optional). Anything not included in the other folders.

  • report (optional). The PDF report describing results from the orthorectification process, plus any associated logs (e.g. text files and JSON). This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D).

  • texturing (optional). Texture models generated during orthorectification. This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D).

  • images (required). Images from a single flight (i.e. images that can be orthorectified to produce a single mosaic). The images should not be grouped into subfolders.

  • config.seabee.yaml (required). A file containing flight metadata and additional settings to control the processing workflow. See Section 2.3 for details.

2.3 Configuration file

Each flight folder must contain a file named config.seabee.yaml, which contains flight metadata plus settings/commands to control the data processing. A minimal example with just the 9 mandatory attributes is shown below.

grouping: gronningen                  # General identifier linking this flight to related flights
area: agder                           # Name of specific location
datetime: '202305230830'              # 'yyyymmddHHMM' or 'yyyymmdd'. Note that the quotes are required
nfiles: 290                           # Total number of files in the 'images' folder
organisation: NINA                    # Responsible organisation
mosaic: true                          # Whether to mosaic the raw images using ODM (true or false)
classify: true                        # Whether to classify the orthomosaic using machine learning (true or false)
publish: true                         # Whether to publish the orthophoto to GeoNode (true or false)
theme: Seabirds                       # SeaBee "theme" ('Seabirds', 'Mammals', 'Habitat' or 'Water quality') 

The nfiles attribute is important, as it is used to determine whether data upload has completed successfully before starting any further processing. For example, before attempting to mosaic any images using NodeODM, the processing script will first check that the number of files in the images subfolder matches the nfiles attribute in config.seabee.yaml. If it does, it is assumed that upload is complete and the flight is ready to be processed; if it does not, the flight is skipped and checked again later.

In addition to the mandatory attributes above, config.seabee.yaml can contain optional information to control subsequent processing and add extra metadata. A complete list of attributes currently supported is shown below. Note that optional attributes can simply be omitted from the file if they are not relevant.

For further explanation of odm_options, see the documentation here. If mosaic is true and odm_options are omitted, default options defined here will be used instead. Similarly, if classify is true but ml_options is omitted, the machine learning step will default to attempting object detection using the most recent algorithm available.

grouping: gronningen                                             # General identifier linking this flight to related flights
area: agder                                                      # Name of specific location
datetime: '202305230830'                                         # 'yyyymmddHHMM' or 'yyyymmdd'. Note that quotes are required
nfiles: 290                                                      # Total number of files in the 'images' folder
organisation: NINA                                               # Responsible organisation
mosaic: true                                                     # Whether to mosaic the raw images using ODM (true or false)
classify: true                                                   # Whether to classify the orthomosaic using machine learning (true or false)
publish: true                                                    # Whether to publish the orthophoto to GeoNode (true or false)
theme: Seabirds                                                  # SeaBee "theme" ('Seabirds', 'Mammals' or 'Habitat')
spectrum_type: RGB                                               # [Optional]. Sensor type ('RGB', 'MSI' or 'HSI')
elevation: 40                                                    # [Optional]. Flight elevation (integer > 0 in metres)
creator_name: Sindre Molværsmyr                                  # [Optional]. Data collector/pilot
project: Seabirds2023                                            # [Optional]. Name of project
vehicle: DJI-M350                                                # [Optional]. Name of vehicle/drone
sensor: DJI-P1                                                   # [Optional]. Name of camera/sensor
licence: CC-BY-4                                                 # [Optional]. Name of licence
licence_link: https://creativecommons.org/licenses/by/4.0/       # [Optional]. Link to licence details
odm_options:                                                     # [Optional]. Overrides default orthorectification settings
  dsm: true                                                      # [Optional]. true or false
  dtm: true                                                      # [Optional]. true or false
  cog: true                                                      # [Optional]. true or false
  orthophoto-compression: LZW                                    # [Optional]. JPEG, LZW, PACKBITS, DEFLATE, LZMA or NONE
  orthophoto-resolution: 0.1                                     # [Optional]. Float. cm/pixel
  dem-resolution: 0.1                                            # [Optional]. Float. cm/pixel
  max-concurrency: 16                                            # [Optional]. Int
  auto-boundary: true                                            # [Optional]. true or false
  use-3dmesh: true                                               # [Optional]. true or false
  fast-orthophoto: false                                         # [Optional]. true or false
  pc-rectify: false                                              # [Optional]. Bool
  split: 999999                                                  # [Optional]. Int
  split-overlap: 150                                             # [Optional]. Int
  crop: 3                                                        # [Optional]. Int or Float (>= 0)
  pc-quality: high                                               # [Optional]. ultra, high, medium, low, lowest
  feature-quality: high                                          # [Optional]. ultra, high, medium, low, lowest
ml_options:                                                      # [Optional]. Overrides default machine learning settings
  task:                                                          # [Optional]. detection, segmentation
  model:                                                         # [Optional]. Name/version of machine learning model to use
Important

Mosaic datasets created and published to GeoNode will be named by combining information from the configuration file in the following way:

grouping_area_datetime_[spectrum_type]_[elevation]m

where grouping, area and datetime are mandatory and spectrum_type and elevation will be included if provided. It is important to ensure that this name is unique to avoid conflicts with other datasets. For this reason, it is strongly recommended to include hours and minutes in the datetime attribute when filling in config.seabee.yaml.

Note

By default, SeaBee datasets are published under a CC-BY-4.0 licence (details here). If publish is True and the licence and licence_link options are not provided, the default values will be set automatically during the publishing step.

3 Data upload

The first step before uploading anything is to organise the data from each flight into folders on your local system, as described in Section 2. Once you have done this, there are two options for getting the data onto the SeaBee platform: the MinIO web interface and Rclone, both of which are described on the Storage page.

The MinIO web interface is convenient if you only need to upload data for a single, small mission. For most other cases, Rclone or RcloneBrowser are recommended. The big advantage of these is that they track which files have been transferred and retry if the network connection is interrupted. For transferring large volumes of high resolution imagery, it is therefore much more reliable than “standard” data upload via a web interface: using Rclone or RcloneBrowser, it is possible to upload terabytes of data relatively smoothly.

Example: Using Rclone and ODM for near-real-time generation of orthophotos

During spring 2023, Sindre Molværsmyr and colleagues from NINA flew several hundred drone-based seabird surveys within a period of just a few weeks. Raw images were downloaded from the drones each evening and arranged into flight folders with the structure shown below (described in Section 2):

grouping_area_yyyymmddHHMM/
├─ images/
config.seabee.yaml

These folders were synchronised to the SeaBee platform overnight using Rclone, which typically involved batch-uploading data from between 20 and 40 flights per day. Around 5 TB of data were uploaded during the first two weeks.

A script running every hour on the SeaBee platform compared the number of files in each images folder to the value specified in config.seabee.yaml to determine when flights were ready for processing. Complete datasets were then submitted as jobs to NodeODM, which ran continuously processing data from four missions in parallel.

Completed orthomosaics were optimised for viewing online and then automatically published to the SeaBee GeoNode - usually within a few hours of the data upload completing. This made it possible for the survey team to check their data while still in the field, which is something we have never previously achieved at this scale.

3.1 Data upload using RcloneBrowser

First install Rclone and RcloneBrowser as described here, then follow the steps below to upload data to the SeaBee platform.

  1. Download data from the drone to a local folder on your PC.

  2. Organise the local data as described in Section 2. For a “standard” mission flown with RTK positioning, this often just means putting all the raw images in a subfolder called images and adding a configuration file (see Section 2.3).

  3. Open RcloneBrowser and connect to the SeaBee MinIO “remote”. Navigate to the folder where you want to upload the files, right-click and choose Upload (Figure 1).

Figure 1: Uploading to a folder on MinIO using RcloneBrowser.
  1. On the source row, click the Choose folder icon and select the tidied mission folder on your local PC. Double-check the destination looks correct and then click Run (Figure 2).
Figure 2: Upload options in RcloneBrowser.
  1. You can follow the progress of the data transfer on the Jobs tab.