Data upload
The information on this page is primarily aimed at drone pilots wishing to add mission data to the SeaBee platform. For a more general overview of SeaBee data storage, see here.
1 Overview
There are two main use cases for pilots adding data to the SeaBee platform:
Upload raw images and associated files (ground control points etc.) and then perform all subsequent processing - such as orthomosaicing - on the platform itself.
Upload partially processed data. For example by performing some of the initial steps on a local desktop machine first.
Option 1 should be preferred where possible as it ensures a consistent and traceable processing pipeline for the entire workflow, and in most cases it should be faster and easier from a pilot’s perspective too. The main reason for choosing Option 2 is if pilots want to use commercial software (such as Pix4D) instead of Open Drone Map (ODM) for orthorectification. In this case, pilots must have a separate Pix4D licence to create the orthophoto(s), then upload the finished mosaics together with supporting metadata.
2 Data structure
2.1 Flight folder naming
The data from each flight should be gathered into a single folder following the subfolder structure described below. The name of the flight folder itself can be anything you choose, but it must be unique and, from a user perspective, it helps to include standard information such as location, date etc. The SeaBee platform does not enforce strict requirements for naming flight folders, because we recognise that different pilots and organisations have their own conventions. However, a recommended approach is to use grouping_area_yyyymmddHHMM
or grouping_area_yyyymmddHHMM_organisation_spectrum-type_elevation
. Note the use of underscores (_
) to separate each of the main components, while hyphens (-
) or CamelCase
can be used to further subdivide each part, if necessary. For example: multipart-group_area-part-1_yyyymmddHHMM
or MultipartGroup_AreaPart1_yyyymmddHHMM
.
The name of the flight folder is not used by any of the automated processing routines on the SeaBee platform. As long as the subfolders are arranged as described below and the details are correct in config.seabee.yaml
, the mission will be processed regardless of the flight folder name. The name of the flight folder only matters for people browsing data manually (e.g. using MinIO). It is therefore important that names are human-readable.
Using the recommended structure, the components of the folder name have the following meaning:
grouping
is any general identifier linking data from this flight with data from other related flights. For example: the name of a broad region where several flights have taken place (e.g.Runde
); the name of a project (e.g.Kelpmap
); or the name of a fieldwork team (e.g.Team1
, orTeam1Day1
).area
is the name of the location (e.g. the name of an island, or a specific stretch of coastline).yyyymmddHHMM
is the flight start date and time. The date part (yyyymmdd
) is mandatory, whereas the time (HHMM
) is optional, but recommended. Including the time is often necessary to uniquely distinguish multiple flights taking place in the same group and area on the same day. Note that there are no separators or additional characters in the datetime (i.e. useyyyymmddHHMM
, notyyyy-mm-dd-HH:MM
or any other variant).organisation
is the organisation collecting the data.spectrum-type
is the type of spectral information recorded by the drone sensor e.g. RGB, multispectral or hyperspectral.elevation
is the flight elevation (in metres).
The flight folders themselves can be organised however you wish. For example, you may choose to group flight folders into parent folders based on organisation and year (e.g. /niva/2023/flight_folder1
, /niva/2023/flight_folder2
etc.), or you may wish to group them by project, pilot, or any combination of these. The automated data processing routines on the SeaBee platform simply scan the file system for folders containing files named config.seabee.yaml
. As long as the data within these folders is arranged correctly, everything should work OK (see Section 2.2).
2.2 Subfolder structure
Within the “parent” flight folder, data should be organised into subfolders as follows:
grouping_area_yyyymmddHHMM_[organisation]_[spectrum-type]_[elevation]/
├─ dem/
├─ gcp/
│ ├─ gcp_list-ODM.txt
│ ├─ gcp_list-Pix4D.txt
├─ ground-truth/
├─ images/
├─ orthophoto/
├─ other/
├─ report/
│ ├─ report.pdf
│ ├─ stdout.txt
├─ texturing/
config.seabee.yaml
It is not necessary to include all the folders - just include what you need. As a minimum, the flight folder must contain a subfolder named images
with the raw images and a file named config.seabee.yaml
. All other components are optional. The most basic flight folder is therefore:
grouping_area_yyyymmddHHMM_[organisation]_[spectrum-type]_[elevation]/
├─ images/
config.seabee.yaml
The purpose of each subfolder or file is as follows:
dem
(optional). Contains elevation datasets generated during orthorectification (DSMs and DTMs etc.). This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D).gcp
(optional). Folder containing ground control points in a standard text format. The format used by ODM is different to that used by Pix4D, so please specify which format has been used in the filename, as shown above. Ground control points are usually not necessary if you are flying with differential GPS (RTK etc.), in which case this folder can be omitted..ground_truth
(optional). Ground truth data, if available.orthophoto
(optional). Georeferenced mosaic images. Ideally a single, multi-band GeoTiff. This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D). ODM generates orthophotos namedodm_orthophoto.original.tif
. If you are uploading an externally built mosaic from Pix4D, it should be placed in this folder and namedpix4d_orthophoto.original.tif
.
If you are uploading a mosaic generated off the platform (e.g by Pix4D), it is important that the GeoTiff file includes proper metadata, such as band descriptions, colour interpretations and NoData values. The SeaBee pipeline will reorganise datasets into a standard format before publishing them to the GeoNode. However, this step will fail (or produce incorrect results) if externally generated mosaics are added with incomplete or incorrect metadata.
other
(optional). Anything not included in the other folders.report
(optional). The PDF report describing results from the orthorectification process, plus any associated logs (e.g. text files and JSON). This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D).texturing
(optional). Texture models generated during orthorectification. This folder and its contents will be generated automatically if orthorectification is performed on the platform using ODM, but it should be explicitly provided if orthorectification has been done elsewhere (e.g. using Pix4D).images
(required). Images from a single flight (i.e. images that can be orthorectified to produce a single mosaic). The images should not be grouped into subfolders.config.seabee.yaml
(required). A file containing flight metadata and additional settings to control the processing workflow. See Section 2.3 for details.
2.3 Configuration file
Each flight folder must contain a file named config.seabee.yaml
, which contains flight metadata plus settings/commands to control the data processing. A minimal example with just the 9 mandatory attributes is shown below.
grouping: gronningen # General identifier linking this flight to related flights
area: agder # Name of specific location
datetime: '202305230830' # 'yyyymmddHHMM' or 'yyyymmdd'. Note that the quotes are required
nfiles: 290 # Total number of files in the 'images' folder
organisation: NINA # Responsible organisation
mosaic: true # Whether to mosaic the raw images using ODM (true or false)
classify: true # Whether to classify the orthomosaic using machine learning (true or false)
publish: true # Whether to publish the orthophoto to GeoNode (true or false)
theme: Seabirds # SeaBee "theme" ('Seabirds', 'Mammals', 'Habitat' or 'Water quality')
The nfiles
attribute is important, as it is used to determine whether data upload has completed successfully before starting any further processing. For example, before attempting to mosaic any images using NodeODM, the processing script will first check that the number of files in the images
subfolder matches the nfiles
attribute in config.seabee.yaml
. If it does, it is assumed that upload is complete and the flight is ready to be processed; if it does not, the flight is skipped and checked again later.
In addition to the mandatory attributes above, config.seabee.yaml
can contain optional information to control subsequent processing and add extra metadata. A complete list of attributes currently supported is shown below. Note that optional attributes can simply be omitted from the file if they are not relevant.
For further explanation of odm_options
, see the documentation here. If mosaic
is true
and odm_options
are omitted, default options defined here will be used instead. Similarly, if classify
is true
but ml_options
is omitted, the machine learning step will default to attempting object detection using the most recent algorithm available.
grouping: gronningen # General identifier linking this flight to related flights
area: agder # Name of specific location
datetime: '202305230830' # 'yyyymmddHHMM' or 'yyyymmdd'. Note that quotes are required
nfiles: 290 # Total number of files in the 'images' folder
organisation: NINA # Responsible organisation
mosaic: true # Whether to mosaic the raw images using ODM (true or false)
classify: true # Whether to classify the orthomosaic using machine learning (true or false)
publish: true # Whether to publish the orthophoto to GeoNode (true or false)
theme: Seabirds # SeaBee "theme" ('Seabirds', 'Mammals' or 'Habitat')
spectrum_type: RGB # [Optional]. Sensor type ('RGB', 'MSI' or 'HSI')
elevation: 40 # [Optional]. Flight elevation (integer > 0 in metres)
creator_name: Sindre Molværsmyr # [Optional]. Data collector/pilot
project: Seabirds2023 # [Optional]. Name of project
vehicle: DJI-M350 # [Optional]. Name of vehicle/drone
sensor: DJI-P1 # [Optional]. Name of camera/sensor
licence: CC-BY-4 # [Optional]. Name of licence
licence_link: https://creativecommons.org/licenses/by/4.0/ # [Optional]. Link to licence details
odm_options: # [Optional]. Overrides default orthorectification settings
dsm: true # [Optional]. true or false
dtm: true # [Optional]. true or false
cog: true # [Optional]. true or false
orthophoto-compression: LZW # [Optional]. JPEG, LZW, PACKBITS, DEFLATE, LZMA or NONE
orthophoto-resolution: 0.1 # [Optional]. Float. cm/pixel
dem-resolution: 0.1 # [Optional]. Float. cm/pixel
max-concurrency: 16 # [Optional]. Int
auto-boundary: true # [Optional]. true or false
use-3dmesh: true # [Optional]. true or false
fast-orthophoto: false # [Optional]. true or false
pc-rectify: false # [Optional]. Bool
split: 999999 # [Optional]. Int
split-overlap: 150 # [Optional]. Int
crop: 3 # [Optional]. Int or Float (>= 0)
pc-quality: high # [Optional]. ultra, high, medium, low, lowest
feature-quality: high # [Optional]. ultra, high, medium, low, lowest
ml_options: # [Optional]. Overrides default machine learning settings
task: # [Optional]. detection, segmentation
model: # [Optional]. Name/version of machine learning model to use
Mosaic datasets created and published to GeoNode will be named by combining information from the configuration file in the following way:
grouping_area_datetime_[spectrum_type]_[elevation]m
where grouping
, area
and datetime
are mandatory and spectrum_type
and elevation
will be included if provided. It is important to ensure that this name is unique to avoid conflicts with other datasets. For this reason, it is strongly recommended to include hours and minutes in the datetime
attribute when filling in config.seabee.yaml
.
By default, SeaBee datasets are published under a CC-BY-4.0
licence (details here). If publish
is True
and the licence
and licence_link
options are not provided, the default values will be set automatically during the publishing step.
3 Data upload
The first step before uploading anything is to organise the data from each flight into folders on your local system, as described in Section 2. Once you have done this, there are two options for getting the data onto the SeaBee platform: the MinIO web interface and Rclone, both of which are described on the Storage page.
The MinIO web interface is convenient if you only need to upload data for a single, small mission. For most other cases, Rclone or RcloneBrowser are recommended. The big advantage of these is that they track which files have been transferred and retry if the network connection is interrupted. For transferring large volumes of high resolution imagery, it is therefore much more reliable than “standard” data upload via a web interface: using Rclone or RcloneBrowser, it is possible to upload terabytes of data relatively smoothly.
During spring 2023, Sindre Molværsmyr and colleagues from NINA flew several hundred drone-based seabird surveys within a period of just a few weeks. Raw images were downloaded from the drones each evening and arranged into flight folders with the structure shown below (described in Section 2):
grouping_area_yyyymmddHHMM/
├─ images/
config.seabee.yaml
These folders were synchronised to the SeaBee platform overnight using Rclone, which typically involved batch-uploading data from between 20 and 40 flights per day. Around 5 TB of data were uploaded during the first two weeks.
A script running every hour on the SeaBee platform compared the number of files in each images
folder to the value specified in config.seabee.yaml
to determine when flights were ready for processing. Complete datasets were then submitted as jobs to NodeODM, which ran continuously processing data from four missions in parallel.
Completed orthomosaics were optimised for viewing online and then automatically published to the SeaBee GeoNode - usually within a few hours of the data upload completing. This made it possible for the survey team to check their data while still in the field, which is something we have never previously achieved at this scale.
3.1 Data upload using RcloneBrowser
First install Rclone and RcloneBrowser as described here, then follow the steps below to upload data to the SeaBee platform.
Download data from the drone to a local folder on your PC.
Organise the local data as described in Section 2. For a “standard” mission flown with RTK positioning, this often just means putting all the raw images in a subfolder called
images
and adding a configuration file (see Section 2.3).Open RcloneBrowser and connect to the SeaBee MinIO “remote”. Navigate to the folder where you want to upload the files, right-click and choose
Upload
(Figure 1).
- On the
source
row, click the Choose folder icon and select the tidied mission folder on your local PC. Double-check thedestination
looks correct and then clickRun
(Figure 2).
- You can follow the progress of the data transfer on the
Jobs
tab.