JupyterHub

Quick start: Choose the resources you need from the links below and login using your Feide credentials.

Standard. 1 CPU, 1 GB memory. The default machine, suitable for basic data analysis such as processing ground truth data, annotation, simple statistics & visualisation etc. Also good for deploying orthorectification tasks to NodeODM.
High memory. 4 CPUs, 32 GB memory. Well suited to raster processing, such as manipulating image mosaics.
GPU-enabled. 4 CPUs, 16 GB memory, 1 Tesla V100-PCIE-16GB GPU. Suitable for machine learning.

Tip

Some platform components accessible from the Hub have their own resources in addition to those listed above. For example, NodeODM, which provides access to Open Drone Map, has its own CPU and memory allocations, so you can perform orthomosaicing of large volumes of raw images even from a Standard machine.
If in doubt, start small and work up. It is easy to switch between machines
If you receive a Spawn failed error when you login, it is likely that there are not enough resources available to start your chosen machine. Either try a smaller one or wait until the Hub is less busy. If you see this message often, please create an issue and we’ll try to increase the resources available

1 Overview

The SeaBee JupyterHub provides a central access point for data processing on the SeaBee platform. It can be used to access data stored on MinIO, to push new datasets to GeoServer & GeoNode, to train & apply machine learning algorithms, and to perform general data processing tasks. See the simplified architecture diagram for an indication of where the Hub fits in relation to other platform components.

1.1 Programming languages

The Hub provides access to three programming languages: Python, R and Julia. Most of the development work so far has been done using Python, but since R is popular with many SeaBee ecologists it has been included to facilitate better collaboration between developers and researchers. Julia is typically faster than Python or R for intensive number crunching, but it is not (yet) used in any SeaBee workflows.

The Python environment includes everything most users will need for typical geospatial data processing and machine learning tasks. See Section 4 for some example notebooks illustrating various SeaBee workflows.

1.2 Integrated Development Environments

The Hub provides access to three Integrated Development Environments (IDEs): JupyterLab, VSCode and R-Studio. Users may work with one or all of these (and switch between them), depending on their experience, preferences and the task being undertaken. JupyterLab provides a familiar interface for many Python-orientated data scientists, while R-Studio will be familiar to most users of R. VSCode offers a more traditional IDE, with a richer set of tools for developers. In addition, all users have access to an Ubuntu terminal, providing access to Bash and various other command-line tools.

Jupyter Notebooks and scripts can be created for any of the 3 languages described. Notebooks can be edited using either JupyterLab or VSCode, and they provide an excellent way to develop and test workflows and to share examples with colleagues. Similarly, R-Studio can be used to create R-Markdown documents, which combine working code and descriptive text is an analogous way to Jupyter Notebooks. See Section 4 for some examples.

1.3 Extensions

The Hub includes the following extensions and add-ins:

jupyter-archive provides context menu options for creating and extracting .zip archives (the zip and unzip terminal commands are also available).
jupyterlab-lsp provides language server protocol integration.
jupyterlab_code_formatter offers code auto-formatting (using black for Python).
jupyterlab-spellchecker provides spell-checking within Markdown and notebook documents.
jupyterlab-spreadsheet supports read-only exploration of Excel files, which is useful for quickly inspecting files while coding, without having to download open them in Excel.
jupyterlab-git provides a Git graphical user interface (GUI) in JupyterLab (see Section 3). Note that Git integration is also available in R-Studio and VSCode, as well as via the terminal.

1.4 Resources

All SeaBee components are deployed within a single Kubernetes namespace. The JupyterHub shares resources in this namespace with other components of the platform. Some memory and CPUs are permanently reserved for running e.g. GeoServer and Open Drone Map, but the rest is available to JupyterHub. Resources on the Hub can be increased if necessary. If you believe your workflows are being limited by lack of memory or processing power, please create an issue and we will try to help.

3 Configure Git

It is strongly recommended that all users store and version their code within GitHub repositories. GitHub provides backup in case anything goes wrong, and also makes it easier to work collaboratively and share results.

Git interfaces are available in JupyterLab, R-Studio and VSCode. You can also work with Git from the command line if you prefer. In order to pull and push from private GitHub repositories, you will need to store your GitHub credentials on the Hub so that Git can authenticate correctly. To do this, follow the steps below:

Login to GitHub and create a Personal Access Token (PAT) by following the steps described here.
Login to JupyterHub, open a terminal and attempt to clone a private repository. When prompted for your username enter your GitHub username, and when prompted for your password enter the PAT (not your GitHub password).
Once the repository has been cloned, run git config --global credential.helper store and then git pull

This should create a hidden file called .git-credentials within your $HOME directory (the file will be not be visible in JupyterLab, but it will show in VSCode, which displays hidden files by default). This file contains your username and your personal access token, which Git will use in future to authenticate on GitHub.

Note that you should only need to perform these steps once. After your credentials are stored, you can use any of the four available Git interfaces in future JupyterHub sessions and your credentials will be used automatically.

4 Useful resources

The seabeepy repository includes example notebooks illustrating different parts of the SeaBee workflow. The snippets repository also contains some useful tips. You can clone these repositories to your $HOME directory and use them as a starting point for your own work. If you create a new workflow (e.g. a notebook or script) that you think might be useful for others, please consider adding it to one of these repositories (seabeepy notebooks should be fairly well documented and explained; snippets can be brief). The aim is to build up a library of examples to help new users to get started.