JupyterHub
Quick start: Choose the resources you need from the links below and login using your Feide credentials.
Standard. 1 CPU, 1 GB memory. The default machine, suitable for basic data analysis such as processing ground truth data, annotation, simple statistics & visualisation etc. Also good for deploying orthorectification tasks to NodeODM.
High memory. 4 CPUs, 32 GB memory. Well suited to raster processing, such as manipulating image mosaics.
GPU-enabled. 4 CPUs, 16 GB memory, 1 Tesla V100-PCIE-16GB GPU. Suitable for machine learning.
Some platform components accessible from the Hub have their own resources in addition to those listed above. For example, NodeODM, which provides access to Open Drone Map, has its own CPU and memory allocations, so you can perform orthomosaicing of large volumes of raw images even from a Standard machine.
If in doubt, start small and work up. It is easy to switch between machines
If you receive a
Spawn failed
error when you login, it is likely that there are not enough resources available to start your chosen machine. Either try a smaller one or wait until the Hub is less busy. If you see this message often, please create an issue and we’ll try to increase the resources available
1 Overview
The SeaBee JupyterHub provides a central access point for data processing on the SeaBee platform. It can be used to access data stored on MinIO, to push new datasets to GeoServer & GeoNode, to train & apply machine learning algorithms, and to perform general data processing tasks. See the simplified architecture diagram for an indication of where the Hub fits in relation to other platform components.
1.1 Programming languages
The Hub provides access to three programming languages: Python, R and Julia. Most of the development work so far has been done using Python, but since R is popular with many SeaBee ecologists it has been included to facilitate better collaboration between developers and researchers. Julia is typically faster than Python or R for intensive number crunching, but it is not (yet) used in any SeaBee workflows.
The Python environment includes everything most users will need for typical geospatial data processing and machine learning tasks. See Section 4 for some example notebooks illustrating various SeaBee workflows.
1.2 Integrated Development Environments
The Hub provides access to three Integrated Development Environments (IDEs): JupyterLab, VSCode and R-Studio. Users may work with one or all of these (and switch between them), depending on their experience, preferences and the task being undertaken. JupyterLab provides a familiar interface for many Python-orientated data scientists, while R-Studio will be familiar to most users of R. VSCode offers a more traditional IDE, with a richer set of tools for developers. In addition, all users have access to an Ubuntu terminal, providing access to Bash and various other command-line tools.
Jupyter Notebooks and scripts can be created for any of the 3 languages described. Notebooks can be edited using either JupyterLab or VSCode, and they provide an excellent way to develop and test workflows and to share examples with colleagues. Similarly, R-Studio can be used to create R-Markdown documents, which combine working code and descriptive text is an analogous way to Jupyter Notebooks. See Section 4 for some examples.
1.3 Extensions
The Hub includes the following extensions and add-ins:
jupyter-archive
provides context menu options for creating and extracting.zip
archives (thezip
andunzip
terminal commands are also available).jupyterlab-lsp
provides language server protocol integration.jupyterlab_code_formatter
offers code auto-formatting (using black for Python).jupyterlab-spellchecker
provides spell-checking within Markdown and notebook documents.jupyterlab-spreadsheet
supports read-only exploration of Excel files, which is useful for quickly inspecting files while coding, without having to download open them in Excel.jupyterlab-git
provides a Git graphical user interface (GUI) in JupyterLab (see Section 3). Note that Git integration is also available in R-Studio and VSCode, as well as via the terminal.
1.4 Resources
All SeaBee components are deployed within a single Kubernetes namespace
. The JupyterHub shares resources in this namespace with other components of the platform. Some memory and CPUs are permanently reserved for running e.g. GeoServer and Open Drone Map, but the rest is available to JupyterHub. Resources on the Hub can be increased if necessary. If you believe your workflows are being limited by lack of memory or processing power, please create an issue and we will try to help.
2 Access and login
To use the JupyterHub, you first need to create a Feide account and ask for you Feide username to be added to the list of JupyterHub users. See the Login page for details.
3 Configure Git
It is strongly recommended that all users store and version their code within GitHub repositories. GitHub provides backup in case anything goes wrong, and also makes it easier to work collaboratively and share results.
Git interfaces are available in JupyterLab, R-Studio and VSCode. You can also work with Git from the command line if you prefer. In order to pull
and push
from private
GitHub repositories, you will need to store your GitHub credentials on the Hub so that Git can authenticate correctly. To do this, follow the steps below:
Login to GitHub and create a Personal Access Token (PAT) by following the steps described here.
Login to JupyterHub, open a terminal and attempt to
clone
a private repository. When prompted for your username enter your GitHub username, and when prompted for your password enter the PAT (not your GitHub password).Once the repository has been cloned, run
git config --global credential.helper store
and thengit pull
This should create a hidden file called .git-credentials
within your $HOME
directory (the file will be not be visible in JupyterLab, but it will show in VSCode, which displays hidden files by default). This file contains your username and your personal access token, which Git will use in future to authenticate on GitHub.
Note that you should only need to perform these steps once. After your credentials are stored, you can use any of the four available Git interfaces in future JupyterHub sessions and your credentials will be used automatically.
4 Useful resources
The workflow_examples repository includes example notebooks illustrating different parts of the SeaBee workflow. The snippets repository also contains some useful tips. You can clone these repositories to your $HOME
directory and use them as a starting point for your own work. If you create a new workflow (e.g. a notebook or script) that you think might be useful for others, please consider adding it to one of these repositories (workflow_examples
should be fairly well documented and explained; snippets
can be brief). The aim is to build up a library of examples to help new users to get started.