• AIPressRoom
  • Posts
  • New solution template simplifies Jupyterhub on GKE setup

New solution template simplifies Jupyterhub on GKE setup

The recent growth in distributed, compute-intensive ML applications has prompted data scientists and ML practitioners to find easy ways to prototype and develop their ML models. Running your Jupyter notebooks and JupyterHub on Google Kubernetes Engine (GKE) can provide a way to run your solution with security and scalability built-in as core elements of the platform.

GKE is a managed container orchestration service that provides a scalable and flexible platform for deploying and managing containerized applications. GKE abstracts away the underlying infrastructure, making it easy to deploy and manage complex deployments.

Jupyterhub is a powerful, multi-tenant server-based web application that allows users to interact with and collaborate on Jupyter notebooks. Users can create custom computing environments with custom images and computational resources in which to run their notebooks. “Zero to Jupyterhub for Kubernetes” (z2jh) is a Helm chart that you can use to install Jupyterhub on Kubernetes that provides numerous configurations for complex user scenarios.

We are excited to announce a solution template that will help you get started with Jupyterhub on GKE. This greatly simplifies the use of z2jh with GKE templates, offering a quick and easy way to set up Jupyterhub by providing a pre-configured GKE cluster, Jupyterhub config, and custom features. Further, we added features such as authentication and persistent storage and cut down the complexity for model prototyping and experimentation. In this blog post, we discuss the solution template, the Jupyterhub on GKE experience, unique characteristics that come from running on GKE, and features such as a custom authentication and persistent storage.

The Jupyter on GKE experience

Running Zero to Jupyterhub on GKE provides a powerful platform for ML applications but the installation process is complicated. To ensure ML practitioners have minimal friction, our solution templates abstract away the infrastructure setup and solve common enterprise platform challenges including authentication and security, and persistent storage for notebooks.

Security and Auth

Granting the correct access to the notebooks can be especially difficult when working with sensitive data. By default, Jupyterhub exposes a public endpoint that anyone can access. This endpoint should be locked down to prevent unintended access. Our solution leverages Identity-Aware Proxy (IAP) to gate access to the public endpoint. IAP creates a central authorization layer for the Jupyterhub application access by HTTPS, utilizing the application-level access model and enabling IAM-based access control to the notebook to make users’ data more secure. Adding authentication to Jupyterhub ensures user validity and notebook security.

By default, the template reserves an IP address through Google Cloud IAP. Platform administrators can alternatively provide a domain to host their Jupyterhub endpoint, which will be guarded by IAP. Once IAP is configured, the platform administrator must update the service allowlist by granting users the role of “IAP-secure Web App User.” You can see how to allow access to the deployed Jupyterhub in the image below and as described here: