Why you should use Jupyter Notebook

fpfcorp 04/10/2021 1549

Jupyter Notebook

is a pretty nifty tool that you can utilize in your day to day activities. To explain the benefits of Jupyter Notebook, we will share how we are using it to solve our regular puzzles at

Elucidata

.

But before we deep dive into our specific usage, let’s get some context around Jupyter Notebooks.

What is a Jupyter Notebook?

Jupyter Notebook is an open-source web application that allows a user, scientific researcher, scholar or analyst to create and share the document called the

Notebook,

containing live codes, documentation, graphs, plots, and visualizations.

Jupyter Notebook provides support for 40+ programming languages to the users including the most frequently used programming languages –

Python

,

R

,

Julia

to name a few. It allows the user to download the notebook in various file formats like PDF, HTML, Python, Markdown or an .ipynb file.

What makes Jupyter Notebook the de facto standard for analysis?

Due to Jupyter Notebook’s multi-programming support, huge feature availability and rapidly growing popularity among the community, it has become a standard for all sorts of analysis, visualizations, rapid prototyping, ML and various code practices.

Who uses Jupyter Notebooks?

Any person who is a data scientist, data engineer, data analyst, machine learning scientist, research scholar, scientific researchers or a general user who wants to do any sort of scientific computation, data processing or visualization related work can use the Jupyter Notebook.

Base Architecture Behind Jupyter Notebook

Fig: [1] Image depicting the base components of the Jupyter Notebook

How a Jupyter Notebook works – a user interacting with the component of the Jupyter Notebook runs the code and stores the code output together with markdown notes in an editable doc called a notebook.

When the user saves their notebook file, it is sent from the user browser to the notebook server. It is then saved on the disk as a

JSON file

with a

.ipynb

extension. The Jupyter Notebook server is responsible for saving, loading and editing the user notebooks if the kernel is still not present.

Jupyter Notebook @

Elucidata

We, at Elucidata, are working on this project to develop new features and services on top of a traditional Jupyter Notebook, to facilitate our end users to have the best user experience.

We have worked on creating a Jupyter Notebook with a brand new and elegant UI, and new custom functionalities. We are not leaving any stone unturned to make it the best notebook experience a user can have.

We are dedicated to this project to make it, what a data scientist, data engineer or data scholar would want on our platform.

Our Use Cases

We introduced the Jupyter Notebooks in the eco-system of our platform –

PollyTM

, to support the manipulation, visualization and the programming of the end result of the built-in workflows. Later, we leveraged the functionality of the Jupyter Notebook and combined it with the JupyterHub architecture to extend the functionalities for the following use cases:

Project

: In

PollyTM

, the user can go to his/her project, and from there, they can open their saved Jupyter Notebook, or create a new Jupyter Notebook, or even upload an existing Jupyter Notebook. Project is the most widely used function of Jupyter Notebook to open, create, modify and even delete in PollyTM

Templates

: We have built some generic template notebooks as well to support all the use cases

Analysis

: Analysis is a key point in finding valuable insights from the huge genomics and metabolomics datasets uploaded to our platform for running various in-house builds and workflows. By integrating the Jupyter Notebook we offer our end-users and our in house data scientists a convenient interface for interactively running code, exploring output, and visualizing data – all from a single cloud-based development environment. Along with this, we have added the custom functionality of our platform API which works seamlessly for fetching dataset from our cloud-managed environment directly into the Notebook.

Our in-house build workflows

: As we expanded the platform, we introduced new capabilities called workflows. With the hard efforts of our engineering team & the collaboration of the data science team, we succeeded in building workflows which are a series of algorithms running in a set manner to achieve a specific goal

Data Scientist

s can code the experiment using our hosted Jupyter Notebook

Software Engineers

can code various functionalities using the Jupyter Notebook

Our Jupyter Notebook System

Supporting such use cases, require a quite scalable and supporting infrastructure. Let’s walk through some of the components of our Jupyter Notebook System.

Docker

:

Docker plays a key role in the infrastructure and endless support for our interactive Notebooks. All our Jupyter Notebooks run inside a contained environment, with all the pre-configured functionalities and library packages available, that the end user might require in their day to day task

JupyterHub

:

JupyterHub

is a high-level architecture that handles user authentication, routing, generating notebook docker, detecting notebooks and deleting them when they are no longer in use.

Why we chose JupyterHub + Docker?

We didn’t want our users to fight over the correct package version installations and their dependency management for work. We wanted every user i.e. data scientists, data engineers, or data analysts, to have an identical reproducible environment with the same library and same datasets. In fact, the same version of everything.

If we allow them to install on their own pods, it would lead to different environment versions depending on what workflow they are using for the package installer.

A fully hosted environment makes sure that everybody has the same seamless starting point.

UI Interface

: We have redesigned the UI of the Jupyter Notebook. We used CSS and JS with libraries like JQuery to give it a perfect and clean UI. It is an intuitive UI with a minimalistic aesthetic. This required a thoughtful UX design that made it easy to do the hard things. Below is the look of our notebook.

Polly iPython Jupyter

Compute

: The user’s virtual machine instances support the computation of a core, with 2 GB RAM and 100 GB block storage, but as per availability and cluster usage the computation power would increase. Our cluster is

AutoScale

enabled which allows spawning of the new user pods on the fly based on high requests. We have deployed our whole notebook infrastructure on Google Cloud.

Cluster Management Software

:

We are using

Kubernetes

for managing our computation instances and cluster. Kubernetes ensures that the pods in running state do not shut down due to an error, maintaining high availability. With Kubernetes we are able to manage 1000+ user pods without losing any data

Deployment

:

We use

Helm

(a package manager) for Kubernetes to automate our deployment process. Helm ensures the correct docker image is deployed and kept for future use to avoid pulling the image again and reduce spawning time

Storage

:

We are using

Amazon S3

as a storage system for the users’ Notebook and their reusable scripts across the Jupyter Notebook. Thus each user project has a directory structure at S3 for storing, managing, creating or deleting their notebook. They can launch its interactive notebook from within

our platform

. Following is the snapshot of the users’ project storage on S3.

Elucidata Polly’s S3 Storage View

Here is a brief tech stack map

Elucidata Jupyter Notebook tech stack

What did we learn?

Reduce human maintenance:

It is easy to scale large scenarios without a lot of human intervention which would avoid any bottlenecks. With Helm, we have also reduced the bottlenecks faced during deployment for our engineers

Great infrastructure:

With this development, we have a stable infrastructure that can handle large user requests, multiple contained environments, enabling multiple docker containers to run at an instance and power the user to do their task

Discover new possibilities: During the journey of our development and integration of Jupyter Notebook, we discovered several new possibilities that we were working on to give better features and good user experience to our end-users

References

For further references, refer to the following useful links or schedule a demo with us to witness this in action

Schedule a Demo

Jupyter Notebook Components

Latest: What is a copy lab notebook?

Next: Why Jupyter is the computing notebook of choice for data scientists

Related Articles