Data science Machine learning Data analysis Python

Published:
Est. reading time: 4 minutes
Author: Mia Hatton

Jupyter is a suite of products that support data science and scientific computing, including Jupyter Notebook, JupyterHub and JupyterLab. It is 100% open-source software, free for all to use and released under the liberal terms of the modified BSD license.

Mia Hatton

Budding data scientist with an entrepreneurial and science communication background.

More

Definition of Jupyter

From Project Jupyter:

Project Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages. Jupyter will always be 100% open-source software, free for all to use and released under the liberal terms of the modified BSD license.

Key technologies

Jupyter is a suite of products that support data science and scientific computing, including:

  • Jupyter Notebook, an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
  • JupyterHub is a multi-user version of the notebook designed for companies, classrooms and research labs.
  • JupyterLab, a web-based interactive development environment for Jupyter notebooks, code, and data.

Jupyter Notebooks in action

Jupyter Notebooks are the core of Jupyter. A notebook integrates code, its output (e.g. data visualisations), and additional text in markdown format. This makes notebooks ideal for presenting data science projects. Your code, comments and output are processed by Jupyter on a server - which can be anywhere - and presented as a web page in HTML format.

Jupyter notebooks are made up of cells, in which you can type code in any language, or text in markdown format.

When you open your first Jupyter notebook you will be presented with an empty cell:

Your first view of Jupyter Notebooks

When you type python code into a cell and click ‘run’, the code is executed by a python kernel, and its output appears below the cell.

Running code in jupyter notebooks

Jupyter Notebooks offer two types of cell: code cells and markdown cells. When you change a cell to a markdown cell, you can document what your code does in markdown format.

Switching to markdown

Running a markdown cell will apply formatting to it.

Markdown format in Jupyter

When you use packages such as Matplotlib to plot your data in a Jupyter notebook, the plot will appear beneath the code cell.

Plotting your data in Jupyter

What makes Jupyter Notebooks great for data scientists is that it supports several languages. When you create a new Jupyter notebook you choose what type of kernel you want to use. Here is a notebook running an R kernel that does the same thing as the python notebook above:

You can also use R with Jupyter notebook

Does your organisation need Jupyter?

Jupyter’s collection of open-source software makes it easy for organisations to collaborate on shared data science projects and present their findings in a clear way. Notebooks can be hosted on your own hardware or on the cloud, for example using Azure’s Azure Notebooks or Docker. JupyterHub is a collaboration tool that allows teams of any size to work on Jupyter notebooks via Kubernetes or any virtual or physical machine.

Using Jupyter notebooks with Docker makes collaboration easy, because a Docker container hosting your notebook will also have all of its dependencies installed, saving the need for collaborators to install and update packages to match the notebook. When used with Git and Docker, Jupyter notebooks introduce seamless collaboration opportunities to your projects.

As well as being portable and accessible, Jupyter notebooks are easy to share across organisations and the world. Your notebooks can be converted to static HTML for public viewing.

You may need Jupyter if:

  • you are undertaking a data science project and want to present your findings alongside your code
  • you are undertaking a collaborative project and need to share results, thoughts and code easily
  • you need multiple parties to collaborate on a project without having to install and update packages on several machines

Technical considerations

Prerequisites and Integrations

Although Jupyter is language-agnostic, it runs on Python, so you need to have Python installed to install Jupyter.

Getting started with Jupyter notebooks is a simple as hitting an install button. Jupyter Notebooks is most easily installed using the Anaconda distribution, which also installs Python, R, and a number of common data science packages such as matplotlib. You can install Anaconda from this page.

Once you are using Jupyter Notebooks there are a variety of additional packages and configurations to choose from on the Project Jupyter website.

Pricing

Jupyter is open source and completely free to use. Once you implement Jupyter for cloud projects, your costs will come from hiring virtual machines or sharing Docker images.