Accelerating JupyterLab

Illustration of an astronaut flying in space with a jet pack.

How JupyterLab is switching to second gear for Version 4

The next major release of JupyterLab will be significantly faster than previous versions. This was achieved both through systematic tracking of performance bugs and through significant upgrades to the Jupyter communication protocol and rendering mechanism for documents.

1. Setting up rigorous performance measurements

The first step to any measurable improvement in performance is to set up systematic measurement of performance.

The JupyterLab project now includes a UI performance benchmarking tool, in the form of a GitHub action that can be triggered on any pull request to check how performance is impacted by the change. The implementation of this new GitHub action is available in this repository: https://github.com/jupyterlab/benchmarks.

This tool measures the time required for performing the following actions: opening a test notebook, switching from the test notebook to a copy of it opened in another tab, switching from the test notebook to a text editor, switching back, searching for a word in the test notebook and closing the test notebook. There are multiple example notebooks in the test suites. Benchmark results are posted as comments on the pull request. You can see such a benchmark report here: #11494#issuecomment-976393815

Example report from the new benchmarking tool — each execution time distribution is represented by a box-plot graph (the box spans from the 1st to the 3rd quartiles with the white line positioned at the median value).

The addition of this benchmarking tool immediately allowed for optimization on how notebooks are hidden when switching tabs. Hiding can be done by adding a CSS class that enables some CSS rule, or forcibly setting display to “none”. Depending on the browser, picking one way or another of hiding content may trigger a reflow of the entire page, so we made this a settable with an option in the JupyterLab config.

The benchmark GitHub action was developed by Frédéric Collonval.

2. Upgrading to CodeMirror 6

The rendering of the text editor used in notebooks can be very expensive, especially in the case of large notebooks with many cells. Jupyter has historically relied on CodeMirror as its based text editor.

JupyterLab 4 includes an upgrade from CodeMirror 5 to CodeMirror 6, which is a complete rewrite of the text editor. This work can be found in pull requests #11638, #12877, and #12861 — modifying over 150 files of the JupyterLab codebase. Benchmarks indicate a rendering speedup factor between 2 and 3 on the large notebooks used in the benchmarking suite.

Benchmark report on the CodeMirror 6 migration PR

Note: CodeMirror 6 is also an important stepping stone towards making Jupyter notebooks accessible to people who need screen readers and other devices.

The migration of JupyterLab to CodeMirror 6 was performed by Johan Mabille.

3. Virtual rendering of notebooks

In JupyterLab 4, the notebook will only render the parts of the documents that are visible in the viewport. It significantly improves the rendering speed of large notebooks. The main pull request implementing this feature is available here: #12554.

Significant preparation work was required for this Pull Request, especially regarding the “search feature” and the “table of content” components that both made use of the notebook view instead of the document model. This was done in PRs #11689 and #12374 respectively.

The end results showed significant improvement in the rendering speed of large notebook files, with a speedup of 3 to 4, which come on top of the already improved performance from the CodeMirror 6 migration.

Benchmark report on the Virtual Rendering of notebooks

The virtual rendering of notebooks was developed by Frédéric Collonval.

4. Jupyter protocol alignment

The Jupyter server serves as a relay between the frontends such as JupyterLab or the notebook and kernels. The server ⇄ kernel communication is done over ZeroMQ sockets, with the well-specified Jupyter kernel protocol. The server ⇄ client communication is done over WebSockets.

Unfortunately, up until recently, the server ⇄ kernel (ZMQ), and the server ⇄ client (WebSocket) protocols differed slightly so that the server had to parse each message and re-serialise it in both directions. This processing cost is small for short messages such as execution requests and replies which are typically very short, however, it can become very costly when dealing with larger datasets being sent or retrieved from the front-end, such as large tables, complex mime type rendering. This misalignment of the ZMQ and WebSocket protocol can then become a real bottleneck.

In Jupyter Server 2, the WebSocket connection supports a new “aligned” protocol, in which messages can simply be copied over to and from ZeroMQ messages, which is supported by JupyterLab 4. (This work was done in PRs #657 (jupyter-server), #154 (jupyverse), and #11841 (JupyterLab)). This new aligned protocol is an opt-in, so that legacy Jupyter front-end are still expected to function with Jupyter Server 2.

Benchmarks indicate a large speedup factor (at least one order of magnitude, and more for larger messages) in the performance of the Jupyter server when displaying large data sets in Jupyter widgets. However, this is not captured by the JupyterLab benchmark tests which focus on the rendering performances.

The procol alignment work was done by David Brochart.

5. Lumino 2

The JupyterLab frontend is built upon the Lumino framework, which provides utilities for building in-browser desktop-like applications. It provides the foundations for such applications, including a uniform component wrapper that handles lifecycle management and efficient propagation of front-end events to an entire application, (e.g., resize events, drag-and-drop, layout calculation). Lumino also provides several high-performance components such as a drag-and-drop dock panel (used as the application shell for JupyterLab) and a best-in-class data grid component.

JupyterLab 4 includes a major upgrade of the Lumino. The main changes in Lumino 2 include the migration to ES2018, which allowed for the removal of large parts of the codebase prodiving features that are now natively available in JavaScript, such as native iterators, removing polyfills for promises, and special-case logic for idiosyncrasies of legacy browsers like IE. This upgrade is also leading to across-the-board performance improvements in the front-end, although not for the rendering of large documents. Lumino 2 supports background processing of UI components when the application resides in a background browser tab (a feature that may be back-ported to Lumino 1.x as well).

The Lumino 2 upgrade and integration in JupyterLab was done by Afshin Darian.

6. A faster Lumino data grid

Optimizations to the Lumino data grid widget were also implemented, speeding up the rendering in the case of merged cells (cf. PR #394). The Lumino datagrid is used in various parts of the JupyterLab UI, such as the table view for CSV files. It is also used extensively in third-party extensions such as the ipydatagrid Jupyter widget and the BeakerX table display.

The Lumino data grid optimization was done by Martin Renou.

Acknowledgement

The work by the QuantStack team on JupyterLab performance improvements was done in collaboration with Two Sigma. Several of these pull requests required major changes across the JupyterLab codebase. We are very grateful to Two Sigma for supporting the development of the Jupyter project at such a deep level.

We are grateful to Juliette Taka for the illustrations.

About the authors

Frédéric Collonval, who led the charge on JupyterLab performance improvements, is a technical director at QuantStack. He is a member of the core JupyterLab core team and authored several JupyterLab extensions.

Johan Mabille is a technical director at QuantStack, very active in the Jupyter ecosystem. He regularly contributes to JupyterLab, and developed the Xeus framework for creating Jupyter kernels.

David Brochart is a scientific software developer at QuantStack, very active in the Jupyter ecosystem. He is a maintainer of the Jupyter-server project, and the main author of Jupyverse. David also contributes to the geo-science open-source stack built atop Jupyter.

Afshin Darian is a technical director at QuantStack. He is the co-creator of the JupyterLab project and continues working on the project to this day.

Illustration of an astronaut planting a Jupyter flag at the top of a mountain.

Accelerating JupyterLab was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.