We’re just weeks away our first Jupyter community conference, JupyterCon. It will take place from August 22nd to 25th (and Sprints 26th), in the beautiful city of New York, at the Hilton Midtown, a spectacular location just steps from Central Park, Times Square, MOMA.
If you haven’t registered yet there’s still time. There are a number of pass options to choose from including 2, 3, or 4 day passes. Discounts are available for students, academic instructors, and government and non-profit employees. Don’t qualify for any of those? We have a special 20% discount for you, just use the code JUPCORE20 when you register.
What is JupyterCon?
The four day event will feature two days of training and tutorials and two days of Keynotes and Sessions. Topics include the Jupyter platform’s core architecture, kernels, extensions & customizations, usage and application of Jupyter Notebooks, as well as sessions on Jupyter’s development and community from the core Jupyter team.
We’ve left plenty of time for networking, including attendee receptions, Speed Networking, Poster Sessions, community group meetups, as well as the chance to meet some of the speakers in small group settings. Check out the event page. The core Jupyter team will also be present and ready to answer your trickiest questions about the Jupyter platform and what’s in-store for the future.
The preconference starts on Monday 21st with a Solar Eclipse from 1:23pm to 4:00 pm in NYC.
A fantastic line-up of speakers
JupyterCon is chaired by Fernando Pérez, creator and BDFL of Jupyter, and Andrew Odewahn, CTO of O’Reilly.
Some of the speakers joining us at JupyterCon include:
Lorena Barba from the George Washington University
Brett Cannon from Microsoft / the Python Software Fundation
These are only a few of the speakers joining us. We are looking forward to hearing how Jupyter is being used in education, finance, machine learning, and what the future holds for the Jupyter ecosystem. See the full line-up here.
Sprints
Sprints are happening on Saturday 26th, if you want to come hack on Jupyter and related project get more information on the GitHub JupyterCon repository. You do not need to have registered to the main conference — but you must registered via Eventbrite even if you have already register for the main conference.
Financial aid
We had a number of really good applicants for financial help, and the selection process was tough. If you have not been selected this time, don’t be discouraged and we hope to see you at the next JupyterCon.
BOFS
We will have a special “Jupyter for Teaching & Learning BOF“ organized by Lorena Barba And Robert Talbert on Thursday at 7pm. This BOF is for anyone interested in using Jupyter for teaching and learning. Topics for discussion include incorporating Jupyter in the classroom, using Jupyter tools like nbgrader and JupyterHub, connecting with other Jupyter educators, and more. For more information and to let us know you’re interested in participating, please see this flyer and fill out the Call for Participation form at http://bit.ly/jupyter-ed-bof.
Solar Eclipse
If you are coming from outside of the United States please remember to be in NYC on the 21st from 1:23pm as there is a (partial) solar eclipse in NYC, which ends at 4:00.pm. Do not forget your eclipse glasses!
Thanks
This is our first JupyterCon! We do welcome feedback and will be looking for help to organize another one next year. Please contact us if you are interested in helping organizing the next conference.
We’ve partnered with O’Reilly Media to develop the JupyterCon conference. O’Reilly Media is a long-time supporter of the project and active publishers in the Python/Data Science space. O’Reilly has extensive experience running conferences and we’ve been working with them for the past year to bring you a great inaugural JupyterCon.
We are pleased to announce the release of Jupyter Notebook 5.1.0. This is a minor release that includes mostly bug fixes and improvements with the notable addition of i18n (internationalization and localization) support.
You can install the new version of the notebook now using pip:
pip install --upgrade notebook
Or conda (it may be a few days before packages are available):
We look forward to your feedback and contributions!
Jupyter Notebook 5.1.0 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Four month after releasing IPython 6.1 and 5.4, and a couple of hours after the release of the notebook 5.1, we are happy to announce the release of IPython 6.2 (Python 3 only), and it’s cousin IPython 5.5 still compatible with Python 2.7.
You can update now by using:
pip install --upgrade ipython
If you have a recent enough version of pip you will get the latest compatible version of IPython regardless of the version of Python you are running.
The conda packages are on their way; once available you will be able to update with:
conda install ipython
New Features
As IPython 6.2 and 5.5 are minor releases you will only find a small number of new features. When API additions were done on IPython 6.2 they were backported on 5.5 to simplify the maintenance of code compatible both with Python 2.7 and 3+. You can find the full list of new features in the changelog.
As a quick teaser, IPython 6.2 can now:
Show function signature in the terminal while completing.
As stated on our roadmap, we’ll keep releasing a 5.x for some time, though starting at end of year. However, we will decrease our active involvement in fixing bugs affecting the 5.x branch. We will still accept PRs, and backport if you nicely ask us. Releases will happen occasionally if fixes are available, but we will be sunsetting the Python 2 support slowly.
If you are interested in further maintenance of the 5.x branch, we would love help with that work. Feel free to contact us on GitHub.
What’s next ?
We are going to start thinking about IPython 7, and start to embrace more of the Python 3 only features. Slowing down backports should allow us to be more confident that changes will not affect the automatic application of patches on old branches. Trimming down old legacy code may also help to regain some speed on interpreter startup, and should lead to plenty of opportunities for new contributors to join.
We will also try to simplify our documentation, and make often requested sections easier to find.
If you are looking for a project to contribute to – code, documentation, example, design, helping others, feel free to contact us so we can guide you through the process.
Enjoy this new release, and hope to see you around the mailing list and bug tracker!
Release of IPython 5.5 and 6.2 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
We are pleased to announce the release of JupyterHub 0.8. This is a big release with many fixes and improvements, and some major changes.
To upgrade from jupyterhub 0.7:
Stop jupyterhub
pip3 install --upgrade jupyterhub==0.8
Backup your database: e.g.cp -v jupyterhub.sqlite jupyterhub-backup-$(date +%Y-%m-%d).sqlite
Upgrade your database: jupyterhub upgrade-db
See the changelog for more details. The “Notes on Upgrading” below also provides more specifics.
OAuth
Perhaps the biggest change is the switch to using OAuth 2.0 for the internal authentication mechanism in JupyterHub. This shouldn’t have much of a visible effect on your deployments other than hopefully a reduction in redirect-loops when users login.
If you are writing a JupyterHub Service, then you can switch to OAuth by using the HubOAuthenticated or HubOAuth classes if you were previously using HubAuthenticated or HubAuth. All of these classes now support token-based access via the Authorization header, just like the Hub itself.
Stability, Scalability, and Performance
As part of supporting larger deployments, we have done some scalability stress testing of the Hub, finding and fixing several race conditions that could manifest as redirect loops, bugs, and performance improvements when the Hub is under load. This has been led largely by the Berkeley Data Science Education Program team, which has over one thousand students using a JupyterHub instance this Fall. We now know that one JupyterHub 0.8 instance can support a few thousand users logging in and attempting to spawn their servers at the same time. Once servers are active, the Hub is minimally involved, and the proxy becomes the next bottleneck.
Custom Proxy Implementations
Another major new feature targeted at scalability is custom proxy implementations. JupyterHub has always used Configurable-HTTP-Proxy (CHP), a single-process Node.js HTTP proxy. This is a single process bottleneck and potential single point of failure for Hub deployments. CHP can handle a lot of concurrent active users (at least several thousand), but for larger scale applications, Hub deployments may want to use a more scalable and/or robust proxy implementation.
To achieve this, JupyterHub 0.8 introduces a Python API abstracting the proxy needs of JupyterHub and developers can provide their own proxy implementations as alternatives to CHP. The first such implementation uses a Kubernetes Traefik-based Ingress for the proxy, as part of KubeSpawner.
Persisting Authentication State
Another new feature available to Authenticators is the encrypted persistence of authentication state. This is aimed at preserving and passing authenticator-related state, such as client certificates or GitHub API tokens. Authenticators provided by oauthenticator 0.7 support persisting auth state from upstream authentication services.
Other Highlights
Preliminary (API-only, no GUI) support for multiple named servers per user
Token-based access to the single-user server API
A page for users to request new API tokens from the Hub
Notes on upgrading
As with every update to JupyterHub, make sure to back-up your database and run upgrade-db. For example, if using JupyterHub with SQLite:
Due to the change in internal authentication, it is important that you make sure that single-user servers and the Hub itself are upgraded at the same time. If you are running everything in one env (e.g. the default Spawner, SudoSpawner, or similar), there is nothing to do. However, if you are using a container-based setup to launch single-user server (e.g. DockerSpawner or KubeSpawner), make sure that your user image and your Hub both get the upgrade at the same time. In most cases, this means adding to your Dockerfile:
RUN pip3 install jupyterhub==0.8.0
which ensures the correct version of JupyterHub is installed.
Thanks to everyone who has contributed to this release, especially users who have helped out with testing JupyterHub during the beta process, which helps make this a great release!
JupyterHub 0.8 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
We are pleased to announce the release of Jupyter Notebook 5.2.0. This is a minor release that includes mostly bug fixes and improvements with the notable addition of RTL (right-to-left) support.
You can install the new version of the notebook now using pip:
pip install --upgrade notebook
Or conda (it may be a few days before packages are available):
conda upgrade notebook
Changelog
Allow setting token via jupyter_token env (#2921).
Fix some errors caused by raising 403 in get_current_user (#2919).
We look forward to your feedback and contributions!
Jupyter Notebook 5.2.0 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
update: December 14, 20:45 UTC, all services should be restored and back up.
On December 13, at 22:10 UTC (4:10pm EST), a large number of Jupyter-provided services stopped responding. This included, but was not limited to https://nbviewer.jupyter.org,https://try.jupyter.org (powered by tmpnb) and https://cdn.jupyter.org. We quickly narrowed this down to an issue with our hosting provider and have been working with them to resolve the issue as fast as possible.
When outages happen, the Jupyter Status page should show which services are affected and we publish updates there.
How are Jupyter services hosted?
To understand the cause of the outage, we need to understand how the Jupyter services are hosted and maintained. As Jupyter is an open organization which is mostly maintained by volunteers, we do not have a dev-ops team assigned to maintaining our infrastructure. Even with full-time developers hired through universities or companies, the time spent fixing infrastructure is taken on nights and weekends. These developers are often stretched thin and cannot be available 24/7.
Most of our cloud infrastructure is donated to us by companies like CloudFlare, Rackspace, Fastly, Google, and Microsoft. Donating resources can be challenging, both technically and legally. In this particular case, Rackspace graciously created a special account for Jupyter that handles invoices on our behalf, thereby making resources free to the project. Following a hiccup, this Jupyter account was suspended and all services are unavailable as a result.
Temporary resolution
As nbviewer is one of the most used services provided by Jupyter, we’ve moved it to one of our personal account at another cloud-provider. Fastly was set up to load-balance on the yet-to-come-back-up instances as well as this newly created instance, so all should be fine now.
The other services (tmpnb, mails@jupyter.org, cdn.jupyter.org, …) will still unavailable or highly degraded until a permanent solution is found, or the services are restarted. try.jupyter.org will likely redirect to a repo on https://mybinder.org in the meantime so people can still try out Jupyter.
Low bus factor
The outage of all these services lasted for a significant time (more than 18 hours). Which perturbed many of you relying on these services. We understand that this is hardly acceptable and we hope you’ll indulge us as these services are provided for free and without ads. One of the factors leading to the slow reestablishment of service was a relatively low bus factor, with only one and a half of our developers knowing how to deploy and maintain these services. Documentation and access to credentials was also limited.
This is one of the challenges in a distributed team like Jupyter where contributors self-organize. It is easy to forget that new code is not the only way to contribute and that infrastructure and maintenance are crucial.
We also overly rely on a single vendor (in this case Rackspace), and while we are happy with Rackspace and have no reason to move to another provider, we should have a plan to restore critical services even temporarily in case of failure.
A couple of months ago, the subject was brought to our attention, and we developed a plan to move many of our deployment to Kubernetes (which is provider agnostic). We underestimated the probability to need an emergency plan this early.
How can you help
Jupyter is mainly governed by the community all around the world. Contributing is not limited to writing code! We need members with knowledge in multiple languages, in design, dev-ops, etc. Whether you are an expert, or still learning, we would like you to get involved.
Thanks everyone for your patience and the kind words when you reached to us when discovering the services were down.
JupyterHub makes it possible to serve Jupyter instances to multiple users. The JupyterHub Helm Chart makes it possible to run this setup on kubernetes, making JupyterHub more scalable, stable, and flexible.
We, the JupyterHub team, are proud to announce the next version of the JupyterHub Helm Chart: version 0.5. This post describes a bit of what’s new in this release. We’ve nicknamed the releases of the JupyterHub Helm Chart after famous cricketers, in this case world-class bowler Hamid Hassan*.
tl;dr: The release bumps JupyterHub to 0.8, adds better HTTPS support, and improves scalability to ~4,000 simultaneous users. See the Helm Chart Changelog for more information.
New Features
The following major features have been added to v0.5:
JupyterHub 0.8
Version 0.8 of JupyterHub was released earlier this year. It is full of new features, many of which directly benefit the Kubernetes deployment of JupyterHub. Below is a list of relevant points along with the relevant sections of the Helm Chart configuration:
Lots of performance improvements. We now know we can handle up to 4k active users.
Limit the number of users who can try to launch the hub at once. This can be tuned to avoid crashes when hundreds of users try to launch at the same time. It gives them a friendly error message and asks them to try later. See hub.concurrentSpawnLimit.
Limit the number of simultaneous active users . The Active Server limit can be used to limit the total number of active users that can use the hub at any given time. This allows admins to control the size of their clusters more effectively. See hub.activeServerLimit.
Memory limits & guarantees can now contain fractional units. So you can say 0.5G instead of having to use 512M.
No more ‘too many redirects’ errors at scale. This fixes an annoying race condition causing users to get stuck in a redirect loop when starting their servers.
Easier HTTPS
Version 0.5 of the helm chart makes it easier for admins to set up HTTPS for their users with Let’s Encrypt. Users often access a JupyterHub instance from a public URL. To avoid nefarious behavior and increase security, using HTTPS is important. You can now choose to use Let’s Encrypt or a valid HTTPS certificate and key. You can also use your own HTTPS certificates & keys rather than using Let’s Encrypt. You can find the new instructions here.
More authenticators
Authenticators allow you to control who has access to your JupyterHub. The following new authentication providers have been added in 0.5:
You can now also set up a whitelist of usernames that have access to the hub (in addition to other authenticators in use). Do so by adding to the list in auth.whitelist.users.
Hub Services support
Services let you connect your JupyerHub to other web services (for example, in mybinder.org). You can now add external JupyterHub Services by adding them to hub.services. Note that you are still responsible for actually running the service somewhere (perhaps as a deployment object in Kubernetes).
More customization with jupyterhub_config.py
Sometimes it is useful to be able to run arbitrary extra code when setting up your deployment. You can put extra snippets of jupyterhub_config.py configuration in hub.extraConfig. Now you can also add extra environment variables to the hub in hub.extraEnv and extra configmap items via hub.extraConfigMap. This makes it cleaner to customize the hub's configuration in ways that are not yet possible with config.yaml. You can find more information in the documentation.
More customization options for user server environments
More options have been added under singleuser to help you customize the environment that the user session is spawned in. You can…
Change the uid / gid of the user with singleuser.uid and singleuser.fsGid
Mount extra volumes with singleuser.storage.extraVolumes & singleuser.storage.extraVolumeMounts
Provide extra environment variables with singleuser.extraEnv.
More information
For more information about the JupyterHub Helm Chart, and the JupyterHub ecosystem more broadly, see the following links:
*Hamid Hassan is a fast bowler who currently plays for the Afghanistan National Cricket Team. With nicknames ranging from “Afghanistan’s David Beckham” to “Rambo”, he is considered by many to be Afghanistan’s first Cricket Superhero. Currently known for fast (145km/h+) deliveries, cartwheeling celebrations, war painted face and having had to flee Afghanistan as a child to escape from war. He says he plays because “We are ambassadors for our country and we want to show the world that Afghanistan is not like people recognize it by terrorists and these things. We want them to know that we have a lot of talent as well.”
We are pleased to announce the release of Jupyter Notebook 5.3.0. This is a minor release that introduces some notable improvements, such as terminal support for Windows and support for operating system trash (files deleted from the notebook dashboard are moved to the trash vs. deleted permanently), as well as many bug fixes and enhancements
You can install the new version of the notebook now using pip:
pip install --upgrade notebook
Or conda (it may be a few days before packages are available):
We look forward to your feedback and contributions!
Jupyter Notebook 5.3.0 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Last August, Project Jupyter, the NumFOCUS Foundation, and O’Reilly Media came together to host our first JupyterCon. We attracted over 700 attendees and 23 scholarship recipients for 4 days of talks and tutorials. There were 5 parallel session tracks featuring 55 talks, 11 keynotes, 55 talks, 8 tutorials, and 2 training courses. In addition, the conference poster session featured 33 posters and fostered great discussions within the community. Our Community Day, held at at the end of the conference featured free registration was open to the general public. Videos of the event have been made available on Safari Online and YouTube.
JupyterCon 2018, CFP Open
JupyterCon 2017 was a huge success and we’ve been working hard since then to make JupyterCon 2018 even better. It will be held in New York City in August from Tuesday the 21st to Friday the 24th. We’ll also host an open Community Day on August 25th, which will be open to everyone.
Today we are happy to open the conference website and open the Call For Proposal with submissions due by early March. A couple of changes have been made to the CFP since last year. In particular if your talk is not accepted, you can ask us to automatically consider the proposal for the poster session.
We encourage you to submit a proposal, and reach out to us if you have any questions. We’ll do our best to help you and and give you feedback on your proposal.
Like last year, we will have diversity and student scholarships available; further information will be provided on the website. We also encourage you to follow the JupyterCon Twitter account for announcements or corrections.
Community Day
The final day of JupyterCon 2017 was a blast with a large number of people making their first contribution to the Jupyter codebase, to the documentation, editing the wiki, or deploying it in the cloud. During the conference days, a separate room was also reserved for user testing of different Jupyter software, which proved to be fantastic source of feedback for User Experience (UX) and driving various Jupyter Tools forward.
We are happy to offer this “Community Day” experience again. At JupyterCon 2017, the Saturday was branded “Sprints” with the connotation of a code-centric experience. While we’re happy to see users coming to “Sprint” on code, we want to let you know that the Community Day will be open to anyone. Whether you are a teacher, coder, researcher, or user of Jupyter, the Community Day will have something for you. The Community Day is not limited to attendees of the main JupyterCon event, and it’s intended to be a “grass-roots” celebration of Jupyter and its community. We hope to see you at JupyterCon 2018.
Jupyter Day Atlanta is a single-day conference to showcase different use cases of Project Jupyter software, in the broader free, open-source software community including data, science, journalism, and education. This event will facilitate connections between community members using Jupyter’s technology to reshape how people interact with code and data in both industry and academia.
Submit a talk
We are accepting talks until February 16 and speakers will be notified by February 19. Please use this form to submit a talk for Jupyter Day Atlanta 2018. We encourage talks from students and early researchers.
A new series of local Jupyter events, starting in Boston.
Jupyter Pop-Up, brought to you by NumFOCUS and O’Reilly Media, March 21, Boston, MA.
Many of you are looking forward to JupyterCon 2018 and have submitted a talk. Alongside these large, multi-day events, we are seeing demand for smaller, local events as well. Since 2015 we have co-organized several Jupyter Days events (Paris, Hawaii, Atlanta, Boston, Philadelphia, NYC) with local community organizers. In 2018, we are preparing Jupyter Day Atlanta (March 31st), and hoping to offer other community organized events as well.
In addition to Jupyter Days, we are pleased to announce the first Jupyter Pop-Up, which is brought to you by the NumFOCUS Foundation and O’Reilly Media.
If you are using JupyterHub with the GitLab OAuthenticator and its gitlab_group_whitelist support, there is a security issue where the authenticator will allow users outside your intended group whitelist to create accounts. A fix has been released as OAuthenticator 0.6.2 and 0.7.3. No other authentication mechanism, including GitLabOAuthenticator without using the group whitelist feature, is affected. If you are using GitLab authentication with group whitelist support, upgrade oauthenticator immediately:
python3 -m pip install --upgrade oauthenticator
Thanks to Joseph Weston for reporting the issue and providing the fix.
Timeline (all times UTC):
2018–02–16 09:51 Joseph Weston reports security issue to the Jupyter security list
2018–02–16 16:08 Fix is verified and applied to oauthenticator master
2018–02–16 21:52 oauthenticator 0.7.3 and 0.6.2 are released with the fix
JupyterLab is an interactive development environment for working with notebooks, code, and data.
The Evolution of the Jupyter Notebook
Project Jupyter exists to develop open-source software, open standards, and services for interactive and reproducible computing.
Since 2011, the Jupyter Notebook has been our flagship project for creating reproducible computational narratives. The Jupyter Notebook enables users to create and share documents that combine live code with narrative text, mathematical equations, visualizations, interactive controls, and other rich output. It also provides building blocks for interactive computing with data: a file browser, terminals, and a text editor.
The Jupyter Notebook has become ubiquitous with the rapid growth of data science and machine learning and the rising popularity of open-source software in industry and academia:
Today there are millions of users of the Jupyter Notebook in many domains, from data science and machine learning to music and education. Our international community comes from almost every country on earth.¹
The Jupyter Notebook now supports over 100 programming languages, most of which have been developed by the community.
There are over 1.7 million public Jupyter notebooks hosted on GitHub. Authors are publishing Jupyter notebooks in conjunction with scientific research, academic journals, data journalism, educational courses, and books.
At the same time, the community has faced challenges in using various software workflows with the notebook alone, such as running code from text files interactively. The classic Jupyter Notebook, built on web technologies from 2011, is also difficult to customize and extend.
JupyterLab: Ready for Users
JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.
JupyterLab enables you to arrange your work area with notebooks, text files, terminals, and notebook outputs.
JupyterLab provides a high level of integration between notebooks, documents, and activities:
Drag-and-drop to reorder notebook cells and copy them between notebooks.
Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).
Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.
Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.
JupyterLab has been over three years in the making, with over 11,000 commits and 2,000 releases of npm and Python packages. Over 100 contributors from the broader community have helped build JupyterLab in addition to our core JupyterLab developers.
JupyterLab is built on top of an extension system that enables you to customize and enhance JupyterLab by installing additional extensions. In fact, the builtin functionality of JupyterLab itself (notebooks, terminals, file browser, menu system, etc.) is provided by a set of core extensions.
JupyterLab extensions enable you to work with diverse data formats such as GeoJSON, JSON and CSV.²
Among other things, extensions can:
Provide new themes, file editors and viewers, or renderers for rich outputs in notebooks;
Add menu items, keyboard shortcuts, or advanced settings options;
Provide an API for other extensions to use.
Community-developed extensions on GitHub are tagged with the jupyterlab-extension topic, and currently include file viewers (GeoJSON, FASTA, etc.), Google Drive integration, GitHub browsing, and ipywidgets support.
Develop JupyterLab Extensions
While many JupyterLab users will install additional JupyterLab extensions, some of you will want to develop your own. The extension development API is evolving during the beta release series and will stabilize in JupyterLab 1.0. To start developing a JupyterLab extension, see the JupyterLab Extension Developer Guide and the TypeScript or JavaScript extension templates.
JupyterLab itself is co-developed on top of PhosphorJS, a new Javascript library for building extensible, high-performance, desktop-style web applications. We use modern JavaScript technologies such as TypeScript, React, Lerna, Yarn, and webpack. Unit tests, documentation, consistent coding standards, and user experience research help us maintain a high-quality application.
JupyterLab 1.0 and Beyond
We plan to release JupyterLab 1.0 later in 2018. The beta releases leading up to 1.0 will focus on stabilizing the extension development API, user interface improvements, and additional core features. All releases in the beta series will be stable enough for daily usage.
JupyterLab 1.0 will eventually replace the classic Jupyter Notebook. Throughout this transition, the same notebook document format will be supported by both the classic Notebook and JupyterLab.
Get Involved
There are many ways you can participate in the JupyterLab effort. We welcome contributions from all members of the Jupyter community:
Use our extension development API to make your own JupyterLab extensions. Please add the jupyterlab-extension topic if your extension is hosted on GitHub. We appreciate feedback as we evolve toward a stable API for JupyterLab 1.0.
Connect with us on our GitHub Issues page or on our Gitter Channel. If you find a bug, have questions, or want to provide feedback, please join the conversation.
We are thrilled to see how you use and extend JupyterLab.
[1] Based on the 249 country codes listed under ISO 3166–1, recent Google analytics data from 2018 indicates that jupyter.org has hosted visitors from 213 countries.
JupyterLab is Ready for Users was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
With the success of the notebook file format as a medium for communicating scientific results, more than an interactive development environment, Jupyter is turning into an interactive scientific authoring environment.
JupyterLab viewing LaTeX source code and a PDF document
The new JupyterLab interface is much more than a replacement for the classic notebook. It aims to bring together all the pieces required for a complete scientific workflow. The extension-based architecture of JupyterLab comes with a number of components already enabled:
and much more. However, some pieces are still missing to complete the picture for a scientific authoring environment. One would be a featureful LaTeX editor. The first LaTeX editor for JupyterLab is a step in the right direction and offers an easy way to live-compile tex documents. Another piece is — of course — a means to produce diagrams, flow charts and draw figures!
Drawing charts and diagrams
On the occasion of the Paris Jupyter Widgets workshop, I started working on a feature to fill that gap and built a JupyterLab extension for the Draw.io diagram editor.
Draw.io is a diagram editor that runs in the web browser and is Apache 2.0 licensed. It’s got a really mature code base, which has been around for many years. However, unlike the other components used by JupyterLab, Draw.io has not yet embraced the new JavaScript packaging tooling such as NPM, which complicated the integration with JupyterLab a little bit, but it all paid off eventually!
Now, I am really pleased to announce the first release of the draw.io extension, a fully fledged integration for JupyterLab of the fully-fledged diagram editor!
The Draw.io JupyterLab extension takes advantages of the JupyterLab architecture: i.e. registering a new mime type (.dio) with the file explorer to open files, and adding a launcher button and menu items. Besides that, multiple synchronized views of the same diagrams can be displayed at the same time, allowing a user to visualize the same content with different zoom levels, or with a bare text editor.
Installation
You can install the jupyterlab-drawio extension with the following command:
jupyter labextension install jupyterlab-drawio
This should set up the extension inside your JupyterLab environment. I hope this will be a useful extension for the larger community. All the code is available on GitHub: https://github.com/QuantStack/jupyterlab-drawio. Don’t hesitate to open issues and come contribute to jupyterlab-drawio.
The future
There are other projects just waiting to be packaged for use inside of JupyterLab: one great webapp for JupyterLab would probably be the ShareLaTeX application, which is Open Source as well and provides a very nicely integrated editing experience for LaTeX documents, with autocomplete of LaTeX commands and reference search. Eventually, we might be able to integrate with the official ShareLaTeX server for a collaborative, hosted, editing experience for LaTeX documents from inside JupyterLab.
Maybe we as a community can come together and start building integrations for these amazing free tools into JupyterLab!
To conclude, thanks to all who’ve organized and participated in the workshop (especially Sylvain for the organization). I’ve used the opportunity to chat with the core developers and get their helpful input: Steven, Afshin, and Jason, thanks for helping me out in getting this off the ground and making JupyterLab! And honestly, the biggest shoutout has to go to the people who’ve worked on improving draw.io and thankfully open sourced this amazing code base: the entire draw.io team.
About the Author
Wolf Vollprecht is a scientific software developer at QuantStack, passionate about High-Performance Computing and Robotics. He is one of the core developers of xtensor.
A Diagram Editor for JupyterLab was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Guest post authored by Olivier Borderies, Olivier Coudray, and Pierre Marion
Jupyter interactive widgets enhance the notebook experience by allowing users to create graphical user interfaces. They enable richer interaction with the data and computing resources.
While the base ipywidgets library comes with a number of controls such as sliders, buttons, and dropdowns, it is in fact much more than a collection of basic controls: it is the foundation of a framework upon which one can build arbitrarily complex interactions.
Examples of custom widget libraries built upon the foundational package are
bqplot, a d3-Jupyter bridge, and a 2-D plotting library following the constructs of the Grammar of Graphics,
ipyleaflet, a leaflet-Jupyter bridge enabling maps visualization in the Jupyter notebook,
pythreejs, a 3-D visualization library bringing the functionalities of Three.js into the Jupyter notebook,
ipyvolume, a 3-D plotting library also based on Three.js enabling volume rendering, quiver plots and much more.
Jupyter Widgets, a Bridge Between Two Continents
Jupyter widgets provide a means to bridge the kernel and the rich ecosystem of JavaScript visualization libraries for the web browser. It is an amazing opportunity for scientific developers to use all these resources in their language of choice.
This article is meant to serve as a guide for developers interested in authoring a custom widget library and bridge the gap between the basic examples of the official documentation and fully-fledged visualization libraries like the ones listed above.
A number of resources are provided alongside the article, including the complete source code of the examples, notebooks, and Binder links.
In this post, we focus on the Python back-end, even though dozens of Jupyter kernels exist. The Python kernel is the reference implementation of the Jupyter protocol and remains the most featureful. We should also mention QuantStack’s Xeus, a native C++ implementation of Jupyter kernel protocol, which supports interactive widgets. Xeus is used as the foundation for the support of Jupyter widgets for the R and C++ kernels.
Xeus: C++ implementation of Jupyter kernel protocol
A Hands-On Guide
While the Jupyter community is thriving, and we start seeing a growth in the number of custom widget libraries as well. Although the learning curve from the examples of the official Jupyter documentation to authoring state-of-the-art libraries like the ones listed above is steep and may be intimidating.
We started on this path a few months ago and benefited from guidance from core Jupyter developers along the way. Now, we would like to share the lessons learned. We hope this will help turn this mountain trail into a new silk road!
Let us start with the ‘Hello World’ example widget from the ipywidgets documentation, before moving on to more advanced use cases.
1 — Improving on the Hello-World Example from the Documentation
The idea behind Jupyter widgets is to enable a bi-directional communication channel between the kernel (back-end) and the JavaScript front-end. In Python, widgets are special objects which are automatically synchronized with a counterpart object in the JavaScript front-end. In the back-end, the change events are handled by the traitlets package, which implements the observer pattern, while on the JavaScript side, this is done with the Backbone.js library.
The front-end implementation follows the MVC (Model View Controller) pattern. This allows the rendering of the same widget in multiple cell outputs, where all views share the same model, analogous to printing out a string variable multiple times.
The official documentation includes a Hello-World example widget, which offers an example of synchronization from Python to JavaScript, but not the other way around. Our jupyter-widget-hello-world-binder example provides a slightly more advanced version of it which demonstrates the bi-directional communication between the Python kernel and the JavaScript front-end. You can experiment with this widget on Binder or simply check out the example notebook with nbviewer.
Let us briefly present the main aspects of the implementation.
The JavaScript front-end defines a custom view by extending the base DOMWidgetView class from the base package:
In the Hello-World example from the documentation, there is no way to change the JavaScript value from the notebook front-end. We have added this feature in order to illustrate the bi-directional synchronization between JavaScript and Python. The idea is to trigger an event in the browser which will update the JavaScript model and then automatically - that's the magic of ipywidgets - the Python back-end. These additional lines trigger the model update:
// update the JavasScript model this.model.set('value', formElement[0].value);
// sync with Python this.touch();
Conversely, when changing the value from the Python kernel the corresponding JavaScript value gets updated, as explained above.
2 — First Example Involving Bi-Directional Communication
In the previous section, widgets were entirely defined in the notebook. We now show how to move the implementation outside of the notebook document and produce a proper installable package.
2.1 — First Widget
To help you create your own custom widget, the Jupyter team provides a cookiecutter template, producing a custom Jupyter widget library containing all the boilerplate for packaging. The cookiecutter is initialized with the hello-world widget from the documentation.
We used the widget cookiecutter to put the hello-world widget presented in the previous section into a well-organized GitHub repository. It contains
the Python part in the first_widget folder
the JavaScript part in the js folder.
The README provides detailed instructions to install the package, together with information about how to enable the Jupyter extension, and tips for custom widget authors. It delves into the technical details a bit more than this overview blog post.
Now that the basics of two-way synchronization are covered, you can create arbitrarily complex widgets! The key is to identify the data you would like to synchronize between the front-end and the back-end. Then you can set up the events that will update this data when a change is detected using the building blocks above.
Now you can ‘widget-ify’ any JavaScript library!
2.2 — Barebones setup.py
The setup.py described in the official documentation tries to automate many of the build steps, at the cost of readability. Fortunately, packaging Jupyter widgets will work with the bare-bones setup.py we provide.
The gain is a clearer, lighter (~100 less lines) setup.py, giving you a better understanding of what is happening.
The drawback is that you need one extra step to install the widget from source.
More importantly we thought that it was clearer to keep the build steps of the Python and JavaScript packages separated. Both build processes are well documented, making the steps easier to follow. Finally, the inclusion of compiled JavaScript bundles in the Python package, is more explicit.
2.3 — Side Note: Naming Conventions
Naming conventions for Python packages are covered by PEP8. They can be a bit tricky in the case of 2-words names like “first-widget”. Where to use underscore (_) or hyphen (-) ? Typically “-” is used in GitHub repositories, and URLs and JavaScript while any folder or file in a Python module can only contain “_”. In the case of a Jupyter widget there is an extra attention point: in the setup.py file, the data_files argument in the setup function is a list of tuples. For each, the first element (representing a path in the filesystem) contains "-" as it relates to JavaScript code while the paths in the second contain "_" as they represent paths in the Python package. For a full example see the first-widget repo and the detailed README.
3 — Increasingly Complex Widgets
We made the following three widgets, gradually adding complexity. The first two examples are meant as educational examples:
NOTE: The PivotUI widget is a combo of a custom widget and core widgets. This modular approach is more flexible (and looks nicer) but the same features can be made in a single custom widget containing extra buttons and display fields. The alt branch of the repo contains this version.
3.4 — Including JavaScript Callbacks
The ipypivot widget is an example of a widget that transparently wraps a JavaScript library. Transparent in the sense that all parameters of the JavaScript API are exposed, including functions, which are exposed in the form of strings on the Python side. In Python, JavaScript functions can only be strings, so there is an eval() to convert them to actual JavaScript functions.
The benefit is that all the functionalities of the JS libraries are exposed to the Python users with a very thin API. Developers can ‘widget-ify’ a large array of interesting libraries, thereby boosting the productivity of a Jupyter notebook user.
The downside is naturally the security concerns of enabling arbitrary JavaScript code to be injected by the notebook users. It is less a concern in the context of notebooks being shared within a small team of coworkers.
We are currently exploring means to execute the user-provided arbitrary JavaScript function in a sandboxed fashion, for example using the iframe srcdoc field (Cf. this repo for an example, though not in a Jupyter widget context) and messages can be sent back and forth between the main page and an iframe with the Window.postMessage function (see this gist for a bare bones example).
3.5 — Enabling Jupyter Widgets By Default
The jupyter nbextension enable command, arguably cumbersome, has become unnecessary as Jupyter widgets can be enabled by default from notebook version 5.3 (included). See this PR.
In order to be future-proof, all the widgets in this article include the file which triggers this ‘automatic enable’, and require notebook >= 5.3. Thus the pip-installation of our widgets is a one-line command. However, in dev mode, you still need to install and enable the notebook example (see section 3.1 for an example).
If you are working with an older version of notebook (run jupyter notebook --version to check), you will have to run the following command after pip-installing a widget:
If you have followed the “best practices” so far, packaging shouldn’t be an issue. Indeed, the cookiecutter provides a template of an easily-packageable widget.
Once your package is ready, you can publish it on:
npmjs for the JavaScript extension, which is necessary to use the widget as a standalone application (outside of the notebook), render it with nbviewer, and also in the JupyterLab context.
It is a general-purpose package manager, which allows for non-python dependencies.
Unlike pip, it also has a real dependency solver, which prevents breaking your environments when updating a single package.
It allows creating virtual environments to isolate your projects.
It allows for one-line installation of Jupyter extension, including the enabling of the extension as a “post-link” script (temporary advantage: see the previous section).
Conda packages are available on different channels. The default channel is administrated by Anaconda Inc. The usually recommended channel to upload open source projects is conda-forge, as this article from Anaconda announces. To add this channel to your conda configuration, run the following command:
conda config --add channels conda-forge
Several steps are needed to publish a package on conda forge, using a so-called ‘recipe’:
writing the recipe, which describes how to build the package along with the dependencies required for building and running it
testing the recipe
publishing the package on the conda-forge GitHub by forking their staged-recipes repo
maintaining the package
The conda doc and the conda-forge doc are very clear and give you much more detailed information about this subject. If you do not want to read the full doc, and jump straight to the necessary information for publishing a new package, you may want to have a look at our first-widget repo. The README contains a section describing the publishing process for conda-forge.
4.3 — Automatic Push Script
The sequence of steps to update the version of a Jupyter widget and do all the pushing to the various repositories is quite long.
If you want to automate the process we advise to have a look at Maarten Breddels’ releash package (release with relish :-) ), and how it is used in the context of ipyvolume and ipysheet.
4.4 — Binder and nbviewer
nbviewer and Binder are two fantastic tools to share both static and live notebooks.
nbviewer requires a URL to a valid notebook JSON file — typically hosted on GitHub/GitLab. To make widgets render in nbviewer, you need (1) to make the JavaScript package for your widget available on npm, (2) to run the notebook with the corresponding JavaScript extension (with the same version), and (3) to save the notebook widget state (in the ‘Widgets’ tab) before pushing it to GitHub / GitLab.
Binder requires a URL to a GitHub repository containing notebooks and a manifest of the dependencies required to run this notebook, which is used to produce a Docker image including all the resources to run the notebook. Check out the mybinder.org documentation to know how exactly to make use of it. Another resource is the BinderHub documentation if you want to host your own deployment of Binder. Either way we also highly recommend the article Binder 2.0, a Tech Guide.
5 — Conclusion
Hopefully you will have learned something reading this article. We believe in the potential of Jupyter widgets and hope that this intermediate-level article will help getting more people involved in the development of the ecosystem.
Note: This article only covers the case of the classic Jupyter notebook. The integration with JupyterLab will be covered in a future article!
If you find bugs or are interested in improving the example widgets presented here, please do not hesitate contact the authors or open a pull request!
This article is the first in a series of guest blog posts about open source projects in the Jupyter ecosystem and the problems they attempt to solve. If you would like to submit a guest post to highlight a specific tool or project, please get in touch with us.
Jupyter Notebooks go a long way towards making computations reproducible and sharable. Nevertheless, for many Jupyter users, it remains a challenge to manage datasets across machines, over time, and across collaborators — especially when those datasets are large or change often. Quilt Data is a company that supports Quilt, an open source project to version and package data. The Quilt team recently released an extension for JupyterLab.
We are excited to see how the community will extend Jupyter and JupyterLab to manage datasets. Thanks to the Quilt team for submitting this guest blog post.
— The Jupyter Team
The open-source community has developed strong foundations for reproducible source code. Git supports versioning, GitHub supports collaboration. PyPI and Conda deliver code in immutable packages. Docker executes code in uniform, scalable containers.
But what about reproducible data? Data poses unique challenges: it’s larger than code, and resides in a wide variety of formats. Each data format implies different tradeoffs in serialization performance, compression, and file size. As a result, managing data becomes intractable in line-based version control systems like git. This presents a problem for Jupyter users: source code gets shared, but data gets left behind.
One solution to this problem is to port successful abstractions from source code management over to data. Versioning, packaging, and execution are well understood and universally adopted in source code management. In this article we’ll explore a collection of services that version, package, and marshal data — Quilt.
Getting data into notebooks
Notebooks that depend on data from files are fragile. File formats change, file stores move, files are copied, and file copies diverge. As a result, notebooks break as we share them across collaborators, across machines, and over time.
Quilt hides network, files, and storage behind a data package abstraction so that anyone can create durable, reproducible data dependencies for notebooks.
To run the sample code in this article, launch your favorite Python environment and install quilt:
$ pip install quilt
Now we use quilt to pull data dependencies into a Jupyter notebook:
import quilt
# install a small subpackage quilt.install("uciml/heart_disease/tables/processed/switzerland", force=True)
# import the data package from quilt.data.uciml import heart_disease
In the above code, uciml/heart_disease is a data package. Packages live in repositories and have handles of the form USER/PACKAGE. The package repository includes a versioned history of the data, which we can access as follows:
In [1]: import quilt In [2]: quilt.log("uciml/heart_disease")
We could install a specific version of the data by providing the hash= keyword argument to quilt.install.
Let’s access the data in our subpackage. Data are loaded into memory by adding parentheses to a package path, as follows:
heart_disease.tables.processed.switzerland()
Heart disease data in a data frame
Command line and Python interfaces
Virtually all Quilt commands are available on both the command line and in Python. For example $ quilt log uciml/heart_disease in Terminal is equivalent to quilt.log("uciml/heart_disease") in Python.
What about unstructured data, like images?
In the preceding example we saw that Quilt reads columnar data into data frames. Semi-structured and unstructured data — such as JSON, images, and text — are also supported. Unstructured data skip serialization and are simply copied into Quilt’s object store. For instance if a user calls pkg.unstructured_txt() they receive a path to the unstructured file on disk, not a data frame. Future versions of Quilt will provide a wider variety of native deserializers (e.g. JSON to dict for Python).
Understanding the object store
All package data are read from Quilt’s object store. The object store provides three performance enhancements:
deduplication — Each unique data fragment is named by its SHA-256 hash so that when Quilt users push and pull data, only the fragments that have changed are sent over the network
fast reads of large files — the package akarve/nyc_taxi contains a 1.7GB CSV table. Quilt produces a data frame from the table in 4.88 seconds, thanks to PyArrow’s efficient handling of Parquet. By comparison, pandas.read_csv() takes 47 seconds to produce the same data frame from its CSV source. And pandas.read_sql() takes more than 5 minutes to acquire the same data from a database.
columnar storage — Quilt converts tabular data into Apache Parquet columns. Parquet columns can be efficiently compressed, deserialized, and searched by tools like PrestoDB and HiveSQL.
Browsing data packages in JupyterLab
As Jupyter users we want to make it easy to find and consume new data packages. Quilt leverages JupyterLab’s extension architecture to create an extension that lets you search Quilt for data packages.
If desired, switch to an environment of your choice. Next, install the Quilt extension for JupyterLab as follows (see the quilt extension repo for further documentation):
If you type “uciml” in the search box you’ll see a list of packages from the UCI Machine Learning Repository. If you click on a package in the list, the extension will generate the code to install and import the package. Click on the link icon, far right, to visit the package repository page, where you’ll find further documentation on the data.
quilt.yml is like requirements.txt for data
We can express a notebook’s data dependencies in a YAML file, conventionally called quilt.yml:
Let’s create a data package that contains Bitcoin prices. You can start by downloading the source data BTC_prices.csv, placing it in a clean directory, and changing to the same directory:
$ mkdir quilt-btc $ cd quilt-btc $ curl https://s3.amazonaws.com/quilt-web-public/data/BTC_prices.csv -o BTC_prices.csv $ ls BTC_prices.csv
We can use quilt generate to create a build.yml file. build.yml specifies how data are transformed into an in-memory package tree.
quilt generate recursively descends a directory to include all descendants folders and files in a generated build.yml file. Quilt packages can contain thousands of files and hundreds of gigabytes of data.
Let’s modify our build.yml to specify how we want our data package to be structured. We’ll rename the node, and use kwargs to parse the Date column as a datetime, specify the quote character, and skip some comment rows.
Any notebooks that depend on this version of the akarve/BTC package can declare the dependency in a quilt.yml file or directly in code:
quilt.install("akarve/BTC", hash="e81757c")
Alternatively, we can associate a human-readable tag with any of the hashes from quilt log.
$ quilt tag add akarve/BTC crypto e81757c
Any notebooks that contain the following line of code point to the same immutable data package and are reproducible across machines:
quilt.install("akarve/BTC", tag="crypto")
Where does my data live?
By default Quilt store data in the registry at quiltdata.com. Alternatively, you can host your own registry by running the open source containers, then using quilt config to point clients to your private registry.
Conclusion
We’ve explored data packages as versioned, immutable building blocks that encapsulate data dependencies. The data package lifecycle is driven by four commands: build, push, install, and import.
In the near future we plan to make the JupyterLab extension bidirectional, so that users can not only pull data from Quilt, but easily push cell data into Quilt. Support for R, Spark, and HDFS are also on the Quilt roadmap.
The Quilt compiler, registry, and catalog are open source. We welcome your contributions.
Discover how data-driven organizations are using Jupyter to analyze data, share insights, and foster practices for dynamic, reproducible data science.
I’m grateful to join Fernando Pérez and Brian Granger as a program co-chair for JupyterCon 2018. Project Jupyter, NumFOCUS, and O’Reilly Media will present the second annual JupyterCon in New York City on August 21–25.
Timing for this event couldn’t be better. The human side of data science, machine learning/AI, and scientific computing is more important than ever. This is seen in the broad adoption of data-driven decision making in human organizations of all kinds, the increasing importance of human centered design in tools for working with data, the urgency for better data insights in the face of complex socioeconomic conditions worldwide, as well as dialogue about the social issues these technologies bring to the fore: collaboration, security, ethics, data privacy, transparency, propaganda, etc.
To paraphrase our co-chairs, Brian Granger:
“Jupyter is where humans and data science intersect”
and Fernando Perez:
“The better the technology, the more important that human judgement becomes”
Consequently, we’ll explore three main themes at JupyterCon:
Interactive computing with data at scale: the technical best practices and organizational challenges of supporting interactive computing in companies, universities, research collaborations, etc (JupyterHub)
Extensible user interfaces for data science, machine learning/AI, and scientific computing (JupyterLab)
Computational communication: taking the artifacts of interactive computing and communicating them to different audiences
A meta-theme which ties these together is extensible software architecture for interactive computing with data. Jupyter is built on a set of flexible, extensible, and re-usable building blocks which can be combined and assembled to address a wide range of usage cases. These building blocks are expressed through the various open protocols, APIs, and standards of Jupyter.
The Jupyter community has much to discuss and share this year. For example, success stories such as the data science program at UC Berkeley illustrate the power of JupyterHub deployments at scale, in both education, research and industry. As universities and enterprise firms learn to handle the technical challenges of rolling out hands-on, interactive computing at scale, a cohort of organizational challenges come to the fore: practices regarding collaboration, security, compliance, data privacy, ethics, etc. These points are especially poignant in verticals such as healthcare, finance and education, where the handling of sensitive data is rightly constrained by ethical and legal requirements (HIPAA, FERPA, etc.). Overall, this dialogue is extremely relevant — it is happening at the intersection of contemporary political and social issues, industry concerns, new laws (GDPR), the evolution of computation, plus good storytelling and communication in general — as we’ll explore with practitioners throughout the conference.
Recent beta release of JupyterLab embodies the meta-theme of extensible software architecture for interactive computing with data. While many people think of Jupyter as a “notebook,” that’s merely one building block needed for interactive computing with data. Other building blocks include terminals, file browsers, LaTeX, markdown, rich outputs, text editors, and renderers/viewers for different data formats. JupyterLab is the next-generation user interface for Project Jupyter, and provides these different building blocks in a flexible, configurable, customizable environment. This opens the door for Jupyter users to build custom workflows, and also for organizations to extend JupyterLab with their own custom functionality.
Thousands of organizations require data infrastructure for reporting, sharing data insights, reproducing results of analytics, etc. Recent business studies estimate that more than half of all companies globally are precluded from adopting AI technologies due to a lack of digital infrastructure — often because their efforts toward data and reporting infrastructure are buried in technical debt. So much of that infrastructure was built from scratch, even when organizations needed essentially the same building blocks. JupyterLab’s primary goal is to make it routine to build highly customized, interactive computing platforms, while supporting more than 90 different popular programming environments.
Screenshot from the JupyterLab beta release. Image used with permission from Project Jupyter contributors.
A third major theme builds on top of the other two: computational communication. For data and code to be useful for humans, who need to make decisions, it has to be embedded into a narrative — a story — that that can be communicated to others. Examples of this pattern include: data journalism, reproducible research and open science, computational narratives, open data in society and government, citizen science, and really any area of scientific research (physics, zoology, chemistry, astronomy, etc.), plus the range of economics, finance, and econometric forecasting.
Another growing segment of use cases involves Jupyter as a “last-mile” layer for leveraging AI resources in the cloud. This becomes especially important in light of new hardware emerging for AI needs, vying with competing demand from online gaming, virtual reality, cryptocurrency mining, etc.
Please take the following as personal opinion, observations, perspectives: We’ve reached a point where hardware appears to be evolving more rapidly than software, while software appears to be evolving more rapidly than effective process. At O’Reilly Media we work to map the emerging themes in industry, in a process nicknamed “radar”. This perspective about hardware is a theme I’ve been mapping, and meanwhile comparing notes with industry experts. A few data points to consider: Jeff Dean’s talk at NIPS 2017, “Machine Learning for Systems and Systems for Machine Learning” about comparisons of CPUs/GPUs/TPUs, and how AI is transforming the design of computer hardware; The Case for Learned Index Structures, also from Google, about the impact of “branch vs. multiple” costs on decades of database theory; this podcast interview “Scaling machine learning” with Reza Zadeh about the critical importance of hardware/software interfaces in AI apps; the video interview that Wes McKinney and I recorded at JupyterCon 2017 about how Apache Arrow presents a much different take on how to leverage hardware and distributed resources.
The notion that “hardware > software > process” contradicts the past 15–20 years of software engineering practice. It’s an inversion of the general assumptions we make. In response, industry will need to rework approaches for building software within the context of AI — which was articulated succinctly by Lenny Pruss from Amplify Partners in “Infrastructure 3.0: Building blocks for the AI revolution”. In this light, Jupyter provides an abstraction layer — a kind of buffer to help “future proof” — for complex use cases in NLP, machine learning, and related work. We’re seeing this from most of the public cloud vendors, who are also leaders in AI, Google, Amazon, Microsoft, IBM, etc., and who will be represented at the conference in August.
Our program at JupyterCon will feature expert speakers across all of these themes. However, to me, that’s merely the tip of the iceberg. So much of the real value that I get from conferences happens in the proverbial “Hallway Track”, where you run into people who are riffing off news they’ve just learned in a session — perhaps in line with your thinking, perhaps in a completely different direction. Those conversations have space to flourish when people get immersed in the community, the issues, the possibilities.
It’ll be a busy week. We’ll have two days of training courses: intensive, hands-on coding, lots of interaction with expert instructors. Training will overlap with one day of tutorials: led by experts, generally larger than training courses though more detailed than session talks, featuring lots of Q&A.
Then we’ll have two days of keynotes and session talks, expo hall, lunches and sponsored breaks, plus Project Jupyter sponsored events. Events include Jupyter User Testing, author signings, “Meet the Experts” office hours, demos in the vendor expo hall — plus related meetups in the evenings. Last year the Poster Session was one of the biggest surprises to me: it was difficult to move through the room, walkways were packed with people asking presenters questions about their projects.
This year we’ll introduce a Business Summit, similar to the popular summits at Strata Data Conference and The AI Conf. This will include high-level presentations on the most promising and important developments in Jupyter for executives and decision-makers. Brian Granger and I will be hosting the Business Summit, along with Joel Horwitz of IBM. One interesting data point: among the regional events, we’ve seen much more engagement this year from enterprise and government than we’d expected, more emphasis on business use cases and new product launches. The ecosystem is growing, and will be represented well at JupyterCon!
We will also feature an Education Track in the main conference, expanding on the well-attended Education Birds-of-a-Feather and related talks during JupyterCon 2017. Use of Jupyter in education has grown rapidly across many contexts: middle/high-school, universities, corporate training, and online courses. Lorena Barba and Robert Talbert will be organizing this track.
Following our schedule of conference talks, the week wraps up with a community sprint day on Saturday. You can work side-by-side with leaders and contributors in the Jupyter ecosystem to implement that feature you’ve always wanted, fix bugs, work on design, write documentation, test software, or dive deep into the internals of something in the Jupyter ecosystem. Be sure to bring your laptop.
Note that we believe true innovation depends on hearing from, and listening to, people with a variety of perspectives. Please read our Diversity Statement for more details. Also, we’re committed to creating a safe and productive environment for everyone at all of our events. Please read our Code of Conduct. Last year we were able to work with the community plus matching donations to provide several Diversity & Inclusion scholarships, as well as more than dozen student scholarships. Looking forward to building on that this year!
That’s a sample of what’s coming up for JupyterCon in NYC this August. Meanwhile, we’ll be helping present and sponsor regional community events to help build momentum for the conference:
plus, talks at related meetups in some North American metro areas
We look forward to many opportunities to showcase new work and ideas, to meet each other, to learn about the architecture of the project itself, and to contribute to the future of Jupyter.
One of the 1 million+ notebooks we scraped from GitHub in July 2017. This notebook combines code, visualizations, and text to create an effective computational narrative.
This is a guest post on how members of the Jupyter community publish code, visualizations, and text using Jupyter Notebooks. We’re excited the Design Lab is sharing their research and data on the blog.
If you have a post relevant to the community you’d like to share on the Jupyter Blog, please contact us.
-The Jupyter Team
In July 2017, my team in the Design Lab at UC San Diego scraped and analyzed over 1 million Jupyter Notebooks from GitHub. Today I am excited to announce we are making these data publicly available for you to explore! While only a snapshot of one corner of the Jupyter universe, these data provide unique perspective into how people use and share Jupyter Notebooks.
The collection includes over 1 million notebooks as well as metadata about the nearly 200,000 repositories where they lived. The full dataset is nearly 600GB so we have created a smaller 5GB sampler dataset for you to get started. This includes roughly 6,000 notebooks from 1000 repositories.
We originally collected these data to explore how people use narrative text in Jupyter Notebooks. We found many notebooks, even those accompanying academic publications, had little in the way of descriptive text. This is likely because many analysts view their notebooks as personal and messy works-in-progress. On the other hand, many of the notebooks we collected were masterpieces of computational narrative, elegantly explaining complex analyses (one notebook even had more text than The Great Gatsby). We think this spread reflects a tension between data exploration, which tends to produce messy notebooks, and process explanation, in which analysts clean and organize their notebooks for a particular audience.
Over 25% of the 1 million+ notebooks we collected from GitHub had no descriptive text, yet some rivaled classic novels in length.
Beyond simply counting lines and words, we also looked at how authors organized their code and text. For example, most notebooks had markdown headers and nearly a third linked to other resources. Most notebooks had code comments and over a third defined new functions.
Analyzing this data helped us see how people organize notebook code and text.
We will be presenting the full results of our analysis in April at the 2018 ACM CHI Conference on Human Factors in Computing Systems and you can read more about our work in this preprint copy of our paper. In the meantime, our team has moved on to developing tools that take some of the effort out of cleaning and organizing Jupyter Notebooks.
There is so much left to explore in the data we collected. We are excited to see what you do with them! Thank you to the UC San Diego Library for graciously hosting the data. If you encounter a problem downloading them, please open an issue on this GitHub repo.
For the past six months, the Project Jupyter team in collaboration with O’Reilly Media and NumFOCUS have been planning JupyterCon 2018. In January, we opened the Call For Proposal, during which we received numerous high-quality proposals. The total submissions exceed our expectations: more than 3 times the number of available slots! With the help of the Program Committee Reviewers and co-chairs Fernando Pérez, Brian Granger and Paco Nathan, we had the hard job of selecting among the fantastic submissions we received. Today we are happy to announce that most of the JupyterCon 2018 Program is ready and registration is open! We are exited to bring you sessions about Scaling JupyterHub, leveraging GPUs for Jupyter, Running C++, in Jupyter, and many more.
Fernando Pérez and Andrew Odewahn during JupyterCon 2017 Opening Keynote
As Paco Nathan previously announced, this year will have a dedicated Education Track, and Business Summit to supplement the Main Tracks, Trainings, and Tutorials that we already brought to you last year. Many of the highlights of last year such as “Meet the Experts” office hours, a Poster Session for extended discussions with presenters, and the Vendor Expo Hall will return this year.
As with last year, JupyterCon will be held at the New York Hilton Midtown, NYC, August 21-24 and Saturday 25th. You can register today for the main conference. Early Bird pricing ends on May 18th. You can also use the discount codePJ20. We also have a limited amount of financial support for JupyterCon for attendees thanks to our partners. Last year, we provided scholarships to 13 students from diverse backgrounds to attend JupyterCon 2017.
Community Sprint Day, August 25th.
Thanks to Bloomberg, Saturday, August 25th will be reserved for a separate Community Sprint day, free of charge. All community members, whether or not you plan to attend the main conference, are invited. This day will be focused on community, contributing to Jupyter, and Open Source in a “Open Studio” form. Whether you are new to Jupyter or a power user, we invite you to come and mingle with the rest of the attendees to lean about any Jupyter-related project.
Several activities will be available. Whether you have coding, design, or writing skills, we encourage you to contribute, pitch your ideas, and get started on something brand new.
Are you new to open source, Git, and GitHub? We’ll offer a introduction to Open-Source 101 and how to get development versions on your machine.
You’ve never tried Jupyter or you are an advanced user of Jupyter with specific needs? We’ll be hosting a JupyterLab user-testing session where you will have the chance to try upcoming features and give us critical insight on how to improve usability.
Interested in contributing back to Jupyter or related projects? Many experts will be here to help you move your project forward. If you are coming to JupyterCon and would like to help with Community Sprint Day, or have a project you’d like attendees to work on let us know !
Jupyter Pop-Up, DC, May 15th
You can’t wait to attend JupyterCon 2018 ? You can attend Jupyter Pop-Up, DC, May 15th for a Day long event, and a taste of what is to come !
Thanks to The O’Reilly Media Team; The Conference Chairs (Brian Granger, Paco Nathan and Fernando Pérez); The Program Committee (Dan Allan, Ian Allison, Paige Bailey, Lorena Barba, Tom Caswell, Afshin Darian, John Detlefs, Chris Erdmann, Jessica Forde, Stuart Geiger, Tim George, Michelle Gill, Tim Head, Jennifer Klay, Cierra Martinez, Emiliy Jane McTavish, Omoju Miller, M Pacer, Peter Parente, Eszti Schoell, Steve Silvester, Robert Talbert, Dwight Townsend, Wolf Vollprecht, Jamie Whitacre, Kevin Zielnicki) and all the people making JupyterCon 2018 possible, and Jupyter a reality.
The recent release of the Jupyter kernel for C++, based on the Cling interpreter enabled a number of new workflows for the users of the C++ programming language.
Features of the xeus-cling C++ kernel for Project Jupyter include:
Showing quick-help pages for functions and classes of the STL and user-defined types, by prefixing them with a question mark: For example, typing ?std::vector results in a pager displaying the page from cppreference on std::vector.
Quick-help page for classes and functions of the STL
Making use of the rich display features of the Jupyter stack, for user-defined types. This can be enabled simply by overloading mime_bundle_repr in the namespace of the class for which we wish to have a rich representation in the front-end. The overload is picked up by the display system through argument-dependent lookup (ADL).
Using Jupyter’s rich display mechanism in C++
Another aspect of the newly released C++ kernel is the implementation of the Jupyter widgets protocol, enabling bi-directional communication between the front-end and the kernel. The xwidgets package, built upon xeus provides a complete implementation of the protocol, together with the implementation of most of the controls available in the reference ipywidgets Python package.
Bidirectional communication with the front-end using Jupyter interactive widgets
More than a limited set of base controls, Jupyter widgets are a framework upon which one can build arbitrarily complex interactions. A large number of interactive widget libraries has been built upon ipywidgets. Popular examples include pythreejs (a Jupyter-threejs bridge), bqplot (an interactive plotting library for Jupyter), and ipyleaflet (a Jupyter-leafletjs bridge) allowing rich interactive maps in the Jupyter notebook.
A common trait of most of these packages is that most of the logic is implemented in the front-end, while the back-end only involves synchronization of data attributes.
A fully-specified communication protocol and a thin back-end architecture facilitate the job of kernel authors willing to bring the power of these visualization libraries to their language of choice.
Hence, we have taken on the endeavor of providing a C++ implementation of the most popular Jupyter interactive widget libraries. These packages can be used in the C++ kernel, as well as in compiled application making use of the Jupyter kernel protocol.
Today, we are proud to announce the first release of xleaflet, the C++ counterpart to the popular ipyleaflet package, and which makes use of the same front-end component.
You can get started by simply creating a map inline in the Jupyter notebook.
Specifying a center location and zoom level
Specifying the tile layers to be displayed among the predefined base maps
A simple map with a specified center and zoom level, displaying the default tiles
A number of other attributes can be set in the map widget. To mimic named parameters, all widgets of xwidgets and xleaflet are provided with a generator class which can be used to initialize attributes using method-chaining syntax.
Making use of the generator class to specify any number of attributes of the map upon construction
In addition to the base map feature, a broad number of features of the leaflet JavaScript library are exposed to the C++ backend directly. This includes markers, marker clusters, image overlays, a variety of controls.
Using the marker widget
Whenever an attribute of a widget is modified in the front-end or in the back-end, the other side will properly reflect the data change.
For example, setting marker.location to a new value in the previous example will actually move the marker on the map. Reversely, if the draggable attribute was set to true, whenever the marker position changes in the front-end, the value is reflected in the C++ model.
Observer on the marker position
Another example is the support of the GeoJSON format, which allows one to load a JSON file locally and display its content on the map.
Support for the GeoJSON format
The bidirectional communication between the front-end and the C++ back-end makes it easier for the end user to create interactive web applications without having to write any JavaScript.
Using the rich features of xleaflet, one can start building fully-fledged GIS application in C++.
If you are interested in trying xleaflet right now in your web browser, we provided a binder for you.
Simply click on the following binder link and start playing with interactive GIS in C++ in your web browser:
Check out the documentation for more detailed information about xleaflet.
Aknowledgements
The software presented in this post was built upon the work of a large number of people including the Jupyter team, the Cling developers, the developers of xeus and xwidgets, and the developers of leafletjs.
The development of xeus, xwidgets and related packages at QuantStack is sponsored by Bloomberg.
About the Author
Martin Renou is a Scientific Software developer at QuantStack. Prior to joining QuantStack, Martin studied at the French Institute of Aeronautics and Space. As an open source developer, he worked on a variety of projects, notably SciviJS, a JavaScript library for 3-D mesh visualization.