Jupyter receives the ACM Software System Award

May 2, 2018, 6:01 am

≪ Previous: Interpreted C++ for GIS with Jupyter

It is our pleasure to announce that Project Jupyter has been awarded the 2017 ACM Software System Award, a significant honor for the project. We are humbled to join an illustrious list of projects that contains major highlights of computing history, including Unix, TeX, S (R’s predecessor), the Web, Mosaic, Java, INGRES (modern databases) and more.

Officially, the recipients of the award are the fifteen members of the Jupyter steering council as of November 2016, the date of nomination (listed in chronological order of joining the project): Fernando Pérez, Brian Granger, Min Ragan-Kelley, Paul Ivanov, Thomas Kluyver, Jason Grout, Matthias Bussonnier, Damián Avila, Steven Silvester, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Carol Willing, Sylvain Corlay and Peter Parente.

A tiny subset of the Jupyter contributors and users that made Jupyter possible — Biannual development meeting, 2016, LBNL.

This is the largest team ever to receive this award, and we are delighted that the ACM was willing to recognize that modern collaborative projects are created by large teams, and should be rewarded as such. Still, we emphasize that Jupyter is made possible by many more people than these fifteen recipients. This award honors the large group of contributors and users that has made IPython and Jupyter what they are today. The recipients are stewards of this common good, and it is our responsibility to help this broader community continue to thrive.

Below, we’ll summarize the story of our journey, including the technical and human sides of this effort. You can learn more about Jupyter from our website, and you can meet the vibrant Jupyter community by attending JupyterCon, August 21–25, 2018, in New York City.

In the beginning

Project Jupyter was officially unveiled with its current name in 2014 at the SciPy scientific Python conference. However, Jupyter’s roots date back nearly 17 years to when Fernando Pérez announced his open source IPython project as a graduate student in 2001. IPython provided tools for interactive computing in the Python language (the ‘I’ is for ‘Interactive’), with an emphasis on the exploratory workflow of scientists: run some code, plot and examine some results, think about the next step based on these outcomes, and iterate. IPython itself was born out of merging an initial prototype with Nathan Gray’s LazyPython and Janko Hauser’s IPP, inspired by a 2001 O’Reilly Radar post — collaboration has been part of our DNA since day one.

From those humble beginnings, a community of like-minded scientists grew around IPython. Some contributors have moved on to other endeavors, while others are still at the heart of the project. For example, Brian Granger and Min Ragan-Kelley joined the effort around 2004 and today lead multiple areas of the project. Our team gradually grew, both with members who were able to dedicate significant amounts of effort to the project as well as a larger, but equally significant, “long tail” community of users and contributors.

In 2011, after development of our first interactive client-server tool (our Qt Console), multiple notebook prototypes, and a summer-long coding sprint by Brian Granger, we were able to release the first version of the IPython Notebook. This effort paved the path to our modern architecture and vision of Jupyter.

What is Jupyter?

Project Jupyter develops open source software, standardizes protocols for interactive computing across dozens of programming languages, and defines open formats for communicating results with others.

Interactive computation

On the technical front, Jupyter occupies an interesting area of today’s computing landscape. Our world is flooded with data that requires computers to process, analyze, and manipulate, yet the questions and insights are still the purview of humans. Our tools are explicitly designed for the task of computing interactively, that is, where a human executes code, looks at the results of this execution, and decides the next steps based on these outcomes. Jupyter has become an important part of the daily workflow in research, education, journalism, and industry.

Whether running a quick script at the IPython terminal, or doing a deep dive into a dataset in a Jupyter notebook, our tools aim to make this workflow as fluid, pleasant, and effective as possible. For example, we built powerful completion tools to help you discover the structure of your code and data, a flexible display protocol to show results enriched by the multimedia capabilities of your web browser, and an interactive widget system to let you easily create GUI controls like sliders to explore parameters of your computation. All these tools have evolved from their IPython origins into open, documented protocols that can be implemented in any programming language as a “Jupyter kernel”. There are over 100 Jupyter kernels today, created by many members of the community.

Blazing fast interactive exploration of 1.7 Billion #GaiaDR2 stars in the Jupyter notebook with vaex. https://t.co/DDXzWVBeb2 #dataviz #Python https://t.co/dTuQIhY61y
— @maartenbreddels

Our experience building and using the Jupyter Notebook application for the last few years has now led to its next-generation successor, JupyterLab, which is now ready for users. JupyterLab is a web application that exposes all the elements above not only as an end-user application, but also as interoperable building blocks designed to enable entirely new workflows. JupyterLab has already been adopted by large scientific projects such as the Large Synoptic Survey Telescope project.

Communicating results

In today’s data-rich world, working with the computer is only half of the picture. Its complement is working with other humans, be it your partners, colleagues, students, clients, or even your future self months down the road. The open Jupyter notebook file format is designed to capture, display and share natural language, code, and results in a single computational narrative. These narratives exist in the tradition of literate programming that dates back to Knuth’s work, but here the focus is weaving computation and data specific to a given problem, in what we sometimes refer to as literate computing. While existing computational systems like Maple, Mathematica and SageMath all informed our experience, our focus in Jupyter has been on the creation of open standardized formats that can benefit the entire scientific community and support the long-term sharing and archiving of computational knowledge, regardless of programming language.

We have also built tools to support Jupyter deployment in multi-user environments, whether a single server in your group or a large cloud deployment supporting thousands of students. JupyterHub and projects that build upon it, like Binder and BinderHub, now support industry deployments, large-scale education, reproducible research, and the seamless sharing of live computational environments.

Data Science class at UC Berkeley, taught using Jupyter.

We are delighted to see, for example, how the LIGO Collaboration, awarded the 2017 Nobel Prize in Physics for the observation of gravitational waves, offers their data and analysis code for the public in the form of Jupyter Notebooks hosted on Binder at their Open Science Center.

Measurement and prediction of gravitational waves formed by two black holes merging. Adapted from https://github.com/minrk/ligo-binder.

Open standards nourish an innovative ecosystem

In Project Jupyter, we have concentrated on standardizing protocols and formats evolved from community needs, independent of any specific implementation. The stability and interoperability of open standards provides a foundation for others to experiment, collaborate, and build tools inspired by their unique goals and perspectives.

For example, while we provide the nbviewer service that renders notebooks from any online source for convenient sharing, many people would rather see their notebooks directly on GitHub. This was not possible originally, but the existence of a well-documented notebook format enabled GitHub to develop their own rendering pipeline, which now shows HTML versions of notebooks rendered in a way that conforms to their security requirements.

Similarly, there exist multiple client applications in addition to the Jupyter Notebook and JupyterLab to create and execute notebooks, each with its own use case and focus: the open source nteract project develops a lightweight desktop application to run notebooks; CoCalc, a startup founded by William Stein, the creator of SageMath, offers a web-based client with real-time collaboration that includes Jupyter alongside SageMath, LaTeX, and tools focused on education; and Google now provides Colaboratory, another web notebook frontend that runs alongside the rest of the Google Documents suite, with execution in the Google Cloud.

These are only a few examples, but they illustrate the value of open protocols and standards: they serve open-source communities, startups, and large corporations equally well. We hope that as the project grows, interested parties will continue to engage with us so we can keep refining these ideas and developing new ones in support of a more interoperable and open ecosystem.

Growing a community

IPython and Jupyter have grown to be the product of thousands of contributors, and the ACM Software System Award should be seen as a recognition of this combined work. Over the years, we evolved from the typical pattern of an ad-hoc assembly of interested people loosely coordinating on a mailing list to a much more structured project. We formalized our governance model and instituted a Steering Council. We continue to evolve these ideas as the project grows, always seeking to ensure the project is welcoming, supports an increasingly diverse community, and helps solidify a foundation for it to be sustainable. This process isn’t unique to Jupyter, and we’ve learned from other larger projects such as Python itself.

Jupyter exists at the intersection of distributed open source development, university-centered research and education, and industry engagement. While the original team came mostly from the academic world, from the start we’ve recognized the value of engaging industry and other partners. This led, for example, to our BSD licensing choice, best articulated by the late John Hunter in 2004. Beyond licensing, we’ve actively sought to maintain a dialog with all these stakeholders:

We are part of the NumFOCUS Foundation, working as part of a rich tapestry of other scientifically-focused open source projects. Jupyter is a portal to many of these tools, and we need the entire ecosystem to remain healthy.
We have obtained significant funding from the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, and the Helmsley Trust.
We engage directly with industry partners. Many of our developers hail from industry: we have ongoing active collaborations with companies such as Bloomberg and Quansight on the development of JupyterLab, and with O’Reilly Media on JupyterCon. We have received funding and direct support in the past from Bloomberg, Microsoft, Google, Anaconda, and others.

The problem of sustainably developing open source software systems of lasting intellectual and technical value, that serve users as diverse as high-school educators, large universities, Nobel prize-winning science teams, startups, and the largest technology companies in the world, is an ongoing challenge. We need to build healthy communities, find significant resources, provide organizational infrastructure, and value professional and personal time invested in open source. There is a rising awareness among volunteers, business leaders, academic promotion and tenure boards, professional organizations, government agencies, and others of the need to support and sustain critical open source projects. We invite you to engage with us as we continue to explore solutions to these needs and build these foundations for the future.

Acknowledgments

The award was given to the above fifteen members of the Steering Council. But this award truly belongs to the community, and we’d like to thank all that have made Jupyter possible, from newcomers to long-term contributors. The project exists to serve the community and wouldn’t be possible without you.

We are grateful for the generous support of our funders. Jupyter’s scale and complexity require dedicated effort, and this would be impossible without the financial resources provided (now and in the past) by the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, the Helmsley Trust, the Simons Foundation, Lawrence Berkeley National Laboratory, the European Union Horizon 2020 program, Anaconda Inc, Bloomberg, Enthought, Google, Microsoft, Rackspace, and O’Reilly Media. Finally, the recipients of the award have been supported by our employers, who often have put faith in the long-term value of this type of work well before the outcomes were evident: Anaconda, Berkeley Lab, Bloomberg, CalPoly, DeepMind, European XFEL, Google, JP Morgan, Netflix, QuantStack, Simula Research Lab, UC Berkeley and Valassis Digital.

Jupyter receives the ACM Software System Award was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Jupyter Notebook 5.5.0

May 24, 2018, 5:11 pm

≫ Next: I Python, You R, We Julia

≪ Previous: Jupyter receives the ACM Software System Award

We are pleased to announce the release of Jupyter Notebook 5.5.0. This is a minor release that introduces some new features such as:

Dynamic Download as… menu for nbconvert exporters
Download as Reveal.js slides
Quit button for stopping the notebook server from the dashboard
File size for files in the dashboard

This release also includes many bug fixes and enhancements as well as improvements to our documentation and testing infrastructure.

You can install the new version of the notebook now using pip:

pip install --upgrade notebook

Or conda:

conda upgrade notebook

Changelog

New features:

The files list now shows file sizes (PR #3539)
Add a quit button in the dashboard (PR #3004)
Display hostname in the terminal when running remotely (PR #3356, PR #3593)
Add slides exportation/download to the menu (PR #3287)
Add any extra installed nbconvert exporters to the “Download as” menu (PR #3323)
Editor: warning when overwriting a file that is modified on disk (PR #2783)
Display a warning message if cookies are not enabled (PR #3511)
Basic __version__ reporting for extensions (PR #3541)
Add NotebookApp.terminals_enabled config option (PR #3478)
Make buffer time between last modified on disk and last modified on last save configurable (PR #3273)
Allow binding custom shortcuts for ‘close and halt’ (PR #3314)
Add description for ‘Trusted’ notification (PR #3386)
Add settings['activity_sources'] (PR #3401)
Add an output_updated.OutputArea event (PR #3560)

Bug fixes:

Fixes to improve web accessibility (PR #3507)
There is more to do on this! See #1801.
Fixed color contrast issue in tree.less (PR #3336)
Allow cancelling upload of large files (PR #3373)
Don’t clear login cookie on requests without cookie (PR #3380)
Don’t trash files on different device to home dir on Linux (PR #3304)
Clear waiting asterisks when restarting kernel (PR #3494)
Fix output prompt when execution_count missing (PR #3236)
Make the ‘changed on disk’ dialog work when displayed twice (PR #3589)
Fix going back to root directory with history in notebook list (PR #3411)
Allow defining keyboard shortcuts for missing actions (PR #3561)
Prevent default on pageup/pagedown when completer is active (PR #3500)
Prevent default event handling on new terminal (PR #3497)
ConfigManager should not write out default values found in the .d directory (PR #3485)
Fix leak of iopub object in activity monitoring (PR #3424)
Javascript lint in notebooklist.js (PR #3409)
Some Javascript syntax fixes (PR #3294)
Convert native for loop to Array.forEach() (PR #3477)
Disable cache when downloading nbconvert output (PR #3484)
Add missing digestmod arg to HMAC (PR #3399)
Log OSErrors failing to create less-critical files during startup (PR #3384)
Use powershell on Windows (PR #3379)
API spec improvements, API handler improvements (PR #3368)
Set notebook to dirty state after change to kernel metadata (PR #3350)
Use CSP header to treat served files as belonging to a separate origin (PR #3341)
Don’t install gettext into builtins (PR #3330)
Add missing import _ (PR #3316, PR #3326)
Write notebook.json file atomically (PR #3305)
Fix clicking with modifiers, page title updates (PR #3282)
Upgrade jQuery to version 2.2 (PR #3428)
Upgrade xterm.js to 3.1.0 (PR #3189)
Upgrade moment.js to 2.19.3 (PR #3562)
Upgrade CodeMirror to 5.35 (PR #3372)
“Require” pyzmq>=17 (PR #3586)

Documentation:

Documentation updates and organisation (PR #3584)
Add section in docs about privacy (PR #3571)
Add explanation on how to change the type of a cell to Markdown (PR #3377)
Update docs with confd implementation details (PR #3520)
Add more information for where jupyter_notebook_config.py is located (PR #3346)
Document options to enable nbextensions in specific sections (PR #3525)
jQuery attribute selector value MUST be surrounded by quotes (PR #3527)
Do not execute special notebooks with nbsphinx (PR #3360)
Other minor fixes in PR #3288, PR #3528, PR #3293, PR #3367

Testing:

Testing with Selenium & Sauce labs (PR #3321)
Selenium utils + markdown rendering tests (PR #3458)
Convert insert cell tests to Selenium (PR #3508)
Convert prompt numbers tests to Selenium (PR #3554)
Convert delete cells tests to Selenium (PR #3465)
Convert undelete cell tests to Selenium (PR #3475)
More selenium testing utilities (PR #3412)
Only check links when build is trigger by Travis Cron job (PR #3493)
Fix Appveyor build errors (PR #3430)
Undo patches in teardown before attempting to delete files (PR #3459)
Get tests running with tornado 5 (PR #3398)
Unpin ipykernel version on Travis (PR #3223)

Credits

This release has been a team effort and we would like to thank the following 36 people who contributed:

Arovit Narula (arovit)
Ashley Teoh (ashleytqy)
Nicholas Bollweg (bollwyvl)
Alex Rothberg (cancan101)
Celina Kilcrease (ckilcrease)
dabuside (dabuside)
Damian Avila (damianavila)
Dana Lee (danagilliann)
Dave Hirschfeld (dhirschfeld)
Heng GAO (ehengao)
Leo Gallucci (elgalu)
Evan Van Dam (evandam)
forbxy (forbxy)
Grant Nestor (gnestor)
Ethan T. Hendrix (hendrixet)
Miro Hrončok (hroncok)
Paul Ivanov (ivanov)
Darío Hereñú (kant)
Kevin Bates (kevin-bates)
Maarten Breddels (maartenbreddels)
Michael Droettboom (mdboom)
Min RK (minrk)
M Pacer (mpacer)
Peter Parente (parente)
Paul Masson (paulmasson)
Philipp Rudiger (philippjfr)
Mac Knight (Shels1909)
Hisham Elsheshtawy (Sheshtawy)
Simon Biggs (SimonBiggs)
Sunil Hari (sunilhari)
Thomas Kluyver (takluyver)
Tim Klever (tklever)
Gabriel Ruiz (unnamedplay-r)
Vaibhav Sagar (vaibhavsagar)
William Hosford (whosford)
Hong (xuhdev)

We look forward to your feedback and contributions!

Jupyter Notebook 5.5.0 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

I Python, You R, We Julia

May 29, 2018, 9:11 am

≫ Next: JupyterHub 0.9

≪ Previous: Jupyter Notebook 5.5.0

When we decided to rename part of the IPython project to Jupyter in 2014, we had many good reasons. Our goal was to make (Data)Science and Education better, by providing Free and Open-Source tools that can be used by everyone. The name “Jupyter” is a strong reference to Galileo, who detailed his discovery of the Moons of Jupiter in his astronomical notebooks. The name is also a play on the languages Julia, Python, and R, which are pillars of the modern scientific world. While we ❤️🐍(Love Python), and use it for much of the architecture in Jupyter, we believe that all open-source languages have an important role in scientific and data analysis workflows. We have strived to make Jupyter a platform that treats all open-source languages as first-class citizens.

You may know that Jupyter has several dozen kernels in as many languages, and that you can choose any of them to power the code execution in a single notebook. However, the possibilities for cross-language integration go way beyond this, which I’ll attempt to demonstrate here.

What I’ll describe below has been possible for now several years – from even before the name Jupyter was first mentioned. It relies on the work of many Open Source libraries, too many to cite all the authors. It is not the only solution — neither the first, not the last. RStudio recently blogged about reticulate, which allows you to intertwine Python and R code. BeakerX is also another solution that appears to to support many languages.

We hope that showing how multiple languages can be use together will help make you more efficient in your work, and that it promotes cooperation across our communities to use the strengths of each language. This article only scratches the surface, you can read more in depth what you can do and how this works in a notebook I wrote some time ago.

Follow along on Binder

We created Jupyter and Binder to make science more trustworthy and allow results to be replicate. If you doubt what I have written below, or just want to follow along feel free to try on your own using Binder — the docker image is quite big so can take a while to launch. In the linked notebook we show a couple of extra languages.

The Tail of Fibonacci

A famous example of recursion in Computer Science is the Fibonacci series, its ubiquity allows the reader not to focus on the sequence itself but on the environment around it. As a reminder, the Fib sequence is defined with its first two terms being one, then each subsequent term as the sum of the two preceding terms; i.e F(1)= 1, F(2)=1, F(n) = F(n-1)+F(n-2)

We can calculate the first few terms: 1, 1, 2, 3, 5, 8 … note that F(5) is a fixed point F(5) = 5, and trust that asymptotically the sequence behaves like exp(n).

Let’s see how one can use many languages to play with fibonacci.

I, Python

For this exploration we’ll start with Python. It is my language of choice, the one I’m the most familiar with:

def fib(n):
    """
    A simple definition of fibonacci manually unrolled
    """
    if n<2:
        return 1
    x,y = 1,1
    for i in range(n-2):
        x,y = y,x+y
    return y

We can check that the fib function works correctly.

>>> [fib(i) for i in range(1,10)]
[1, 1, 2, 3, 5, 8, 13, 21, 34]

And plot it:

%matplotlib inline
import numpy as np
X = np.arange(1,30)
Y = np.array([fib(x) for x in X])
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(X, Y)
ax.set(xlabel='n', ylabel='fib(n)',
       title='The Fibonacci sequence grows fast !')

As you can see it grows quite quickly, actually it’s exponential. Now let’s see how we can check this exponential behavior using multi-language integration.

You R

With the fantastic RPy2 package, we can integrate code seamlessly between Python and R, allowing you to send data back and forth between the two languages. RPy2 will translate R data structures to Python and NumPy, and vice versa.

In addition, RPy2 has extra integration with IPython and provides “Magics” to write inline or multiline R code. Loading the RPy2 extension exposes the %R , %%R, %Rpush and %%Rpull commands for writing R.

%load_ext rpy2.ipython

We can use %RPush to send data to a stateful R process.

%Rpush Y X

and use %%R in order to instruct the R process to run an R cell.

%%R
my_summary = summary(lm(log(Y)~X))
val <- my_summary$coefficients

plot(X, log(Y))
abline(my_summary)

Here we make a linear regression model on log(Y) vs X. As Y is (hopefully) exponential, we should get a nice line. RPy2 provides rich display integration which will nicely display outputs and plots inline in a notebook:

We can of course ask for the linear regression summary:

%%R
my_summary

Which outputs:

Call:
lm(formula = log(Y) ~ X)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.183663 -0.013497 -0.004137  0.006046  0.296094 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.775851   0.026173  -29.64   <2e-16 ***
X            0.479757   0.001524  314.84   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06866 on 27 degrees of freedom
Multiple R-squared:  0.9997,	Adjusted R-squared:  0.9997 
F-statistic: 9.912e+04 on 1 and 27 DF,  p-value: < 2.2e-16

We can also lift the results from R to Python using %Rget:

coefs = %Rget val
y0,k = coefs.T[0:2]
y0,k

Which yields

(-0.77585097534858738, 0.4797570904348315)

Here we saw that RPy2 allows us to pass data back and forth between Python and R; This is incredibly useful to leverage the strengths of each language. This is a toy example, but you could imagine using various python libraries to get data from servers, and move to R for the statistical analysis.

However, sometime moving data between languages may be too limiting. Let’s see how we can leverage the same mechanism to gain some performance, by integrating with a lower level language.

Let’s C

Python and R are not the most performant languages for pure numerical speed. When performance improvement is necessary, developers tend to utilize compiled language like C/C++/Fortran.

Unfortunately, compiled languages generally have a poor interactive experience, and where CPU cycles are gained, human developer time may be lost.

Using magics, we can, as we did for R, include snippets of C, Cython, Fortran, Rust … and many other languages.

import cffi_magic
%%cffi int cfib(int);

int cfib(int n)
{
    int res=0;
    if (n <= 2){  
        res = 1;
    } else {
        res = cfib(n-1)+cfib(n-2);
    }
    return res;
}

We can interactively redefine this function, and it will magically appear on the Python namespace. It works identically to the fib we defined earlier, but is much faster. Note that the Python Fib, and C fib time here are difficult to compare as the C one is recursive (behave in O(exp(n)))and the Python one is hand unrolled, so behave in O(n) .

More technical details can be found in a notebook I wrote earlier, but the same can be done with other languages that call one another, and lines like the following work perfectly:

assert py_fib(cython_fib(c_fib(fortran_fib(rust_fib(5)))) == 5

Julia to bind them all

The last example is a technical marvel that was first developed by Steven Johnson and Fernando Pérez, it relies on starting a Julia and Python interpreter together, allowing them to share memory. This allow both languages not only to exchange data and functions, but to manipulate live object references from the other interpreter. Extra integration with IPython via magics allows us to run inline Julia in Python (%julia, %%julia), while Julia Macros (@pyimport) allows python code to be run from within Julia.

Below we’ll show integration with Graphing libraries (matplotlib), so let’s set up our environment.

%matplotlib inline
%load_ext julia.magic

We’ll start from within Julia, and import a few python packages:

%julia @pyimport matplotlib.pyplot as plt
%julia @pyimport numpy as np

We now have access – from within Julia – to matplotlib and numpy. We can now seamlessly integrate Julia native numerical capabilities and functions with our Python kernel.

%%julia                                        
t = linspace(0, 2*pi,1000);             
s = sin(3*t + 4*np.cos(2*t));           
fig = plt.gcf()                         
plt.plot(t, s, color="red", linewidth=2.0, linestyle="--", label="sin(3t+4.cos(2t))")

Note that in above block, t, pi are native julia; s is computed via sin (julia), t (julia), cos (numpy); fig is a Python object. As the Julia Magic provides IPython display integration, the code above displays this nice graph.

We now want to annotate this graph from Python, as the API is more convenient:

import numpy as np
fig = %julia fig
fig.axes[0].plot(X[:6], np.log(Y[:6]), '--', label='fib')
fig.axes[0].set_title('A weird Julia function and Fib')
fig.axes[0].legend()
fig

After passing a reference to fig from Julia to Python, we can annotate it (and plot one of the fib functions we defined earlier in C, Fortran, Rust, etc…)

Here we can see that unlike BeakerX, R-Reticular or RPy2, we are actually sharing live objects, and can manipulate them from both languages. But let’s push things a bit further.

The fib function can be defined recursively; let’s have some fun and define a pyfib function in Python that recurses via the a jlfib function in Julia. Meanwhile, the jlfib function in Julia recurses using the python function. We’ll print (J , or (P when switching language:

jlfib = %julia _fib(n, pyfib) = n <= 2 ? 1 : pyfib(n-1, _fib) + pyfib(n-2, _fib)


def pyfib(n, _fib):
    print('(P', end='')
    if n <= 2:
         r = 1
    else:
        print('(J', end='')
        # here we tell julia (_fib) to recurse using Python
        r =  _fib(n-1, pyfib) + _fib(n-2, pyfib)
        print(')',end='')
    print(')',end='')
    return r

fibonacci = lambda x: pyfib(x, jlfib)
fibonacci(10)

We can now transparently call the function :

(P(J(P(J(P(J(P(J(P)(P)))(P(J))(P(J))(P)))(P(J(P(J))(P)(P)(P)))(P(J(P(J))(P)(P)(P)))(P(J(P)(P)))))(P(J(P(J(P(J))(P)(P)(P)))(P(J(P)(P)))(P(J(P)(P)))(P(J))))(P(J(P(J(P(J))(P)(P)(P)))(P(J(P)(P)))(P(J(P)(P)))(P(J))))(P(J(P(J(P)(P)))(P(J))(P(J))(P)))))
55

If you are interested in diving more into details see this post from a couple of years ago with all the actual code.

I hope that this post has convinced you that Jupyter – via the IPython kernel – has deep cross-language integration (and has had this for many years). I also hope it lifted the misconception in that in Jupyter “1 kernel == 1 language” or even that “1 notebook == 1 language”. Each of the approaches shown here (as well as Reticulate, BeakerX, etc) have their pros and cons. Use the approach that fits your needs and makes your workflow efficient, regardless of the tool, language, or libraries you use.

I Python, You R, We Julia was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

JupyterHub 0.9

June 18, 2018, 6:18 am

≫ Next: Jupyter Community Workshops

≪ Previous: I Python, You R, We Julia

We are pleased to announce the latest release of JupyterHub. JupyterHub is the multi-user server for Jupyter notebooks, allowing students or researchers to have their own workspace. This release has lots of improvements, especially for stability and performance with large numbers of users.

The biggest change is probably the adoption of asyncio coroutines throughout instead of tornado coroutines and improved support for asyncio coroutines in general, adopting async def coroutine syntax. This means that JupyterHub 0.9 requires Python ≥ 3.5 and tornado ≥ 5.0.

There are lots of improvements to the REST API, including more detailed time information and token management.

There are also several improvements to the ability to customize JupyterHub’s HTML templates.

See the changelog for a more detailed list of changes.

Make sure to backup your jupyterhub database prior to upgrade, and then upgrade with pip:

python3 -m pip install --upgrade 'jupyterhub==0.9.*'

or conda:

conda install -c conda-forge jupyterhub=0.9

For Kubernetes users, we will have an 0.7 release of the helm chart before too long.

Thanks to everyone who has contributed to JupyterHub!

JupyterHub 0.9 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Jupyter Community Workshops

July 11, 2018, 11:46 am

≫ Next: Security fix for Jupyter Notebook

≪ Previous: JupyterHub 0.9

Bloomberg is a long-time partner and supporter of Project Jupyter. Earlier this year Bloomberg announced funding of $120,000 to enable Project Jupyter to host a series of Jupyter Workshops in 2018. These workshops will bring together small groups, approximately 12 to 24 people,) of Jupyter community members and core contributors for high-impact strategic work and community engagement on focused topics.

Much of Jupyter’s work is accomplished through remote, online collaboration; yet, over the years, we have found deep value in focused in-person work over a few days. These in-person events are particularly useful for tackling challenging development and design projects, growing the community of contributors, and strengthening collaborations.

We are now soliciting proposals for Jupyter Workshops for 2018. We are particularly interested in workshops that explore and address topics of strategic importance for the future of Jupyter. We expect the workshops to involve 1–2 dozen participants over a 2–3 day period, and have a total Jupyter-funded budget of approximately $10,000 to $20,000, which may help cover expenses such as travel, lodging, meals, or event space. It is our intent for the workshops to include both participants who are core Jupyter contributors, as well as stakeholders and contributors and potential contributors within the larger Jupyter ecosystem. While not the primary focus of the workshops, it would be highly beneficial to couple the workshop with broader community outreach events, such as sprints, talks, or tutorials, at local meetings or conferences.

An excellent example of a successful, sponsored workshop is the Jupyter Widgets Workshop organized by Sylvain Corlay in February 2018. This workshop brought together the core developers of Jupyter Widgets and JupyterLab, community members developing libraries on top of widgets and JupyterLab, and new collaborators wanting to learn how to contribute to development, design, and documentation. After the workshop, a participant gave a JupyterLab introduction at the PyParis Meetup.

Another workshop idea would be to bring together the core maintainers of the Jupyter Kernel Message Protocol, developers of different third-party Jupyter kernels, and other interested collaborators to chart the future of this protocol to add different programming languages, debugging, and enhancements. While the maintainers of different Jupyter subprojects are likely to propose workshops, we are hoping that others in the community will propose and organize workshops as well.

The proposal process for these Jupyter Workshops is being managed by the Jupyter Operations Manager, Ana Ruvalcaba (jupyterops@gmail.com), and the Steering Council. Applications are due by August 1, 2018 and our vision is that these events would occur anywhere from September to December of 2018. We encourage you to submit a proposal and complete the following Google Form.

This initiative is organized by Jason Grout, Paul Ivanov, Brian Granger, and Ana Ruvalcaba.

Jupyter Community Workshops was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Security fix for Jupyter Notebook

July 17, 2018, 9:31 pm

≫ Next: Deadline Extension: Jupyter Community Workshops

≪ Previous: Jupyter Community Workshops

We have just released Jupyter Notebook 5.6.0. This release fixes a vulnerability that could allow a maliciously crafted notebook to execute JavaScript when it is opened, bypassing the trusted-notebook mechanism.

We recommend updating the notebook immediately, via pip:

pip install notebook>=5.6.0

or conda:

conda install notebook>=5.6.0

Affected versions: all releases prior to 5.6.0

JupyterLab users are affected, independent of the version of JupyterLab itself. Upgrading the notebook package to 5.6.0 resolves the issue for users of both JupyterLab and the classic notebook.

A CVE has been requested for the vulnerability. Release notes for 5.6.0 and this post will be updated as the CVE is assigned. More details of the vulnerability will be released in 30 days, on August 16, 2018.

Security reports for Jupyter are greatly appreciated. You can report security issues to security@ipython.org.

Thanks to Jonathan Kamens for reporting this issue to the security list.

Security fix for Jupyter Notebook was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Deadline Extension: Jupyter Community Workshops

July 31, 2018, 10:52 am

≫ Next: Synopsis: JupyterCon 2018 Education Track

≪ Previous: Security fix for Jupyter Notebook

We recently announced a call for proposals for Jupyter Community Workshops. These workshops are intended to bring together small groups of Jupyter community members (1–2 dozen people) and core contributors for high-impact strategic work and community engagement on focused topics related to the Jupyter’s core mission.

We are extending the deadline for the initial proposals to Monday, August 6, 8am Pacific time (3pm UTC). Please submit your proposals with this form.

For more information related to expectations of organizers and other logistics, please read our information document.

Proposal Process Highlights:

Submit initial proposal using this form by Monday, August 6, 8am Pacific time (3pm UTC).
Initial steering council review (up to a week). Proposal goes to Steering Council for initial review and feedback. Proposal is either approved or declined.
Budget development (up to two weeks). Operations Manager and workshop organizer will work together to develop a detailed budget, event plan, and proposed list of participants.
Final steering council review (up to a week). Proposal presented for final approval to steering council, including final budget, event details, and an estimate of the potential impact of the event.

Several possible workshop ideas are mentioned in the original announcement for proposals. Another idea that has come up in conversation is a workshop focused on using Jupyter in a broad discipline or setting which would have high-impact strategic importance for Jupyter.

Feel free to email jupyterops@gmail.com with questions about the proposal process or feedback on potential proposals.

Deadline Extension: Jupyter Community Workshops was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Synopsis: JupyterCon 2018 Education Track

August 21, 2018, 6:02 pm

≫ Next: Introducing Jupyter Enterprise Gateway

≪ Previous: Deadline Extension: Jupyter Community Workshops

Barba keynoting at JupyterCon 2017. Credit: O’Reilly Media.

I’m on the train to NY, giddy in anticipation of the Jupyter community celebrating together for the second year. It was a privilege for me to be invited to keynote last year, at the inaugural JupyterCon. This year, I am part of the Program Committee, co-chairing the education track with Robert Talbert. Let me tell you about the impressive and inspiring conference line-up for Jupyter in education!

The main conference lists 11 talks on the education track, listed below. In addition to these, we have a panel (2:40–3:20pm Thursday), and an unconference-style session on Thursday at 5 PM.

The panel is titled “The Future of Jupyter in Education,” and features the following panelists:

Carol Willing, @WillingCarol
Natalia Clementi, @ncclementi
James Colliander, @colliand
Allen Downey, @allendowney
Jason Moore, @moorepants
Danny Caballero, @physicistdanny

Each panelist will make a 2–3 min statement to get people thinking, and then it’s Q&A with the audience. We’ve asked the panelists to jot down some ideas to share with us, based on the following questions:

What do you see in the future of Jupyter if it is to fulfill its potential for teaching and learning?

What do you want from Jupyter as its adoption in education grows?

If you are interested in the future of Jupyter in education, please come join the conversation at the panel — it will be energizing!

One thing we don’t want for the discussion in this panel is to drift towards the DevOps challenges of adopting the tools (like JupyterHub and nbgrader). The Project Jupyter team is very aware that DevOps is a pain point for the educational uses of Jupyter. And they are working on it!

Let’s put our minds together to build a vision for the ecosystem, imagine how institutions might collaborate (e.g., federated solution in Canada), what could be the role of the private sector (i.e., paid options, freemium models), or the nonprofit sector (e.g., NumFOCUS) — but refrain from going into the weeds of concrete technical details (on this occasion).

We want to steer the conversation towards the learning concerns more than the edtech concerns. For example, now that many educators are using Jupyter, can we look into what works, and are we using research-based pedagogical strategies? What are these (e.g., chunking, scaffolding), and can we distill some “best practices”?

For the unconference-style session, the slate is clean. We’ll bring sharpies and big post-it notes, and be ready to live tweet!

Thursday: 5 sessions

11:05am–11:45am
Flipped learning with Jupyter: Experiences, best practices, and supporting research, Beekman/Sutton North | Lorena Barba (George Washington University), Robert Talbert (Grand Valley State University)

11:55am–12:35pm
JupyterHub for domain-focused integrated learning modules, Beekman/Sutton North | Mariah Rogers (UC Berkeley Division of Data Sciences), Ronald Walker (UC Berkeley Division of Data Sciences), Julian Kudszus (Yelp)

1:50pm–2:30pm
Jupyter for every high schooler, Beekman/Sutton North | Rob Newton (Trinity School)

2:40pm–3:20pm
Real-time collaboration with Jupyter notebooks using CoCalc, Murray Hill | William Stein (SageMath, Inc. | University of Washington)

4:10pm–4:50pm
Learn by doing: Using data-driven stories and visualizations in the (high school and college) classroom, Beekman/Sutton North | Carol Willing (Cal Poly San Luis Obispo), Jessica Forde (Jupyter), Erik Sundell (IT-Gymnasiet Uppsala)

Friday: 6 sessions

11:05am–11:45am
Data science in US and Canadian higher education, Beekman/Sutton North | Laura Noren (NYU Center for Data Science)

11:55am–12:35pm
I don’t like notebooks, Nassau | Joel Grus (Allen Institute for Artificial Intelligence)

11:55am–12:35pm
The Jupyter Notebook as a transparent way to document machine learning model development: A case study from a US defense agency, Concourse A: Business Summit | Catherine Ordun (Booz Allen Hamilton)

1:50pm–2:30pm
Jupyter graduates, Beekman/Sutton North | Douglas Blank (Bryn Mawr College), Nicole Petrozzo (Bryn Mawr College)

2:40pm–3:20pm
Reproducible education: What teaching can learn from open science practices, Beekman/Sutton North | Elizabeth Wickes (School of Information Sciences, University of Illinois at Urbana-Champaign)

5:00pm–5:40pm
Current RISE candies and its evolution into the future, Beekman/Sutton North | Damián Avila (Anaconda, Inc.)

If I missed an education-track session in my synopsis, let me know and I’ll add it!

Synopsis: JupyterCon 2018 Education Track was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Introducing Jupyter Enterprise Gateway

September 17, 2018, 1:19 pm

≫ Next: Jupyter QtConsole 4.4

≪ Previous: Synopsis: JupyterCon 2018 Education Track

by Luciano Resende, Kevin Bates, Alan Chin

Yesterday, the Jupyter Steering Council voted to make Jupyter Enterprise Gateway a top-level Jupyter Project. I want to thank everyone for their contributions so far — code from my teammates at IBM and the community in general; advice from the Jupyter development team and mentors; and questions, issues, and requirements from end users.

As we become an official Jupyter project, I would like to take the opportunity to give an update on the project’s progress during our incubation period.

What is Jupyter Enterprise Gateway?

Jupyter Enterprise Gateway enables Jupyter Notebook to launch remote kernels in a distributed cluster, including Apache Spark managed by YARN, IBM Spectrum Conductor or Kubernetes.

Although Enterprise Gateway is mostly kernel agnostic, it provides out of the box configuration examples for the following kernels:

· Python using IPython kernel

· R using IRkernel

· Scala using Apache Toree kernel

Jupyter Enterprise Gateway does not manage multiple Jupyter Notebook deployments, for that you should look for JupyterHub. Having said that, Enterprise Gateway can enable JupyterHub to launch remote kernels as individual Kubernetes pods, providing better resource allocation and enabling better environment management as each pod can be based on different images (e.g. TensorFlow, Anaconda, etc)

Supported Platforms

Jupyter Enterprise Gateway currently enables remote kernels in the following platforms:

Distributed Kernels in Apache Spark

Jupyter Enterprise Gateway leverages different resource managers to enable distributed kernels in Apache Spark clusters. One example shown below describes kernels being launched in YARN cluster mode across all nodes of a cluster.

*Jupyter Enterprise Gateway leverages Apache Spark resource managers to distribute kernels*

Note that, Jupyter Enterprise Gateway also provides some other value-added capabilities such as enhanced security and multiuser support with user impersonation.

Jupyter Enterprise Gateway provides Enhanced Security and Multiuser support with user Impersonation

Distributed Kernels in Kubernetes

Jupyter Enterprise Gateway support for Kubernetes enables decoupling the Jupyter Notebook Server and its kernels into multiple pods. This enables running Notebook server pods with minimally necessary resources based on the workload being processed.

Jupyter Enterprise Gateway enable remote kernels on Kubernetes cluster

Jupyter Enterprise Gateway and JupyterHub

JupyterHub is a multi-user server that manages and proxies multiple instances of the single-user Jupyter notebook server. Particularly in a Kubernetes environment, Jupyter Enterprise Gateway can enable JupyterHub to launch remote kernels as individual Kubernetes pods, providing better resource allocation and enabling better environment management as each pod can be based on different images (e.g. TensorFlow, Anaconda, etc). This has proven to be very desired, particularly when working on Deep Learning related Notebooks.

JupyterHub and Jupyter Enterprise Gateway together in a Kubernetes cluster

Some project metrics

The following stats have been collected from the Jupyter Enterprise Gateway GitHub repository during the incubation period:

- 10 releases

- 12 individual contributors

- 90 Stars

- 34 Forks

Source code, documentation, and other community resources

The Jupyter Enterprise Gateway community provides multiple resources that both users and contributors can use:

Source Code available at GitHub
https://github.com/jupyter/enterprise_gateway

Documentation available at ReadTheDocs
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/

Automated builds available at Travis.CI
https://travis-ci.org/jupyter/enterprise_gateway

Releases available at PyPi.org and Conda Forge
https://pypi.org/project/jupyter_enterprise_gateway/
https://github.com/conda-forge/jupyter_enterprise_gateway-feedstock

Related Docker Images available at Elyra organization at DockerHub
https://hub.docker.com/u/elyra/dashboard/

What’s next?

We are eager to build an even greater community around the project, and tailor the project roadmap based on community advise.

Currently, we are busy working on advancing our Kubernetes support and integration with JupyterHub.

As always, we welcome questions, comments, and suggestions from users and the community in general.

Introducing Jupyter Enterprise Gateway was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Jupyter QtConsole 4.4

September 18, 2018, 8:01 am

≫ Next: IPython 7.0, Async REPL

≪ Previous: Introducing Jupyter Enterprise Gateway

We’re pleased to announce the release of QtConsole 4.4, along with a follow-on bugfix update, 4.4.1. QtConsole, which melds the feel of a lightweight terminal and the functionality of an advanced GUI editor, is a rich interface to Jupyter kernels that can be used standalone or built in to an IDE like Spyder (with which QtConsole is a joint collaboration). From user-selectable syntax highlighting themes and expanded integration with external clients like your favorite editors and IDEs, to productivity boosts like block (un)indent, cell-specific Select-All, intelligent Ctrl-Backspace and Ctrl-Delete, and Ctrl-D to send an EOT byte, there’s plenty new for everyone to like.

Read on for a visual summary of the key changes, and check out the Changelog and GitHub Milestone for more details. The new version is available on PyPI and Conda, if you want to get right to trying it out for yourself. If running under conda, you can use Anaconda Navigator, or execute the following (from the Anaconda Prompt on Windows, or your terminal/command line otherwise, after first activate-ing the environment with the QtConsole package you want to update):

conda update qtconsole

Or, if on a pip-only install, after activate-ing the appropriate virtualenv/venv, run:

pip install --upgrade qtconsole

Perhaps the most exciting improvements are the enhancements to QtConsole’s ability to integrate with external editors and IDEs. Thanks to the Jupyter client-server architecture, you can integrate QtConsole with any editor you want, allowing you to assemble your very own Spyder/MATLAB-like IDE out of the components of your choosing. Marijn van Vliet, the author of the changes, demonstrates a few of the possibilities in a short video:

Thanks to the new changes, you can now print beautifully-formatted input and output from other clients, like your editor, with proper indentation, customizable syntax highlighting, configurable prompts with an incrementing prompt number, and more. If you’re interested in setting up your own custom environment, check out Marijn’s blog post on the subject for more details.

QtConsole‘s syntax-highlighted output with a user-selected theme, now available with external editors

Complementary to the previous enhancements, QtConsole now allows you to change the syntax highlighting theme and its overall color scheme (light/dark), and includes a wide variety of options to choose from. From popular styles like Monokai, emacs, vim and VS to more esoteric choices, the list should cover a wide variety of tastes. You can easily switch between them under View→Syntax Style.

Demonstration of how to select a syntax style in QtConsole

When working with multi-line input, QtConsole now offers a block indent/unindent feature, allowing an entire selection to be indented or unindented at once — even across multiple levels of indents. Just highlight the lines you’d like to change, and press TAB to indent or Shift-TAB to unindent.

The Select-All feature is now cell-aware; pressing Ctrl-Shift-A once now only selects the text in the current In [ ]: block (without the prompt), making it easy to just grab the current commands or code you’ve already typed. Pressing it again will trigger the old behavior, selecting all the input and output text since the start of the session, and tapping it a third time will cycle back to just selecting the current text.

(Left) Old Select-All behavior, selecting all console text; (Right) New behavior, selecting only current input

Ctrl-Backspace and Ctrl-Delete are also more intelligent, no longer behaving too greedily across line boundaries and first consuming all the whitespace up to the next non-whitespace character before removing the characters themselves. This lets you easily fix indents, trim leading or trailing whitespace, remove uneeded line breaks and more in one or a few taps, without unintentionally deleting anything important. Similarly, tapping the right arrow at the end of the line now goes to the beginning of the next line’s text, rather than jumping all the way to the end of the cell.

You can now send an EOT (“End of transmission”) character to a console awaiting input by pressing Ctrl-D when the current input is empty; Ctrl-D otherwise works as an alternate shortcut for Delete and Close Tab as before. This allows you to signal programs running in the console to terminate, like ipdb and its interactive mode, that would otherwise require a restart to accomplish.

Demonstration using the new EOT insertion to exit ipdb interactive mode in QtConsole

Finally, there are a number of smaller changes and bugfixes included, like allowing for copying input/output when the prompt is included, making completion not block the console, clarifying the documentation, and more. QtConsole 4.4.1 officially drops support for the ancient Python 3.3 and expands test coverage to Python 3.6, as well as fixing a couple more minor bugs.

We hope everyone loves what the new release of QtConsole has to offer! Thanks to all the contributors to this release and to the Spyder team, who help maintain the project. If you want to know more about using QtConsole, check out our documentation, and if you find any bugs or have specific feature requests, let us know on our GitHub site. Enjoy!

Jupyter QtConsole 4.4 was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

IPython 7.0, Async REPL

September 27, 2018, 10:41 am

≫ Next: JupyterDay in the Triangle

≪ Previous: Jupyter QtConsole 4.4

Today we are pleased to announce the release of IPython 7.0, the powerful Python interactive shell that goes above and beyond the default Python REPL with advanced tab completion, syntactic coloration, and more. It’s the jupyter kernel for python used by millions of users, hopefully including you. This is the second major release of IPython since we stopped support for Python 2.

Not having to support Python 2 allowed us to make full use of new Python 3 features and bring never before seen capability in a Python Console. We are still encouraging library authors and users to look at the Python 3 Statement to learn about the end of life of Python 2 and how to stop support for Python 2 without breaking installation for Python 2 end users.

As developers and maintainers of IPython, it was a large gain of time to be able to only develop for a single version of python. Avoiding the use of conditional imports, being able to rely on type annotations, and make use of the newly available Python APIs were some of the advantages that made us more productive. Especially as most of the work on IPython is done by volunteers who work on nights and weekends, with only a couple of minutes here and there, this often made the difference between a patch reaching completion, or the contributor moving on to other pastures.

One of the core features we focused on for this release is the ability to (ab)use the async and await syntax available in Python 3.5+. There are of course many other improvements in this release you can read about in the what’s new.

Demo of awaiting coroutine in IPython 7.0

TL;DR: You can now use async/await at the top level in the IPython terminal and in the notebook, it should — in most of the cases — “just work”. Update IPython to version 7+, IPykernel to version 5+, and you’re off to the races.

See how to install IPython by reading the “what’s new”.

The recipes are currently building on conda-forge and should be available soon. For the time being you can install it via pip:

$ pip install ipython ipykernel --upgrade

A Primer on concurrency

You may have heard about async/await, threads, concurrency, preemptive scheduling and cooperative scheduling without really understanding what all this is about. If you are not familiar will all the above terms, all the hype may be confusing so let’s talk about concurrency in a really high level way.

Typically when your computer needs to execute many tasks, it will switch between them really fast, so from the human point of view it looks like everything is being processed at the same time. There are two main ways of doing so under the hood: Preemptive Scheduling, and Cooperative Scheduling.

With preemptive scheduling changing tasks can happen at any time. For example, while writing this blog post, I could stop in the middle of a word to start writing an email, which will itself be interrupted to check Gitter/Slack, before coming back, writing 5 words and stopping to get dinner.

TL;DR: Concurrency (from Geek And Poke, 2009)

With cooperative scheduling, the task switches can happen only at agreed spots. The term co-operative comes from the fact that tasks need to co-operate for the whole process to function. If a task decides to never take a break to let you to do something else, the illusion of many tasks being completed at once disappears.

Each approach has its own advantages and drawbacks, and we will not focus on these. Let’s just say that with co-operative scheduling async/await let you mark the areas where interruption is allowed to occur.

Moreover, async/await syntax allows cooperative scheduling in Python in a way that lets you write code that looks synchronous (without task switches), while actually being able to be interrupted, from the point of view of the computer. It also keeps the programmers from having to worry about global state changing under their feet, as this can occur only at the proximity of await keywords.

`When going to a restaurant, social conventions (and common sense) tell us when and how these interactions can or cannot be interrupted, but programming languages need markers when using cooperative scheduling. These are async and await keywords in Python. Async marks a function that may be interrupted, await is required to call async-functions (aka coroutine) and marks a point were task can be switched.

If you want to learn more we strongly recommend reading the Trio Tutorial Primer on async programming.

Async in the Python world

In the current Python ecosystem, packages tend to standardize around AsyncIO, provided in the Python standard library. AsyncIO can sometimes be judged as complex even by well known developers; this is in part due to the necessity of supporting other older asynchronous projects like twisted or tornado, but it’s also what makes a lots of its power: One event loop to rule them all.

Running a single async task requires you to learn about AsyncIO, write a non negligible amount of boilerplate code in order to fetch a single result. This can be especially cumbersome when doing interactive exploration, and likely will keep users from experimenting with AsyncIO code.

How to run a single async task in Python repl without async integration.

As Raymond Hettinger would says (slamming hand on podium): “There must be a better way”.

IPython AsyncIO Integration

Thanks to a multiple month effort (actually this work started close to 2 years ago), and the work of many talented people, you can now directly await code in the REPL and IPython will do “the right thing”.

Awaiting AsyncIO code should now automagically work.

With the new integration, you don’t have to import or learn about asyncio, deal with the loop yourself, or wrap your task in its own function. You are now able to just focus on the business logic and move along.

The only thing you need to remember is: If it is an async function you need to await it.

We hope that this will free users to experiment and play with asynchronous programming. Of course this will not magically make your code faster, or run in parallel, simply easier to write and reason about.

Other Async Libraries (aka: curio and trio integration)

The addition of async and await keyword in Python did not only simplify the use of asynchronous programing and the standardization around asyncio; it also allowed experimentation with new paradigms for asynchronous libraries. David Beazley created Curio, and Nathaniel Smith Trio, which both explore new ways to write asynchronous programs and explore how async, await and coroutines could be used when starting from a blank slate. The Trio documentation introduction and which problems it attempt to solve [1, 2, 3, 4, 5, 6] are highly recommended reading with varying level of technicality.

Interactive uses of libraries is key to getting insight and intuition on how a system works, intuition is critical to rapid prototyping, development and creation of higher levels of abstraction. It was natural for us to build support for Curio, Trio, (and potentially new other async libraries) into IPython.

You can setup IPython to run async code via Curio, or Trio and experiment or write production code using these libraries. To do so use the %autoawait magic, and tell it which library to use.

Defining an asynchronous function and spawning multiple concurrent task in IPython using Trio.

As you can see code looks really natural, and it is easy to forget that the above snippet is usually a syntax error in Python or older version of IPython. The astute reader and IPython expert will have suggested to use the %%time cell magic instead of doing it manually, though a couple of magics still need updates to properly handle async code. We look forward to your contribution on this front, and are excited to see what you can come up with.

Async in Notebooks (and other Jupyter Clients)

If you are a Jupyter user, you most likely use a Notebook interface, and interact with IPython via the ipykernel package.

Using AsyncIO in nteract desktop works out of the box with newer IPython and IPykernel

We’ve been working hard on making async code work in a notebook when using ipykernel. While most of the heavy lifting was done in IPython, the work in IPykernel was non-negligible, and required the accommodation of a number of use cases, which are not working. You now have to update both IPython to 7.0+ and ipykernel to version 5.0+ for async to be available. If you are using pip: $ pip install IPython ipykernel --update. As for conda, the packages should be available on conda-forge soon. With these new releases, async will work with all the frontends that support the Jupyter Protocol, including the classic Notebook, JupyterLab, Hydrogen, nteract desktop, and nteract web. The default code will run in the existing asyncio/tornado loop that runs the kernel. Integration with Trio and Curio is still available, but tasks will not be interleaved with the asyncio one — at least not yet. We welcome work on this front.

Submitting background tasks still requires you to access the asyncio event loop, and we are still be looking for contributions on this front as well, to make it even easier to run async code.

There are still some question on how to handle nested asyncio eventloop. It is indeed usually impossible to run nested eventloop, in the case of asyncio, trying to do so raises a RuntimeError the kernel already ran in and asyncio eventloop, calling directly or indirectly loop.run_until_complete and alike is not possible. There are discussions to use libraries like nest_asyncio as pointed out on this comment, but until those are more battle tested we do not want to commit a default solution in the core of IPython and let the ecosystem develop.

Future improvements

As far as we know, this is the first Async-aware Python REPL, and libraries like Trio/Curio are still young, thus there are still a number of use-cases we have not yet even thought about! We are encouraging you to come forward to talk about your use cases, what you tried and what did not work. There is also a number of new features to implement (making magics work with async, tab completion, background tasks) on which we would welcome new contributors.

IPython 7.0, Async REPL was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

JupyterDay in the Triangle

October 4, 2018, 9:15 am

≫ Next: How to Deploy JupyterHub with Kubernetes on OpenStack

≪ Previous: IPython 7.0, Async REPL

We’re excited to announce the next JupyterDay event! Please join us for JupyterDay in the Triangle on November 13, 2018 at the Carolina Club on the University of North Carolina at Chapel Hill campus. The event is hosted by the UNC University Libraries with additional funding from our generous sponsors.

JupyterDay in the Triangle will be an engaging mix of talks, discussions, and birds of a feather sessions on topics spanning the full ecosystem of Jupyter. The theme of the event focuses on the work of Early Career Researchers & Data Scientists. We are hoping for contributions from this group, but are open to all proposals. We are committed to creating a welcoming environment that fosters diversity and inclusion in the broad spectrum of projects and ideas that will be discussed.

The Day’s Agenda

8:00am — 9:00am: Check-in, network, and enjoy breakfast
9:00am — 9:30am: Introductions & logistics
9:30am — 10:30am: Keynote with Q&A
10:30am — 10:45am: Relax, it’s break time
10:45am — 12:00pm: Two brief 25 minute talks
12:00pm — 1:00pm: Lunchtime
1:00pm — 2:00pm: Two brief 25 minute talks
2:00pm — 2:15pm: Relax, it’s time for some PM snacks and a break
2:15pm — 3:30pm: Lightning talks from attendees, 5 minutes each
3:30pm — 5:00pm: Birds of a Feather discussions on topics from the community

Submit a Proposal

We are accepting proposals for 25 minute talks, 5 minute talks, and birds of a feather (BoF) sessions until October 12, 2018. Any Jupyter-related topic you think would be of interest to attendees will be of interest to us. Selected speakers and BoF session leads will receive complimentary admission.

Click here to submit your proposal

Register to Attend

Registration to attend Jupyter Day in the Triangle is now open. You can purchase student ($10) or professional tickets ($25 early bird, $30 regular) on our EventBrite page.

Click here to register

Help Sponsor

If your organization is a fan of Project Jupyter and would like to give back to the community, we are looking for contributors and sponsors. Please contact us for further information.

For More Information

Please visit the event website for up-to-date information about the agenda, confirmed speakers, venue, transportation, lodging, etc.

JupyterDay in the Triangle was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

How to Deploy JupyterHub with Kubernetes on OpenStack

October 15, 2018, 6:11 am

≫ Next: On-demand Notebooks with JupyterHub, Jupyter Enterprise Gateway and Kubernetes

≪ Previous: JupyterDay in the Triangle

Deploying JupyterHub with Kubernetes on OpenStack

Jupyter is now widely used for teaching and research. The use of Kubernetes for deploying a JupyterHub has enabled reliable setups scaling to thousands of users.

There are many cloud computing vendors (Google, Amazon, …) and the first attempts to use JupyterHub with Kubernetes is based on them. But relying on vendor clouds increases the risk of vendor lock-in.

In addition, there are many pre-existing academic clouds managed by people with a high level of expertise and a thorough knowledge of their infrastructure and associated tools. These are often more cost-effective for research and education. Could we build upon these academic cloud computing to provide scalable and high-quality infrastructure for education and research?

In this post, we will focus on how to deploy JupyterHub with Kubernetes on OpenStack. A first attempt to create academic cloud computing in France.

This post is split into two parts (see links below).

Why to deploy JupyterHub on OpenStack is a high-level description of our problem, and our steps to solve it. It explains why we want to deploy a JupyterHub on OpenStack, what difficulties we have encountered, and what we want to do in a near future.

A technical guide to deploying JupyterHub on OpenStack is an in-depth guide that you may follow in order to deploy your own JupyterHub on Kubernetes on OpenStack. It’s designed for any person interested in how to replicate our deployment on their own infrastructure.

Our story, our difficulties and our plans

Why deploy JupyterHub on OpenStack?

JupyterHub, the multi-user Jupyter server, has been actively developed since 2014 and has seen a rapidly growing adoption in the past year.

You may know about Zero to JupyterHub, which provides step-by-step instructions for installing JupyterHub using a vendor-managed Kubernetes cluster. In the guide, you can also find how to set up a Kubernetes cluster on many vendor clouds such as AWS, Azure, and more recently on OpenShift. But what about other cloud infrastructures based on open-source infrastructure, such as OpenStack? While cloud vendors often provide you with many tools that make your life easier, OpenStack requires more explicit configuration and setup.

Earlier this year, we set up a working group across several France universities to explore how to easily set up JupyterHub for teaching and research in our academic cloud infrastructures. It turns out that the technology used across these academic clouds is OpenStack. One of the objectives of this working group is to make just as easy to deploy JupyterHub on OpenStack as compared to following Zero to JupyterHub and using vendor infrastructure.

Note: We are not the first to work on this problem, and we should also mention the work done in Canada through Syzygy.ca which is a project of PIMS, Compute Canada, and Cybera. They have developed their own deployment tools using terraform and ansible scripts. Our approach differs in that while we use the same technological stack, we prefer not to build a custom deployment tool that we would need to maintain over time.

Issues we encountered

We started a deployment using Kubespray in January of this year and have had a bumpy path since then. To begin, we looked at what existed already in the OpenStack world. We came across Kubespray, which offers a great facility and a lot of flexibility when you want to deploy a Kubernetes cluster. An interesting fact about Kubespray is that it’s not dedicated to OpenStack infrastructures, so you should be able to follow the same procedure for other deployments such as a baremetal cluster.

Using Kubespray, we very quickly had a Kubernetes cluster on OpenStack. However, we ran into network problems, and would lose network packets that made the JupyterHub completely unusable. It took us a long time to realize that we had MTU issues and even longer to solve it. To make things harder, we used a production platform which made it very difficult to update. We finally solved the problem by using a test platform where we could have more freedom.

In Kubespray, there are various CNIs (Container Network Interface) and one of them (weave) allows to modify the MTU. We tried to configure it carefully on the production platform, but we continued facing the same problem. Trying a new version of OpenStack on the test platform, we were able to solve the problem. That means that something bad had also happened with the LoadBalancer. For more explanation, see the MTU section in the technical description below.

We thought we could deploy JupyterHub with Kubernetes on OpenStack in a few weeks but as you can see, that’s not what happened at all. That’s why it was important for us to share our experience in the hopes that it makes the process easier for others. In the last section, we’ll cover more of the technical details for our deployment

What’s next ?

For us, the installation of JupyterHub on OpenStack was just the first step of a long journey. We are able now to offer to our researchers and our students a JupyterHub but we want more. Here’s a short wish-list our deployments.

On-demand environments. Imagine offering researchers and teachers an even more flexible platform where they can create their work environment and distribute them without needing to use central IT for the installation of their packages. As you may have guessed, we are more interested in what BinderHub has to offer.

The steps described above also work for the installation of BinderHub. We deployed a BinderHub on OpenStack alongside a DockerHub registry. Kubespray also offers the possibility to deploy a private registry and we would like to test it with BinderHub.

Cluster monitoring. It would also be great to have monitoring of the Kubernetes cluster. This would allow us to inspect the usage rates and resources available on the deployment. In the roadmap of Kubespray, it is planned to add Grafana and Prometheus installations.

Authentication for BinderHub. Currently BinderHub does not support authentication for users. However, note that a recent pull request on this subject was merged in BinderHub (see https://github.com/jupyterhub/binderhub/pull/666).

Persistent storage in BinderHub. It is also currently not possible to persist storage across BinderHub sessions. Once authentication is possible in BinderHub, we’d also like to connect user accounts to their storage so that they can keep their work over time. This will also require being able to mount the home directory of each user.

We will work on all these items in the next months.

The Technical Details

This part details the set up of a Kubernetes cluster and JupyterHub using a bare OpenStack infrastructure. To make it as reproducible as possible, we will start by listing the versions we have used.

OpenStack: Pike
Kubespray: commit 3632290
Kubernetes: 1.11.3
Helm: 2.9.1
JupyterHub: 0.7.0

Now that we’ve described the components and the versions used, let’s start to deploy our JupyterHub on OpenStack !!

The deployment steps are the following

Connect to our OpenStack infrastructure
Download Kubespray
Create your infrastructure using terraform
Deploy your Kubernetes cluster using ansible
Deploy your JupyterHub using Helm chart
Enjoy!

Connect to OpenStack

Kubespray needs a access to your OpenStack infrastructure in order to create all the instances needed for your Kubernetes cluster using the OpenStack CLI (Command-Line Interface). When you log in to your OpenStack dashboard, you can download all the environment variables to use the CLI.

We chose to download the OpenStack RC File V3. You should obtain something like this:

Note that we’ve removed the lines which ask for your password when you use the CLI and add it to “never ask again”. You also have to provide OS_CLOUD and OS_CACERT (even if you don’t have a certificate to access to your OpenStack infrastructure, you must provide one but you can keep it blank).

Now, you can install the OpenStack CLI with the command line. We’ll show two ways to do this below:

with virtualenv

virtualenv ~/openstack
source ~/openstack/bin/activate
pip install python-openstackclient

with conda

conda create -n openstack python=3.6
source activate openstack
pip install python-openstackclient

Next, source your rc file to activate it

source rc_file

and test your connection

openstack project list

You should be able to see your projects listed.

Once you have access, you will need some information in order to use terraform with Kubespray. You should find the following things (we have highlighted them in the images below):

The name of the image you want to deploy. To list it, run the following command:

openstack image list

2. The id of the flavor describing the type of machine you want to deploy (the flavor must be aUUID and not an integer ID). To find it, run this command:

openstack flavor list

Once you’ve got this information, it’s time to install Kubespray.

Install Kubespray

Because Kubespray is simply a GitHub repository, we don’t “install” it in a traditional sense, we only clone the repository to our machine. Since Kubespray is a project that evolves quickly, we’ll list the commit that we used for this post. Run the following command to get Kubespray:

git clone https://github.com/kubernetes-incubator/kubespray.git
cd kubespray
git checkout 3632290

Next, prepare all the files describing your Kubernetes cluster. We’ll follow the documentation given by Kubespray and will just change some flags. We encourage you to follow the procedure described below as the documentation seems to have some errors.

Kubespray uses terraform and ansible to deploy your Kubernetes cluster. ansible needs an inventory file which describes your cluster in order to execute the playbook roles on it. Kubespray provides a skeleton dedicated to OpenStack platform to provision your cluster using terraform and create the inventory file for ansible accordingly. To use the skeleton provided by Kubespray, the steps are the following

cp -LRp contrib/terraform/openstack/sample-inventory inventory/jhub
cd inventory/jhub
ln -s ../../contrib/terraform/openstack/hosts

jhub is the name directory we choose to store our inventory but you can choose what you want.

If you look at the inventory/jhub directory you will see

cluster.tf: the terraform file describing your inventory.
group_vars: the directory where we set all the variables used by ansible scripts provided by Kubespray.

Let’s describe our inventory.

Initialize Terraform

In the cluster.tf file, you can specify different kinds of Kubernetes clusters with floating IP for each VM. floating ip means that you ask to OpenStack to give you a public IP address in order to connect to the VM from an external network. You can also have a bastion where you have to log before reaching your Kubernetes cluster.

In the following, we only choose to have a master VM and two nodes for our Kubernetes cluster. Another important part is to specify a GlusterFS to have some storage resources for JupyterHub (database and home directories). Our inventory file cluster.tf looks like this

The flavors are the same for each master, node, and GlusterFS but you can do what you want. We also added dns_nameservers to be sure that we have a correct DNS on each nodes. We will check in future experiments if it’s really necessary.

The ID of the external network and the name of the floatingip_pool can be obtained with the following command:

openstack network list

You need ssh keys to be able to connect to the nodes. From the documentation of Kubespray:

Ensure your local ssh-agent is running and your ssh key has been added. This step is required by the terraform provisioner:

eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa

Now, it’s time to initialize terraform. It’s important for the next steps to be run from the root directory of Kubespray.

terraform init contrib/terraform/openstack

Now, create your VMs!

terraform apply -state=inventory/jhub/terraform.tfstate -var-file=inventory/jhub/cluster.tf contrib/terraform/openstack

At the end of this process, you can see your instances in the dashboard of OpenStack. It’s important to keep the information given at the end of the output.

At this stage, you’ve just created several VMs with the images given in the cluster.tf file. You don’t have a Kubernetes cluster up and running yet. It’s the next step!

Note: If you want to destroy all that you’ve done, run this command:

terraform destroy -state=inventory/jhub/terraform.tfstate -var-file=inventory/jhub/cluster.tf contrib/terraform/openstack

Configure your Kubernetes cluster

Again, Kubespray lets you configure your Kubernetes cluster with a lot of possibilities. We will show you one setup, but once you understand the procedure, you should be able to make your own choices.

Let’s start to see if we can ping our VMs. You have to add the following script that we called ssh-nodes.conf in your inventory/jhub directory

You also have to modify the ssh_args variable in ansible.cfg script in the root directory of Kubespray accordingly

ssh_args = -F inventory/jhub/ssh-nodes.conf -o ControlMaster=auto -o ControlPersist=30m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no

Just pay attention that you give the right external address and that your internal network is 10.0.0.*.

To check that everything is configured correctly, this command:

ansible -i inventory/jhub/hosts -m ping all

should have an output like this

You can now install your Kubernetes cluster with the ansible scripts provided by Kubespray. To do that, you will edit the files found in the group_vars directory in inventory/jhub.

You need several things to have a JupyterHub up and running

A CNI (Container Network Interface). Kubespray offers different CNI for your Kubernetes cluster: cilium, calico, contiv, weave or flannel. We will choose calico.

Storage for the data. We’ll deploy a GlusterFS and add storage on the Kubernetes cluster to have access to it.

A LoadBalancer to have access to the service from the external network. You have two kinds of LoadBalancer on Openstack: Neutron or Octavia. You can use both with Kubespray. We will choose Neutron but it will be preferable to use Octavia in the future.

So how do we configure all these items?

First, open the file inventory/jhub/group_vars/all/all.yml and modify the following entries

bootstrap_os: centos

upstream_dns_servers:

    - 8.8.8.8

    - 8.8.4.4

cloud_provider: openstack

Note that the dns address is specific to our infrastructure.

Now, open the file inventory/jhub/group_vars/all/openstack.yml and configure the LoadBalancer

openstack_lbaas_enabled: True

openstack_lbaas_subnet_id: "48ec8433-..."

openstack_lbaas_floating_network_id: "6cd08271-..."

The two IDs are those given at the end of the terraform apply step.

Open the file inventory/jhub/group_vars/k8s-cluster/k8s-cluster.yml and set persistent_volumes_enabled to true and resolvconf_mode to host_resolvconf.

Our OpenStack cloud infrastructure is configured with a VXLAN tunnel where the header size is 50 bytes. We use Calico with Kubernetes which also uses a VXLAN tunnel with a header of 50 bytes. Then, for a default MTU of 1500 bytes, we already have 100 bytes taken by the headers. So, we need to configure carefully the MTU of calico in order to be sure that the packet size (headers included) doesn’t exceed the 1500 bytes.

To configure the MTU of calico, we have to edit the file inventory/jhub/group_vars/k8s-cluster/k8s-net-calico.yml and set the calico_mtu flag to 1400.

It’s important to notice that setting MTU had no effect for OpenStack versions earlier than Pike. The LoadBalancer didn’t work correctly.

The last file to modify is inventory/jhub/group_vars/k8s-cluster/addons.yml. JupyterHub uses Helm charts to deploy all that you need on the Kubernetes cluster and Kubespray can install Helm for you. So just set the helm_enabled flag to true.

Now we can run ansible playbook

ansible-playbook --become -i inventory/jhub/hosts cluster.yml

You can take a coffee break because it takes time to install all the stuff. At the end of this process, you have a Kubernetes cluster up and running.

To be sure, log in on the master nodes (the external address given by terraform) and enter the command

kubectl -n kube-system get pods

You should be able to see all pods of the kube-system namespace running.

The last step is to install the persistent volume from our GlusterFS.

ansible-playbook --become -i inventory/jhub/hosts ./contrib/network-storage/glusterfs/glusterfs.yml

If you log in again to the master of your Kubernetes cluster and enter the following command

kubectl get pv

you will see your GlusterFS storage connected to your Kubernetes cluster.

Install JupyterHub

Now that you have a Kubernetes cluster running, the procedure to install JupyterHub is exactly the same as the one described in Zero to JupyterHub. The only difference is that you don’t have to install Helm, since Kubespray did it for you. We’ll post the commands below, and you can go to the Zero to JupyterHub website for more information.

The first step is to log in to the master node of your Kubernetes cluster. Then, initialize Helm.

helm init --service-account tiller

kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'

Next, follow the procedure described here

Setting up JupyterHub - Zero to JupyterHub with Kubernetes 0.7.0 documentation

If the LoadBalancer did its job, you should be able to see the external IP to connect to your JupyterHub

kubectl -n jhub  get svc

If you enter this address in your web browser. If everything worked, you will see the JupyterHub login page:

Wrapping up and feedback

The steps above described our attempts at running a JupyterHub on Kubernetes using OpenStack. There are likely many other ways to accomplish the same thing, and we’d love to hear feedback on the best procedure to install JupyterHub or BinderHub on OpenStack infrastructure. If you have encountered any issues, please leave a comment or ping us on the gitter channel of JupyterHub or Binder.

Thanks to the Project Jupyter team for their review and helpful comments and especially to Sylvain Corlay and Chris Holdgraf.

About the Authors (alphabetical order)

David Delavennat, Research Engineer in Scientific Infrastructures at CMLS (Polytechnique/CNRS) and INSMI (CNRS)
Loïc Gouarin, Research Engineer in Scientific Computing at CMAP (Polytechnique/CNRS)
Guillaume Philippon, Research Engineer in Scientific Infrastructures at LAL (IN2P3/CNRS)

How to Deploy JupyterHub with Kubernetes on OpenStack was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

On-demand Notebooks with JupyterHub, Jupyter Enterprise Gateway and Kubernetes

October 16, 2018, 8:03 am

≫ Next: Contribute to your first open source project!

≪ Previous: How to Deploy JupyterHub with Kubernetes on OpenStack

by: Luciano Resende, Kevin Bates, Alan Chin

Jupyter Notebook has become the “de facto” platform used by data scientists to build interactive applications and to tackle big data and AI problems.

With the increased adoption of Machine Learning and AI by enterprises, we have seen more and more requirement to build analytics platforms that provide on-demand notebooks for data scientists and data engineers in general.

This article describes how to deploy multiple components from the Jupyter Notebook stack to provide an on-demand analytics platform powered by JupyterHub and Jupyter Enterprise Gateway on a Kubernetes cluster.

On-Demand Notebooks Infrastructure

Below are the main components we are going to use to build our solution, and its high-level description:

JupyterHub enables the creation of a multi-user Hub which spawns, manages, and proxies multiple instances of the single-user Jupyter Notebook server providing the ‘as a service’ feeling we are looking for.

Jupyter Enterprise Gateway provides optimal resource allocations by enabling kernels to be launched in its own pod enabling notebook pods to have minimal resources while kernel specific resources are allocated/deallocated accordingly to its lifecycle. It also enables the base image of the kernel to become a choice.

Kubernetes enables easy management of containerized applications and resources with the benefit of Elasticity and multiple other quality of services.

JupyterHub Deployment

JupyterHub is the entry point for our solution, it will manage user authorization and provisioning of individual Notebook servers for each user.

JupyterHub configuration is done via a config.yaml, and the following settings are required:

Enable custom notebook configuration (coming from the customized user image).

hub:
  extraConfig: |-
    config = '/etc/jupyter/jupyter_notebook_config.py'

Define the docker image to be used when instantiating the notebook server for each user
Define custom environment variables used to connect the Notebook server with Jupyter Enterprise Gateway to enable support for remote kernels

singleuser:
  image:
    name: elyra/nb2kg
    tag: dev
  storage:
    dynamic:
      storageClass: nfs-dynamic
  extraEnv:
    KG_URL: <FQDN of Gateway Endpoint>
    KG_HTTP_USER: jovyan
    KERNEL_USERNAME: jovyan
    KG_REQUEST_TIMEOUT: 60

The complete config.yaml would look like the one below:

hub:
  db:
    type: sqlite-memory
  extraConfig: |-
    config = '/etc/jupyter/jupyter_notebook_config.py'
    c.Spawner.cmd = ['jupyter-labhub']
proxy:
  secretToken: "xxx"

ingress:
  enabled: true
  hosts:
    - <FQDN Kubernetes Master>

singleuser:
  defaultUrl: "/lab"
  image:
    name: elyra/nb2kg-hub
    tag: dev
  storage:
    dynamic:
      storageClass: nfs-dynamic
  extraEnv:
    KG_URL: <FQDN of Gateway Endpoint>
    KG_HTTP_USER: jovyan
    KERNEL_USERNAME: jovyan
    KG_REQUEST_TIMEOUT: 60

rbac:
  enabled: true

debug:
  enabled: true

Detailed deployment instructions for JupyterHub can be found at Zero to JupyterHub for Kubernetes, but the command below would deploy it into a Kubernetes environment.

helm upgrade --install --force hub jupyterhub/jupyterhub --namespace hub --version 0.7.0 --values jupyterhub-config.yaml

Custom JupyterHub user image

By default, JupyterHub would deploy a vanilla Notebook Server image which will require that all resources ever used by the image to be allocated when the Kubernetes image is instantiated.

Our custom image will enable kernels to be started in its own pod, promoting a better resource allocation as resources can be allocated and freed up as needed. This also gives us the flexibility of supporting different frameworks for different notebooks (e.g. a notebook using Python and TensorFlow, while another is using Python and Caffe2).

Dockerfile for elyra-nb2kg custom image:

FROM jupyterhub/k8s-singleuser-sample:0.7.0

# Do the pip installs as the unprivileged notebook user
USER $NB_USER

ADD jupyter_notebook_config.py /etc/jupyter/jupyter_notebook_config.py

# Install NB2KG
RUN pip install --upgrade nb2kg && \
    jupyter serverextension enable --py nb2kg --sys-prefix

Jupyter Notebook custom configuration to override Notebook handlers with the ones from NB2KG that will enable the notebook to connect with the Enterprise Gateway that enables remote kernels.

from jupyter_core.paths import jupyter_data_dir
import subprocess
import os
import errno
import stat

c = get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.port = 8888
c.NotebookApp.open_browser = False

c.NotebookApp.session_manager_class = 'nb2kg.managers.SessionManager'
c.NotebookApp.kernel_manager_class = 'nb2kg.managers.RemoteKernelManager'
c.NotebookApp.kernel_spec_manager_class = 'nb2kg.managers.RemoteKernelSpecManager'

# https://github.com/jupyter/notebook/issues/3130
c.FileContentsManager.delete_to_trash = False

# Generate a self-signed certificate
if 'GEN_CERT' in os.environ:
    dir_name = jupyter_data_dir()
    pem_file = os.path.join(dir_name, 'notebook.pem')
    try:
        os.makedirs(dir_name)
    except OSError as exc:  # Python >2.5
        if exc.errno == errno.EEXIST and os.path.isdir(dir_name):
            pass
        else:
            raise
    # Generate a certificate if one doesn't exist on disk
    subprocess.check_call(['openssl', 'req', '-new',
                           '-newkey', 'rsa:2048',
                           '-days', '365',
                           '-nodes', '-x509',
                           '-subj', '/C=XX/ST=XX/L=XX/O=generated/CN=generated',
                           '-keyout', pem_file,
                           '-out', pem_file])
    # Restrict access to the file
    os.chmod(pem_file, stat.S_IRUSR | stat.S_IWUSR)
    c.NotebookApp.certfile = pem_file

Note that the document above was generated by jupyter notebook --generate-config and then updated with the required handlers override:

c.NotebookApp.session_manager_class = 'nb2kg.managers.SessionManager'
c.NotebookApp.kernel_manager_class = 'nb2kg.managers.RemoteKernelManager'
c.NotebookApp.kernel_spec_manager_class = 'nb2kg.managers.RemoteKernelSpecManager'

Jupyter Enterprise Gateway deployment

Jupyter Enterprise Gateway enables Jupyter Notebook to launch and manage remote kernels in a distributed cluster, including Kubernetes cluster.

Enterprise Gateway provides a Kubernetes deployment descriptor that makes it simple to deploy it on a Kubernetes environment with the command below:

kubectl apply -f https://raw.githubusercontent.com/jupyter-incubator/enterprise_gateway/master/etc/kubernetes/enterprise-gateway.yaml

We also recommend that the kernel images be downloaded on all nodes of the Kubernetes cluster to avoid delays/timeouts when launching kernels for the first time on these nodes.

docker pull elyra/enterprise-gateway:dev
docker pull elyra/kernel-py:dev
docker pull elyra/kernel-tf-py:dev
docker pull elyra/kernel-r:dev
docker pull elyra/kernel-scala:dev

Automated One-Click Deployment using Ansible

If you are eager to get started and try this in a few machines, we have published anansible script that deploys the full set of components described above on vanilla RHEL machines/VMs.

ansible-playbook --verbose setup-kubernetes.yml -c paramiko -i hosts-fyre-kubernetes

Conclusion

Jupyter Enterprise Gateway provides remote kernel management to Jupyter Notebooks. In a JupyterHub/Kubernetes environment, it enables hub to launch tiny Jupyter Notebook pods and only allocate large kernel resources when these are created as independent pods. This approach also allows for easy sharing of expensive resources as GPUs, etc

Special Thanks

Special thanks to Erik Sundell and Min RK from the JupyterHub team for the support and initial discussions around JupyterHub.

On-demand Notebooks with JupyterHub, Jupyter Enterprise Gateway and Kubernetes was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Contribute to your first open source project!

October 22, 2018, 11:13 am

≫ Next: MyBinder.org serves two million launches

≪ Previous: On-demand Notebooks with JupyterHub, Jupyter Enterprise Gateway and Kubernetes

Contribute to your first open source project

This Friday, 10/26, NumFOCUS DISC is excited to host a first of its kind Open Source Inclusion Sprint — no prior experience necessary!

JupyterLab is the next-generation web-based user interface for Project Jupyter.

Join us at the Two Sigma offices in lovely SoHo NYC for a day of fun and learning. Core contributors from the JupyterLab project will be on-site to get you setup and help trouble-shoot your code.

If you’ve never contributed to open source this event is for you.

Breakfast and lunch provided.

What to bring:

Laptop
Charger
Curiosity

Schedule

8:30 am — 9:00 am: Breakfast with GitHub and technical setup (optional)
9:00 am — 9:30 am: Introduction to Project Jupyter and JupyterLab
9:30 am — 1:00 pm: Coding
1:00 pm — 2:00 pm: Lunch will be provided
2:00 pm — 6:00 pm: Coding

Featured Hacks:

Not confident in your coding chops? Want to learn more about how to use Jupyter? Have something else in mind? No problem! Let us know what you’d like to learn and we’ll show you how Jupyter can make the experience easier.

Based on interest we’ll create workshops focused on the below topics. If you have something else in mind feel free to email Sam Brice.

Let’s Make an xkcd JupyterLab Extension (Beginner)

Create your own JupyterLab Extension and publish it to GitHub and NPM.
Prerequisites: Some programming experience. No prior Python or JupyterLab experience necessary.

Practical machine learning with Jupyter Notebook (Intermediate)

Walk through developing a machine learning pipeline, from prototyping to production, with the Jupyter platform.
Prerequisites: Python programming experience, basic familiarity with math (e.g. linear algebra), and data analysis.

Get Started with TensorFlow (Advanced)

Learn and use ML by going through some beginner-friendly notebook examples. We’ll go through training a neural network to classify images and text.
Prerequisites: Python programming experience, basic familiarity with math (e.g. linear algebra), and data analysis.

For More Information

Please visit the event GitHub for details or contact Sam Brice with any questions.

Contribute to your first open source project! was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

MyBinder.org serves two million launches

November 13, 2018, 4:35 am

≫ Next: Jupyter Community Workshops: Call for Proposals

≪ Previous: Contribute to your first open source project!

by the Binder Team

Since the beginning of 2018, the Binder community has been hosting a BinderHub at https://mybinder.org as a free public service. Today, we are proud to announce that this hub has served over two million Binders. To mark this milestone we would like to say a huge Thank You! to the large community of people who use, build, and fund the project. Without you this important public infrastructure would not be the user-friendly, reliable, well-supported and documented resource that we enjoy today. mybinder.org has enabled people from almost every country in the world to learn, participate and share countless projects, ideas and stories. Here’s to two million more!

A huge thank you to all those who help with building, using, and operating https://mybinder.org.

What is mybinder.org?

mybinder.org let’s you take a repository full of Jupyter notebooks or RMarkdown and turn it into a collection of interactive notebooks. You can share your work with anyone by sending them a simple link (like this one). All they have to do is open the link in a web browser and they can run those notebooks from anywhere in the world without having to install anything.

Who is using mybinder.org?

Currently about 70–80,000 Binders are launched every week. A lot of those are people who are looking for a quick and easy way to launch a Python or RStudio environment. However in the last week a notebook showing off fifty ways to solve Fizz Buzz has been getting a lot of love. Beyond those heavy hitters and short-lived audience favorites there is a long tail of over 400 unique repositories that get launched every week. It would take the rest of the post to list them all!

One last statistic that we are particularly proud of: over the last 80 days we have had users from almost all around the globe! Binder was started as a way to make computational research easier to share and reuse. We have been amazed at how many people around the world have used Binder for teaching classes, reproducing results, sharing interactive analyses, and making their work more accessible to others. We are particularly proud that this includes people from around the entire world.

Countries from which https://mybinder.org has received visitors between 22 August 2018 and 10 November 2018.

Data set of all launches on mybinder.org

mybinder.org is operated as a public infrastructure that is transparent, open, and inclusive. We chat, discuss, and work in the open. This is why we are now publishing a new data set: a continuously updated log of every launch that happens on mybinder.org!

MyBinder.org Events Archive

We would love to see people explore this data set as a public resource that describes the kinds of repositories being shared and launched on the public mybinder.org deployment.

A new badge!

One more thing … we thought now is a good time to give the trusty “Launch Binder” badge an overhaul. To improve the badge we put together some suggestions, reached out to the community, and within a few days received a lot of feedback and new ideas. After combining all the inputs, our new badge went live earlier this week. We present to you our new badge:

The new “Launch Binder” badge. Binder blue instead of bright red!

We hope you like it as much as we do. If you are in love with the old design or not quite ready to switch yet, do not worry! The old badge is not going anywhere. If you are using the old badge in your README it will continue to look the same as it always has.

If you do want to change a previously generated link to the new badge, edit the name of the SVG in the link from the old:

[![Binder](http://mybinder.org/badge.svg)](http://mybinder.org/v2/gh/binder-examples/r/master)

to the new:

[![Binder](http://mybinder.org/badge_logo.svg)](http://mybinder.org/v2/gh/binder-examples/r/master)Outro.

Open infrastructure in the cloud

The Binder project is a community-driven experiment in radically-open infrastructure. BinderHub, the underlying technology that powers a Binder deployment, is an open project and can be deployed in many other cloud environments. For example, see the Pangeo Binder deployment for geospatial analytics, or the Gesis Binder deployment for social sciences. We are excited to see the project head in new directions as we continue to grow the technology and the community around Binder.

Finally, we could not have done any of this without a ton of support from the Binder community. First, many thanks to the Moore Foundation for funding initial development of Binder’s underlying tech, and for helping us finance running the deployment at mybinder.org. Second, many thanks to the Binder project core team for fostering excellent technology and a great community. Finally, thanks to everybody in the Binder community — whether you’ve launched repositories, shared your Binders, participated in discussions, built features, helped design this post’s banner image (❤), or gave us some critical feedback. Binder’s purpose is to serve the community, and you all have made it so worth it!

If you’d like to get involved in the Binder project, here are a few helpful links:

To learn about Binder, see the Binder documentation: docs.mybinder.org
For information on how to deploy your own BinderHub, see the BinderHub deployment docs: binderhub.readthedocs.io
To participate in conversations with the Binder community, say hello on the Binder gitter channel (https://gitter.im/jupyterhub/binder) or the JupyterHub/Binder Discourse forum (discourse.jupyter.org).
If you’d like to see the code itself, see the three main open projects that make up a Binder deployment: BinderHub (github.com/jupyterhub/binderhub), repo2docker (github.com/jupyter/repo2docker), and JupyterHub (github.com/jupyterhub/jupyterhub)

MyBinder.org serves two million launches was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Jupyter Community Workshops: Call for Proposals

November 14, 2018, 11:26 am

≫ Next: Jupyter Notebook security fixes

≪ Previous: MyBinder.org serves two million launches

The large majority of Jupyter’s work is accomplished through remote, online collaboration; yet, over the years, we have found deep value in focused in-person work over a few days. In-person events are particularly useful for tackling challenging development and design projects, growing the community of contributors, and for strengthening collaborations.

In early 2018 Bloomberg announced funding to enable contributors to Project Jupyter to host workshops and small events in their local communities. We are thrilled with the level of interest generated by our first round of funding and are happy to share that we were able to fund two recipients Chalmer Lowe and Lorena Barba for events scheduled for mid and late November. Chalmer Lowe’s proposal included a series of community events to engage his local Python and data science community to use support/use/contribute to Jupyter. Lorena Barba’s proposal focused on hosting a writing sprint to create the first draft of a handbook for teaching with Jupyter. Look for future blog posts to highlight the results and impact of those events.

The second call for proposals and final round of funding is open through December 10, 2018.

The idea behind these workshops is to bring together small groups (12 to 24 people), of Jupyter community members and core contributors for high-impact strategic work and community engagement on focused topics. Our vision is that these events would occur no later than June of 2019.

We are particularly interested in workshops that explore and address topics of strategic importance for the future of Jupyter. We expect the workshops to involve 1–2 dozen participants over a 2–3 day period, and have a total Jupyter-funded budget of approximately $10,000 to $20,000, which may help cover expenses such as travel, lodging, meals, or event space. It is our intent for the workshops to include both participants who are core Jupyter contributors, as well as stakeholders and contributors and potential contributors within the larger Jupyter ecosystem. While not the primary focus of the workshops, it would be highly beneficial to couple the workshop with broader community outreach events, such as sprints, talks, or tutorials, at local meetings or conferences.

Proposal Process Highlights:

Submit initial proposal using this form by Monday, December 10, 8am Pacific time (1600 UTC).
Initial steering council review (up to a week). Proposal goes to Steering Council for initial review and feedback. Proposal is either approved or declined.
Budget and Logistics Development (up to four weeks). Operations Manager will support workshop organizer who work will develop a venue/date proposal, detailed budget, event plan, and proposed list of participants.
Final steering council review (up to a week). Proposal presented for final approval to steering council, including final budget, event details, and an estimate of the potential impact of the event. Assuming the budget included in the initial proposal is fully developed and no major changes are proposed, this period may be waived.

The proposal process for these workshops is being managed by the Jupyter Operations Manager, Ana Ruvalcaba (jupyterops@gmail.com), and the Steering Council. Applications can be completed using the online form and are due by December 10, 2018, 8am Pacific time (1600 UTC). Events should be hosted no later than June of 2019.

— -

This initiative is organized by Jason Grout, Paul Ivanov, Brian Granger, and Ana Ruvalcaba.

Jupyter Community Workshops: Call for Proposals was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Jupyter Notebook security fixes

November 19, 2018, 2:20 pm

≫ Next: Outreachy & Jupyter: Supporting diversity in open communities

≪ Previous: Jupyter Community Workshops: Call for Proposals

Two security issues have been found and fixed this week, where untrusted javascript could be executed if malicious files could be delivered to the users system and the user takes specific actions with those malicious files.

The first allowed nbconvert endpoints (such as Print Preview) to render untrusted HTML and javascript with access to the notebook server. This is fixed in notebook 5.7.1. All notebook versions prior to 5.7.1 are affected. Thanks to Jonathan Kamens of Quantopian for reporting. This issue has been assigned CVE-2018–19351.

The second issue allowed maliciously crafted directory names to execute javascript when opened in the tree view. This is fixed in notebook 5.7.2. All versions of notebook from 5.3.0 to 5.7.1 are affected. Thanks to Marvin Solano Quesada for reporting. This issue has been assigned CVE-2018–19352.

You can check your version of the notebook package by issuing the following command:

jupyter notebook --version

Whether you are using classic notebook, JupyterLab or any other notebook server extensions, we recommend that you update the notebook package with :

pip install --upgrade notebook

or if you are using conda-forge

conda upgrade notebook

Thanks especially to Jonathan and Marvin for reporting these issues! If you find a security issue in a Jupyter project, please report it to security@ipython.org.

Jupyter Notebook security fixes was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Outreachy & Jupyter: Supporting diversity in open communities

November 28, 2018, 9:26 am

≫ Next: Jupyter Hackathon Series in Hawaii

≪ Previous: Jupyter Notebook security fixes

Project Jupyter has accepted 2 interns through the Outreachy program, which supports open community members from under-represented backgrounds. Our Outreachy interns will work on important problems in the JupyterHub community with help from 2 mentors from December 2018 through March 2019. This is a short post describing why Jupyter is committing to this program, as well as what we’ll be working on with our Outreachy participants.

We are grateful to the Berkeley Institute for Data Science & NumFocus for jointly sponsoring our interns. This article is cross-posted with the Berkeley Institute for Data Science blog.

Why focus on diversity and inclusion?

The tech industry isn’t doing great on diversity, and the Open Source community is doing worse . This limits the community’s ability to tackle challenging, diverse projects. The amount of responsibility and power held by open source communities is increasing, so we have a responsibility to be more representative of the diversity present in our world. Many projects grow their community by encouraging people to start making small patches on their own. However, requiring uncompensated work is a big barrier to getting more diverse representation in our open source communities. Paid internships with dedicated mentors are a great way to help people break through this particular barrier and make the open source community more diverse.

What is Outreachy?

Outreachy is an internship program coordinated by the Software Freedom Conservancy. It is run twice a year with a goal to bring people from underrepresented backgrounds in tech into open source projects. 21 open source organizations are participating in this round, and will be working with a total of 46–47 interns. Importantly, these are paid internships, which make them more viable for a much broader slice of the population. Jupyter and BIDS will both contribute funding and mentorship for two Outreachy interns.

Our Interns and Mentors

We have two amazing interns for this Outreachy round, working on important projects in the JupyterHub ecosystem. Below we’ll describe the projects that they’ll work on.

A highly Available Proxy for JupyterHub (link)

Georgiana Dolocan will be mentored by Min RK and Yuvi Panda in building a highly available & scalable proxy for JupyterHub using the Traefik project. This will help JupyterHub deployments scale more easily for thousands of active users with minimal service disruptions. Georgiana will be working from Bucharest, Romania where she likes to paint and explore the city and the villages nearby alongside her dog and camera.

Improvements to user management (link)

Leticia Portella will be mentored by Yuvi Panda and Min RK in building better native user management features into JupyterHub. Small to medium installations of JupyterHub that do not want to depend entirely on an external authentication provider will benefit greatly from this. Leticia will be working from Dublin, Ireland (although she used to live in Florianópolis, Brazil), where she likes to read a lot (especially The Chronicles of Ice and Fire), to swim, and to work on her podcast (the first Brazilian podcast specialized in Data Science topics), Pizza de Dados (Data Pizza, free translation).

About our mentors

Outreachy requires more than just funding, but also mentorship. Two members of the JupyterHub community have offered their time to help mentor our Outreachy contributors, some information about them is below!

Min RK is a research engineer at Simula Research Laboratory in Norway and has been working on the IPython and Jupyter open source projects since joining in 2006 as an undergraduate in Engineering Physics at Santa Clara University with Brian Granger, one of the founders of the Jupyter project. Min currently focuses on JupyterHub, and is excited to welcome new folks to the Jupyter team.

Yuvi Panda is an operations engineer at University of California, Berkeley. He works at the Berkeley Institute for Data Science, where he maintains the JupyterHub infrastructure for the Division of Data Sciences. He has been involved in various Open Source communities in the last ten years, spending time in the GNOME, Wikimedia & Jupyter communities. His life was changed drastically by participating as a student in Google Summer of Code 2010, and he’s excited to give back to the Open Source community.

We would like to thank everyone who applied to JupyterHub for this round of Outreachy. We received a number of robust proposals, out of which we were only able to accept two. JupyterHub mentors spent quite a lot of time mentoring candidates during the application period, in reviewing their pull requests, and giving them feedback on their proposals. We look forward to bringing these new open source contributors into our community, and hope that we can set a path that other open source projects may follow in the future. Open source works best when it is diverse and inclusive, we think this is a small step in that direction.

Thanks to Orianna DeMasi, Sara Stoudt, Chris Holdgraf & others for contributing heavily to this blog post.

Outreachy & Jupyter: Supporting diversity in open communities was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧

Jupyter Hackathon Series in Hawaii

December 6, 2018, 12:17 pm

≫ Next: elife sprint: Integrating Stencila and Binder

≪ Previous: Outreachy & Jupyter: Supporting diversity in open communities

PyHawaii hosted a three-part Jupyter Hackathon as part of a workshop grant, generously offered by Bloomberg. PyHawaii, a user group in Hawaii, led the three-part event series composed of a:

Git/GitHub hands-on workshop (4-hrs — September)
Jupyter Lab, Notebook and more hands-on tutorial (4-hrs — October)
Jupyter Hackathon/open source sprint (2-days — November)

In this post, Project Jupyter had announced the availability of Bloomberg funding to host small-scale community events. PyHawaii wrote up a grant application for three events and was approved. The original grant application was for an event with ~25 people. Since the venue was donated by the University of Hawaii, PyHawaii basically needed funds to cover food and incidentals. PyHawaii also included in the application a request for funding to have a Jupyter Core Developer attend the two-day hackathon in November. Ultimately, Project Jupyter was able to send two Core Developers: Dr. Brian Granger and Dr. Ian Rose to the third event.

Based on the strong responses to the initial Meetup postings, PyHawaii sought additional funding via their relationships with Booz Allen Hamilton to provide food for more attendees. PyHawaii advertised via Twitter, Facebook, meetup.com and relied upon their relationships with professors, college students, technical professionals and members of the community to get the word out. PyHawaii was on Bytemarks Cafe, their local tech broadcast on Hawaii Public Radio.

How it all played out…

Each of the events, described below, played in slightly different ways.

Git/GitHub Hands-on Workshop

PyHawaii had a number of mentors and organizers (Jeff, Joe, Jim, John and Chalmer) who walked 35 attendees through PyHawaii’s “How to contribute to open source” workshop. This workshop is used to prep folks for open source sprints and has been run at conferences around the world (twice at Pycon (2018, 2017), PyOhio (2016), DjangoCon (2018) and more).

The material is designed to maximize hands-on use of git and GitHub and enable attendees to practice the most frequently used commands and features by contributing to a repo full of poems (this allows them to focus on just the mechanics of git/GitHub and not have to simultaneously digest all the nuances of an unknown code base.)

Jupyter Hands-on Tutorial

For PyHawaii’s Jupyter Tutorial, we used a 4 hour lesson created and led by Chalmer Lowe, with support from PyHawaii mentors and organizers to introduce 39 attendees to Jupyter Lab, Notebooks, widgets and more. Like their git/GitHub tutorial, this PyHawaii lesson is also open-source and freely available.

Feedback on the second event and kudos to the folks who make Jupyter great!

Jupyter Hackathon

Over the course of a weekend 25 attendees came together to work on adding features, fixing bugs and improving documentation in Jupyter Lab, Ipywidgets and other repositories. Jupyter core developers (as well as mentors from PyHawaii) spent both days working with attendees to:

understand the codebase for the various projects
troubleshoot the process of setting up their dev environments
identify suitable issues to work on
craft suitable pull requests

In addition to the hackathon, a meet-and-greet event was hosted for core developers, professors and students at the University of Hawaii.

Outcomes and Lessons Learned

These events achieved success on multiple levels. Dozens of attendees had a chance to experience the fun (and frustration), the excitement (and exasperation) of contributing to open source, and they got to do it in a friendly environment with mentors on-hand to help guide them past the initial hurdles. These new skills and experiences have the potential to shape career opportunities and strengthen bonds in the local developer community.

The attendees had multiple opportunities to network and grow their relationships. Hawaii has seen tremendous growth in their development community over the past few years and events like this help to speed that growth. The attendees had the opportunity to network on an international scale through discussions about possible fixes to issues with core developers in person and via the web. Based on feedback, a number of attendees initially had no idea of the breadth of capabilities in Jupyter. As attendees apply this experience to their work, adoption of Jupyter as a data analysis, data science and interactive, exploratory platform will grow.

So far, PyHawaii has at least 13 pull requests, 10 of which have been accepted!

PyHawaii’s experiences yielded the following lessons learned, should other organizations seek to facilitate similar events:

a 1:12–15 person ratio of experienced project developer to attendee allowed folks to get quick, hands-on mentoring without over-burdening the developers. Having other mentors (even if they aren’t experienced in the nuances of the project) is a must to support the project developer in staying focused on answering project-specific questions vs more administrative/low-level questions about git, etc)
host a short event (i.e. the evening OR previous weekend) before the hackathon to help walk folks through existing issues and through setting up their dev environment (build tools, documentation build tools, source code, compiler, etc). This will enable attendees to hit the ground running when the hackathon actually starts
RSVPs are tricky when you host an event at no cost to the attendee. Depending on the event, we had between 28% and 53% no-shows which greatly impacts food costs, which often has to be purchased beforehand. Regular tracking of attendance at other events can help identify no-show trends and facilitate more effective budget estimates.

All in all, this was a great experience that promises to have a lasting impact on the data science and programming communities in Hawaii.

Jupyter Hackathon Series in Hawaii was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

↧