Quantcast
Channel: Jupyter Blog - Medium
Viewing all articles
Browse latest Browse all 311

Jupyter for Science User Facilities and High Performance Computing

$
0
0

Jupyter is the “Google Docs” of data science. It provides that same kind of easy-to-use ecosystem, but for interactive data exploration, modeling, and analysis. Just as people have come to expect to be able to use Google Docs everywhere, scientists assume that Jupyter is there for them whenever and wherever they open their laptops.

But what if the data you want to interact with through Jupyter doesn’t fit on your laptop or is excruciating to move? What if the model you want to build and test requires more computing power and storage than you have right in front of you? As a scientist, you want the same interactive experience and all the benefits of Jupyter, but you also need to “reach out” to put something big into your science process: A supercomputer, a telescope data archive, a beam-line at a synchrotron. Can Jupyter help you do that big science? What efforts are in motion already to make this a reality, what work still needs to be done, and who needs to do it?

Doing this right will take a community: New collaborations between core Jupyter developers, engineers from high-performance computing (HPC) centers, staff from large-scale experimental and observational data (EOD) facilities, users and other stakeholders. Many facilities have figured out how to deploy, manage, and customize Jupyter, but have done it while focused on their unique requirements and capabilities. Still others are just taking their first steps and want to avoid reinventing the wheel. With some initial critical mass, we can start contributing what we’ve learned separately into a shared body of knowledge, patterns, tools, and best practices.

40+ participants from universities, national labs, industry, and science user facilities. Credit: Fernando Perez.

In June, a Jupyter Community Workshop held at the National Energy Research Scientific Computing Center (NERSC) and the Berkeley Institute for Data Science (BIDS) brought about 40 members of this community together to start distilling. Over three days in talks and breakout sessions, we addressed pain points and best practices in Jupyter deployment, infrastructure, and user support; securing Jupyter in multi-tenant environments; sharing notebooks; HPC/EOD-focused Jupyter extensions; and strategies for communication with stakeholders.

Here are just a few highlights from the meeting:

  • Michael Milligan from the Minnesota Supercomputing Center perfectly set the tone for the workshop with his keynote, “Jupyter is a One-Stop Shop for Interactive HPC Services.” Michael is the creator of BatchSpawner and WrapSpawner, JupyterHub Spawners that let HPC users run notebooks on compute nodes supporting a variety of batch queue systems. Contributors to both packages met in an afternoon-long breakout to build consensus around some technical issues, start managing development and support in a collaborative way, and gel as a team.
  • Securing Jupyter is a huge topic. Thomas Mendoza from Lawrence Livermore National Laboratory talked about his work to enable end-to-end SSL in JupyterHub and best practices for securing Jupyter. Outcomes from two breakouts on security include a plan to more prominently document security best practices, and a future meeting (perhaps another Jupyter Community Workshop?) focused specifically on security in Jupyter.
  • Speakers from Lawrence Livermore and Oak Ridge National Laboratories, the European Space Agency showed off a variety of beautiful JupyterLab extensions, integrations, and plug-ins for climate science, complex physical simulations, astronomical images and catalogs, and atmospheric monitoring. People at a variety of facilities are finding ways to adapt Jupyter to meet the specific needs of their scientists.

Really, there’s just too much to pack into a blog post so we encourage you to look at the talk slides and notes on Discourse — all the breakout notes have been posted there to this topic. We’re working on getting videos of the slide presentations up on the workshop website as well. Watch for announcements of future meeting opportunities and documentation on Discourse as well.

Finally we want to thank Project Jupyter, NumFOCUS, and Bloomberg for their help making this meeting happen. We all came away with a better sense of who is doing what in our community, and how we can work together on this new area of growth for the Jupyter community. The organizers also want to thank their respective institutions’ administrative staff (Seleste Rodriguez at NERSC, and Stacy Dorton at BIDS) for helping with workshop logistics.


Jupyter for Science User Facilities and High Performance Computing was originally published in Jupyter Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.


Viewing all articles
Browse latest Browse all 311

Trending Articles