Remove Livy from WS, or provide an option to run WS notebooks without Livy

Based on the feedback from our DSX users -

We would love to stop using Livy for DSX Jupyter notebooks.

There are many disadvantages of using Livy:

Code execution is synchronous – can’t see results while a cell is running
Straight Spark connectivity can shows output while it’s running (like model build progress, Spark job progress etc) – this is not possible when using livy/sparkmagic
There are ton of bugs and functional deficiencies with Livy and Sparkmagic

actually probably a half if not more of our support cases revolved around this two components

We noticed it’s extremely confusing to some users that their notebooks actually have two independent python processes running
(Jupyter notebook itself is one , and for %%spark it’s a remote spark driver). When we remove livy/sparkmagic, and run spark driver locally like we currently do in other notebook solution, it’s all very simple
Some Jupyter extensions don’t work with sparkmagic

There was one tiny benefit of using Livy as spark driver runs remotely in yarn-cluster mode,
but in case of DSX we don’t have issues with running Spark drivers locally as those are in K8S cluster,
so we’re not limited with resources. For interactive Spark applications like Jupyter, using intermediate
components like Livy/ sparkmagic doesn’t add any value, but introduces complexity and a lot of bugs.

Notice for example “sparkmagic” is still considered incubating Jupyter project as it doesn’t have a
good community/userbase. Apache Livy also lags good release cadence, doesn’t have a strong community
and is also considered “incubating” Apache project.

Based on the feedback from our users, we would love to have an option to run Jupyter Spark notebooks
without Livy/sparkmagic, where spark driver runs locally in Jupyter container. I think you were checking
if DSX 2.1 already removed Livy, when DSX has migrated over to Enterprise Gateway – please confirm.

From what we can tell, Livy/sparkmagic is a major obstacle with our success in DSX deployment
and wider adoption of WSL in our organization.

Post comment

Guest

Dec 19, 2019

Thank you for confirming Snehal!

In case of JEG, is Spark driver runs locally in the same pod as Jupyter process,
or it's running in some other pod? In other words, is this yarn-client or yarn-cluster spark submit mode? A link to documentation would be awesome to have .

Ruslan

Reply
Hide replies

Guest

Dec 19, 2019

Hello Ruslan, Confirmed with development team - JEG doesn't involve Livy

Reply
Hide replies

Guest

Dec 18, 2019
Hello Snehal -

Thank you for those details.

Based on this, it seems the only option that will work for us is

WSL 2.1/CPD 2.5
- Spark running on Hadoop via JEG
Can you please confirm that "Hadoop via JEG" doesn't involve running Livy in any way? E.g. we want to make sure JEG doesn't run / doesn't use Livy itself..

We want to completely exclude JEG from the equation.

Thanks!
Ruslan
Reply
Hide replies

Guest

Dec 18, 2019
We will continue Livy in WS 2.1 but we have also introduced JEG.
For spark execution, users have the following options
WSL 1.2.3 / CPD 2.1
- Spark running in local mode within the Jupyter pod
- Spark running in cluster mode within the WSL/CPD cluster
- Spark running on Hadoop via Livy
WSL 2.1/CPD 2.5
- Spark running in local mode within the Jupyter pod
- Hummingbird spark
- Spark running on Hadoop via Livy
- Spark running on Hadoop via JEG
Please confirm if this is helfpul for your use case.
Reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Remove Livy from WS, or provide an option to run WS notebooks without Livy