Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Reinstate the JupyterLite-based live shell for the pandas website #60747

Open
agriyakhetarpal opened this issue Jan 21, 2025 · 2 comments · May be fixed by #60758
Open

DOC: Reinstate the JupyterLite-based live shell for the pandas website #60747

agriyakhetarpal opened this issue Jan 21, 2025 · 2 comments · May be fixed by #60758
Labels

Comments

@agriyakhetarpal
Copy link
Contributor

agriyakhetarpal commented Jan 21, 2025

Description

Hello, I've recently been looking into the JupyterLite shell for the pandas website's Getting Started page that briefly used to serve as an interactive endpoint for users browsing the website. It was discussed in #46682 and added in #47428, subsequently reported to be a bit slow back then, and was removed as a result in #49807.

I'd like to propose reinstating this shell for the website (either on the same page, or elsewhere on the docs website's landing page via the jupyterlite-sphinx project, similar to matplotlib/matplotlib#22634), and wish to seek thoughts from the pandas maintainers via this issue on whether it would be a good idea to do so for usage by newcomers.

Rationale and additional context

  • In early 2025, it has been a lot of time by now, and while the world of Python running in WebAssembly still experimental, we've since then made a bunch of improvements across the Pyodide and JupyterLite ecosystems across many past releases – both for improving the stability of the shell, if not its speed, and for being able to run pandas code within it.
  • As the one who helped add the WASM CI job for Pandas last year via $57896, this is a related area in terms of pandas's usage within Pyodide, and I would be happy to maintain the shell if it's added and establish some relevant automations towards its upkeep.
  • We have been working on similar improvements to contemporary shells, such as those that exist and have been retained on the websites for NumPyand SymPy, recently

xref: Quansight-Labs/czi-scientific-python-mgmt#134

Thank you for your time! :)


P.S. Here's a short example, which takes ~7.5 seconds for me to load on a decently stable connection – but even for those with throttled connections, it should be easy to add a small admonition before it that just says "This is an experimental playground", or just prefix the word "Experimental" before the heading.

P.P.S. I noticed that a similar approach has been taken by the Ibis project; they have an admonition on this page: https://ibis-project.org/tutorials/browser/repl that states that it is experimental at the moment.

cc: @jtpio for visibility, as he was among those who collaborated on (and led) this effort previously through the issues and PRs linked.


The description and rationale have been copied over with minor changes from my recent message on 18/01/2025 in the pandas Slack workspace: https://pandas-dev-community.slack.com/archives/C03PH1SU1M1/p1737168137448029 as suggested by @rhshadrach, which should help this proposal receive greater visibility.

@rhshadrach
Copy link
Member

I'm +1 on giving this another go.

maintain the shell if it's added and establish some relevant automations towards its upkeep.

What is the expected maintenance / upkeep?

I'd like to propose reinstating this shell for the website (either on the same page, or elsewhere

Getting Started seems to be a natural place to me.

@rhshadrach rhshadrach changed the title Reinstate the JupyterLite-based live shell for the pandas website DOC: Reinstate the JupyterLite-based live shell for the pandas website Jan 22, 2025
@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Jan 22, 2025

I'm +1 on giving this another go.

Thanks, @rhshadrach! I'll put together a PR to add it back, and I'm happy to take the discussion forward either there or here.

maintain the shell if it's added and establish some relevant automations towards its upkeep.

What is the expected maintenance / upkeep?

Here are my points on this:

  • If we want the shell to always host the latest stable version of pandas, we need to ensure that the emscripten job in these lines:
    emscripten:
    # Note: the Python version, Emscripten toolchain version are determined
    # by the Pyodide version. The appropriate versions can be found in the
    # Pyodide repodata.json "info" field, or in the Makefile.envs file:
    # https://github.com/pyodide/pyodide/blob/stable/Makefile.envs#L2
    # The Node.js version can be determined via Pyodide:
    # https://pyodide.org/en/stable/usage/index.html#node-js
    name: Pyodide build
    runs-on: ubuntu-22.04
    concurrency:
    # https://github.community/t/concurrecy-not-work-for-push/183068/7
    group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-wasm
    cancel-in-progress: true
    steps:
    - name: Checkout pandas Repo
    uses: actions/checkout@v4
    with:
    fetch-depth: 0
    - name: Set up Python for Pyodide
    id: setup-python
    uses: actions/setup-python@v5
    with:
    python-version: '3.11.3'
    - name: Set up Emscripten toolchain
    uses: mymindstorm/setup-emsdk@v14
    with:
    version: '3.1.46'
    actions-cache-folder: emsdk-cache
    - name: Install pyodide-build
    run: pip install "pyodide-build==0.25.1"
    - name: Build pandas for Pyodide
    run: |
    pyodide build
    - name: Set up Node.js
    uses: actions/setup-node@v4
    with:
    node-version: '18'
    - name: Set up Pyodide virtual environment
    run: |
    pyodide venv .venv-pyodide
    source .venv-pyodide/bin/activate
    pip install dist/*.whl
    - name: Test pandas for Pyodide
    env:
    PANDAS_CI: 1
    run: |
    source .venv-pyodide/bin/activate
    pip install pytest hypothesis
    # do not import pandas from the checked out repo
    cd ..
    python -c 'import pandas as pd; pd.test(extra_args=["-m not clipboard and not single_cpu and not slow and not network and not db"])'
    is here to stay, and that pandas continues to build WASM wheels in that manner. That said, I've just noticed that said job builds WASM wheels against an older version of Pyodide for some reason, so I'll bump it in a separate PR right now.
  • Currently, the version of jupyterlite-pyodide-kernel controls the Pyodide version (0.27.1 right now), and the Pyodide version controls the versions of packages built with it, including pandas. Here's a list of available packages at the moment: https://pyodide.org/en/stable/usage/packages-in-pyodide.html. With Pyodide 0.28, we have been working on unvendoring the recipes from the Pyodide runtime (RFC Plans for unvendoring package recipes pyodide/pyodide#4918). Once done, we likely won't have this constraint later this year.
  • Therefore, the required maintenance right now would be to ensure that the jupyterlite-pyodide-kernel version is pinned and updated from time to time, and that the JupyterLite deployment jupyter lite build succeeds.
  • However, it is also possible to just rely on an external deployment (https://github.com/jupyterlite/demo), which means that there is less control over the pandas version and lesser customizability but more convenience. It also means that the build size for uploading on GitHub Pages will be larger, as source maps are enabled with that deployment, which means higher bandwidth usage. I've asked about it in Disable source maps for the deployment from this repository? jupyterlite/demo#151.
  • I would consider that it is okay if the pandas version is a bit outdated, as this would only be a REPL that can serve basic functionality for pandas for newcomers, as they won't need access to cutting-edge features with the nightlies. However, we have also been working on having an appropriate place for nightly WASM wheels (which pandas already uploads to Anaconda.org via BLD, CI: Use cibuildwheel to build Emscripten/Pyodide wheels, push nightlies to Anaconda.org #58647) for use in the docs, so a "dev" REPL is also possible to add (xref A "dev" version of the REPL sympy/live#28 for a similar need for SymPy).

Short answer: we can either build a custom deployment + have to update the Pyodide kernel from time to time (or keep it unpinned if we don't want to update it), or use a different deployment elsewhere with less control over the pandas version and other JupyterLite-specific configurations. The previous state of the terminal in the older PRs had a custom deployment, so I'll keep that here for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants