Skip to content

Commit

Permalink
Getting started 2 : Added a colab notebook, updated for local data. (#…
Browse files Browse the repository at this point in the history
…572)

* updated code to work with local data.  Added a colab notebook

- simplified python and ray notebooks to work with 'Input-Test_Data' that is part of repo
- removed 'Output-Test-Data' directory.  This is generated
- Added a notebook that is Google Colab friendly.  This one requires no local setup.  And easy to run
- Updated project README.md accordingly

* updated file links

* fixing pip install timeouts on google colab
  • Loading branch information
sujee authored Sep 6, 2024
1 parent d0a80ba commit 51c8676
Show file tree
Hide file tree
Showing 8 changed files with 3,869 additions and 313 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,10 +131,14 @@ If there are no errors, you are good to go!

Let's try a simple transform to extract content from PDF files. We have following notebooks that demonstrate how to run a data preparation transformation that extracts content from PDF files using the data-prep-kit.

- Option 1: Pure python notebook : [examples/notebooks/Run_your_first_transform_python.ipynb](examples/notebooks/Run_your_first_transform_python.ipynb) - easiest to get started
- Option 2: This one uses Ray framework for parallel execution while still allowing local processing : [examples/notebooks/Run_your_first_transform_ray.ipynb](examples/notebooks/Run_your_first_transform_ray.ipynb)
**Notebook versions**

You can try one or all 😄

- Option 1: Google Colab friendly notebook (no setup necessary, easiest to get started): [examples/notebooks/Run_your_first_transform_colab.ipynb](examples/notebooks/Run_your_first_transform_colab.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/data-prep-kit/blob/dev/examples/notebooks/Run_your_first_transform_colab.ipynb)
- Option 2: Pure python notebook (runs locally) : [examples/notebooks/Run_your_first_transform_python.ipynb](examples/notebooks/Run_your_first_transform_python.ipynb) - easiest to get started
- Option 3: Ray version (runs locally): This one uses Ray framework for parallel execution while still allowing local processing - [examples/notebooks/Run_your_first_transform_ray.ipynb](examples/notebooks/Run_your_first_transform_ray.ipynb)

You can try either one, or both 😄

To run the notebooks, launch jupyter from the same virtual environment you created using the command below.

Expand Down
3 changes: 1 addition & 2 deletions examples/notebooks/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
Input-Test-Data
Output-Test-Data
Output-Test-Data*
Binary file not shown.
Binary file not shown.
62 changes: 0 additions & 62 deletions examples/notebooks/Output-Test-Data/metadata.json

This file was deleted.

Loading

0 comments on commit 51c8676

Please sign in to comment.