Skip to content

Commit

Permalink
add Colab support to the notebooks; pack config files in `sam2_config…
Browse files Browse the repository at this point in the history
…s` package during installation (#176)
  • Loading branch information
ronghanghu authored Aug 8, 2024
1 parent 6186d15 commit d421e0b
Show file tree
Hide file tree
Showing 6 changed files with 286 additions and 114 deletions.
4 changes: 2 additions & 2 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you see a message like `Skipping the post-processing step due to the error ab

If you would like to enable this post-processing step, you can reinstall SAM 2 on a GPU machine with environment variable `SAM2_BUILD_ALLOW_ERRORS=0` to force building the CUDA extension (and raise errors if it fails to build), as follows
```bash
pip uninstall -y SAM-2; SAM2_BUILD_ALLOW_ERRORS=0 pip install -v -e ".[demo]"
pip uninstall -y SAM-2; rm -f sam2/*.so; SAM2_BUILD_ALLOW_ERRORS=0 pip install -v -e ".[demo]"
```

Note that PyTorch needs to be installed first before building the SAM 2 CUDA extension. It's also necessary to install [CUDA toolkits](https://developer.nvidia.com/cuda-toolkit-archive) that match the CUDA version for your PyTorch installation. (This should typically be CUDA 12.1 if you follow the default installation command.) After installing the CUDA toolkits, you can check its version via `nvcc --version`.
Expand Down Expand Up @@ -56,7 +56,7 @@ I got `MissingConfigException: Cannot find primary config 'sam2_hiera_l.yaml'`

This is usually because you haven't run the `pip install -e .` step above, so `sam2_configs` isn't in your Python's `sys.path`. Please run this installation step. In case it still fails after the installation step, you may try manually adding the root of this repo to `PYTHONPATH` via
```bash
export SAM2_REPO_ROOT=/path/to/segment-anything # path to this repo
export SAM2_REPO_ROOT=/path/to/segment-anything-2 # path to this repo
export PYTHONPATH="${SAM2_REPO_ROOT}:${PYTHONPATH}"
```
to manually add `sam2_configs` into your Python's `sys.path`.
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
masks, _, _ = predictor.predict(<input_prompts>)
```

Please refer to the examples in [image_predictor_example.ipynb](./notebooks/image_predictor_example.ipynb) for static image use cases.
Please refer to the examples in [image_predictor_example.ipynb](./notebooks/image_predictor_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb)) for static image use cases.

SAM 2 also supports automatic mask generation on images just like SAM. Please see [automatic_mask_generator_example.ipynb](./notebooks/automatic_mask_generator_example.ipynb) for automatic mask generation in images.
SAM 2 also supports automatic mask generation on images just like SAM. Please see [automatic_mask_generator_example.ipynb](./notebooks/automatic_mask_generator_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/automatic_mask_generator_example.ipynb)) for automatic mask generation in images.

### Video prediction

Expand All @@ -99,7 +99,7 @@ with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
...
```

Please refer to the examples in [video_predictor_example.ipynb](./notebooks/video_predictor_example.ipynb) for details on how to add click or box prompts, make refinements, and track multiple objects in videos.
Please refer to the examples in [video_predictor_example.ipynb](./notebooks/video_predictor_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/video_predictor_example.ipynb)) for details on how to add click or box prompts, make refinements, and track multiple objects in videos.

## Load from 🤗 Hugging Face

Expand Down
92 changes: 69 additions & 23 deletions notebooks/automatic_mask_generator_example.ipynb

Large diffs are not rendered by default.

159 changes: 103 additions & 56 deletions notebooks/image_predictor_example.ipynb

Large diffs are not rendered by default.

137 changes: 107 additions & 30 deletions notebooks/video_predictor_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,83 @@
"- propagating clicks (or box) to get _masklets_ throughout the video\n",
"- segmenting and tracking multiple objects at the same time\n",
"\n",
"We use the terms _segment_ or _mask_ to refer to the model prediction for an object on a single frame, and _masklet_ to refer to the spatio-temporal masks across the entire video. \n",
"We use the terms _segment_ or _mask_ to refer to the model prediction for an object on a single frame, and _masklet_ to refer to the spatio-temporal masks across the entire video. "
]
},
{
"cell_type": "markdown",
"id": "a887b90f-6576-4ef8-964e-76d3a156ccb6",
"metadata": {},
"source": [
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/video_predictor_example.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>"
]
},
{
"cell_type": "markdown",
"id": "26616201-06df-435b-98fd-ad17c373bb4a",
"metadata": {},
"source": [
"## Environment Set-up"
]
},
{
"cell_type": "markdown",
"id": "8491a127-4c01-48f5-9dc5-f148a9417fdf",
"metadata": {},
"source": [
"If running locally using jupyter, first install `segment-anything-2` in your environment using the [installation instructions](https://github.com/facebookresearch/segment-anything-2#installation) in the repository.\n",
"\n",
"If running locally using jupyter, first install `segment-anything-2` in your environment using the [installation instructions](https://github.com/facebookresearch/segment-anything-2#installation) in the repository."
"If running from Google Colab, set `using_colab=True` below and run the cell. In Colab, be sure to select 'GPU' under 'Edit'->'Notebook Settings'->'Hardware accelerator'. Note that it's recommended to use **A100 or L4 GPUs when running in Colab** (T4 GPUs might also work, but could be slow and might run out of memory in some cases)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f74c53be-aab1-46b9-8c0b-068b52ef5948",
"metadata": {},
"outputs": [],
"source": [
"using_colab = False"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d824a4b2-71f3-4da3-bfc7-3249625e6730",
"metadata": {},
"outputs": [],
"source": [
"if using_colab:\n",
" import torch\n",
" import torchvision\n",
" print(\"PyTorch version:\", torch.__version__)\n",
" print(\"Torchvision version:\", torchvision.__version__)\n",
" print(\"CUDA is available:\", torch.cuda.is_available())\n",
" import sys\n",
" !{sys.executable} -m pip install opencv-python matplotlib\n",
" !{sys.executable} -m pip install 'git+https://github.com/facebookresearch/segment-anything-2.git'\n",
"\n",
" !mkdir -p videos\n",
" !wget -P videos https://dl.fbaipublicfiles.com/segment_anything_2/assets/bedroom.zip\n",
" !unzip -d videos videos/bedroom.zip\n",
"\n",
" !mkdir -p ../checkpoints/\n",
" !wget -P ../checkpoints/ https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt"
]
},
{
"cell_type": "markdown",
"id": "22e6aa9d-487f-4207-b657-8cff0902343e",
"metadata": {},
"source": [
"## Set-up"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e5318a85-5bf7-4880-b2b3-15e4db24d796",
"metadata": {},
"outputs": [],
Expand All @@ -50,7 +119,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "08ba49d8-8c22-4eba-a2ab-46eee839287f",
"metadata": {},
"outputs": [],
Expand All @@ -74,7 +143,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"id": "f5f3245e-b4d6-418b-a42a-a67e0b3b5aec",
"metadata": {},
"outputs": [],
Expand All @@ -89,7 +158,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "1a5320fe-06d7-45b8-b888-ae00799d07fa",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -143,17 +212,17 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"id": "b94c87ca-fd1a-4011-9609-e8be1cbe3230",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.image.AxesImage at 0x7f884825eef0>"
"<matplotlib.image.AxesImage at 0x7fdeec360250>"
]
},
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
Expand Down Expand Up @@ -206,15 +275,15 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"id": "8967aed3-eb82-4866-b8df-0f4743255c2c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"frame loading (JPEG): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:05<00:00, 33.78it/s]\n"
"frame loading (JPEG): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:05<00:00, 35.92it/s]\n"
]
}
],
Expand Down Expand Up @@ -242,7 +311,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 10,
"id": "d2646a1d-3401-438c-a653-55e0e56b7d9d",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -272,7 +341,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 11,
"id": "3e749bab-0f36-4173-bf8d-0c20cd5214b3",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -333,7 +402,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 12,
"id": "e1ab3ec7-2537-4158-bf98-3d0977d8908d",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -399,15 +468,15 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 13,
"id": "ab45e932-b0d5-4983-9718-6ee77d1ac31b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 23.85it/s]\n"
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 23.76it/s]\n"
]
},
{
Expand Down Expand Up @@ -591,7 +660,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 14,
"id": "1a572ea9-5b7e-479c-b30c-93c38b121131",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -664,15 +733,15 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 15,
"id": "baa96690-4a38-4a24-aa17-fd2f4db0e232",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 23.94it/s]\n"
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 23.93it/s]\n"
]
},
{
Expand Down Expand Up @@ -862,7 +931,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 16,
"id": "6dbe9183-abbb-4283-b0cb-d24f3d7beb34",
"metadata": {},
"outputs": [],
Expand All @@ -882,7 +951,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 17,
"id": "1cbfb273-4e14-495b-bd89-87a8baf52ae7",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -932,7 +1001,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 18,
"id": "54906315-ab4c-4088-b866-4c22134d5b66",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -986,15 +1055,15 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 19,
"id": "9cd90557-a0dc-442e-b091-9c74c831bef8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 24.05it/s]\n"
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 23.71it/s]\n"
]
},
{
Expand Down Expand Up @@ -1158,6 +1227,14 @@
" show_mask(out_mask, plt.gca(), obj_id=out_obj_id)"
]
},
{
"cell_type": "markdown",
"id": "e023f91f-0cc5-4980-ae8e-a13c5749112b",
"metadata": {},
"source": [
"Note that in addition to clicks or boxes, SAM 2 also supports directly using a **mask prompt** as input via the `add_new_mask` method in the `SAM2VideoPredictor` class. This can be helpful in e.g. semi-supervised VOS evaluations (see [tools/vos_inference.py](https://github.com/facebookresearch/segment-anything-2/blob/main/tools/vos_inference.py) for an example)."
]
},
{
"cell_type": "markdown",
"id": "da018be8-a4ae-4943-b1ff-702c2b89cb68",
Expand All @@ -1176,7 +1253,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 20,
"id": "29b874c8-9f39-42d3-a667-54a0bd696410",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -1204,7 +1281,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 21,
"id": "e22d896d-3cd5-4fa0-9230-f33e217035dc",
"metadata": {},
"outputs": [],
Expand All @@ -1224,7 +1301,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 22,
"id": "d13432fc-f467-44d8-adfe-3e0c488046b7",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -1276,7 +1353,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 23,
"id": "95ecf61d-662b-4f98-ae62-46557b219842",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -1334,7 +1411,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 24,
"id": "86ca1bde-62a4-40e6-98e4-15606441e52f",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -1407,15 +1484,15 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 25,
"id": "17737191-d62b-4611-b2c6-6d0418a9ab74",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:10<00:00, 19.93it/s]\n"
"propagate in video: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:10<00:00, 19.77it/s]\n"
]
},
{
Expand Down
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ def get_ext_filename(self, ext_name):
author_email=AUTHOR_EMAIL,
license=LICENSE,
packages=find_packages(exclude="notebooks"),
package_data={"": ["*.yaml"]}, # SAM 2 configuration files
include_package_data=True,
install_requires=REQUIRED_PACKAGES,
extras_require=EXTRA_PACKAGES,
python_requires=">=3.10.0",
Expand Down

0 comments on commit d421e0b

Please sign in to comment.