Add Model Evaluation panel callbacks for segmentation tasks #5332

brimoor · 2025-01-01T00:32:57Z

Change log

Adds class/confusion matrix callbacks when evaluating segmentations

TODO

Benchmark performance of using label IDs to implement load_view()
Investigate alternatives that don't require calling load_evaluation_results()? cf this comment
Figure out a way to show TP/FP/FN data in the ME panel?
Figure out a way to support TP/FP/FN filtering in the App sidebar?

Example usage

import random

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.core.fields as fof
import fiftyone.utils.labels as foul

dataset = foz.load_zoo_dataset("quickstart", max_samples=10, shuffle=True)

model = foz.load_zoo_model("deeplabv3-resnet101-coco-torch")
dataset.apply_model(model, "resnet101")

model = foz.load_zoo_model("deeplabv3-resnet50-coco-torch")
dataset.apply_model(model, "resnet50")

CLASSES = (
    "background,aeroplane,bicycle,bird,boat,bottle,bus,car,cat,chair,cow," +
    "diningtable,dog,horse,motorbike,person,pottedplant,sheep,sofa,train," +
    "tvmonitor"
)
classes = CLASSES.split(",")
mask_targets = {idx: label for idx, label in enumerate(classes)}

rgb_mask_targets = {
    fof.int_to_hex(random.getrandbits(64)): label
    for label in CLASSES.split(",")
}

_mask_targets = {v: k for k, v in mask_targets.items()}
_rgb_mask_targets = {v: k for k, v in rgb_mask_targets.items()}
targets_map = {_mask_targets[c]: _rgb_mask_targets[c] for c in classes}

dataset.clone_sample_field("resnet101", "resnet101_rgb")
foul.transform_segmentations(
    dataset,
    "resnet101_rgb",
    targets_map,
)

dataset.clone_sample_field("resnet50", "resnet50_rgb")
foul.transform_segmentations(
    dataset,
    "resnet50_rgb",
    targets_map,
)

dataset.mask_targets["resnet101"] = mask_targets
dataset.mask_targets["resnet50"] = mask_targets
dataset.mask_targets["resnet101_rgb"] = rgb_mask_targets
dataset.mask_targets["resnet50_rgb"] = rgb_mask_targets
dataset.save()

# Evaluation with int mask targets
dataset.evaluate_segmentations(
    "resnet50",
    gt_field="resnet101",
    eval_key="eval",
)

# Evaluation with RGB mask targets
dataset.evaluate_segmentations(
    "resnet50_rgb",
    gt_field="resnet101_rgb",
    eval_key="eval_rgb",
    mask_targets=dataset.mask_targets["resnet50_rgb"],
)

sesssion = fo.launch_app(dataset)

Summary by CodeRabbit

Release Notes

New Features
- Added color conversion utilities for hex and integer representations
- Enhanced segmentation evaluation with custom metrics support
- Introduced dataset caching mechanism
- Improved model evaluation panel functionality
Improvements
- Refined handling of segmentation results
- Added more robust error handling for custom metrics
- Streamlined segmentation evaluation processes
Technical Enhancements
- Updated utility functions for more flexible data processing
- Improved compatibility between different evaluation components

coderabbitai · 2025-01-01T00:33:03Z

Walkthrough

This pull request introduces enhancements across multiple files in the FiftyOne project, focusing on improving color data handling, segmentation evaluation, and dataset caching. The changes include new utility functions for color conversions in fields.py, expanded segmentation evaluation capabilities in segmentation.py, a new dataset caching method in server/utils.py, and modifications to evaluation-related components in both Python and TypeScript files. These updates aim to provide more flexible and robust data processing and evaluation mechanisms.

Changes

File	Change Summary
`fiftyone/core/fields.py`	Added 4 new functions: `hex_to_int()`, `int_to_hex()`, `rgb_array_to_int()`, `int_array_to_rgb()` for color data conversion
`fiftyone/utils/eval/segmentation.py`	Updated evaluation functions to support custom metrics, modified `SegmentationEvaluationConfig`, `SimpleEvaluationConfig`, and `SegmentationResults` classes
`fiftyone/server/utils.py`	Added `cache_dataset()` function to cache datasets independently
`app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx`	Modified `formatCustomMetricRows` to handle `null` custom metrics more robustly
`plugins/panels/model_evaluation/__init__.py`	Added new methods for custom metrics retrieval and segmentation evaluation handling

Sequence Diagram

sequenceDiagram
    participant User
    participant EvaluationPanel
    participant SegmentationEvaluator
    participant Dataset

    User->>EvaluationPanel: Initiate Evaluation
    EvaluationPanel->>Dataset: Retrieve Mask Targets
    Dataset-->>EvaluationPanel: Return Mask Targets
    EvaluationPanel->>SegmentationEvaluator: Evaluate with Custom Metrics
    SegmentationEvaluator->>SegmentationEvaluator: Process Segmentation Results
    SegmentationEvaluator-->>EvaluationPanel: Return Evaluation Results

Possibly related PRs

various model evaluation fixes and enhancements #5123: Evaluation component enhancements related to metrics processing
model evaluation bug fixes #5166: Modifications to Evaluation component dialog functionalities
fix confusion matrix in model evaluation panel #5186: Confusion matrix functionality improvements

Suggested Labels

enhancement

Suggested Reviewers

imanjra
lanzhenw
manivoxel51

Poem

🐰 Hopping through code with glee,
Colors convert, metrics run free!
Segmentation's new dance begins,
Caching datasets, where magic wins!
FiftyOne leaps with playful might! 🚀

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

plugins/panels/model_evaluation.py (4)

17-17: Unused import
import fiftyone.core.fields as fof is never referenced. Consider removing it to satisfy linters and reduce clutter.

🧰 Tools

🪛 Ruff (0.8.2)

17-17: fiftyone.core.fields imported but unused

Remove unused import: fiftyone.core.fields

(F401)

485-485: mask_targets2 assigned.
The variable is set here but appears to be overwritten later, leading to redundant assignments.

491-491: mask_targets2 is never effectively used.
Kindly remove or integrate it if necessary; currently it generates lint warnings.

🧰 Tools

🪛 Ruff (0.8.2)

491-491: Local variable mask_targets2 is assigned to but never used

Remove assignment to unused variable mask_targets2

(F841)

685-734: _init_segmentation_results: assembling ID dictionaries.
This function modifies the passed-in results object to map (i, j) pairs to lists of IDs. Be cautious about naming collisions if this is run multiple times; consider clearing any stale data.

fiftyone/core/fields.py (1)

1627-1636: Implementation of hex_to_int.
Simple bit-shift logic is correct. Provide error handling for malformed hex strings if user input is allowed.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fddb6a4 and 33a2dac.

📒 Files selected for processing (3)

fiftyone/core/fields.py (1 hunks)
fiftyone/utils/eval/segmentation.py (11 hunks)
plugins/panels/model_evaluation.py (5 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

plugins/panels/model_evaluation.py

17-17: fiftyone.core.fields imported but unused

Remove unused import: fiftyone.core.fields

(F401)

491-491: Local variable mask_targets2 is assigned to but never used

Remove assignment to unused variable mask_targets2

(F841)

🔇 Additional comments (25)

fiftyone/utils/eval/segmentation.py (13)

11-11: The added import is properly used for creating repeat iterators.
No issues identified; it is used in the _from_dict method when handling “no ID” scenarios.

337-337: Validate the hex string keys for consistent usage.
This dictionary comprehension properly converts hex string keys to integers. Consider verifying that all user-supplied hex strings follow the #RRGGBB format before conversion to avoid potential ValueError.

353-353: Initialization of the matches list.
No issues here; it neatly collects pixel-wise mapping data for subsequent analysis.

396-406: Appending match details for segmentation.
This loop accurately records ground truth/prediction associations and pixel counts. However, this can grow large for massive images or datasets. Be mindful of memory usage if used repeatedly in large-scale evaluations.

440-440: Passing the newly built matches to SegmentationResults.
Clean approach to provide the collected matches in the results constructor.

455-457: Docstring accurately reflects the new matches field.
The description matches the tuple structure from the evaluation loop.

469-469: New matches parameter defaults to None.
This is a good backward-compatible signature update.

Line range hint 475-492: Conditional handling of matches.
The fallback to parse pixel_confusion_matrix when matches is None ensures compatibility with legacy workflows. Watch for potential ValueError if ytrue, ypred, weights do not align in length.

510-529: Consistent _from_dict logic for matches.
Correctly handles both new and legacy (no IDs) formats, merging them into a uniform list of tuples.

534-534: Passing the reconstructed matches in _from_dict.
Good consistency with the constructor.

594-597: RGB to integer masking for dimensional consistency.
Properly uses fof.rgb_array_to_int to handle multi-channel arrays.

670-670: Use of new utility for RGB array conversion.
Reusing fof.rgb_array_to_int avoids code duplication.

677-677: Generating hex class labels from integer-based values.
Efficient approach for color-coded segmentation classes.

plugins/panels/model_evaluation.py (8)

13-13: Necessary import for ObjectId usage.
This is used in _to_object_ids.

350-358: Loading segmentation results and initializing them.
Assigning mask_targets and calling _init_segmentation_results is a clear approach to unify the data before proceeding with metrics. Make sure to handle any potential logging or warnings if results are partially missing.

596-611: Segmentations with legacy format.
Early returns handle older data where IDs don’t exist. Ensure end users receive a clear message if early-exiting leads to partial data in the UI.

612-664: Match expressions for segmentation subviews.
This logic effectively filters segmentation results by class/matrix cell. It might be beneficial to confirm performance on large datasets, as multiple .is_in() calls could be costly.

736-752: _get_segmentation_class_ids: retrieving matching IDs by class.
Check for key existence in results._classes_map[x] to avoid KeyError if x is not recognized.

755-760: _get_segmentation_conf_mat_ids: confusion matrix IDs.
Straightforward approach to isolate matches. This is well-structured.

762-780: _get_segmentation_tp_fp_fn_ids: basic classification logic for pixel-level segmentation.
The approach is consistent with typical definitions of TP, FP, and FN. If large sets are expected, consider memory usage.

782-783: _to_object_ids: converting string IDs to ObjectId.
Simple utility that is helpful for consistent typed usage. Ensure _id is always a valid string to avoid conversion errors.

fiftyone/core/fields.py (4)

1624-1625: hex_to_int function declaration and docstring.
Docstring is clear; confirm that input always starts with '#' and contains exactly 6 hex characters.

1639-1652: int_to_hex: Reverse conversion from int to hex.
Logic is standard. No issues observed.

1654-1668: rgb_array_to_int: Transforming 3D RGB arrays to 2D integer arrays.
The use of NumPy bit-shifts is efficient and readable. Ensure mask is always [..., 3] shape or raise warnings.

1670-1684: int_array_to_rgb: Restoring 3D RGB arrays from integer-based masks.
Works in parallel with rgb_array_to_int. Usage of np.stack is correct.

fiftyone/utils/eval/segmentation.py

prernadh · 2025-01-02T17:36:03Z

plugins/panels/model_evaluation.py

+                expr = F(gt_id).is_in(ytrue_ids)
+                expr &= F(pred_id).is_in(ypred_ids)


Mmmm, I see this expression should evaluate much faster than select_labels since you are specifying which labels to look for in which fields. Nice

prernadh · 2025-01-02T17:38:15Z

plugins/panels/model_evaluation.py

+    if field == "tp":
+        # True positives
+        inds = results.ytrue == results.ypred
+        ytrue_ids = _to_object_ids(results.ytrue_ids[inds])
+        ypred_ids = _to_object_ids(results.ypred_ids[inds])
+        return ytrue_ids, ypred_ids
+    elif field == "fn":
+        # False negatives
+        inds = results.ypred == results.missing
+        ytrue_ids = _to_object_ids(results.ytrue_ids[inds])
+        return ytrue_ids, None
+    else:
+        # False positives
+        inds = results.ytrue == results.missing
+        ypred_ids = _to_object_ids(results.ypred_ids[inds])
+        return None, ypred_ids


Would we want to move this tp/fp/fn calculation to utils/eval/segmentation.py and make it a sample level field so we can filter on it - similar to detections?

I'm not sure what we'd store as a sample-level field. The TP/FP/FN designation has to do with each region in the segmentation mask, so there would usually be multiple per sample (like object detection tasks for example), but the the mask is stored in a single Segmentation field (unlike object detection).

Hmm.. I see what you are saying- but I guess my confusion is arising from the fact that if we are marking labels as TP/FP/FN, we should be able to filter by it at the sample level too.

I agree that it would be nice to support more sample-level filtering, but I don't know what to do!

Yep understood. Makes sense to leave as is for now then. Maybe something to discuss with the ML team

prernadh

Just a question about high level TP/FP/FN design

prernadh · 2025-01-02T21:51:40Z

plugins/panels/model_evaluation.py

+            elif view_type == "field":
+                if field == "tp":
+                    # All true positives
+                    ytrue_ids, ypred_ids = _get_segmentation_tp_fp_fn_ids(
+                        results, field
+                    )
+                    expr = F(gt_id).is_in(ytrue_ids)
+                    expr &= F(pred_id).is_in(ypred_ids)
+                    view = eval_view.match(expr)
+                elif field == "fn":
+                    # All false negatives
+                    ytrue_ids, _ = _get_segmentation_tp_fp_fn_ids(
+                        results, field
+                    )
+                    expr = F(gt_id).is_in(ytrue_ids)
+                    view = eval_view.match(expr)
+                else:
+                    # All false positives
+                    _, ypred_ids = _get_segmentation_tp_fp_fn_ids(
+                        results, field
+                    )
+                    expr = F(pred_id).is_in(ypred_ids)
+                    view = eval_view.match(expr)


We currently don't display FP/FN/TP in the summary table- get_tp_fp_fn function will have to be updated for segmentations if we ever want to reach this section of the code is my understanding

Yeah I went ahead and implemented the callback so it would be available if we found a reasonable way to show TP/FP/FN icons in the panel. Not quite sure what the best way to show this info would be though.

Oh I understand now! Realized you had also left comment on the other PR explaining this. Apologies

fiftyone/utils/eval/segmentation.py

prernadh

LGTM! 😄

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

fiftyone/server/utils.py (2)
52-65: Implementation looks good but needs additional validation.

The function is well-documented and serves a clear purpose in preventing garbage collection of dataset singletons between async calls.

Consider adding these validations for robustness:
 def cache_dataset(dataset):
     """Caches the given dataset.
 
     This method ensures that subsequent calls to
     :func:`fiftyone.core.dataset.load_dataset` in async calls will return this
     dataset singleton.
 
     See :meth:`load_and_cache_dataset` for additional details.
 
     Args:
         dataset: a :class:`fiftyone.core.dataset.Dataset`
+
+    Raises:
+        ValueError: if the dataset is None or not a Dataset instance
+        ValueError: if the dataset name is None
     """
+    if dataset is None or not isinstance(dataset, fod.Dataset):
+        raise ValueError("Expected a Dataset instance, but got %r" % dataset)
+
+    if dataset.name is None:
+        raise ValueError("Cannot cache dataset with None name")
+
     _cache[dataset.name] = dataset
52-65: Consider architectural improvements for the caching mechanism.

The current TTL cache configuration (maxsize=10, ttl=900s) might need adjustment based on usage patterns:

Limited cache size could lead to premature evictions in high-concurrency scenarios

No monitoring of cache effectiveness

Consider these improvements:

Make cache size and TTL configurable via environment variables

Add cache statistics/metrics for monitoring

Implement a more sophisticated eviction strategy based on dataset size and access patterns

Add logging for cache hits/misses to help tune the configuration

Would you like me to propose a detailed implementation for any of these improvements?
plugins/panels/model_evaluation.py (2)
17-17: Remove unused import.

The import fiftyone.core.fields is not used in the code.
-import fiftyone.core.fields as fof
🧰 Tools

🪛 Ruff (0.8.2)

17-17: fiftyone.core.fields imported but unused

Remove unused import: fiftyone.core.fields

(F401)

687-748: Consider adding error handling for invalid mask targets.

The initialization logic is robust, but it might benefit from additional error handling when dealing with mask targets.

Consider adding validation:
 def _init_segmentation_results(dataset, results, gt_field):
     if results.ytrue_ids is None or results.ypred_ids is None:
         # Legacy format segmentations
         return

     if getattr(results, "_classes_map", None):
         # Already initialized
         return

     fosu.cache_dataset(dataset)

     classes_map = {c: i for i, c in enumerate(results.classes)}

     mask_targets = _get_mask_targets(dataset, gt_field)
     if mask_targets is not None:
+        # Validate mask targets
+        if not isinstance(mask_targets, dict):
+            raise ValueError(f"Expected dict for mask_targets, got {type(mask_targets)}")
+
         mask_targets = {str(k): v for k, v in mask_targets.items()}
         classes = [mask_targets.get(c, c) for c in results.classes]
         classes_map.update({c: i for i, c in enumerate(classes)})

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e45561 and d5fc685.

📒 Files selected for processing (2)

fiftyone/server/utils.py (1 hunks)
plugins/panels/model_evaluation.py (4 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

plugins/panels/model_evaluation.py

17-17: fiftyone.core.fields imported but unused

Remove unused import: fiftyone.core.fields

(F401)

🔇 Additional comments (7)

plugins/panels/model_evaluation.py (7)

340-348: LGTM! Well-structured initialization of segmentation results.

The code properly initializes segmentation-specific data by getting mask targets and initializing results.

582-653: LGTM! Comprehensive segmentation view filtering.

The implementation handles all necessary cases for segmentation tasks:

Class-based filtering

Confusion matrix cell filtering

TP/FP/FN filtering

676-684: LGTM! Clear mask target retrieval logic.

The function follows a clear precedence order for retrieving mask targets.

750-767: LGTM! Efficient class ID retrieval.

The function efficiently retrieves ground truth and predicted IDs for a given class.

769-774: LGTM! Clear confusion matrix ID retrieval.

The function provides a straightforward way to get IDs for specific confusion matrix cells.

777-794: LGTM! Comprehensive TP/FP/FN handling.

The function handles all cases for true positives, false positives, and false negatives correctly.

796-797: LGTM! Simple ObjectId conversion.

The utility function correctly converts string IDs to BSON ObjectIds.

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5fc685 and 7e45bd2.

📒 Files selected for processing (2)

fiftyone/core/fields.py (1 hunks)
fiftyone/server/utils.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

fiftyone/server/utils.py

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: lint / eslint
GitHub Check: e2e / test-e2e
GitHub Check: build / build
GitHub Check: test / test-app
GitHub Check: build

🔇 Additional comments (1)

fiftyone/core/fields.py (1)

1624-1684: Add unit tests for the new color conversion functions.

The new color conversion functions need comprehensive unit tests to verify their behavior with:

Valid inputs

Edge cases (e.g., black, white, primary colors)

Invalid inputs (to verify error handling)

Would you like me to generate a comprehensive test suite for these functions?

fiftyone/core/fields.py

The base branch was changed.

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 9

🧹 Nitpick comments (41)

app/packages/core/src/plugins/SchemaIO/index.tsx (1)
23-26: Add TypeScript type annotations for better type safety.

The value coercion logic is correct and prevents mutation through proper use of cloneDeep. However, the code would benefit from TypeScript type annotations.

Add type annotations to improve type safety:
-function coerceValue(value, schema) {
+interface Schema {
+  type: string;
+}
+
+function coerceValue(value: unknown, schema: Schema): unknown {
-const onIOChange = useCallback(
-  (path, value, schema, ancestors) => {
+type Ancestors = Record<string, Schema>;
+
+const onIOChange = useCallback(
+  (path: string, value: unknown, schema: Schema, ancestors: Ancestors) => {
app/packages/looker/src/processOverlays.ts (1)
41-44: Consider using spread operator for better readability.

While using push with reduce works, using the spread operator would be more idiomatic and concise:
-  let ordered = activePaths.reduce((acc, cur) => {
-    acc.push(...bins[cur]);
-    return acc;
-  }, []);
+  let ordered = activePaths.reduce((acc, cur) => [...acc, ...bins[cur]], []);
app/packages/looker/src/processOverlays.test.ts (1)

7-10: Consider a more descriptive constant name.

The constant EMPTY could be renamed to better describe its purpose, such as EMPTY_OVERLAY_CONFIG.
app/packages/looker/src/worker/mask-decoder.test.ts (2)
41-67: Consider using a more type-safe approach for Blob creation.

Instead of type assertion with unknown, consider creating a proper Blob interface for testing.
-  } as unknown as Blob;
+  interface TestBlob extends Blob {
+    arrayBuffer(): Promise<ArrayBuffer>;
+    slice(start?: number, end?: number): Blob;
+  }
+  } as TestBlob;
92-140: Consider adding error handling test cases.

The test suite covers the main functionality well, but could benefit from additional test cases for:

Invalid PNG headers

Corrupted image data

Network errors during blob handling
app/packages/looker/src/worker/disk-overlay-decoder.test.ts (2)

Line range hint 31-205: Consider adding tests for maskPathDecodingPromises handling.

While the test coverage is good, consider adding test cases to verify:

The order of promise execution in maskPathDecodingPromises

Concurrent decoding behavior

Promise rejection handling in the promises array

Line range hint 207-282: Add tests for additional edge cases.

Consider adding test cases for:

Large images to verify memory handling

Invalid stride values

Non-standard image dimensions (e.g., very wide, very tall)
app/packages/looker/src/worker/disk-overlay-decoder.ts (2)
Line range hint 20-36: Consider refactoring function parameters for better maintainability.

The function has 10 parameters which makes it difficult to maintain and use correctly. Consider grouping related parameters into objects.
-export const decodeOverlayOnDisk = async (
-  field: string,
-  label: Record<string, any>,
-  coloring: Coloring,
-  customizeColorSetting: CustomizeColor[],
-  colorscale: Colorscale,
-  sources: { [path: string]: string },
-  cls: string,
-  maskPathDecodingPromises: Promise<void>[] = [],
-  maskTargetsBuffers: ArrayBuffer[] = [],
-  overlayCollectionProcessingParams:
-    | { idx: number; cls: string }
-    | undefined = undefined
-) => {
+type DecodingConfig = {
+  field: string;
+  label: Record<string, any>;
+  coloring: Coloring;
+  customizeColorSetting: CustomizeColor[];
+  colorscale: Colorscale;
+  sources: { [path: string]: string };
+  cls: string;
+  maskPathDecodingPromises?: Promise<void>[];
+  maskTargetsBuffers?: ArrayBuffer[];
+  overlayCollectionProcessingParams?: {
+    idx: number;
+    cls: string;
+  };
+};
+
+export const decodeOverlayOnDisk = async ({
+  field,
+  label,
+  coloring,
+  customizeColorSetting,
+  colorscale,
+  sources,
+  cls,
+  maskPathDecodingPromises = [],
+  maskTargetsBuffers = [],
+  overlayCollectionProcessingParams,
+}: DecodingConfig) => {
117-122: Enhance error message with more context.

While the error handling is good, the error message could be more descriptive by including context about what failed (e.g., image URL, class type).
 if (!overlayMask) {
-      console.error("Overlay mask decoding failed");
+      console.error(`Overlay mask decoding failed for class ${cls}, URL: ${overlayImageUrl}`);
       return;
 }
app/packages/operators/src/components/OperatorExecutionTrigger/index.tsx (2)
45-45: Consider specifying a specific interface for 'executionParams' instead of using 'object'.

Using a specific interface or type for executionParams enhances type safety and code readability. It allows TypeScript to provide better type-checking and autocompletion support.

Apply this diff to define a custom type for executionParams:
-  executionParams?: object;
+  executionParams?: ExecutionParamsType;
Where ExecutionParamsType is defined as:
type ExecutionParamsType = {
  // Define the shape of execution parameters here
};
83-83: Memoize 'executionParams' and 'executorOptions' to prevent unnecessary re-creations of 'onExecute' callback.

Since executionParams and executorOptions are included in the dependency array of useCallback, if they are not memoized, the onExecute callback will be recreated on every render. Memoizing these props can improve performance by preventing unnecessary re-renders.

You can memoize these props using useMemo:
+const memoizedExecutionParams = useMemo(() => executionParams, [executionParams]);
+const memoizedExecutorOptions = useMemo(() => executorOptions, [executorOptions]);

const onExecute = useCallback(
  (options?: OperatorExecutorOptions) => {
    const resolvedOptions = {
-     ...executorOptions,
+     ...memoizedExecutorOptions,
      ...options,
    };

-   return operator.execute(executionParams ?? {}, resolvedOptions);
+   return operator.execute(memoizedExecutionParams ?? {}, resolvedOptions);
  },
- [executorOptions, operator, executionParams]
+ [memoizedExecutorOptions, operator, memoizedExecutionParams]
);
fiftyone/utils/eval/regression.py (1)
587-587: Simplify the 'get' method call by removing the redundant default value.

Since the default return value of dict.get() when the key is missing is None, specifying None explicitly is unnecessary.

Apply this diff to simplify the code:
-    custom_metrics = kwargs.get("custom_metrics", None)
+    custom_metrics = kwargs.get("custom_metrics")
🧰 Tools

🪛 Ruff (0.8.2)

587-587: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)
fiftyone/utils/eval/detection.py (1)
641-641: Simplify kwargs.get by removing the unnecessary default value

The default value None in kwargs.get("custom_metrics", None) is unnecessary because None is the default return value of dict.get() when the key is missing.

Apply this diff to simplify the code:
-    custom_metrics = kwargs.get("custom_metrics", None)
+    custom_metrics = kwargs.get("custom_metrics")
🧰 Tools

🪛 Ruff (0.8.2)

641-641: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)
fiftyone/utils/eval/base.py (2)
85-85: Simplify dictionary iteration by removing .keys()

When iterating over a dictionary, you can iterate directly over its keys without explicitly calling the .keys() method. This is more Pythonic and concise.

Apply this diff to simplify the loops:
-        for metric in self._get_custom_metrics().keys():
+        for metric in self._get_custom_metrics():
This change applies to lines 85, 101, and 113.

Also applies to: 101-101, 113-113

🧰 Tools

🪛 Ruff (0.8.2)

85-85: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

179-179: Rename unused loop variable uri to _uri

The loop variable uri is not used within the loop body. Renaming it to _uri indicates that it is intentionally unused.

Apply this diff to rename the variable:
-            for uri, metric in self.custom_metrics.items():
+            for _uri, metric in self.custom_metrics.items():
🧰 Tools

🪛 Ruff (0.8.2)

179-179: Loop control variable uri not used within loop body

Rename unused uri to _uri

(B007)
plugins/panels/model_evaluation/__init__.py (1)
17-17: Remove unused import fiftyone.core.fields

The imported module fiftyone.core.fields is not used in the code, so it can be safely removed to clean up the imports.

Apply this diff to remove the unused import:
-import fiftyone.core.fields as fof
🧰 Tools

🪛 Ruff (0.8.2)

17-17: fiftyone.core.fields imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
fiftyone/utils/eval/classification.py (1)
926-926: Simplify kwargs.get() usage

You can omit the default value None in kwargs.get("custom_metrics", None), as None is the default return value when the key is missing.

Apply this diff to simplify the code:
-    custom_metrics = kwargs.get("custom_metrics", None)
+    custom_metrics = kwargs.get("custom_metrics")
🧰 Tools

🪛 Ruff (0.8.2)

926-926: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)
plugins/operators/__init__.py (4)
647-647: Simplify dictionary iteration by removing .keys()

When iterating over a dictionary, you can iterate directly over it instead of calling .keys().

Apply this diff to streamline the iteration:
-    for key in schema.keys():
+    for key in schema:
🧰 Tools

🪛 Ruff (0.8.2)

647-647: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

755-755: Simplify dictionary iteration by removing .keys()

You can iterate directly over the dictionary without using .keys().

Apply this diff:
-    for key in schema.keys():
+    for key in schema:
🧰 Tools

🪛 Ruff (0.8.2)

755-755: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

907-907: Simplify dictionary iteration by removing .keys()

When iterating over a dictionary, there's no need to call .keys(); you can iterate over the dictionary directly.

Apply this diff:
-    for key in schema.keys():
+    for key in schema:
🧰 Tools

🪛 Ruff (0.8.2)

907-907: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

982-982: Simplify dictionary iteration by removing .keys()

You can simplify the loop by iterating directly over the dictionary.

Apply this diff:
-    for key in schema.keys():
+    for key in schema:
🧰 Tools

🪛 Ruff (0.8.2)

982-982: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
fiftyone/core/collections.py (1)
10575-10590: Simplify boolean return in _is_full_collection()

The boolean expression can be simplified by returning the condition directly instead of using an if/else statement.
def _is_full_collection(self):
-   if isinstance(self, fod.Dataset) and self.media_type != fom.GROUP:
-       return True
-
-   # pylint:disable=no-member
-   if (
-       isinstance(self, fov.DatasetView)
-       and self._dataset.media_type == fom.GROUP
-       and len(self._stages) == 1
-       and isinstance(self._stages[0], fos.SelectGroupSlices)
-       and self._pipeline() == []
-   ):
-       return True
-
-   return False

+   return (
+       (isinstance(self, fod.Dataset) and self.media_type != fom.GROUP)
+       or (
+           isinstance(self, fov.DatasetView)
+           and self._dataset.media_type == fom.GROUP 
+           and len(self._stages) == 1
+           and isinstance(self._stages[0], fos.SelectGroupSlices)
+           and self._pipeline() == []
+       )
+   )
🧰 Tools

🪛 Ruff (0.8.2)

10580-10589: Return the condition directly

Inline condition

(SIM103)
app/packages/operators/src/usePanelEvent.ts (2)
Line range hint 8-13: Type currentPanelState more strictly than any.

Using any reduces type safety. Consider creating an interface for the panel state structure.
 type HandlerOptions = {
   params: { [name: string]: unknown };
   operator: string;
   prompt?: boolean;
   panelId: string;
   callback?: ExecutionCallback;
-  currentPanelState?: any; // most current panel state
+  currentPanelState?: PanelState; // Define PanelState interface
 };
Line range hint 39-45: Type the callback parameters.

The eventCallback function has untyped parameters result and opts. Consider adding proper type annotations.
-  const eventCallback = (result, opts) => {
+  const eventCallback = (result: OperationResult, opts: OperationOptions) => {
fiftyone/utils/open_clip.py (1)

139-139: Improved GPU handling with proper autocast context.

The changes enhance GPU performance and memory usage by:

Simplifying preprocessing logic

Adding proper autocast context for GPU operations using torch.amp.autocast

Consider adding a comment explaining the autocast's impact on memory and performance, as referenced in the PR comment.

Also applies to: 147-154

app/packages/state/src/recoil/dynamicGroups.ts (1)

292-292: Enhanced cache key with media field.

The store key now includes the media field, ensuring proper cache invalidation when the selected media field changes.

Consider using a more structured approach like JSON.stringify for complex keys to avoid potential string concatenation issues.

Also applies to: 294-294
fiftyone/utils/ultralytics.py (1)
23-23: Well-implemented device management with automatic CUDA detection!

The changes properly handle device management by:

Automatically detecting CUDA availability

Moving the model to the appropriate device

Consider caching the device check result to avoid repeated CUDA availability checks.
+_CUDA_AVAILABLE = None

+def _is_cuda_available():
+    global _CUDA_AVAILABLE
+    if _CUDA_AVAILABLE is None:
+        _CUDA_AVAILABLE = torch.cuda.is_available()
+    return _CUDA_AVAILABLE

 def __init__(self, d):
     self.model = self.parse_raw(d, "model", default=None)
     self.model_name = self.parse_raw(d, "model_name", default=None)
     self.model_path = self.parse_raw(d, "model_path", default=None)
     self.classes = self.parse_array(d, "classes", default=None)
-    self.device = self.parse_string(
-        d, "device", default="cuda" if torch.cuda.is_available() else "cpu"
-    )
+    self.device = self.parse_string(
+        d, "device", default="cuda" if _is_cuda_available() else "cpu"
+    )
Also applies to: 382-384, 397-398
fiftyone/utils/eval/activitynet.py (1)
43-44: Clean implementation of custom metrics support!

The changes properly integrate custom metrics support while maintaining backward compatibility. Consider adding an example in the docstring to demonstrate the format of the custom_metrics parameter.
 """ActivityNet-style evaluation config.

 Args:
     pred_field: the name of the field containing the predicted
         :class:`fiftyone.core.labels.TemporalDetection` instances
     gt_field: the name of the field containing the ground truth
         :class:`fiftyone.core.labels.TemporalDetection` instances
     iou (None): the IoU threshold to use to determine matches
     classwise (None): whether to only match segments with the same class
         label (True) or allow matches between classes (False)
     compute_mAP (False): whether to perform the necessary computations so
         that mAP and PR curves can be generated
     iou_threshs (None): a list of IoU thresholds to use when computing mAP
         and PR curves. Only applicable when ``compute_mAP`` is True
     custom_metrics (None): an optional list of custom metrics to compute
         or dict mapping metric names to kwargs dicts
+
+        Example::
+
+            custom_metrics={
+                "my_metric": {"threshold": 0.5},
+                "another_metric": {"min_iou": 0.75}
+            }
+
 """
Also applies to: 55-55, 59-64, 334-334, 351-351, 361-361
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)
1784-1811: Consider adding error handling for malformed custom metrics

The function handles the happy path well but could benefit from additional error handling.
 function formatCustomMetricRows(evaluationMetrics, comparisonMetrics) {
   const results = [] as SummaryRow[];
+  if (!evaluationMetrics) return results;
+
   const customMetrics = _.get(
     evaluationMetrics,
     "custom_metrics",
     {}
   ) as CustomMetrics;
   for (const [operatorUri, customMetric] of Object.entries(customMetrics)) {
+    try {
       const compareValue = _.get(
         comparisonMetrics,
         `custom_metrics.${operatorUri}.value`,
         null
       );
       const hasOneValue = customMetric.value !== null || compareValue !== null;

       results.push({
         id: operatorUri,
         property: customMetric.label,
         value: customMetric.value,
         compareValue,
         lesserIsBetter: customMetric.lower_is_better,
         filterable: false,
         active: false,
         hide: !hasOneValue,
       });
+    } catch (error) {
+      console.error(`Error processing custom metric ${operatorUri}:`, error);
+    }
   }
   return results;
 }
fiftyone/core/view.py (2)
442-451: Improve error handling by preserving the exception chain.

The error handling should preserve the exception chain to help with debugging.
-            raise ValueError("%s is empty" % self.__class__.__name__)
+            raise ValueError("%s is empty" % self.__class__.__name__) from None
🧰 Tools

🪛 Ruff (0.8.2)

451-451: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

489-553: Improve error handling by preserving the exception chain.

The implementation is thorough with good examples, but error handling should preserve the exception chain.
-            raise ValueError("No samples match the given expression")
+            raise ValueError("No samples match the given expression") from None
🧰 Tools

🪛 Ruff (0.8.2)

542-542: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
fiftyone/core/dataset.py (2)
1236-1243: Improve exception handling in the StopIteration case.

The error handling should preserve the exception chain by using raise ... from err syntax.
-            raise ValueError("%s is empty" % self.__class__.__name__)
+            raise ValueError("%s is empty" % self.__class__.__name__) from None
🧰 Tools

🪛 Ruff (0.8.2)

1241-1241: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

1251-1258: Improve exception handling in the StopIteration case.

The error handling should preserve the exception chain by using raise ... from err syntax.
-            raise ValueError("%s is empty" % self.__class__.__name__)
+            raise ValueError("%s is empty" % self.__class__.__name__) from None
🧰 Tools

🪛 Ruff (0.8.2)

1256-1256: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
fiftyone/utils/eval/segmentation.py (3)
Line range hint 389-442: Consider using a dataclass for matches.

The matches list contains tuples with 5 elements. Consider using a dataclass to make the code more maintainable and self-documenting.

Create a dataclass to represent matches:
+from dataclasses import dataclass
+
+@dataclass
+class SegmentationMatch:
+    gt_label: str
+    pred_label: str
+    pixel_count: int
+    gt_id: str
+    pred_id: str
+
-                matches.append(
-                    (
-                        classes[i],
-                        classes[j],
-                        int(image_conf_mat[i, j]),
-                        gt_seg.id,
-                        pred_seg.id,
-                    )
-                )
+                matches.append(SegmentationMatch(
+                    gt_label=classes[i],
+                    pred_label=classes[j],
+                    pixel_count=int(image_conf_mat[i, j]),
+                    gt_id=gt_seg.id,
+                    pred_id=pred_seg.id,
+                ))
513-522: Consider using early return for better readability.

The nested if-else structure can be simplified using early returns.

Simplify the logic:
-        if matches is None:
-            ytrue, ypred, weights = self._parse_confusion_matrix(
-                pixel_confusion_matrix, classes
-            )
-            ytrue_ids = None
-            ypred_ids = None
-        elif matches:
-            ytrue, ypred, weights, ytrue_ids, ypred_ids = zip(*matches)
-        else:
-            ytrue, ypred, weights, ytrue_ids, ypred_ids = [], [], [], [], []
+        if matches is None:
+            ytrue, ypred, weights = self._parse_confusion_matrix(
+                pixel_confusion_matrix, classes
+            )
+            return ytrue, ypred, weights, None, None
+            
+        if not matches:
+            return [], [], [], [], []
+            
+        return zip(*matches)
602-605: Simplify custom metrics handling.

The code can be simplified by removing the redundant condition.

Apply this change:
-    custom_metrics = kwargs.get("custom_metrics", None)
-    if etau.is_str(custom_metrics):
-        kwargs["custom_metrics"] = [custom_metrics]
+    if etau.is_str(kwargs.get("custom_metrics")):
+        kwargs["custom_metrics"] = [kwargs["custom_metrics"]]
🧰 Tools

🪛 Ruff (0.8.2)

602-602: Use kwargs.get("custom_metrics") instead of kwargs.get("custom_metrics", None)

Replace kwargs.get("custom_metrics", None) with kwargs.get("custom_metrics")

(SIM910)
fiftyone/operators/evaluation_metric.py (2)
17-17: Correct the default parameter annotation in the docstring.

The parameter label is incorrectly annotated as (name) instead of (None) in the docstring. It should reflect that the default value is None.

Apply this diff to correct the docstring:
-            label (name): a label for the evaluation metric
+            label (None): a label for the evaluation metric
36-36: Consider setting lower_is_better=False as the default.

Depending on the typical metrics being used, setting lower_is_better=False might be more appropriate if higher metric values indicate better performance.

Apply this diff if you decide to change the default value:
-            lower_is_better=True,
+            lower_is_better=False,
fiftyone/utils/eval/coco.py (1)
70-71: Consistent indentation in the docstring for custom_metrics.

The parameter custom_metrics in the docstring is misaligned. Adjust the indentation for consistency with the other parameters.

Apply this diff to fix the indentation:
             error_level (1): the error level to use when manipulating instance
                 masks or polylines. Valid values are:

                 -   0: raise geometric errors that are encountered
                 -   1: log warnings if geometric errors are encountered
                 -   2: ignore geometric errors

                 If ``error_level > 0``, any calculation that raises a geometric
                 error will default to an IoU of 0
+            custom_metrics (None): an optional list of custom metrics to compute
+                or dict mapping metric names to kwargs dicts
fiftyone/utils/eval/openimages.py (1)
72-73: Fix the indentation of the custom_metrics parameter in the docstring.

The custom_metrics parameter description is not aligned with the other parameters. Adjust its indentation for better readability.

Apply this diff to correct the indentation:
             expand_pred_hierarchy (False): whether to expand predicted objects and
                 labels according to the provided ``hierarchy``
+            custom_metrics (None): an optional list of custom metrics to compute
+                or dict mapping metric names to kwargs dicts
         """
fiftyone/utils/transformers.py (1)
457-458: Consider combining device placement with model loading.

While the current implementation is correct, you could make it more concise by moving the model to the device during loading rather than as a separate step.
-        self.device = torch.device(self.config.device)
-        self.model.to(self.device)
+        self.device = torch.device(self.config.device)
+        self.model = self.model.to(self.device)

🛑 Comments failed to post (9)

app/packages/core/src/plugins/SchemaIO/components/TooltipProvider.tsx (1)
9-9: 🛠️ Refactor suggestion

Consider keeping the explicit component="span" prop.

Removing the component="span" prop from Box changes its default rendering from an inline <span> to a block-level <div>. This could cause unintended layout shifts and affect accessibility, especially when the tooltip is used with inline elements.

Apply this diff to maintain consistent layout behavior:
-      <Box>{children}</Box>
+      <Box component="span">{children}</Box>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
      <Box component="span">{children}</Box>
app/packages/looker/src/processOverlays.ts (1)
85-105: 💡 Codebase verification

🛠️ Refactor suggestion

TODO comment is valid - current filter approach needs revision

The verification confirms that DetectionOverlay can operate with bounding boxes even when masks are not present. The current filtering approach could incorrectly filter out valid detection overlays that have bounding boxes but no masks loaded.

Suggested improvements:

Implement overlay-specific filtering logic that respects each overlay type's requirements

Move the filtering logic into the respective overlay classes to maintain better separation of concerns
🔗 Analysis chain

Add JSDoc documentation and consider alternative approaches.

The filter function would benefit from:

JSDoc documentation explaining its purpose, parameters, and return value

A more robust solution for handling overlay-specific conditions, as noted in the TODO comment

Add documentation:
/**
 * Filters overlays based on their field existence and label properties.
 * @param overlay - The overlay to check
 * @param bins - Map of field names to overlay arrays
 * @returns true if the overlay should be filtered out, false otherwise
 */
export const filter = <State extends BaseState>...
The TODO comment suggests potential issues with detection overlays. Let's verify if there are any detection overlays in the codebase:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for detection overlay implementations
rg -l "class.*DetectionOverlay" app/packages/looker/src/
Length of output: 156

Script:
#!/bin/bash
# Examine detection overlay implementation
rg -A 10 "class.*DetectionOverlay" app/packages/looker/src/overlays/detection.ts
Length of output: 403

Script:
#!/bin/bash
# Search for DetectionLabel type and bounding box handling
rg -B 5 -A 10 "type DetectionLabel" app/packages/looker/src/overlays/detection.ts

# Also check full implementation for mask-related code
rg "mask" app/packages/looker/src/overlays/detection.ts
Length of output: 313
app/packages/looker/src/processOverlays.test.ts (1)
12-20: 🛠️ Refactor suggestion

Enhance test coverage for the filter function.

The current tests only cover basic cases. Consider adding tests for:

When the overlay field exists in bins

When the overlay field doesn't exist

Edge cases with different overlay types

Cases where map/mask properties are defined

Example additional tests:
it("returns false when field exists in bins", () => {
  const bins = { test: [] };
  const overlay = new HeatmapOverlay("test", { ...EMPTY, field: "test" });
  overlay.label.map = new Map();
  expect(filter(overlay, bins)).toBe(false);
});

it("returns true when field doesn't exist in bins", () => {
  const bins = { other: [] };
  const overlay = new HeatmapOverlay("test", { ...EMPTY, field: "test" });
  expect(filter(overlay, bins)).toBe(true);
});
app/packages/looker/src/worker/mask-decoder.ts (1)
23-28: ⚠️ Potential issue

Ensure error handling for insufficient blob size

The loop assumes that header has at least PNG_SIGNATURE.length bytes. If blob is smaller than expected, accessing header[i] could cause an error. Consider adding a check to ensure header.length is at least PNG_SIGNATURE.length before the loop to prevent potential runtime errors.

Apply this diff to add error handling:
const header = new Uint8Array(await blob.slice(0, 29).arrayBuffer());

+ if (header.length < PNG_SIGNATURE.length) {
+   // not a PNG
+   return undefined;
+ }

for (let i = 0; i < PNG_SIGNATURE.length; i++) {
  if (header[i] !== PNG_SIGNATURE[i]) {
    // not a PNG
    return undefined;
  }
}
Committable suggestion skipped: line range outside the PR's diff.
app/packages/looker/src/worker/custom-16-bit-png-decoder.ts (1)
4-25: 🛠️ Refactor suggestion

Add error handling for decoding failures

The function assumes that decode(arrayBuffer) will always succeed. To enhance robustness, consider adding a try-catch block to handle any potential errors during the decoding process.

Apply this diff to include error handling:
export const customDecode16BitPng = async (blob: Blob) => {
+ try {
  const arrayBuffer = await blob.arrayBuffer();
  const decodedPng = decode(arrayBuffer);

  if (decodedPng.palette?.length) {
    // spec doesn't allow palettes for 16-bit PNGs
    throw new Error("Indexed PNGs are not supported for 16-bit PNGs");
  }

  const width = decodedPng.width;
  const height = decodedPng.height;

  const numChannels =
    decodedPng.channels ?? decodedPng.data.length / (width * height);

  return {
    buffer: decodedPng.data.buffer,
    channels: numChannels,
    arrayType: decodedPng.data.constructor.name as OverlayMask["arrayType"],
    shape: [height, width],
  } as OverlayMask;
+ } catch (error) {
+   throw new Error(`Failed to decode 16-bit PNG: ${error.message}`);
+ }
};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export const customDecode16BitPng = async (blob: Blob) => {
  try {
    const arrayBuffer = await blob.arrayBuffer();
    const decodedPng = decode(arrayBuffer);

    if (decodedPng.palette?.length) {
      // spec doesn't allow palettes for 16-bit PNGs
      throw new Error("Indexed PNGs are not supported for 16-bit PNGs");
    }

    const width = decodedPng.width;
    const height = decodedPng.height;

    const numChannels =
      decodedPng.channels ?? decodedPng.data.length / (width * height);

    return {
      buffer: decodedPng.data.buffer,
      channels: numChannels,
      arrayType: decodedPng.data.constructor.name as OverlayMask["arrayType"],
      shape: [height, width],
    } as OverlayMask;
  } catch (error) {
    throw new Error(`Failed to decode 16-bit PNG: ${error.message}`);
  }
};
fiftyone/operators/__init__.py (1)
15-15: 🛠️ Refactor suggestion

Add new imports to __all__ list.

The newly imported EvaluationMetricConfig and EvaluationMetric classes should be added to the __all__ list since they are part of the public API.
 from .evaluation_metric import EvaluationMetricConfig, EvaluationMetric
+__all__ = [
+    "EvaluationMetricConfig",
+    "EvaluationMetric",
+    # ... existing entries ...
+]
Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

15-15: .evaluation_metric.EvaluationMetricConfig imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

15-15: .evaluation_metric.EvaluationMetric imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
app/packages/operators/src/built-in-operators.ts (1)

1048-1048: ⚠️ Potential issue

Breaking change: Removal of cancel callback functionality

The removal of the on_cancel parameter and its associated logic is a breaking change that could affect existing code that relies on the cancel callback functionality. Consider:

Adding a deprecation notice before removal

Documenting migration steps for affected code
requirements/extras.txt (1)
4-4: 💡 Codebase verification

Keep the upper version constraint to prevent future compatibility issues

The removal of the upper version constraint (<8) could lead to compatibility issues as ipywidgets 8.x introduces several breaking changes. While version 8.x is stable, it's recommended to explicitly specify the major version to prevent future breaking changes:
ipywidgets>=7.5,<9
🔗 Analysis chain

Verify compatibility with newer ipywidgets versions.

Removing the upper version constraint (<8) could lead to compatibility issues with newer versions of ipywidgets.

Run the following script to check for breaking changes in newer versions:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for breaking changes in ipywidgets versions

# Get latest version
latest=$(curl -s https://pypi.org/pypi/ipywidgets/json | jq -r '.info.version')

# Compare with current constraint
echo "Latest version: $latest"
echo "Current constraint: >=7.5"

# Check release notes for breaking changes
curl -s "https://raw.githubusercontent.com/jupyter-widgets/ipywidgets/main/CHANGELOG.md"
Length of output: 304

Script:
#!/bin/bash
# Description: Check PyPI and GitHub releases for breaking changes

# Get PyPI release description for 8.0.0
echo "=== PyPI Release Info for 8.0.0 ==="
curl -s "https://pypi.org/pypi/ipywidgets/8.0.0/json" | jq -r '.info.description'

echo -e "\n=== GitHub Release Info ==="
# Get GitHub release info for 8.0.0
curl -s "https://api.github.com/repos/jupyter-widgets/ipywidgets/releases/tags/8.0.0" | jq -r '.body'
Length of output: 21978
fiftyone/utils/transformers.py (1)
589-589: 🛠️ Refactor suggestion

Standardize device handling pattern across model loading methods.

There are inconsistent patterns in how device placement is handled across different model types. Some methods chain the .to(device) call during model loading while others do it separately. Consider standardizing this pattern across all model loading methods for better maintainability.

Example of consistent pattern to apply everywhere:
def _load_model(self, config):
    if config.model is not None:
        return config.model.to(config.device)
    return transformers.AutoModel.from_pretrained(config.name_or_path).to(config.device)
Also applies to: 649-649, 701-701, 708-708, 756-757, 766-766, 777-779, 832-832, 886-886, 938-938, 1090-1090

prernadh

LGTM!

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)
1786-1787: Consider using optional chaining and nullish coalescing
While _.get works, TypeScript’s optional chaining and nullish coalescing generally provide clearer, more concise code. For example:
-  const customMetrics = (_.get(evaluationMetrics, "custom_metrics", null) || {}) as CustomMetrics;
+  const customMetrics = (evaluationMetrics?.custom_metrics ?? {}) as CustomMetrics;

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59d4044 and 3d11a02.

📒 Files selected for processing (1)

app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-app
GitHub Check: build / build
GitHub Check: e2e / test-e2e
GitHub Check: build

brimoor requested review from imanjra and prernadh January 1, 2025 00:32

brimoor changed the title ~~Add Model Evaluation panel callbacks for segmentation tasks (Option 1)~~ Add Model Evaluation panel callbacks for segmentation tasks (Option 2) Jan 1, 2025

brimoor force-pushed the segmentation-callbacks2 branch from 33a2dac to 33f06a8 Compare January 1, 2025 00:35

coderabbitai bot reviewed Jan 1, 2025

View reviewed changes

prernadh requested changes Jan 2, 2025

View reviewed changes

prernadh self-requested a review January 2, 2025 17:41

prernadh reviewed Jan 2, 2025

View reviewed changes

fiftyone/utils/eval/segmentation.py Show resolved Hide resolved

brimoor force-pushed the segmentation-callbacks2 branch from 33f06a8 to 5e45561 Compare January 2, 2025 23:44

prernadh self-requested a review January 3, 2025 02:14

prernadh previously approved these changes Jan 3, 2025

View reviewed changes

prernadh mentioned this pull request Jan 3, 2025

Adding label IDs in Segmentation Evaluation results #5323

Closed

7 tasks

brimoor changed the title ~~Add Model Evaluation panel callbacks for segmentation tasks (Option 2)~~ Add Model Evaluation panel callbacks for segmentation tasks Jan 10, 2025

coderabbitai bot reviewed Jan 10, 2025

View reviewed changes

brimoor force-pushed the segmentation-callbacks2 branch from d5fc685 to 7e45bd2 Compare January 12, 2025 03:50

coderabbitai bot reviewed Jan 12, 2025

View reviewed changes

fiftyone/core/fields.py Show resolved Hide resolved

fiftyone/core/fields.py Show resolved Hide resolved

fiftyone/core/fields.py Show resolved Hide resolved

fiftyone/core/fields.py Show resolved Hide resolved

brimoor mentioned this pull request Jan 14, 2025

Add Model Evaluation panel callbacks for segmentation tasks #5331

Closed

prernadh and others added 5 commits January 23, 2025 16:46

Adding Seg changes

1828114

Reverting new variable

c7fd157

implement callbacks

1bb601d

store matches instead

5b17db1

more robust initialization

59d4044

brimoor force-pushed the segmentation-callbacks2 branch from 7e45bd2 to 59d4044 Compare January 23, 2025 21:52

brimoor changed the base branch from develop to release/v1.3.0 January 23, 2025 21:52

brimoor requested a review from prernadh January 23, 2025 21:54

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

prernadh previously approved these changes Jan 23, 2025

View reviewed changes

handle missing custom metrics

3d11a02

brimoor dismissed prernadh’s stale review via 3d11a02 January 24, 2025 05:20

coderabbitai bot reviewed Jan 24, 2025

View reviewed changes

brimoor merged commit 2ac97ad into release/v1.3.0 Jan 24, 2025
13 checks passed

brimoor deleted the segmentation-callbacks2 branch January 24, 2025 05:47

This was referenced Jan 24, 2025

Main -> develop #5434

Closed

Main -> develop #5435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Model Evaluation panel callbacks for segmentation tasks #5332

Add Model Evaluation panel callbacks for segmentation tasks #5332

brimoor commented Jan 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 1, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

prernadh Jan 2, 2025

brimoor Jan 2, 2025

prernadh Jan 2, 2025

brimoor Jan 2, 2025

prernadh Jan 2, 2025

brimoor Jan 2, 2025

prernadh Jan 3, 2025

prernadh left a comment

prernadh Jan 2, 2025

brimoor Jan 2, 2025 •

edited

Loading

prernadh Jan 3, 2025

prernadh left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

prernadh left a comment

coderabbitai bot left a comment

		expr = F(gt_id).is_in(ytrue_ids)
		expr &= F(pred_id).is_in(ypred_ids)

Add Model Evaluation panel callbacks for segmentation tasks #5332

Add Model Evaluation panel callbacks for segmentation tasks #5332

Conversation

brimoor commented Jan 1, 2025 • edited by coderabbitai bot Loading

Change log

TODO

Example usage

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 1, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Suggested Labels

Suggested Reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prernadh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brimoor Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prernadh left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

prernadh left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

brimoor commented Jan 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 1, 2025 •

edited

Loading

brimoor Jan 2, 2025 •

edited

Loading