Content safety evals aggregate max from conversations #39083

MilesHolland · 2025-01-08T20:50:39Z

Adds a new enum: Aggregation type, as well as a utility class which converts that enum to associated functions.

This enum is then leveraged in the base eval class to control the way that multi-turn conversations have their per-turn results aggregated into a single value. Also adds private functions to inject custom functions directly, and testing for all this.

In the future, this will likely be used to control how evaluation results across multiple evals are aggregated in the evaluate() function.

azure-sdk · 2025-01-09T20:09:43Z

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-ai-evaluation

Copilot reviewed 5 out of 10 changed files in this pull request and generated no comments.

Files not reviewed (5)

sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluate_test_data_conversation.jsonl: Language not supported
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_violence.py: Evaluated as low risk
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py: Evaluated as low risk
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_sexual.py: Evaluated as low risk
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_self_harm.py: Evaluated as low risk

Comments suppressed due to low confidence (3)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py:77

[nitpick] Update the docstring to reflect the correct class name if it is renamed to ConversationNumericAggregationType.

:type conversation_aggregation_type: ~azure.ai.evaluation._constants._ConversationNumericAggregationType

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:41

Missing period at the end of the docstring.

Default is ~azure.ai.evaluation._constants.ConversationNumericAggregationType.MEAN.

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:42

The conversation_aggregation_type parameter should be explicitly mentioned in the constructor's docstring.

:type conversation_aggregation_type: ~azure.ai.evaluation._constants.ConversationNumericAggregationType

sdk/evaluation/azure-ai-evaluation/CHANGELOG.md

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py

Added change, but GH is still annoyed about it.

sdk/evaluation/azure-ai-evaluation/CHANGELOG.md

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py

...tion/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_constants.py

…m:MilesHolland/azure-sdk-for-python into jan25/eval/improvement/cs-convo-takes-max

* add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip

MilesHolland added 2 commits January 8, 2025 15:44

add convo agg type, and have harm evals use max

cb3a6aa

analysis

3ee1a9f

MilesHolland requested a review from a team as a code owner January 8, 2025 20:50

MilesHolland changed the title ~~Jan25/eval/improvement/cs convo takes max~~ Content safety evals aggregate max from conversations Jan 8, 2025

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jan 8, 2025

correct enum name in docs

a0caaf0

nagkumar91 approved these changes Jan 8, 2025

View reviewed changes

nagkumar91 requested a review from Copilot January 9, 2025 21:18

Copilot AI reviewed Jan 9, 2025

View reviewed changes

MilesHolland added 4 commits January 13, 2025 16:37

refactor checked enum into function field

171b2c2

Merge branch 'main' into jan25/eval/improvement/cs-convo-takes-max

b559100

cl and analysis

9ab974c

Merge branch 'main' into jan25/eval/improvement/cs-convo-takes-max

fc3669f

diondrapeck previously requested changes Jan 14, 2025

View reviewed changes

change enum name and update CL

51592ce

change function names to private, allow agg type retrieval

22a02f2

slister1001 approved these changes Jan 15, 2025

View reviewed changes

nagkumar91 approved these changes Jan 15, 2025

View reviewed changes