-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content safety evals aggregate max from conversations #39083
Content safety evals aggregate max from conversations #39083
Conversation
API change check APIView has identified API level changes in this PR and created following API reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 10 changed files in this pull request and generated no comments.
Files not reviewed (5)
- sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluate_test_data_conversation.jsonl: Language not supported
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_violence.py: Evaluated as low risk
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py: Evaluated as low risk
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_sexual.py: Evaluated as low risk
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_self_harm.py: Evaluated as low risk
Comments suppressed due to low confidence (3)
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py:77
- [nitpick] Update the docstring to reflect the correct class name if it is renamed to
ConversationNumericAggregationType
.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants._ConversationNumericAggregationType
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:41
- Missing period at the end of the docstring.
Default is ~azure.ai.evaluation._constants.ConversationNumericAggregationType.MEAN.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:42
- The
conversation_aggregation_type
parameter should be explicitly mentioned in the constructor's docstring.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants.ConversationNumericAggregationType
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
Added change, but GH is still annoyed about it.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
...tion/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py
Show resolved
Hide resolved
…m:MilesHolland/azure-sdk-for-python into jan25/eval/improvement/cs-convo-takes-max
* add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip
* add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip
Adds a new enum: Aggregation type, as well as a utility class which converts that enum to associated functions.
This enum is then leveraged in the base eval class to control the way that multi-turn conversations have their per-turn results aggregated into a single value. Also adds private functions to inject custom functions directly, and testing for all this.
In the future, this will likely be used to control how evaluation results across multiple evals are aggregated in the
evaluate()
function.