Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mediapipe graph process time metric #2942

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

bstrzele
Copy link
Collaborator

@bstrzele bstrzele commented Dec 23, 2024

🛠 Summary

CVS-160188
A Prometheus metric ovms_graph_processing_time_us for tracking graph processing of histogram type.
Uses the same buckets like for single model execution metric.

The metric tracks time allocated by the graph to serve a single user request for unary calls or the session time for the stream requests.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@bstrzele bstrzele requested review from dtrawins and dkalinowski and removed request for dtrawins December 23, 2024 10:33
@bstrzele bstrzele force-pushed the mp_process_time_metric branch from 08f22b8 to 8493488 Compare December 23, 2024 10:36
@bstrzele bstrzele marked this pull request as ready for review December 23, 2024 10:41
src/metric_config.cpp Outdated Show resolved Hide resolved
@bstrzele bstrzele force-pushed the mp_process_time_metric branch from 9fd076f to 8174bf7 Compare December 23, 2024 14:30
@bstrzele bstrzele force-pushed the mp_process_time_metric branch 2 times, most recently from 2df7c56 to ff2fa6b Compare January 7, 2025 13:02
@@ -196,6 +203,9 @@ class MediapipeGraphExecutor {
INCREMENT_IF_ENABLED(this->mediapipeServableMetricReporter->getGraphErrorMetric(executionContext));
}
MP_RETURN_ON_FAIL(status, "graph wait until done", mediapipeAbslToOvmsStatus(status.code()));
timer.stop(PROCESS);
double processTime = timer.template elapsed<std::chrono::microseconds>(PROCESS);
OBSERVE_IF_ENABLED(this->mediapipeServableMetricReporter->getProcessingTimeMetric(executionContext), processTime);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me if we want or don't want to register processing times of failed execution @dtrawins @bstrzele
In either way, I don't think this is correct because:
a) we do not register time in all error return codepaths
b) we do register time in one specific error return codepath (when if (outputPollers.size() != outputPollersWithReceivedPacket.size()) {)

docs/metrics.md Outdated
@@ -241,6 +241,7 @@ For [MediaPipe Graphs](./mediapipe.md) execution there are 4 generic metrics whi
| counter | ovms_responses | Useful to track number of packets generated by MediaPipe graph. Keep in mind that single request may trigger production of multiple (or zero) packets, therefore tracking number of responses is complementary to tracking accepted requests. For example tracking streaming partial responses of LLM text generation graphs. |
| gauge | ovms_current_graphs | Number of graphs currently in-process. For unary communication it is equal to number of currently processing requests (each request initializes separate MediaPipe graph). For streaming communication it is equal to number of active client connections. Each connection is able to reuse the graph and decide when to delete it when the connection is closed. |
| counter | ovms_graph_error | Counts errors in MediaPipe graph execution phase. For example V3 LLM text generation fails in LLMCalculator due to missing prompt - calculator returns an error and graph cancels. |
| histogram | ovms_graph_processing_time_us | Time for which mediapipe graph was opened. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain more - for successful only? for failed as well? which fails count?

@bstrzele bstrzele force-pushed the mp_process_time_metric branch from b9fa2d5 to ae8dd5a Compare January 10, 2025 10:30
@bstrzele bstrzele force-pushed the mp_process_time_metric branch from ae8dd5a to 2455f22 Compare January 10, 2025 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants