Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1800] Introduce ApplicationTotalCount and ApplicationFallbackCount metric to record the total and fallback count of application #3026

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

SteNicholas
Copy link
Member

What changes were proposed in this pull request?

Introduce ApplicationTotalCount and ApplicationFallbackCount metric to record the total and fallback count of application.

Why are the changes needed?

There is no any metric to record the total count of application running with celeborn shuffle and engine bulit-in shuffle and the fallback count of application. Meanwhile, the fallback of Flink shuffle is based on job granularity rather than shuffle granularity.

Follw up #3012 (comment).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • DefaultMetaSystemSuiteJ#testShuffleAndApplicationCountWithFallback
  • RatisMasterStatusSystemSuiteJ#testShuffleAndApplicationCountWithFallback

@SteNicholas SteNicholas force-pushed the CELEBORN-1800 branch 3 times, most recently from f3126d4 to 95503c3 Compare December 24, 2024 08:52
Copy link

codecov bot commented Dec 24, 2024

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.

Project coverage is 32.66%. Comparing base (2eb4c23) to head (95503c3).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...born/common/protocol/message/ControlMessages.scala 0.00% 8 Missing ⚠️
...org/apache/celeborn/common/util/PbSerDeUtils.scala 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3026      +/-   ##
==========================================
- Coverage   32.99%   32.66%   -0.33%     
==========================================
  Files         331      336       +5     
  Lines       19842    20042     +200     
  Branches     1787     1792       +5     
==========================================
- Hits         6545     6544       -1     
- Misses      12933    13134     +201     
  Partials      364      364              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SteNicholas SteNicholas marked this pull request as draft December 24, 2024 11:26
…ckCount metric to record the total and fallback count of application
@SteNicholas SteNicholas marked this pull request as ready for review December 24, 2024 12:49
@SteNicholas
Copy link
Member Author

Ping @turboFei, @codenohup, @reswqa, @FMX.

Copy link
Member

@turboFei turboFei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this patch.

Only some comments for unnecessary proto field order change.
Due 0.6.0 is not released, so I am using the main branch for our company deployment, so it is better that we can keep the proto field order here. Thanks.

…ckCount metric to record the total and fallback count of application
@SteNicholas SteNicholas requested a review from turboFei December 25, 2024 03:02
Copy link
Member

@turboFei turboFei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

…ckCount metric to record the total and fallback count of application
Copy link
Contributor

@codenohup codenohup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@SteNicholas
Copy link
Member Author

Ping @pan3793, @FMX, @RexXiong.

@@ -118,10 +123,24 @@ public <K, V, C> ShuffleHandle registerShuffle(
String appId = SparkUtils.appUniqueId(dependency.rdd().context());
initializeLifecycleManager(appId);

if (!registeredApps.contains(appId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a SparkContext or ShuffleManager or lifecycleManager have multiple apps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RexXiong, from my knowledge, there is no case that SparkContext or ShuffleManager or lifecycleManager have multiple apps. Could you give some input or thoughts about this?

@SteNicholas SteNicholas requested a review from RexXiong January 10, 2025 03:47
Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants