-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect "slow" response message in Cache Configuration test #33190
Comments
@wxiaoguang It seems that the implementation in PR #31265 is incorrect. The test measures performance in microseconds (μs), which is a local measurement. However, in real-world scenarios, it should start from milliseconds (ms). Basically, any response time under 30ms, even if it extends to microseconds, can still be considered fast or low latency. |
I can fix this, but the question remains how:
Any other suggestions or ideas? I'd add a prometheus histogram there but then there's no way to get that information by default without enabling metrics and scarping (which would be an issue if switched to otel). And self querying the endpoint feels weird. |
I prefer the solution 3. |
@TheFox0x7 Another idea is to change the current implementation by adjusting the ratio. For example, if the latency is under 30ms, it can still be considered fast or low latency. However, if the latency goes around 50ms to 100ms, then it can be considered slow or high latency. This is because under 30ms is considered an average stable network latency. |
Removing it should be fine because:
|
Cache check during startup is different from cache testing repeatedly. When you press test again and again, the results will always vary, sometimes ranging from low to high latency. For example: This test provides a real-time check of the latency, which can be useful for monitoring the system's performance. |
But:
|
No, you don't really need something like tracing and profiling in this case. In PR #31265, the code is measuring latency, not just reporting a single delay. The const (
testCacheKey = "DefaultCache.TestKey"
SlowCacheThreshold = 100 * time.Microsecond
)
func Test() (time.Duration, error) {
if defaultCache == nil {
return 0, fmt.Errorf("default cache not initialized")
}
testData := fmt.Sprintf("%x", make([]byte, 500))
start := time.Now()
if err := defaultCache.Delete(testCacheKey); err != nil {
return 0, fmt.Errorf("expect cache to delete data based on key if exist but got: %w", err)
}
if err := defaultCache.Put(testCacheKey, testData, 10); err != nil {
return 0, fmt.Errorf("expect cache to store data but got: %w", err)
}
testVal, hit := defaultCache.Get(testCacheKey)
if !hit {
return 0, fmt.Errorf("expect cache hit but got none")
}
if testVal != testData {
return 0, fmt.Errorf("expect cache to return same value as stored but got other")
}
return time.Since(start), nil
} |
Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem? |
Of course, if the response times are accurate. However, the message A latency of 6ms is generally considered fast or low latency, and it's well within acceptable performance standards. Labeling a response time under 30ms as "slow" might not be appropriate in most cases, as it can create confusion |
So the PR #31265 only needs to improve this part: elapsed, err := cache.Test()
if err != nil {
ctx.Data["CacheError"] = err
} else if elapsed > cache.SlowCacheThreshold {
ctx.Data["CacheSlow"] = fmt.Sprint(elapsed)
} Change the const SlowCacheThreshold = 30 * time.Millisecond Then it might look like this: elapsed, err := cache.Test()
if err != nil {
ctx.Data["CacheError"] = err
} else if elapsed > cache.SlowCacheThreshold {
ctx.Data["CacheSlow"] = fmt.Sprintf("Cache response is slow: %v", elapsed)
} else {
ctx.Data["CacheFast"] = fmt.Sprintf("Cache response is fast: %v", elapsed)
} |
But they aren't really. You measure a set of 3 operations in a one off time measurement and that's prone to outliers. To have it accurate, you'd need to measure the operations for a bit (say 10-100 times) and take an average from them, at which point you're just re-implementing metrics but without all the details and benefits of it. @H0llyW00dzZ Maybe you can explain your usecase better? Right now the longer I look at the feature the more I agree with @wxiaoguang that it's kind of pointless, so a real user scenario would show the usefulness of it better. In particular:
|
func Test() (time.Duration, error) {
if defaultCache == nil {
return 0, fmt.Errorf("default cache not initialized")
}
testData := fmt.Sprintf("%x", make([]byte, 500))
start := time.Now()
if err := defaultCache.Delete(testCacheKey); err != nil {
return 0, fmt.Errorf("expect cache to delete data based on key if exist but got: %w", err)
}
if err := defaultCache.Put(testCacheKey, testData, 10); err != nil {
return 0, fmt.Errorf("expect cache to store data but got: %w", err)
}
testVal, hit := defaultCache.Get(testCacheKey)
if !hit {
return 0, fmt.Errorf("expect cache hit but got none")
}
if testVal != testData {
return 0, fmt.Errorf("expect cache to return same value as stored but got other")
}
return time.Since(start), nil
} The function performs a series of cache operations (delete, put, and get) and measures the total time taken to complete them, providing an accurate measurement of the cache latency.
|
Change SlowCacheThreshold to 30 milliseconds so it doesn't trigger on non memory cache Closes: go-gitea#33190 Closes: go-gitea#32657
Backport #33220 by TheFox0x7 Change SlowCacheThreshold to 30 milliseconds so it doesn't trigger on non memory cache Closes: #33190 Closes: #32657 Co-authored-by: TheFox0x7 <[email protected]>
Description
When testing the Cache Configuration with Redis in the admin panel, the cache test keeps displaying the message:
Cache test successful, but response is slow: 6.725281ms.
However, a response time of around 6ms should not be considered slow. The message seems to be misleading and may cause confusion for administrators.
Expected behavior:
The cache test should provide an accurate assessment of the response time and not label a response time of around 6ms as
slow
Gitea Version
v1.23.0
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Screenshots
Git Version
2.47.1
Operating System
Linux
How are you running Gitea?
Running on Kubernetes
Database
SQLite
The text was updated successfully, but these errors were encountered: