Incorrect "slow" response message in Cache Configuration test #33190

H0llyW00dzZ · 2025-01-10T02:40:09Z

Description

When testing the Cache Configuration with Redis in the admin panel, the cache test keeps displaying the message: Cache test successful, but response is slow: 6.725281ms.

However, a response time of around 6ms should not be considered slow. The message seems to be misleading and may cause confusion for administrators.

Expected behavior:
The cache test should provide an accurate assessment of the response time and not label a response time of around 6ms as slow

Gitea Version

v1.23.0

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

Git Version

2.47.1

Operating System

Linux

How are you running Gitea?

Running on Kubernetes

Database

SQLite

The text was updated successfully, but these errors were encountered:

wxiaoguang · 2025-01-10T04:17:00Z

Related: Add cache test for admins #31265 @6543

You can safely ignore that message, and I still think that test should be removed from admin panel.

H0llyW00dzZ · 2025-01-10T17:50:34Z

Related: Add cache test for admins #31265 @6543

You can safely ignore that message, and I still think that test should be removed from admin panel.

@wxiaoguang It seems that the implementation in PR #31265 is incorrect. The test measures performance in microseconds (μs), which is a local measurement. However, in real-world scenarios, it should start from milliseconds (ms). Basically, any response time under 30ms, even if it extends to microseconds, can still be considered fast or low latency.

TheFox0x7 · 2025-01-10T21:09:54Z

I can fix this, but the question remains how:

Remove the test entirely per PR comments (Gitea won't start if the cache doesn't work so testing if it works is pointless)
Replace it with pure information about the timing (so no warning).
Change time to pass to 100ms so the test won't fail

Any other suggestions or ideas? I'd add a prometheus histogram there but then there's no way to get that information by default without enabling metrics and scarping (which would be an issue if switched to otel). And self querying the endpoint feels weird.

lunny · 2025-01-10T21:19:14Z

I prefer the solution 3.

H0llyW00dzZ · 2025-01-11T02:59:09Z

I can fix this, but the question remains how:

Remove the test entirely per PR comments (Gitea won't start if the cache doesn't work so testing if it works is pointless)

Replace it with pure information about the timing (so no warning).

Change time to pass to 100ms so the test won't fail

Any other suggestions or ideas? I'd add a prometheus histogram there but then there's no way to get that information by default without enabling metrics and scarping (which would be an issue if switched to otel). And self querying the endpoint feels weird.

@TheFox0x7
Nah, don't remove it. It's pretty useful for testing.

Another idea is to change the current implementation by adjusting the ratio. For example, if the latency is under 30ms, it can still be considered fast or low latency. However, if the latency goes around 50ms to 100ms, then it can be considered slow or high latency. This is because under 30ms is considered an average stable network latency.

wxiaoguang · 2025-01-11T03:20:29Z

Removing it should be fine because:

Gitea instance already does cache check during startup, it doesn't make sense to "test" it again and again
For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

H0llyW00dzZ · 2025-01-11T06:00:19Z

Removing it should be fine because:

Gitea instance already does cache check during startup, it doesn't make sense to "test" it again and again

For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

Cache check during startup is different from cache testing repeatedly. When you press test again and again, the results will always vary, sometimes ranging from low to high latency.

For example:

This test provides a real-time check of the latency, which can be useful for monitoring the system's performance.

wxiaoguang · 2025-01-11T06:03:15Z

This test provides a real-time check of the latency, which can be useful for monitoring the system's performance.

But:

For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

H0llyW00dzZ · 2025-01-11T06:10:15Z

This test provides a real-time check of the latency, which can be useful for monitoring the system's performance.

But:

For the latency, it should be handled by something like "tracing" and "profiling", but not just test once and report a delay.

No, you don't really need something like tracing and profiling in this case.

In PR #31265, the code is measuring latency, not just reporting a single delay. The Test() function performs a series of cache operations and measures the total time taken to complete them:

const (
	testCacheKey       = "DefaultCache.TestKey"
	SlowCacheThreshold = 100 * time.Microsecond
)

func Test() (time.Duration, error) {
	if defaultCache == nil {
		return 0, fmt.Errorf("default cache not initialized")
	}

	testData := fmt.Sprintf("%x", make([]byte, 500))

	start := time.Now()

	if err := defaultCache.Delete(testCacheKey); err != nil {
		return 0, fmt.Errorf("expect cache to delete data based on key if exist but got: %w", err)
	}
	if err := defaultCache.Put(testCacheKey, testData, 10); err != nil {
		return 0, fmt.Errorf("expect cache to store data but got: %w", err)
	}
	testVal, hit := defaultCache.Get(testCacheKey)
	if !hit {
		return 0, fmt.Errorf("expect cache hit but got none")
	}
	if testVal != testData {
		return 0, fmt.Errorf("expect cache to return same value as stored but got other")
	}

	return time.Since(start), nil
}

wxiaoguang · 2025-01-11T06:13:37Z

No, you don't really need something like tracing and profiling in this case.

In PR #31265, the code is measuring latency, not just reporting a single delay. The Test() function performs a series of cache operations and measures the total time taken to complete them:

Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem?

H0llyW00dzZ · 2025-01-11T06:19:06Z

No, you don't really need something like tracing and profiling in this case.
In PR #31265, the code is measuring latency, not just reporting a single delay. The Test() function performs a series of cache operations and measures the total time taken to complete them:

Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem?

Of course, if the response times are accurate. However, the message Cache test successful, but response is slow: 6.725281ms can be misleading.

A latency of 6ms is generally considered fast or low latency, and it's well within acceptable performance standards. Labeling a response time under 30ms as "slow" might not be appropriate in most cases, as it can create confusion

H0llyW00dzZ · 2025-01-11T06:27:41Z

So the PR #31265 only needs to improve this part:

elapsed, err := cache.Test()
if err != nil {
    ctx.Data["CacheError"] = err
} else if elapsed > cache.SlowCacheThreshold {
    ctx.Data["CacheSlow"] = fmt.Sprint(elapsed)
}

Change the SlowCacheThreshold to:

const SlowCacheThreshold = 30 * time.Millisecond

Then it might look like this:

elapsed, err := cache.Test()
if err != nil {
    ctx.Data["CacheError"] = err
} else if elapsed > cache.SlowCacheThreshold {
    ctx.Data["CacheSlow"] = fmt.Sprintf("Cache response is slow: %v", elapsed)
} else {
    ctx.Data["CacheFast"] = fmt.Sprintf("Cache response is fast: %v", elapsed)
}

TheFox0x7 · 2025-01-11T11:55:03Z

Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem

Of course, if the response times are accurate

But they aren't really. You measure a set of 3 operations in a one off time measurement and that's prone to outliers. To have it accurate, you'd need to measure the operations for a bit (say 10-100 times) and take an average from them, at which point you're just re-implementing metrics but without all the details and benefits of it.

@H0llyW00dzZ Maybe you can explain your usecase better? Right now the longer I look at the feature the more I agree with @wxiaoguang that it's kind of pointless, so a real user scenario would show the usefulness of it better. In particular:

Do you believe the test is accurate when you use it?
How do you use it? once or refreshing the page/clicking the button multiple times?
Why wouldn't histogram in metrics endpoint cover this instead?

H0llyW00dzZ · 2025-01-11T19:31:02Z

Does it make sense? This time, you do a test, it shows that "the cache is fast", then some time later, there is some cache performance problem, then users report that "the response is slow", then you go to admin panel click "test", at that time the cache recovers, you still see "the cache is fast". Does it resolve any real world problem

Of course, if the response times are accurate

But they aren't really. You measure a set of 3 operations in a one off time measurement and that's prone to outliers. To have it accurate, you'd need to measure the operations for a bit (say 10-100 times) and take an average from them, at which point you're just re-implementing metrics but without all the details and benefits of it.

@H0llyW00dzZ Maybe you can explain your usecase better? Right now the longer I look at the feature the more I agree with @wxiaoguang that it's kind of pointless, so a real user scenario would show the usefulness of it better. In particular:

Do you believe the test is accurate when you use it?

How do you use it? once or refreshing the page/clicking the button multiple times?

Why wouldn't histogram in metrics endpoint cover this instead?

Do you believe the test is accurate when you use it? Yes, because the code in the Test() function is correct for checking the latency of cache operations.

func Test() (time.Duration, error) {
	if defaultCache == nil {
		return 0, fmt.Errorf("default cache not initialized")
	}

	testData := fmt.Sprintf("%x", make([]byte, 500))

	start := time.Now()

	if err := defaultCache.Delete(testCacheKey); err != nil {
		return 0, fmt.Errorf("expect cache to delete data based on key if exist but got: %w", err)
	}
	if err := defaultCache.Put(testCacheKey, testData, 10); err != nil {
		return 0, fmt.Errorf("expect cache to store data but got: %w", err)
	}
	testVal, hit := defaultCache.Get(testCacheKey)
	if !hit {
		return 0, fmt.Errorf("expect cache hit but got none")
	}
	if testVal != testData {
		return 0, fmt.Errorf("expect cache to return same value as stored but got other")
	}

	return time.Since(start), nil
}

The function performs a series of cache operations (delete, put, and get) and measures the total time taken to complete them, providing an accurate measurement of the cache latency.

How do you use it? I only use it when there are many crawler viewing the source code.
Why wouldn't histogram in metrics endpoint cover this instead? do you mean the metrics endpoint from gitea configuration ?

Change SlowCacheThreshold to 30 milliseconds so it doesn't trigger on non memory cache Closes: go-gitea#33190 Closes: go-gitea#32657

Backport #33220 by TheFox0x7 Change SlowCacheThreshold to 30 milliseconds so it doesn't trigger on non memory cache Closes: #33190 Closes: #32657 Co-authored-by: TheFox0x7 <[email protected]>

H0llyW00dzZ added the type/bug label Jan 10, 2025

TheFox0x7 mentioned this issue Jan 11, 2025

fix(cache): cache test triggered by non memory cache #33220

Merged

lunny closed this as completed in #33220 Jan 11, 2025

lunny closed this as completed in 8c6d707 Jan 11, 2025

GiteaBot pushed a commit to GiteaBot/gitea that referenced this issue Jan 11, 2025

fix(cache): cache test triggered by non memory cache (go-gitea#33220)

b95a4ed

Change SlowCacheThreshold to 30 milliseconds so it doesn't trigger on non memory cache Closes: go-gitea#33190 Closes: go-gitea#32657

GiteaBot mentioned this issue Jan 11, 2025

fix(cache): cache test triggered by non memory cache (#33220) #33221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect "slow" response message in Cache Configuration test #33190

Incorrect "slow" response message in Cache Configuration test #33190

H0llyW00dzZ commented Jan 10, 2025 •

edited

Loading

wxiaoguang commented Jan 10, 2025

H0llyW00dzZ commented Jan 10, 2025 •

edited

Loading

TheFox0x7 commented Jan 10, 2025

lunny commented Jan 10, 2025

H0llyW00dzZ commented Jan 11, 2025

wxiaoguang commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

wxiaoguang commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

wxiaoguang commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

TheFox0x7 commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

Incorrect "slow" response message in Cache Configuration test #33190

Incorrect "slow" response message in Cache Configuration test #33190

Comments

H0llyW00dzZ commented Jan 10, 2025 • edited Loading

Description

Gitea Version

Can you reproduce the bug on the Gitea demo site?

Log Gist

Screenshots

Git Version

Operating System

How are you running Gitea?

Database

wxiaoguang commented Jan 10, 2025

H0llyW00dzZ commented Jan 10, 2025 • edited Loading

TheFox0x7 commented Jan 10, 2025

lunny commented Jan 10, 2025

H0llyW00dzZ commented Jan 11, 2025

wxiaoguang commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

wxiaoguang commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

wxiaoguang commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

TheFox0x7 commented Jan 11, 2025

H0llyW00dzZ commented Jan 11, 2025

H0llyW00dzZ commented Jan 10, 2025 •

edited

Loading

H0llyW00dzZ commented Jan 10, 2025 •

edited

Loading