The error message for rook-ceph-mon-d is:debug 2024-09-29T05:56:05.330+0000 7f8b5543f700 1 mon.d@2(electing) e5 handle_auth_request failed to assign global_id #12 #329

shuaigea · 2024-09-29T08:51:55Z

[WRN] Health check update: 10 slow ops, oldest one blocked for 56 sec, mon.d has slow ops (SLOW_OPS)

9/29/24 4:31:51 PM [WRN] Health check update: 8 slow ops, oldest one blocked for 51 sec, mon.d has slow ops (SLOW_OPS)

9/29/24 4:31:51 PM [WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 46 sec, mon.d has slow ops

9/29/24 4:31:51 PM [WRN] Health detail: HEALTH_WARN 2 slow ops, oldest one blocked for 46 sec, mon.d has slow ops
bash-4.4$ ceph -s
cluster:
id: 93cb51f5-56d6-4045-87d6-6e37d861a83e
health: HEALTH_WARN
1/3 mons down, quorum a,c

services:
mon: 3 daemons, quorum a,c,d (age 0.186447s)
mgr: b(active, since 9w), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 2d), 6 in (since 3d)

data:
volumes: 1/1 healthy
pools: 3 pools, 49 pgs
objects: 43.32k objects, 65 GiB
usage: 204 GiB used, 11 TiB / 11 TiB avail
pgs: 49 active+clean

io:
client: 136 KiB/s rd, 1005 KiB/s wr, 4 op/s rd, 64 op/s wr

subhamkrai · 2024-09-30T03:43:32Z

@shuaigea are you trying to restore the mon quorum and having some issues? It's not clear from the issue description

shuaigea · 2024-10-08T06:03:02Z

@shuaigea您是否正在尝试恢复 mon quorum 并遇到一些问题？问题描述不清楚

Hello, the problem occurred when I added a new hard drive during expansion. The new hard drive has a different transfer speed from the original Jiu hard drive of the same brand. The Jiu hard drive has a read speed of 7000mb/s, which is indeed fast, but the new machine has a read speed of 3500mb/s, which is not satisfactory. However, there is still a slow read alarm. After investigation, it was found that the read speed of the new hard drive is different from that of Jiu hard drive. I would like to know if there is a clear indication during expansion that it should have the same transfer speed? Or is it necessary to use the same brand of hard drive and read/write speed as Jiu to maintain consistency in order to avoid slow reading within what acceptable range of differences in reading speed?

subhamkrai · 2024-10-08T12:26:17Z

@shuaigea您是否正在尝试恢复 mon quorum 并遇到一些问题？问题描述不清楚

Hello, the problem occurred when I added a new hard drive during expansion. The new hard drive has a different transfer speed from the original Jiu hard drive of the same brand. The Jiu hard drive has a read speed of 7000mb/s, which is indeed fast, but the new machine has a read speed of 3500mb/s, which is not satisfactory. However, there is still a slow read alarm. After investigation, it was found that the read speed of the new hard drive is different from that of Jiu hard drive. I would like to know if there is a clear indication during expansion that it should have the same transfer speed? Or is it necessary to use the same brand of hard drive and read/write speed as Jiu to maintain consistency in order to avoid slow reading within what acceptable range of differences in reading speed?

I'm not sure about this one @travisn @BlaineEXE do you have any about above

BlaineEXE · 2024-10-08T15:25:13Z

I still don't understand what the problem is. My intuition from reading between the lines is that the disk in question is being used for mon and not osd, but I can't be sure.

shuaigea · 2024-10-09T07:08:08Z

@BlaineEXE It is indeed a problem with mon that is causing slow queries now, but I am not sure if my newly added hard drive is not of the same brand and has the same read and write capabilities as the original disk. My current solution is to kick out the newly added mon. 5 and restore it, but I still have doubts about future expansion. Should the storage capacity be consistent with the original hard drive performance?

BlaineEXE · 2024-10-09T18:52:44Z

I still don't quite have a full enough understanding to help out here. I know you have added a new hard drive, but hard drives have multiple uses for Ceph, and I can't know exactly how the new drive is getting used.

If the new drive was used for an OSD, that shouldn't affect mons, so we will have to look into other causes and other cluster info.

If the new drive was used for either the mon PVC, or if it has dataDirHostPath on it, then it could affect the mon.

The only guidance we have from the Ceph documentation on mon disks is this:

It is strongly suggested that (enterprise-class) SSDs are provisioned for, at a minimum, Ceph Monitor

The Ceph project doesn't state specific throughput requirements, so I can't say for sure whether the disk's throughput is an issue or not. It is possible that the new disk is simply a faulty (or partly faulty) unit from the factory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The error message for rook-ceph-mon-d is:debug 2024-09-29T05:56:05.330+0000 7f8b5543f700 1 mon.d@2(electing) e5 handle_auth_request failed to assign global_id #12 #329

The error message for rook-ceph-mon-d is:debug 2024-09-29T05:56:05.330+0000 7f8b5543f700 1 mon.d@2(electing) e5 handle_auth_request failed to assign global_id #12 #329

shuaigea commented Sep 29, 2024

subhamkrai commented Sep 30, 2024

shuaigea commented Oct 8, 2024

subhamkrai commented Oct 8, 2024

BlaineEXE commented Oct 8, 2024

shuaigea commented Oct 9, 2024

BlaineEXE commented Oct 9, 2024

The error message for rook-ceph-mon-d is:debug 2024-09-29T05:56:05.330+0000 7f8b5543f700 1 mon.d@2(electing) e5 handle_auth_request failed to assign global_id #12 #329

The error message for rook-ceph-mon-d is:debug 2024-09-29T05:56:05.330+0000 7f8b5543f700 1 mon.d@2(electing) e5 handle_auth_request failed to assign global_id #12 #329

Comments

shuaigea commented Sep 29, 2024

subhamkrai commented Sep 30, 2024

shuaigea commented Oct 8, 2024

subhamkrai commented Oct 8, 2024

BlaineEXE commented Oct 8, 2024

shuaigea commented Oct 9, 2024

BlaineEXE commented Oct 9, 2024