Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [json-inverted] Datanode panic: etcdserver: request is too large when inserting json data with a large number of keys #39130

Open
1 task done
ThreadDao opened this issue Jan 9, 2025 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: JsDove-optimization_json-ac45a92-20250109
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. collection has 3 fields: int64 pk + 128dim vector + json fields
  2. crrate index for vector fields
  3. insert data, each json entity has more than 1000+ json keys
        _nb = len(ids)
        json_key_num = 1000
        max_length = 5
        num_key_20 = int(_nb * 0.2)
        num_key_40 = int(_nb * 0.4)
        num_key_60 = int(_nb * 0.6)
        num_key_80 = int(_nb * 0.8)

        values = [{f"key_{k}": i for k in range(json_key_num)} for i in ids]

        for i in range(num_key_80):
            values[i]["extra_80"] = str(ids[i]).zfill(max_length)
        for i in range(num_key_60):
            values[i]["extra_60"] = str(ids[i]).zfill(max_length)
        for i in range(num_key_40):
            values[i]["extra_40"] = str(ids[i]).zfill(max_length)
        for i in range(num_key_20):
            values[i]["extra_20"] = str(ids[i]).zfill(max_length)
  1. dataNode panic
    dn_7r4k7.log
[2025/01/09 13:30:13.172 +00:00] [INFO] [syncmgr/meta_writer.go:75] [SaveBinlogPath] [SegmentID=455193845997583737] [CollectionID=455193845995911140] [ParitionID=455193845995911141] [startPos="[]"] [checkPoints="[{\"segmentID\":455193845997583737,\"position\":{\"channel_name\":\"json-inverted-op-2-rootcoord-dml_0_455193845995911140v0\",\"msgID\":\"CCcQ5zYYACAAMAE=\",\"msgGroup\":\"datanode-53-json-inverted-op-2-rootcoord-dml_0_455193845995911140v0-true\",\"timestamp\":455194507661017091},\"num_of_rows\":100000}]"] [binlogNum=5] [statslogNum=1] [deltalogNum=0] [bm25logNum=0] [vChannelName=json-inverted-op-2-rootcoord-dml_0_455193845995911140v0]
[2025/01/09 13:30:13.175 +00:00] [INFO] [syncmgr/task.go:206] ["task done"] [collectionID=455193845995911140] [partitionID=455193845995911141] [segmentID=455193845997583737] [channel=json-inverted-op-2-rootcoord-dml_0_455193845995911140v0] [level=L1] [flushedSize=34966136] [timeTaken=58.340948ms]
[2025/01/09 13:30:13.349 +00:00] [WARN] [syncmgr/meta_writer.go:161] ["failed to DropChannel"] [channel=json-inverted-op-2-rootcoord-dml_2_455185638928886944v0] [error="etcdserver: request is too large"]
[2025/01/09 13:30:13.349 +00:00] [ERROR] [writebuffer/write_buffer.go:677] ["failed to drop channel"] [collectionID=455185638928886944] [channel=json-inverted-op-2-rootcoord-dml_2_455185638928886944v0] [error="etcdserver: request is too large"] [stack="github.com/milvus-io/milvus/internal/flushcommon/writebuffer.(*writeBufferBase).Close\n\t/go/src/github.com/milvus-io/milvus/internal/flushcommon/writebuffer/write_buffer.go:677\ngithub.com/milvus-io/milvus/internal/flushcommon/writebuffer.(*bufferManager).DropChannel\n\t/go/src/github.com/milvus-io/milvus/internal/flushcommon/writebuffer/manager.go:274\ngithub.com/milvus-io/milvus/internal/flushcommon/pipeline.(*writeNode).Operate\n\t/go/src/github.com/milvus-io/milvus/internal/flushcommon/pipeline/flow_graph_write_node.go:126\ngithub.com/milvus-io/milvus/internal/util/flowgraph.(*nodeCtxManager).workNodeStart\n\t/go/src/github.com/milvus-io/milvus/internal/util/flowgraph/node.go:131"]
panic: etcdserver: request is too large

goroutine 599 gp=0xc002b7d500 m=0 mp=0xa1478a0 [running]:
panic({0x6896820?, 0xc001915650?}) 
    /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:779 +0x158 fp=0xc001575708 sp=0xc001575658 pc=0x2000b58
github.com/milvus-io/milvus/internal/flushcommon/writebuffer.(*writeBufferBase).Close(0xc000b3e7e0, {0x7437db0, 0xa1db1e0}, 0x1)
    /go/src/github.com/milvus-io/milvus/internal/flushcommon/writebuffer/write_buffer.go:679 +0x70c fp=0xc001575a10 sp=0xc001575708 pc=0x552ac6c
github.com/milvus-io/milvus/internal/flushcommon/writebuffer.(*l0WriteBuffer).Close(0xc000040e10?, {0x7437db0?, 0xa1db1e0?}, 0xd0?)
    <autogenerated>:1 +0x29 fp=0xc001575a40 sp=0xc001575a10 pc=0x552e929
github.com/milvus-io/milvus/internal/flushcommon/writebuffer.(*bufferManager).DropChannel(0xc000f46850?, {0xc0010af640, 0x37})
    /go/src/github.com/milvus-io/milvus/internal/flushcommon/writebuffer/manager.go:274 +0x12d fp=0xc001575ac0 sp=0xc001575a40 pc=0x552342d
github.com/milvus-io/milvus/internal/flushcommon/pipeline.(*writeNode).Operate(0xc002c223c0, {0xc002bdc6c0?, 0xc002be4390?, 0xc0011a09a0?})
    /go/src/github.com/milvus-io/milvus/internal/flushcommon/pipeline/flow_graph_write_node.go:126 +0xafa fp=0xc001575ee0 sp=0xc001575ac0 pc=0x55c011a
github.com/milvus-io/milvus/internal/util/flowgraph.(*nodeCtxManager).workNodeStart(0xc002c16e70)
    /go/src/github.com/milvus-io/milvus/internal/util/flowgraph/node.go:131 +0x278 fp=0xc001575fc8 sp=0xc001575ee0 pc=0x559f798
github.com/milvus-io/milvus/internal/util/flowgraph.(*nodeCtxManager).Start.gowrap1()
    /go/src/github.com/milvus-io/milvus/internal/util/flowgraph/node.go:95 +0x25 fp=0xc001575fe0 sp=0xc001575fc8 pc=0x559f4e5
runtime.goexit({})
    /go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc001575fe8 sp=0xc001575fe0 pc=0x203f9c1
created by github.com/milvus-io/milvus/internal/util/flowgraph.(*nodeCtxManager).Start in goroutine 437 
    /go/src/github.com/milvus-io/milvus/internal/util/flowgraph/node.go:95 +0x138

goroutine 1 gp=0xc0000061c0 m=nil [chan receive]:
runtime.gopark(0xfb8be079a7e1de?, 0xc00026afc0?, 0x63?, 0xf4?, 0x7f2e9ae2a1b8?)
    /go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:402 +0xce fp=0xc0040f5590 sp=0xc0040f5570 pc=0x20052ce
runtime.chanrecv(0xc000b228a0, 0x0, 0x1)

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

pods:

json-inverted-op-2-milvus-datanode-6d88545cd6-7r4k7               0/1     CrashLoopBackOff         7 (4m22s ago)   47m     10.104.14.236   4am-node18   <none>           <none>
json-inverted-op-2-milvus-indexnode-5bffdff8b6-2l427              1/1     Running                  0               60m     10.104.9.168    4am-node14   <none>           <none>
json-inverted-op-2-milvus-indexnode-5bffdff8b6-5czcg              1/1     Running                  0               57m     10.104.15.115   4am-node20   <none>           <none>
json-inverted-op-2-milvus-indexnode-5bffdff8b6-bzbx6              1/1     Running                  0               58m     10.104.26.36    4am-node32   <none>           <none>
json-inverted-op-2-milvus-indexnode-5bffdff8b6-cw92x              1/1     Running                  0               56m     10.104.30.68    4am-node38   <none>           <none>
json-inverted-op-2-milvus-indexnode-5bffdff8b6-mrdmv              1/1     Running                  0               59m     10.104.19.194   4am-node28   <none>           <none>
json-inverted-op-2-milvus-mixcoord-58f44fdb9d-tbz27               1/1     Running                  0               55m     10.104.26.38    4am-node32   <none>           <none>
json-inverted-op-2-milvus-proxy-86c6d449fd-pxczp                  1/1     Running                  0               46m     10.104.13.231   4am-node16   <none>           <none>
json-inverted-op-2-milvus-querynode-0-588bd5b895-2dhnr            1/1     Running                  0               53m     10.104.16.187   4am-node21   <none>           <none>
json-inverted-op-2-milvus-querynode-0-588bd5b895-2tt7r            1/1     Running                  0               50m     10.104.23.191   4am-node27   <none>           <none>
json-inverted-op-2-milvus-querynode-0-588bd5b895-4hz8b            1/1     Running                  0               52m     10.104.26.39    4am-node32   <none>           <none>
json-inverted-op-2-milvus-querynode-0-588bd5b895-4tptm            1/1     Running                  0               54m     10.104.30.69    4am-node38   <none>           <none>
json-inverted-op-2-milvus-querynode-0-588bd5b895-cl859            1/1     Running                  0               51m     10.104.21.122   4am-node24   <none>           <none>
json-inverted-op-2-milvus-querynode-0-588bd5b895-jdn7x            1/1     Running                  0               49m     10.104.27.176   4am-node31   <none>           <none>
json-inverted-op-2-milvus-querynode-0-588bd5b895-vrg8x            1/1     Running                  0               48m     10.104.15.125   4am-node20   <none>           <none>

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 9, 2025
@yanliang567
Copy link
Contributor

/assign @JsDove
/unassign

@sre-ci-robot sre-ci-robot assigned JsDove and unassigned yanliang567 Jan 10, 2025
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 10, 2025
@yanliang567 yanliang567 added this to the 2.5.3 milestone Jan 10, 2025
@JsDove
Copy link
Contributor

JsDove commented Jan 10, 2025

Tantivy will generate multiple segments, and each segment will contain files like.idx and.pos, which leads to an increase in the size of SegmentInfo and thus exceeds the upper limit of etcd.
The current solution is to adjust the memory size of Tantivy and generate only one file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants