Read & Write race condition #75

captainjapeng · 2022-06-06T18:45:41Z

Upon further investigation from jaydenseric/graphql-upload#318 there are chances that fs.read might return zero bytes when processing a lot of files. I was able to replicate this issue at 100, 10248, 102416, 102416+1, 102416*2 byte files.

I have pinned point the issue to the line of codes https://github.com/mike-marcacci/fs-capacitor/blob/main/src/index.ts#L61-L70 basically it ends the stream whenever the write stream has already finished even if the read stream has not yet fully read the file as seen on the image.

…arcacci#75

mike-marcacci · 2022-06-10T04:19:36Z

This is a really good catch – I do wish that there was an official way to make this check (both the initial implementation and your fix rely on private/undocumented properties), but I know that at the time of original authorship at least, there wasn't one.

Just to clarify your reproduction of this, were payloads of the listed sizes able to consistently produce this error, or was it still nondeterministic? I'm just thinking in terms of adding tests for this case.

captainjapeng · 2022-06-10T04:31:22Z

Hi @mike-marcacci, the payload size is irrelevant. I've tried it with 100 bytes, 16KiB, 16KiB + 1B, and if I recall, another higher value.

The issue is really non-deterministic so I can't think of a way to create a test for it also. I still don't understand what can cause the fs.read to return with zero bytes without any error my only hunch was that it runs first before write was called and the read callback runs after that.

I was only really able to replicate it during a high concurrency scenario on my local machine, but as discussed on the issue on graphql-upload, it occurs on a low traffic production deployment as well on GKE so probably it can also be caused by a slow disk although our current setup all uses SSD persistent disks.

captainjapeng · 2022-06-10T04:42:43Z

If you'll modify this repo to add to a counter https://github.com/captainjapeng/graphql-upload-zero-buffer like in the image below, you'll see that sometimes error occurs at around less than 1000 index but can occur on the last end of the load test as well. One thing to take note of is that the problem is less reproducible when the node debugger is turned on probably because it adds overhead and makes everything slow.

Another thing is that sometimes data index printed is less than the index with the zero buffer, so can it be related to max number of open files?

data 38
data 39
data 40
buffer length 42 0

…arcacci#75

mike-marcacci · 2022-06-10T05:33:40Z

Ah this is really helpful – thanks for all the additional details. I doubt there's any sort of tests we can add to protect against regressions in future versions of node, but I'm glad to have this documented here.

haggholm · 2023-04-26T23:38:18Z

I doubt there's any sort of tests we can add to protect against regressions in future versions of node

This could potentially be a lead:

I encountered what appears to be exactly this bug. The fix is only in the latest versions of fs-capacitor, which are ESM only, and I'm not currently able to migrate the project I'm working on to ESM.

Since I wanted to prove to myself that upgrading to the latest version of fs-capacitor (by copying it into my project, since I can't import it), I had to try to find some way of provoking the issue. Based on your earlier discussion, I tried things like using an SMB-mounted NAS to get a slower filesystem, and so on. It worked, but not deterministically, and (worse yet) with a pretty low success rate ("success" being defined as exposing the bug).

What does seem to work for me is simply concurrency. The below isn't a reusable use case for you, but the gist should be pretty obvious; it references a MultiStream which uses fs-capacitor for internal buffering. It's definitely not perfect—it's still probabilistic—but the below case fails a majority of the time with version 7, and never fails with version 8 of fs-capacitor.

it("...", async () => {
    // This parameter is critical. There's a bug in older versions of fs-capacitor that causes
    // some bytes to be potentially missed; guard against any possible version (or upstream) regression.
    const N = 15;
    const streams = Array.from({ length: N }).map(() => getMultiStream());

    const results = await Promise.all(streams.map((s) => new LazyPipeline(s).toArray()));
    expect(results).toEqual(Array.from({ length: N }).map(() => objects.flat()));
  },
)

By the way, if the above is of any help—and even if it isn't—would you consider backporting the fix for this bug to a version of the library that's compatible with CommonJS? ESM may be the way of a future, but an awful lot of people work on projects stuck in CJS land. Right now, nearly 80% of your users are using a version of your library with a known data corruption bug, which is not maximally user friendly.

captainjapeng added a commit to captainjapeng/fs-capacitor that referenced this issue Jun 6, 2022

check if read stream has consumed all the written bytes to fix mike-m…

e7144c6

…arcacci#75

captainjapeng mentioned this issue Jun 6, 2022

Check if read stream has consumed all the written bytes before ending #76

Merged

captainjapeng added a commit to captainjapeng/fs-capacitor that referenced this issue Jun 10, 2022

check if read stream has consumed all the written bytes to fix mike-m…

6926928

…arcacci#75

mike-marcacci closed this as completed in #76 Jun 10, 2022

haggholm mentioned this issue Apr 26, 2023

Backport data corruption bugfix (#75, #76) to CJS #94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read & Write race condition #75

Read & Write race condition #75

captainjapeng commented Jun 6, 2022

mike-marcacci commented Jun 10, 2022

captainjapeng commented Jun 10, 2022

captainjapeng commented Jun 10, 2022

mike-marcacci commented Jun 10, 2022

haggholm commented Apr 26, 2023

Read & Write race condition #75

Read & Write race condition #75

Comments

captainjapeng commented Jun 6, 2022

mike-marcacci commented Jun 10, 2022

captainjapeng commented Jun 10, 2022

captainjapeng commented Jun 10, 2022

mike-marcacci commented Jun 10, 2022

haggholm commented Apr 26, 2023