Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[source-mysql] Sync Stops / Infinite Hanging on UI - Out of Memory because of Zero Dates in Table Column #51014

Open
1 task
somativa-mauricio-macri opened this issue Jan 9, 2025 · 3 comments
Labels
area/connectors Connector related issues community connectors/source/mysql team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working

Comments

@somativa-mauricio-macri
Copy link

Connector Name

source-mysql

Connector Version

3.9.4

What step the error happened?

During the sync

Relevant information

Issue

Hi!

We are using airbyte to sync alot of different types of sources (MSSQL, MySQL, CSV Files, among others) and we are having an issue using the MySQL Source.

We noticed that some of our tables have "Zero Dates" ('0000-00-00' for example) in them, instead of null. We noticed that the source has the behaviour of changing these invalid dates to NULL when writing the DESTINATION / OUTPUT. <<< This is ideal! It's something that we want to happen.

The issue in question is that, when the SYNC is running, and a table has TOO many Zero Date rows, we noticed that the SYNC log gets bombed with INFO logs about each row and, after a while, the sync "freezes". Stops generating more logs.

We managed to read the log by accessing the "airbyte-abctl-control-plane" container, created by ABCTL, and via "kubectl logs -n airbyte-abctl replication-job-xxxx-attempt-x -f" command, we reached the end of the execution and its displayed that and OUT OF MEMORY has occured and the processed was killed.

After this OOM problem, the only way to "move" past is canceling the SYNC in the airbyte UI.

We got this problem so far from two distinct MySQL database, and in both cases, there were tables with millions or rows that have this kind of Zero Date problem...

We think that some variable / state / cache or something is accumullating when treating these database rows in general, and after a while, its running out of memory in the process that was created...

We didn't got this kind of behaviour when syncing other tables... We even made a test where a connection with 20 tables was not syncing, and after removing the only table with Zero Dates (2 columns ~ Aproximate 1.2 million rows), and running all the other 19, the sync was successfull.

Airbyte Version

1.3.1 - Latest install using ABCTL

Abctl Version

v.0.24.0

MySQL Source Version

3.9.4

Example of INFO generated before the Out of Memory error

2025-01-09 17:56:13,054 [pool-9-thread-4] INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):314 - INFO DefaultDispatcher-worker-11#cps_brasil.produtos-round-1-partition-1-read i.a.c.r.JdbcSelectQuerier$Result(next):123 Failed to serialize column: validade_ca, of type OffsetDateTimeFieldType, with error Zero date value prohibited
2025-01-09 17:56:13,054 [pool-9-thread-4] INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):314 - INFO DefaultDispatcher-worker-11#cps_brasil.produtos-round-1-partition-1-read i.a.c.r.JdbcSelectQuerier$Result(next):123 Failed to serialize column: data_retorno, of type OffsetDateTimeFieldType, with error Zero date value prohibited
2025-01-09 17:56:13,054 [pool-9-thread-4] INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):314 - INFO DefaultDispatcher-worker-11#cps_brasil.produtos-round-1-partition-1-read i.a.c.r.JdbcSelectQuerier$Result(next):123 Failed to serialize column: validade_ca, of type OffsetDateTimeFieldType, with error Zero date value prohibited
Terminating due to java.lang.OutOfMemoryError: Java heap space

Images

I'm sending together with this message, 3 images... 1 showing the pod as "completed", 1 showing the error inside the log, 1 showing the SYNC "hanging" in the airbyte UI after the Out of Memory Error.

1
Image

2
Image

3
Image

Relevant log output

Contribute

  • Yes, I want to contribute
@marcosmarxm
Copy link
Member

@airbytehq/jdbc-connectors can someone take a look at this issue?

@marcosmarxm
Copy link
Member

@somativa-mauricio-macri can you provide information about your server (cpu/disk/memory) is available to run the sync as the database size?

@somativa-mauricio-macri
Copy link
Author

@marcosmarxm

Atm the server which airbyte is installed have: 16 core, 32GB RAM, 200gb+ disk space available.

One of the MySQL databases in question, the one used in the images, have 20GB in size at the moment. The largest table, is 12GB, and is not in the sync. The largest one used in the sync has 700MB and is synced fine.

The table that i mentioned that we saw that has is contributing with the issue has the size of 155MB (1m+ rows - produtos table).

I got the DB Size information via DBEAVER, i got a image from the interface:

Image

We also have other MySQL databases that are larger than this one that we will need to sync, and we managed to sync other larger tables... but are seeing this Out Of Memory when we are reading tables with too many Zero Dates...

I don't know if it helps but we also managed to SYNC over 100M+ rows (80GB) of data when reading from MS SQL Server... Without an issue. I would'nt say the server specs are the issue based on this other cases that we got.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues community connectors/source/mysql team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants