Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column Semantic Type Set to Null if First Chunk of Data is All Null #73

Open
ogiorgil opened this issue Mar 6, 2024 · 1 comment
Open

Comments

@ogiorgil
Copy link
Contributor

ogiorgil commented Mar 6, 2024

In the implementation of checkSpatialTemporal (once #71 is merged), we determine a column's spatial/temporal-ness (semantic type) based on the first ProfilerConfig.NUM_RECORD_READ data. If all these data are null, the column's semantic type will be considered NONE, even though there may exist non-null values later on in the table.

We could either drop all non-null values before passing data into the PreAnalyzer or modify the estimateSemanticType function to retry the determination of a column's semantic type if all read values were null.

@luthfibalaka
Copy link
Contributor

luthfibalaka commented Mar 14, 2024

Is this issue solved already? I tried running it on a csv file by changing NUM_RECORD_READ to 1 (all values in the first row of the csv file are null), but there is no issue for labeling the column. Perhaps you have a way to reproduce the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants