-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for reading legacy date in Hive for Parquet #24611
base: master
Are you sure you want to change the base?
Support for reading legacy date in Hive for Parquet #24611
Conversation
@@ -281,7 +281,7 @@ public static ReaderPageSource createPageSource( | |||
exception -> handleException(dataSourceId, exception), | |||
// We avoid using disjuncts of parquetPredicate for page pruning in ParquetReader as currently column indexes | |||
// are not present in the Parquet files which are read with disjunct predicates. | |||
parquetPredicates.size() == 1 ? Optional.of(parquetPredicates.get(0)) : Optional.empty(), | |||
parquetPredicates.size() == 1 ? Optional.of(parquetPredicates.getFirst()) : Optional.empty(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How about getOnly
@@ -265,7 +264,7 @@ public static ReaderPageSource createPageSource( | |||
List<HiveColumnHandle> baseColumns = readerProjections.map(projection -> | |||
projection.get().stream() | |||
.map(HiveColumnHandle.class::cast) | |||
.collect(toUnmodifiableList())) | |||
.toList()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toImmutableList
@@ -255,38 +255,27 @@ public Block getBlock(Page sourcePage, long startRowId) | |||
} | |||
} | |||
|
|||
private static class SourceColumn | |||
private record SourceColumn(int sourceChannel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general record
can be used for POJO but this implements ColumnAdaptation
which encapsulates function like details like getBlock
which is something beyond POJO so do we need this record conversion ?
private record CoercedColumn(SourceColumn sourceColumn, TypeCoercer<?, ?> typeCoercer) | ||
implements ColumnAdaptation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -125,7 +125,7 @@ public ColumnReader create(PrimitiveField field, AggregatedMemoryContext aggrega | |||
} | |||
if (DATE.equals(type) && primitiveType == INT32) { | |||
if (annotation == null || annotation instanceof DateLogicalTypeAnnotation) { | |||
return createColumnReader(field, valueDecoders::getIntDecoder, INT_ADAPTER, memoryContext); | |||
return createColumnReader(field, valueDecoders::getInt32Decoder, INT_ADAPTER, memoryContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for date
type right ?
@@ -36,6 +36,7 @@ public class ParquetReaderOptions | |||
private final boolean useBloomFilter; | |||
private final DataSize smallFileThreshold; | |||
private final boolean vectorizedDecodingEnabled; | |||
private final boolean hybridCalendarEnabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: useHybridCalendar
return julianDays; | ||
} | ||
|
||
public static int convertDaysToHybridCalendar(int prolepticDays) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used only for testing ? In that case - can we move to test class ?
private static final int[] JULIAN_GREGORIAN_DIFFS = {2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, 0}; | ||
private static final int[] JULIAN_GREGORIAN_DIFF_SWITCH_DAY = { | ||
-719164, -682945, -646420, -609895, -536845, | ||
-500320, -463795, -390745, -354220, -317695, | ||
-244645, -208120, -171595, -141427 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we build a RangeMap and maintain it ? It would allow us to identify on which set of ranges which dates should we consider ?
else if (julianDays < LAST_SWITCH_JULIAN_DAY) { | ||
return rebaseDays(JULIAN_GREGORIAN_DIFF_SWITCH_DAY, JULIAN_GREGORIAN_DIFFS, julianDays); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a dedicated conversion for rebase - Can we use convertHybridDaysToProlepticDays
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we can, however rebaseDays method will be faster
if (shouldUseHybridCalendarForDate) { | ||
return createColumnReader(field, encoding -> getDateInt32Decoder(valueDecoders.getInt32Decoder(encoding)), INT_ADAPTER, memoryContext); | ||
} | ||
else { | ||
return createColumnReader(field, valueDecoders::getInt32Decoder, INT_ADAPTER, memoryContext); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement it as a ColumnAdaptation ? In this case the Date/Timestamp conversion can be unified and we could use them for ORC as well.
9c61f71
to
27ef0fc
Compare
Description
#23904
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: