Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50558][SQL] Introduce simpleString for ExpressionSet #49650

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

olaky
Copy link
Contributor

@olaky olaky commented Jan 24, 2025

What changes were proposed in this pull request?

  • Introduce a simpleString method equal to the one for Expression and add it to ExpressionSet
  • Use it for push down filter logging in DataSourceStrategy
  • Use if for after scan filter logging in FileSourceStrategy

Why are the changes needed?

Filter expressions can be arbitrarily large and should not be logged completely in these cases

Does this PR introduce any user-facing change?

No, logging is not user facing

How was this patch tested?

Added new tests

Was this patch authored or co-authored using generative AI tooling?

No

@olaky
Copy link
Contributor Author

olaky commented Jan 24, 2025

cc @MaxGekk

@github-actions github-actions bot added the SQL label Jan 24, 2025

/** Returns a length limited string that must be used for logging only. */
def simpleString(maxFields: Int): String =
s"Set(${truncatedString(originals.map(_.simpleString(maxFields)).toSeq, ", ", maxFields)})"
Copy link
Member

@MaxGekk MaxGekk Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you convert elements via simpleString even if it could be not needed. Let's imagine originals has 10000 elements, and maxFields is 5. In any case you convert all 10000 elements to strings.

How about to add override truncatedString w/ additional parameter: a function which converts T to String:

truncatedString(originals.toSeq, "Set(", ", ", ")", maxFields, _.simpleString(maxFields))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had the exact same in mind, will pick that up. Default value is toString

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-50558] Introduce simpleString for ExpressionSet [SPARK-50558][SQL] Introduce simpleString for ExpressionSet Jan 24, 2025
@olaky olaky requested a review from MaxGekk January 24, 2025 21:52
seq.take(numFields).map(customToString.get).mkString(start, sep, ending)
} else {
seq.take(numFields).mkString(start, sep, ending)
}
} else {
seq.mkString(start, sep, end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about applying the custom converter here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write a test for the case, please. Like top elements are less than maxFields, but some nested elem has more than maxFields.

def simpleString(maxFields: Int): String = {
val customToString = { e: Expression => e.simpleString(maxFields) }
s"Set(${SparkStringUtils.truncatedString(
seq = originals.toSeq, start = "", sep = ", ", end = "", maxFields, Some(customToString))})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just set start = "Set(", and end = ")"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants