-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow flexible column names #27
Conversation
9c5fd2b
to
e26c028
Compare
Resolve conflicts @psainics |
@@ -25,7 +25,7 @@ | |||
import com.google.cloud.bigquery.JobStatistics; | |||
import com.google.cloud.bigquery.Table; | |||
import com.google.cloud.bigquery.TimePartitioning; | |||
import com.google.cloud.hadoop.io.bigquery.output.BigQueryTableFieldSchema; | |||
//import com.google.cloud.hadoop.io.bigquery.output.BigQueryTableFieldSchema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done !
@@ -344,9 +344,12 @@ public void validate(@Nullable Schema inputSchema, @Nullable Schema outputSchema | |||
String name = field.getName(); | |||
// BigQuery column names only allow alphanumeric characters and _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove old doc reference and this old comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not old, as per docs the column names cannot have special chars , the special chars are part of flexible-column-names , that is still in preview.
I suggest to keep both docs until BQ docs merges the 2 different concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case write one comment on top mentioning keeping these comments as it is in preview, will be removed after GA
// If the field name is not in english characters, then we will use json format | ||
// We do this as the avro load job in BQ does not support non-english characters in field names for now | ||
String fieldName = field.getName(); | ||
if (!Pattern.matches("[\\w]+", fieldName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a variable for this regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String wordRegex = "[\\w]+";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a static final variable with name something like COLUMN_NAME_REGEX
src/main/java/io/cdap/plugin/gcp/bigquery/sink/lib/BigQueryOutputConfiguration.java
Outdated
Show resolved
Hide resolved
src/main/java/io/cdap/plugin/gcp/bigquery/sink/lib/BigQueryOutputConfiguration.java
Show resolved
Hide resolved
1b927fc
to
ba0e4ac
Compare
// If the field name is not in english characters, then we will use json format | ||
// We do this as the avro load job in BQ does not support non-english characters in field names for now | ||
String fieldName = field.getName(); | ||
String wordRegex = "[\\w]+"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this cover all special characters like underscore or digits that were supported previously by BQ. If not use the same REGEX that was given in bigqueryConnector library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not cover everything this only checks for [a-z A-Z 0-9],
we are doing a negative here, so it means as long as we don't have a character outside of [a-z A-Z 0-9] we can use avro format, else we use the JSON format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of underscore also, it will write into json then, which is not required
3edc1da
to
88bb9ca
Compare
a5bdff1
to
b24053f
Compare
514a93e
to
b6d212c
Compare
b6d212c
to
cb76da5
Compare
Allow flexible column names (Japanese Characters)
Jira : PLUGIN-1718
Let the user enter non English characters as column names.
A column name can contain the letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. For more flexible column name support, see flexible column names.
Code change
Unit Test
WithSpecialCharacter
WithNumbers
WithCapitalLetters
WithDash
WithUnderscore
WithEmoji
WithSpace
WithJapaneseColumnName
WithInvalidColumnName
WithChineseColumnName
WithValidColumnName
WithChineseColumnName
WithInvalidColumnName
WithJapaneseColumnName
WithSpace
WithEmoji
WithUnderscore
WithDash
WithCapitalLetters
WithNumbers
WithSpecialCharacter
With300Length
With301Length