Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add json data type #4619

Merged
merged 30 commits into from
Sep 9, 2024
Merged

feat: add json data type #4619

merged 30 commits into from
Sep 9, 2024

Conversation

CookiePieWw
Copy link
Collaborator

@CookiePieWw CookiePieWw commented Aug 26, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#3686 #4515 #4230

What's changed and what's your intention?

As title. Currently the datatype can store json strings as jsonb format. Also a simple udf is added to query json through a path. Now the udf is deleted. We can focus on the type first and add some udfs in the other PRs.

e.g.
image

Since the underlying storage of json is the same as binary, we cannot distinguish them inside datafusion, thus a comment is attached to the metadata of Field of json column for frontend to identify and convert jsonb to strings.

TODO:
Unit tests and sqlness tests.
UDFs.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

  • New Features
    • Introduced support for JSON data types throughout the application, including new functions for JSON handling.
    • Added a JsonFunction for enhanced JSON operations.
    • Expanded the functionality of the FUNCTION_REGISTRY to include JSON operations.
  • Improvements
    • Enhanced data handling for JSON in SQL parsing and data encoding across various components.
    • Updated data equality checks to support JSON types.
  • Chores
    • Added new dependencies related to JSON handling in various configuration files.

Copy link
Contributor

coderabbitai bot commented Aug 26, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce a new JSONB dependency to the project, enhancing data handling capabilities. Multiple files have been updated to support a new Json data type, allowing for better integration of JSON functionalities across various components, including data types, functions, and serialization logic. Additionally, several existing functions and structures were modified to accommodate this new data type, resulting in a more cohesive handling of JSON data throughout the codebase.

Changes

Files Change Summary
Cargo.toml, src/common/function/Cargo.toml, src/datatypes/Cargo.toml, src/servers/Cargo.toml, src/sql/Cargo.toml Added jsonb dependency and marked it as part of the workspace.
src/api/src/helper.rs, src/common/function/src/function_registry.rs, src/common/function/src/scalars.rs, src/common/function/src/scalars/json.rs, src/common/function/src/scalars/json/get_by_path.rs, src/datatypes/src/data_type.rs, src/datatypes/src/types/json_type.rs, src/datatypes/src/value.rs, src/mito2/src/row_converter.rs, src/query/src/dist_plan/merge_scan.rs, src/servers/src/mysql/writer.rs, src/servers/src/postgres/handler.rs, src/servers/src/postgres/types.rs, src/sql/src/statements.rs Enhanced handling of Json data type and related functions across multiple components.
src/datatypes/src/schema/column_schema.rs, src/datatypes/src/types/string_type.rs, src/datatypes/src/schema.rs, src/datatypes/src/type_id.rs, src/common/function/src/scalars/json/get_by_path.rs Introduced or modified functions and structures to support JSON data types without changing signatures.
tests/cases/standalone/common/insert/merge_mode.result, tests/cases/standalone/common/insert/merge_mode.sql Updated error messages and added comments for clarity in test cases.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant Database
    participant FunctionRegistry

    User->>API: Request data with JSON
    API->>FunctionRegistry: Call JsonFunction
    FunctionRegistry->>Database: Query with JSON data
    Database-->>FunctionRegistry: Return results
    FunctionRegistry-->>API: Processed JSON response
    API-->>User: Send JSON data
Loading

🐇 "In fields of code where rabbits play,
New JSON types hop in today!
With functions bright and data fair,
We dance in joy, no worries to bear!
Through crates and structs, we leap and bound,
In every line, new magic found!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Aug 26, 2024
@WenyXu
Copy link
Member

WenyXu commented Aug 26, 2024

The CI is failed, try make fmt-toml

Cargo.toml Outdated Show resolved Hide resolved
@CookiePieWw CookiePieWw marked this pull request as ready for review August 28, 2024 09:39
@CookiePieWw CookiePieWw requested review from evenyag, v0y4g3r, waynexia and a team as code owners August 28, 2024 09:39
@WenyXu
Copy link
Member

WenyXu commented Aug 28, 2024

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 28, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range, codebase verification and nitpick comments (5)
src/datatypes/src/data_type.rs (1)

169-169: Add a test case for Json in test_is_stringifiable.

The is_stringifiable method appears to be correctly implemented, but the test suite does not currently verify the Json variant. Adding a test case for ConcreteDataType::Json(_) in the test_is_stringifiable function will ensure that this new addition is properly tested and behaves as expected.

  • Consider adding: assert!(ConcreteDataType::json_datatype().is_stringifiable()); in test_is_stringifiable.
Analysis chain

Approve the update to is_stringifiable but recommend verifying implementation.

The update to the is_stringifiable method to include Json is crucial for correctly identifying JSON data types as stringifiable. However, it's recommended to verify that the method implementation is correct and does not introduce any issues.

The code changes are approved.

Run the following script to verify the implementation of the is_stringifiable method:

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of the `is_stringifiable` method for JSON data type.

# Test: Search for the function usage. Expect: Correct implementation and consistent behavior.
rg --type rust -A 5 $'is_stringifiable'

Length of output: 3672

src/servers/src/postgres/types.rs (2)

Line range hint 65-92: Enhance JSON handling in encode_value function.

The addition of the datatype parameter to the encode_value function is a significant change that allows the function to handle JSON values appropriately. The use of jsonb::to_string(v) for ConcreteDataType::Json(_) ensures that JSON data is encoded correctly. However, the handling of Value::Binary and Value::Json in the same match arm could be problematic if the encoding requirements for these types diverge in the future.

Consider separating the handling of Value::Binary and Value::Json to maintain clear separation of concerns and improve maintainability. Additionally, ensure that all possible errors from jsonb::to_string(v) are handled correctly to prevent runtime panics.

Separate the handling of Value::Binary and Value::Json for clarity and future-proofing. Ensure robust error handling for JSON encoding.


Line range hint 593-863: Expand test coverage for new JSON data type.

The addition of comprehensive tests for the new JSON data type in the test module is a positive step towards ensuring robustness and correctness of the implementation. The tests cover a variety of data types and ensure that each type is handled correctly by the encoding functions.

However, ensure that the tests also cover edge cases and error handling scenarios, particularly for JSON data types. This will help catch any potential issues before they affect production systems.

Would you like assistance in adding more comprehensive tests, particularly for edge cases and error handling scenarios involving JSON data?

src/datatypes/src/value.rs (2)

351-351: Handling of Json in scalar value conversion.

The code correctly treats Json similarly to Binary by converting the bytes into a ScalarValue::Binary. This is appropriate given that JSON data is stored in a binary format. However, consider explicitly documenting why JSON is treated similarly to binary data to avoid confusion.

Consider adding comments to explain why Json is handled similarly to Binary in scalar value conversions.


1361-1361: Data size estimation for Json in ValueRef.

The method estimates the size of JSON data by treating it similarly to binary data. This is appropriate given the binary nature of JSON storage. Ensure that this estimation is accurate or consider refining the estimation method if necessary.

Consider refining the data size estimation method for JSON to ensure accuracy, especially for larger or more complex JSON objects.

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3297d5f and 092ee0c.

Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
Files selected for processing (28)
  • Cargo.toml (1 hunks)
  • src/api/src/helper.rs (7 hunks)
  • src/common/function/Cargo.toml (1 hunks)
  • src/common/function/src/function_registry.rs (2 hunks)
  • src/common/function/src/scalars.rs (1 hunks)
  • src/common/function/src/scalars/json.rs (1 hunks)
  • src/common/function/src/scalars/json/get_by_path.rs (1 hunks)
  • src/common/grpc/src/select.rs (1 hunks)
  • src/datatypes/Cargo.toml (1 hunks)
  • src/datatypes/src/data_type.rs (6 hunks)
  • src/datatypes/src/schema.rs (1 hunks)
  • src/datatypes/src/schema/column_schema.rs (1 hunks)
  • src/datatypes/src/type_id.rs (2 hunks)
  • src/datatypes/src/types.rs (2 hunks)
  • src/datatypes/src/types/json_type.rs (1 hunks)
  • src/datatypes/src/types/string_type.rs (1 hunks)
  • src/datatypes/src/value.rs (19 hunks)
  • src/datatypes/src/vectors/eq.rs (1 hunks)
  • src/mito2/src/row_converter.rs (4 hunks)
  • src/query/src/dist_plan/merge_scan.rs (2 hunks)
  • src/servers/Cargo.toml (1 hunks)
  • src/servers/src/mysql/writer.rs (4 hunks)
  • src/servers/src/postgres/handler.rs (1 hunks)
  • src/servers/src/postgres/types.rs (7 hunks)
  • src/sql/Cargo.toml (1 hunks)
  • src/sql/src/statements.rs (4 hunks)
  • tests/cases/standalone/common/insert/merge_mode.result (2 hunks)
  • tests/cases/standalone/common/insert/merge_mode.sql (1 hunks)
Files skipped from review due to trivial changes (3)
  • src/datatypes/Cargo.toml
  • src/servers/Cargo.toml
  • tests/cases/standalone/common/insert/merge_mode.sql
Additional comments not posted (47)
src/common/function/src/scalars.rs (1)

18-18: New JSON module added.

The addition of the json module is a positive step towards enhancing JSON handling capabilities in the project.

The code changes are approved.

Verify the integration of the json module with other parts of the system to ensure smooth functionality.

Verification successful

Integration of the json module verified.

The json module is well-integrated into the codebase, with references found in various parts of the system. Key areas of integration include:

  • src/common/function/src/function_registry.rs: Direct usage of crate::scalars::json::JsonFunction.
  • src/common/datasource/src/file_format/json.rs: Integration with file format handling.
  • src/operator/src/statement/copy_table_to.rs and copy_table_from.rs: Usage in data operations.

The integration appears smooth, with no conflicts detected.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify integration of the `json` module with other parts of the system.

# Test: Search for references to the `json` module in the project. Expect: Proper integration without conflicts.
rg --type rust $'use .*json.*'

Length of output: 3191

src/sql/Cargo.toml (1)

29-29: New JSONB dependency added.

The addition of the jsonb dependency is a positive step towards enhancing JSONB data handling capabilities in the project.

The code changes are approved.

Verify the integration of the jsonb dependency with other parts of the workspace to ensure smooth functionality.

src/common/function/Cargo.toml (1)

26-26: New JSONB dependency added.

The addition of the jsonb dependency is a positive step towards enhancing JSONB data handling capabilities in the project.

The code changes are approved.

Verify the integration of the jsonb dependency with other parts of the workspace to ensure smooth functionality.

Verification successful

Integration of jsonb dependency is verified.

The jsonb dependency is well-integrated across multiple modules in the project, with active usage in various source files. This confirms that the dependency is functioning as intended without any conflicts.

  • jsonb functions are used in src/sql/src/statements.rs, src/servers/src/mysql/writer.rs, src/datatypes/src/value.rs, and src/common/function/src/scalars/json/get_by_path.rs.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify integration of the `jsonb` dependency with other parts of the workspace.

# Test: Search for references to the `jsonb` dependency in the project. Expect: Proper integration without conflicts.
rg --type toml $'jsonb.workspace = true'

Length of output: 231


Script:

#!/bin/bash
# Description: Search for usage or import statements of `jsonb` in the source code files.

# Test: Look for `jsonb` references in source code files to verify its integration.
rg 'jsonb' --type rust

Length of output: 2780

src/common/function/src/scalars/json.rs (2)

15-17: Good use of modular architecture.

The separation of the get_by_path function into its own module enhances maintainability and readability.


24-27: Proper registration of JSON functions.

The implementation of JsonFunction and its method to register the GetByPathFunction is correctly done using Arc for thread safety.

src/datatypes/src/types.rs (2)

24-24: Module inclusion well integrated.

The inclusion of mod json_type; is consistent with the structure of the file and allows for proper modularization.


46-46: Export of JsonType is appropriate.

The public export of JsonType is crucial for its use across the application, facilitating the integration of the new JSON data type.

src/datatypes/src/types/json_type.rs (3)

27-33: Well-defined structure for JsonType.

The JsonType struct and its method to create a shared instance are well implemented, using Rust's memory safety features effectively.


36-51: Comprehensive implementation of data type interface.

The methods defining the JSON data type's behavior, such as name, logical_type_id, default_value, and as_arrow_type, are correctly implemented, ensuring that JSON data is handled as binary within the system.


53-62: Appropriate vector creation and casting logic.

The method for creating a mutable vector specific to binary data and the casting logic are correctly implemented, aligning with the system's requirements for handling JSON as binary.

tests/cases/standalone/common/insert/merge_mode.result (1)

Line range hint 95-107: Approved: Error message format change.

The change to use a placeholder in error messages enhances readability and user-friendliness. Ensure that this new format is consistently applied across all error messages in the application to maintain uniformity.

The change is approved.

Run the following script to verify the consistency of error message formats across the application:

src/common/function/src/function_registry.rs (1)

98-100: Approved: Registration of JsonFunction.

The addition of JsonFunction to the function registry is correctly implemented. Verify that it integrates seamlessly with existing functionalities and does not conflict with other registered functions.

The registration is approved.

Run the following script to verify the integration of JsonFunction in the system:

src/datatypes/src/types/string_type.rs (1)

84-84: Approved: New case in try_cast method.

The addition of a case to convert Value::Json to Value::String is correctly implemented. Verify that the conversion accurately preserves the data and does not introduce any errors.

The addition is approved.

Run the following script to verify the accuracy of the conversion:

Verification successful

Conversion from Value::Json to Value::String is verified.

The use of jsonb::to_string for converting Value::Json to Value::String is consistent with its application in other parts of the codebase, and no issues were found in the conversion process. The function appears to be reliable and accurate in preserving data.

  • Location: src/datatypes/src/types/string_type.rs
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the accuracy of the conversion from `Value::Json` to `Value::String`.

# Test: Check for data preservation and errors in the conversion. Expect: Accurate data preservation and no errors.
rg --type rust -A 5 $'jsonb::to_string'

Length of output: 2657

src/datatypes/src/type_id.rs (2)

72-72: Addition of Json to LogicalTypeId is appropriate.

The new Json variant in the LogicalTypeId enum is a logical extension to accommodate the handling of JSON data types. This aligns with the PR's objective to enhance JSON data handling capabilities.


131-131: Proper handling of Json in data_type method.

The addition of LogicalTypeId::Json mapping to ConcreteDataType::json_datatype() in the data_type method is correctly implemented. Ensure that the json_datatype() method itself is correctly defined and behaves as expected.

Run the following script to verify the implementation of json_datatype:

Cargo.toml (1)

127-127: Addition of jsonb dependency is appropriate.

The new dependency jsonb version 0.4 is correctly added to enhance JSONB data handling capabilities. Ensure compatibility with existing dependencies and test for any potential conflicts.

Run the following script to check for dependency conflicts:

Verification successful

Addition of jsonb dependency is appropriate and well-integrated.

The jsonb dependency is actively used across various parts of the codebase, including SQL handling, JSON path operations, and data type conversions. There are no apparent conflicts or issues with its integration.

  • Usage in src/sql/src/statements.rs for parsing JSON values.
  • Usage in src/common/function/src/scalars/json/get_by_path.rs for JSON path operations.
  • Usage in src/servers/src/postgres/types.rs and src/servers/src/mysql/writer.rs for encoding JSON data.
  • Usage in src/datatypes/src/value.rs and src/datatypes/src/types/string_type.rs for handling JSON data types.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for dependency conflicts with `jsonb`.

# Test: Search for potential conflicts. Expect: No conflicts.
cargo check

Length of output: 86


Script:

#!/bin/bash
# Search for any usages of `jsonb` in the codebase to verify its integration.
rg 'jsonb' -A 5

Length of output: 10602

src/servers/src/postgres/handler.rs (1)

147-148: Enhanced data handling in encoding process.

The modifications in the loop to include both values and column schemas, and the updated encode_value function call to pass the data type, are crucial for handling different data types, especially the new JSON type. This change likely improves the accuracy and efficiency of data encoding.

The changes are approved as they enhance the functionality and maintainability of the encoding process.

src/common/grpc/src/select.rs (1)

73-73: Proactive handling of unsupported data types in gRPC.

The addition of the Json data type to the unreachable condition in the convert_arrow_array_to_grpc_vals! macro is a good practice. It prevents potential runtime errors by ensuring that unsupported data types are not processed in gRPC calls.

The change is approved as it enhances the robustness and error handling of the system.

src/datatypes/src/vectors/eq.rs (1)

83-83: Improved equality checks for JSON and Binary data types.

The modification to treat Json and Binary data types similarly in the equal function enhances the functionality by ensuring correct comparisons. This change is crucial for operations that rely on accurate data comparisons.

The change is approved as it improves the overall versatility and correctness of vector equality checks.

src/query/src/dist_plan/merge_scan.rs (1)

160-172: Refactor of arrow_schema handling in MergeScanExec::new simplifies the code.

The changes streamline the handling of arrow_schema by using it directly instead of stripping metadata, which should reduce complexity and potential bugs. However, ensure that all functionalities relying on arrow_schema still operate correctly with these changes, especially in terms of metadata handling.

The refactor is approved but verify the complete functionality:

src/datatypes/src/schema.rs (1)

259-267: Enhancement in collect_fields to handle JSON data types.

The addition of metadata to JSON fields is a significant improvement for distinguishing JSON data from binary data. This change aligns with the project's requirements to treat JSON distinctly. Ensure that this metadata addition is consistent and does not interfere with other functionalities.

The changes are approved but verify the consistency across the codebase:

Verification successful

JSON Metadata Handling is Consistent Across the Codebase

The addition of the "is_json" metadata is consistently used to determine the JSON data type in column_schema.rs, aligning with the project's requirements to treat JSON data distinctly. No interference with other functionalities has been found.

  • src/datatypes/src/schema/column_schema.rs: Uses "is_json" to set the data type to JSON.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that JSON metadata is consistently handled across the codebase.

# Test: Search for usages of JSON fields to ensure consistent metadata handling.
rg --type rust "is_json"

Length of output: 331


Script:

#!/bin/bash
# Investigate the usage of "is_json" in column_schema.rs to understand its role.
rg --type rust -A 5 'is_json' src/datatypes/src/schema/column_schema.rs

Length of output: 356

src/servers/src/mysql/writer.rs (2)

293-293: Addition of MYSQL_TYPE_JSON to handle JSON data types in create_mysql_column.

This change is crucial for correctly mapping JSON data types within the MySQL column creation logic, ensuring that JSON data is processed correctly.

The addition is approved and aligns with the project's requirements for handling JSON data.


Line range hint 171-220: Enhancements in MysqlResultWriter::write_recordbatch to handle JSON data types.

The handling of Value::Json by serializing it to a JSON string before writing is a crucial improvement for correct JSON data processing in the MySQL context. This change ensures that JSON data is handled distinctly and correctly, which is essential for data integrity and usability.

The changes are approved but verify the complete functionality:

src/mito2/src/row_converter.rs (4)

149-150: Approve the changes but recommend verifying serialization logic.

The integration of JSON data type in the serialize function is crucial for correct data serialization. However, it's recommended to verify that the serialization logic for JSON data is consistent and does not introduce any issues.

The code changes are approved.

Run the following script to verify the serialization logic:


245-245: Approve the changes but recommend verifying skipping deserialization logic.

The integration of JSON data type in the skip_deserialize function is crucial for correct handling when skipping deserialization. However, it's recommended to verify that the skipping deserialization logic for JSON data is consistent and does not introduce any issues.

The code changes are approved.

Run the following script to verify the skipping deserialization logic:


71-71: Approve the changes but recommend verifying size estimation logic.

The integration of JSON data type in the estimated_size function is crucial for accurate size estimations. However, it's recommended to verify that the size estimation logic for JSON data is accurate and consistent with the Binary data type.

The code changes are approved.

Run the following script to verify the size estimation logic:


196-199: Approve the changes but recommend verifying deserialization logic.

The integration of JSON data type in the deserialize function is crucial for correct data deserialization. However, it's recommended to verify that the deserialization logic for JSON data is consistent and does not introduce any issues.

The code changes are approved.

Run the following script to verify the deserialization logic:

src/datatypes/src/data_type.rs (4)

86-86: Approve the addition of JSON data type but recommend verifying integration.

The addition of the Json(JsonType) variant to the ConcreteDataType enum is a significant update for supporting JSON data types. However, it's recommended to verify that this new data type integrates seamlessly with the existing data types and does not introduce any inconsistencies.

The code changes are approved.

Run the following script to verify the integration of the JSON data type:


224-226: Approve the addition of is_json method but recommend verifying implementation.

The addition of the is_json method is crucial for easily identifying JSON data types. However, it's recommended to verify that the method is implemented correctly and does not introduce any issues.

The code changes are approved.

Run the following script to verify the implementation of the is_json method:

Verification successful

The is_json method is correctly implemented and used.

The is_json method is correctly implemented using the matches! macro to check for the Json variant of ConcreteDataType. Its usage in the codebase, particularly in setting and checking metadata related to JSON data types, confirms its correct integration and functionality.

  • Location of Implementation: src/datatypes/src/data_type.rs
  • Usage Locations:
    • src/datatypes/src/schema.rs
    • src/datatypes/src/schema/column_schema.rs
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of the `is_json` method.

# Test: Search for the function usage. Expect: Correct implementation and consistent behavior.
rg --type rust -A 5 $'is_json'

Length of output: 1385


134-134: Approve the update to fmt::Display but recommend verifying formatting.

The update to the fmt::Display implementation for ConcreteDataType to include Json is crucial for correctly formatting JSON data types as strings. However, it's recommended to verify that the formatting implementation is correct and does not introduce any issues.

The code changes are approved.

Run the following script to verify the formatting implementation:


416-416: Approve the update to the macro but recommend verifying its functionality.

The update to the impl_new_concrete_type_functions! macro to include Json is a necessary change for integrating the new JSON data type. However, it's recommended to verify that the macro update does not introduce any issues and that the functions it generates are correct.

The code changes are approved.

Run the following script to verify the functionality of the updated macro:

src/servers/src/postgres/types.rs (2)

163-163: Correctly map ConcreteDataType::Json to PostgreSQL type.

The update to type_gt_to_pg function to handle ConcreteDataType::Json(_) by returning Type::JSON is appropriate and aligns with the PostgreSQL standards. This change ensures that JSON data types are recognized and handled correctly within the PostgreSQL framework.

The mapping of JSON data type to PostgreSQL type is correctly implemented.


547-549: Update parameter handling for JSON in parameters_to_scalar_values.

The modification to handle ConcreteDataType::Json(_) in the parameters_to_scalar_values function is crucial for ensuring that JSON data is treated correctly as binary data within the system. This change aligns with the overall strategy of treating JSON data as binary for internal processing while allowing for specific handling when necessary.

The handling of JSON data in parameter conversion is correctly implemented.

src/sql/src/statements.rs (3)

322-322: Ensure proper handling of unary operations on JSON values in sql_value_to_value.

The code correctly identifies that unary operations on JSON values should not be allowed, which is enforced by returning an error. This is a good practice as it prevents unintended operations on JSON data, which could lead to data corruption or unexpected behavior.

The implementation for handling unary operations on JSON values is correctly enforced.


583-583: Review the handling of SqlDataType::JSON in sql_data_type_to_concrete_data_type.

The function now correctly maps SqlDataType::JSON to ConcreteDataType::json_datatype(). This is a crucial update for supporting JSON data types in the system. Ensure that all parts of the system that interact with SQL data types are aware of this new mapping to prevent type mismatches.

The mapping of SqlDataType::JSON to ConcreteDataType::json_datatype() is correctly implemented.


620-620: Ensure consistency in data type conversions for JSON in concrete_data_type_to_sql_data_type.

The function now correctly maps ConcreteDataType::Json(_) to SqlDataType::JSON. This change is essential for maintaining consistency in the type system across different layers of the database. It's important to ensure that all conversions between concrete and SQL data types are covered and tested thoroughly to prevent any type mismatches during runtime.

The mapping of ConcreteDataType::Json(_) to SqlDataType::JSON is correctly implemented.

src/api/src/helper.rs (2)

239-239: Verify JSON to Binary Data Type Mapping

The mapping of ConcreteDataType::Json(_) to ColumnDataType::Binary is crucial for the system's handling of JSON data. Ensure that this mapping does not lead to unexpected behavior in other parts of the system.


938-940: Approve JSON Handling in to_proto_value

The conversion of Value::Json(v) to a binary format is aligned with the PR's objectives. However, verify that this conversion does not affect data integrity or lead to misinterpretations.

src/datatypes/src/value.rs (9)

81-81: Addition of JSON data type to Value enum.

The addition of the Json(Bytes) variant to the Value enum allows the system to handle JSON data types. This is a crucial update for supporting JSON functionalities in the database.

The code changes are approved.


161-161: Updated define_data_type_func! macro for Json.

Including the Json variant in the define_data_type_func! macro ensures that the data type function correctly identifies JSON data types. This is a necessary update for the system to recognize and process JSON data appropriately.

The code changes are approved.


212-212: Addition of Json variant to ValueRef.

The inclusion of the Json(&'a [u8]) variant in the ValueRef enum allows for referencing JSON data efficiently. This change is aligned with the addition of the Json type in the Value enum and is essential for operations that require data referencing.

The code changes are approved.


320-320: Logical type ID handling for Json.

The update to include Json in the logical_type_id method ensures that the system can correctly identify the logical type of JSON data. This is crucial for the correct processing and handling of JSON data within the system.

The code changes are approved.


476-476: Updated to_null_scalar_value function for Json.

The function now correctly handles Json data types by returning ScalarValue::Binary(None) when needed. This update is crucial for ensuring that null handling for JSON types is consistent with other binary data types.

The code changes are approved.


1025-1025: Addition of Json variant to ValueRef enum.

The inclusion of Json(&'a [u8]) in the ValueRef enum is consistent with the handling of other binary data types and is necessary for referencing JSON data efficiently.

The code changes are approved.


1057-1057: Handling of Json in as_binary method for ValueRef.

The method now correctly allows Json data to be accessed as a binary slice, which is crucial for operations that need to process or inspect the raw JSON data.

The code changes are approved.


768-768: Conversion from Value to serde_json::Value for Json.

The implementation uses jsonb::from_slice to convert JSON bytes into a serde_json::Value. This is an essential feature for integrating JSON data with systems that utilize serde_json. Ensure that error handling is robust in this conversion process.


124-124: Display implementation for Json variant.

The implementation uses jsonb::to_string to convert the JSON bytes into a string format. This is appropriate for ensuring that JSON data is readable when printed or logged. However, ensure that jsonb::to_string handles errors gracefully or consider handling potential errors in this context.

src/common/function/src/scalars/json/get_by_path.rs Outdated Show resolved Hide resolved
src/datatypes/src/schema/column_schema.rs Outdated Show resolved Hide resolved
src/sql/src/statements.rs Outdated Show resolved Hide resolved
src/api/src/helper.rs Show resolved Hide resolved
src/api/src/helper.rs Show resolved Hide resolved
src/api/src/helper.rs Outdated Show resolved Hide resolved
@killme2008
Copy link
Contributor

A great work!
I don't like the name get_by_path, it's too long. What about jq? just like the command line tool https://jqlang.github.io/jq/

And i found that databend is using this name too.

@discord9
Copy link
Contributor

discord9 commented Sep 5, 2024

Rest LGTM

Copy link
Member

@WenyXu WenyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpicking

src/datatypes/src/vectors/binary.rs Outdated Show resolved Hide resolved
src/servers/src/grpc/flight/stream.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot added docs-required This change requires docs update. docs-not-required This change does not impact docs. and removed docs-not-required This change does not impact docs. docs-required This change requires docs update. labels Sep 5, 2024
@WenyXu
Copy link
Member

WenyXu commented Sep 9, 2024

Hi @CookiePieWw, there are some conflicts

Cargo.toml Outdated Show resolved Hide resolved
Cargo.lock Outdated Show resolved Hide resolved
@WenyXu WenyXu enabled auto-merge September 9, 2024 08:17
@WenyXu WenyXu added this pull request to the merge queue Sep 9, 2024
Merged via the queue into GreptimeTeam:main with commit 04e7dd6 Sep 9, 2024
32 checks passed
@paomian
Copy link
Contributor

paomian commented Sep 10, 2024

Nice job!

CookiePieWw added a commit to CookiePieWw/greptimedb that referenced this pull request Sep 17, 2024
* feat: add json type and vector

* fix: allow to create and insert json data

* feat: udf to query json as string

* refactor: remove JsonbValue and JsonVector

* feat: show json value as strings

* chore: make ci happy

* test: adunit test and sqlness test

* refactor: use binary as grpc value of json

* fix: use non-preserve-order jsonb

* test: revert changed test

* refactor: change udf get_by_path to jq

* chore: make ci happy

* fix: distinguish binary and json in proto

* chore: delete udf for future pr

* refactor: remove Value(Json)

* chore: follow review comments

* test: some tests and checks

* test: fix unit tests

* chore: follow review comments

* chore: corresponding changes to proto

* fix: change grpc and pgsql server behavior alongside with sqlness/crud tests

* chore: follow review comments

* feat: udf of conversions between json and strings, used for grpc server

* refactor: rename to_string to json_to_string

* test: add more sqlness test for json

* chore: thanks for review :)

* Apply suggestions from code review

---------

Co-authored-by: Weny Xu <[email protected]>
@CookiePieWw CookiePieWw deleted the json branch September 26, 2024 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants