Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB specific length support in JSONs/Varchars/binary data #3107

Merged
merged 15 commits into from
Jun 16, 2023

Conversation

rPraml
Copy link
Contributor

@rPraml rPraml commented Jun 14, 2023

Rob Notes:

For myself, I understand this as an issue with DDL generation for varchar(x) where x exceeds a db platform maximum. We desire varchar(x) to actually work across multiple database platforms and it currently does not because it does not take into account the platform specific maximum limits on varchar.

This issue also happens to impact @dbjson(length=x) for any db that uses varchar to store json (e.g. SqlServer) but note that this is really a general issue with cross platform DDL generation for varchar(x) [ and similarly varbinary and DbJson ].

Robs Summary of the Fix:

This is effectively a DDL Generation change only

Types have a maximum length and fallback type. When generating DDL, based on the length the fallback type is used. Best example is MySql with varchar, text, mediumtext, longtext types that are chained via fallback type.


DDL changes - Important Notes

This changes the DDL generated:
E.g. you have @Size(length = 16000) String myText in your code.

  • This has failed on SqlServer/Hana/Oracle, and will work now
  • For db2/mysql/mariadb/NuoDb the max varchar limit is set to 4000 now and ebean uses clob instead of varchar(16000) although the platform will support such large varchars (as long as you do not hit the row-size limit)
  • If you really need a varchar(16000) here you have to explicitly override that with @column(columnDefinition = "mariadb,mysql;varchar(16000);")
  • The varbinary limit for db2/mysql/mariadb is conservatively set to 8000 bytes and for NuoDb to 4000 bytes
  • The limit of Oracle/Postgres/SqlServer/SqlAnywhere platform is set to the max supported limit mentioned in documentation
  • Other platforms (Sqlite etc) are not changed
  • SqlServer platform uses varbinary(max) instead the deprecated image type now
  • Fixes for Clickhouse platform, so the DbMigration test will at least start

Problem
For platforms, that have no json database, ebean uses the clob type for @DbJson without length.
If length is specified (@DbJson(length=123)), varchar is used.

The following table will show the resulting datatype for @DbJson(...) List<String> someProp

platform DbJson DbJson(length=200) DbJson(length=4096) DbJson(len=32768) Remarks
Clickhouse String String String String All OK
Cockroach json json json json All OK
DB2 clob varchar(200) varchar(4096)⚠ varchar(32768)❌ Fails (1)
H2 clob varchar(200) varchar(4096) varchar(32768) All OK
HANA clob varchar(200) varchar(4096) varchar(32768)❌ Fails (2)
HsqlDb clob varchar(200) varchar(4096) varchar(32768) Not verified
MySql json json json json All OK
NuoDb clob varchar(200) varchar(4096) varchar(32768) Not verified
Oracle clob varchar2(200) varchar2(4096) varchar2(32768)❌ Fails (3)
Postgres json json json json All OK (4)
SqlAnywhere long varchar varchar(200) varchar(4096) varchar(32768) Not verified
SqlServer nvarchar(max) nvarchar(200) nvarchar(4096)❌ nvarchar(32768)❌ Fails (5)
  1. DB2 max varchar length is 32739. Depending on the chosen page size (4,8,16,32K) and amount of columns, the limit may be even smaller
  2. According to documentation, HANA supports up to 5000 chars
  3. According to documentation, max varchar length is 5000
  4. Postgres supports up to varchar(10M)
  5. SqlServer supports up to 4000 chars in nvarchar (and 8000 in non-utf8-aware varchar)

The same issue happens also for varchars and also for varbinary (e.g. @Size(...) String someProp).
So, we take a look at the platforms that have json in the table above. (for platform without json support, varchar handling is similar to the table above)

platform no @Size @Size(max=200) @Size(max=4096) @Size(max=32768) Remarks
Cockroach varchar(255) varchar(200) varchar(4096) varchar(32768) All OK (1)
MySql/MariaDb varchar(255) varchar(200) varchar(4096) ⚠ varchar(32768) ❌ Fails (2)
Postgres varchar(255) varchar(200) varchar(4096) varchar(32768) All OK (3)
  1. Cockroach supports up to varchar(8M)
  2. in MySql/MariaDb a row can take up to 64K bytes. One char in a varchar takes up 3 bytes (utf8) or 4 bytes (utf8_mb4), so limit is varchar(21844) respectively 16383.
  3. Postgres supports up to varchar(10M)

Fix

When a varchar/varbinary definition is found which is not supported at all by the platform (e.g. sqlserver nvarchar(4096))
or is a bad choice (e.g. mysql/db2 varchar(4096)) because you 'waste' the row space, use the clob (respectively blob) type of that platform.

IMPORTANT NOTE: This might break/change things:
E.g. you have @Size(length = 16000) String myText in your code.

  • This has failed on SqlServer/Hana/Oracle, and will work now
  • For db2/mysql/mariadb/NuoDb the max varchar limit is set to 4000 now and ebean uses clob instead of varchar(16000) although the platform will support such large varchars (as long as you do not hit the row-size limit)
    If you really need a varchar(16000) here you have to explicitly override that with @Column(columnDefinition = "mariadb,mysql;varchar(16000);")
  • The varbinary limit for db2/mysql/mariadb is conservatively set to 8000 bytes and for NuoDb to 4000 bytes
  • The limit of Oracle/Postgres/SqlServer/SqlAnywhere platform is set to the max supported limit mentioned in documentation
  • Other platforms (Sqlite etc) are not changed

Other changes

  • SqlServer platform uses varbinary(max) instead the deprecated image type now
  • Fixes for Clickhouse platform, so the DbMigration test will at least start

Copy link
Contributor Author

@rPraml rPraml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob-bygrave I tried to put all changes in separate commits. What do you think about these changes?

(This is related to #3103 and required to implement #3102 - so it would be great, if I can get feedback here soon, Thanks)

// see if there is a precision/scale to add (or not)
int len = deployLength != 0 ? deployLength : defaultLength;
if (len == Integer.MAX_VALUE) {
sb.append("(max)"); // TODO: this is sqlserver specific
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the sqlserver workaround now

@@ -15,6 +15,7 @@ class ClickHouseDbArray {
mapping.put("integer[]", "Array(UInt32)");
mapping.put("bigint[]", "Array(UInt64)");
mapping.put("float[]", "Array(Float32)");
mapping.put("float4[]", "Array(Float32)");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some issues for Clickhouse. They are in commit 81443b7

// this is a platform specific definition. So do not apply further conversions
return tmp[index];
}
type = tmp[index];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This detects, if there is a platform specific definition, e.g. @Column(columnDefinition = "mariadb,mysql;varchar(16000);") This definition is taken "as is" and not converted

@@ -45,24 +45,6 @@ create table migtest_fk_set_null (
one_id UInt64
) ENGINE = Log();

create table drop_main (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge #3108 first

@rob-bygrave
Copy link
Contributor

I don't know, I don't understand the problem.

By default (if no length specified), ebean uses the clob type to store the JSON in the database.

This isn't really correct. Most databases have a JSON database type and by default that is the type that is used. So I don't know why the first statement is: By default (if no length specified), ebean uses the clob type ?

So you loose me right at the start. You are going to have to restate the issue as you understand it.

@rPraml
Copy link
Contributor Author

rPraml commented Jun 16, 2023

I don't know, I don't understand the problem.

:)

@rob-bygrave I've updated the initial documentation, so that it is more clear, what this PR changes.

@rbygrave
Copy link
Member

Sorry, still not clear what the actual problem is. Can you state what the actual problem is?

@rPraml
Copy link
Contributor Author

rPraml commented Jun 16, 2023

Hello @rbygrave,

currently a @DbJson(length = 5000) annotation will work on MariaDb, as it supports json and on H2, because there is no varchar limit. It will not work on SqlServer, because varchars cannot be longer than 4000 chars.

Yes I can work around this by a custom @Column(columnDefinition = "sqlserver;varchar(max)")
As we want to develop platform independent (and I think this is a goal from Ebean) I do not want to care about platform specific things.

(Rob, I'm sure everything is clear up to here)

So I need a working @DbJson(length = xxx) support for all platforms and all (positive) int-values (currently values <= 4000 it will work on all platforms).

Why I need a working DbJson annotation: (I think I'm losing you here)

I plan to restrict the JSON that is written to the DB.

So if you have @DbJson(length = 8192), you should get an exception, if the serialized JSON is bigger than 8192 bytes.
You'll get this on some platforms (e.g. H2 uses varchar(8192) as storage) but not on others (Mysql uses json and SqlServer nvarchar(max))
For these platforms, the json is not limited and due a programming error you can write up to 1/2GB in that lob.

This happens on our production application. We do not have any size limits on the json objects.
Just a @DbJson List<String> myProperty
Due a misconfiguration some code wrote very long strings in that list, which can not be deserialized anymore, because of a limit in jackson 2.15. See #3103 for more background info.

It would be great, if the misconfiguration would be noticed by a failing save (because the JSON field is too largen) instead of a failing find (and you have invalid data in the database)

Why I did this also for Varchar: Because I find it useful (If that's the issue that bothers you, tell me)
(I know, I can work around this with @Lob)

What are my next steps

  • I need a working @DbJson(length...) support (this PR)- so the generated DDL should work
  • I'll implement a validation in ebean, where I check the Json length BEFORE it stored in the DB (separate PR, a draft is here WIP: Bind validation #3102, but I will rewrite this)
  • I'll restrict the json-lengths in our application, so I detect programming errors and misconfigurations earlier.

If still is something unclear, please ask

Roland

@rbygrave
Copy link
Member

rbygrave commented Jun 16, 2023

Problem

@DbJson(length = 5000)currently does not work in SQLServer.

The reason is because SQLServer has a maximum length of 4000 for varchar and nvarchar columns.

Proposed Fix

For SQLServer DDL generation, instead of generating nvarchar(5000) generate nvarchar(max).

Is that the correct interpretation?

Edit: Ok, I think I understand it. For myself I think I understand as a varchar(x) issue where x exceeds a db platform maximum. We desire varchar(x) to actually work across multiple database platforms and it currently does not because it does not take into account the platform specific maximum limits on varcher.

This issue also happens to impact @DbJson(length=x) for any db that uses varchar to store json (e.g. SqlServer) but note that this is really a general issue with cross platform DDL generation for varchar(x).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants