Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[12.x] prefer "datetime" types over "timestamp" types #54256

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

browner12
Copy link
Contributor

@browner12 browner12 commented Jan 18, 2025

I know this has been brought up before, but I'm going to make the case again why datetimes are the superior data type compared to timestamps, and why Laravel should make these their recommended and default types for v12 and beyond.

Premises

  • All dates should be stored as UTC values (although we will not hinder people who choose to do otherwise)
  • Any specifics given in this PR are for MySQL

What this PR does NOT do

  • force Laravel to auto convert values to a given timezone. the Laravel team has made their opinions on automatic conversion known. it is on the user to submit values to the database in their desired timezone.

Parity between datetime and timestamp

Storage requirements

timestamp requires 4 bytes for storage. datetime requires 5 bytes for storage. both allow an additional 3 bytes for precision.

One of the proposals for solving the 2038 problem for timestamp is to increase it to a 64 bit integer, which would increase its storage requirements to 8 bytes.

https://dev.mysql.com/doc/refman/8.4/en/storage-requirements.html#data-types-storage-reqs-date-time

Performance

There has been some confusion in other related PRs, Issues, and Discussions about how the performance of datetime would be worse than timestamp because it stores the date as a string, and string comparison is slower than integer comparison.

datetime is actually stored internally in a fixed length binary format which allows comparisons to be just as efficient as integer comparison.

For testing, I created a table with the following migration:

Schema::create('tests', function (Blueprint $table) {
    $table->id();
    $table->timestamp('timestamp');
    $table->dateTime('datetime');
    $table->timestamps();
});

I filled the table with 100,000 rows with a random date stored in both the "timestamp" and "datetime" fields. I ran the following queries and had results consistently within 1ms of each other.

SELECT * FROM `tests` WHERE `timestamp` <  '2025-01-01';
SELECT * FROM `tests` WHERE `datetime` <  '2025-01-01';

Allow using "CURRENT_TIMESTAMP"

Both data types allow using the "CURRENT_TIMESTAMP" for both an initial value and an "on update" value.

datetime benefits

Solves the 2038 issue

timestamp fields store their value internally as a signed 32 bit integer, which means any dates after 2038/01/19 are not valid for timestamps. this is not as big of an issue right now, since most stored dates are in the past, but could potentially be a huge problem when we reach that date. it does affect current use, too, when you may be storing a future date, like an expiration.

datetime fields have a minimum value of 1000-01-01 and a maximum value of 9999-12-31, giving us a much wider valid date range, and eliminating the 2038 problem

Ignorant of Server/SQL timezone

Lastly, what may be the most important of all the benefits of datetime, it is completely ignorant of the timezone set on either the server or SQL, while timestamp is not.

When a date is entered into a timestamp it will first attempt to convert it to UTC for internal storage. This is dependent on a couple of factors. SQL could have its own explicitly set timezone. More likely, it will be set to "SYSTEM" which means it defers to the timezone set on the OS. Either way, issues arise when SQL deems its timezone to be something other than UTC. Let's say for example, SQL's timezone is set to CST(-6). When it receives a value for a timestamp field, it will interpret the value it receives as a CST value, and convert it to UTC for internal storage, and then also convert it back to CST when the value is retrieved. Now, whether you actually intended to give it a CST value is irrelevant, because all you really care about is that the value you gave it is EXACTLY what you got back.

As long as that SQL timezone value stays the same, you're actually kind of ok, even if things don't technically match up. However, things can go very poorly if the SQL timezone changes.

Imagine again we have our server with the timezone set to CST. We insert a row with a CST value, and SQL converts the timestamp field to UTC internally. Now someone comes along and sees that the server is set to CST, but should probably be UTC because that's pretty standard for servers. Unfortunately that simple change would mess up all of our data. Now when that row is retrieved from the database, SQL sees the server is in UTC, so it just gives the internal value it stored back to us, even though thats not correct and should have been converted.

This means the value we put into the database is NOT the value we got out! Some might argue that's intentional, but I would say for the large majority of people any timezone other than UTC on the server is pure happenstance or oversight, and not actually what they intended.

If we switch to datetime fields, SQL ignores any server or SQL timezone settings and simply stores the value you give it, and returns exactly the same value when you request it. By making ourselves ignorant of any server settings, we actually protect ourselves from any unintentional errors like mentioned above.

For some real numbers, assume we started with a server in CST, the table will show how timestamp and datetime differ.

Data Type Submitted Value Internal Value Returned Value with Server CST Returned Value with Server UTC
timestamp 2020-02-12 12:00:00 2022-02-12 18:00:00 2020-02-12 12:00:00 2022-02-12 18:00:00
datetime 2020-02-12 12:00:00 2020-02-12 12:00:00 2020-02-12 12:00:00 2020-02-12 12:00:00

Questionable Changes

One thing I did not change was the softDeletes() method. I think ideally it would change to using datetimes internally, and then a new softDeletesTimestamp() method would be created for that specific use. However, I'm not sure how that would affect existing usage of softDeletes() that were executed when it used timestamps.

"datetimes" are the better default choice for date related columns, and should be the recommended way from Laravel going forward

- address 2038 issue
- only 1 extra bye
- internal binary storage for equal performance
- ignorant of server/SQL timezone
@browner12 browner12 marked this pull request as draft January 18, 2025 23:45
@browner12 browner12 marked this pull request as ready for review January 19, 2025 00:35
@Rizky92
Copy link

Rizky92 commented Jan 19, 2025

Here's my two cents. While timestamp had 2038 problem, one of the alternative was to store it as UNSIGNED BIGINT. PHP itself had already support for 64 bit timestamp. Internally, Laravel had already use UNSIGNED INT for some of its tables, which can be changed to BIGINT without breaking change.

image

I'm not sure about performance indication, although I believe it should be minimal on both sides.

The only drawback was storing as UNSIGNED BIGINT may have confuse users because it has less meaning, and using the value as timestampp in other languages that may not have 64 bit support yet would make it unreadable.

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

@ziming
Copy link
Contributor

ziming commented Jan 19, 2025

I personally feel it is better to wait till closer to 2038 and see what is the consensus is for this topic for timestamps. Maybe by then there is a better solution or non issue

@browner12
Copy link
Contributor Author

@Rizky92 storing as an BIGINT is a poor solution because then we lose readability in DB guis, and it will increase storage costs to 8 bytes.

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

I don't understand this point. can you elaborate?

@ziming we have a data type literally called "datetime" that was built to handle dates and times. if we start enforcing this good standard now, the 2038 problem literally goes away. regardless of the 2038 aspect, using datetime also takes a foot gun away from people with regards to timezones. now is the time for this better solution.

@Rizky92
Copy link

Rizky92 commented Jan 19, 2025

@Rizky92 storing as an BIGINT is a poor solution because then we lose readability in DB guis, and it will increase storage costs to 8 bytes.

I understand that. That's why it's a one alternative over many that had the potential to avoid breaking change. :)

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

I don't understand this point. can you elaborate?

My bad. I shouldn't have add that. I was thinking whether it's relevant to the scope of this PR. Basically using datetime loses timezone information over timestamp have, because timestamp internally uses dataabse server timezone to offset the datetime information. Changing the columns to datetime means new apps must explicitly define which timezone it is live on.

@taylorotwell
Copy link
Member

Hey @browner12 - thanks for this PR. Are there any breaking changes for existing applications?

@antonkomarev
Copy link
Contributor

antonkomarev commented Jan 19, 2025

In our MySQL application we stopped using TIMESTAMP data type because it may differ of server settings. This may lead to big date issues.

@browner12
Copy link
Contributor Author

@taylorotwell shoot, I knew there was something in my original post I forgot that I wanted to add.

As far as I can tell, there would be no breaking changes because this PR only affects stubs, which would only affect migrations going forward.

I've also setup a small application with 2 models, with one using $table->timestamps() and one using $table->datetimes(). They seem to work just fine along side each other, as Laravel's handling of the casts does the heavy lifting.

As stated, the one questionable change we could make is to have $table->softDeletes() switch to using a datetime, and then creating a dedicated $table->softDeletesTimestamp() for users who still wanted to use the old way. My concern about not doing this is people will still just use $table->softDeletes() because they won't be any the wiser of the underlying behavior. I don't think making the change would have an affect on existing applications because the migrations wouldn't re-execute. However, you could have scenarios where someones local environment was different than production if they often run artisan migrate:fresh --seed or something similar. locally, they would have datetime deleted_at fields, and on production they would have timestamp deleted_at fields. IF the production database is only using UTC anyway, this doesn't really make a difference.

Would love some others thoughts on this.


I've also done some testing locally about updating the column definition of a timestamp field to a datetime, and it's actually pretty straightforward. For example:

ALTER TABLE `test` CHANGE `updated_at` `updated_at` datetime NULL;

Basically what seems to happen is the value you see remains unchanged. It just loses it's awareness of the server/sql timezone setting. Again, if you were doing everything in UTC anyway, there is no impact.

@browner12 browner12 changed the title prefer "datetime" types over "timestamp" types [12.x] prefer "datetime" types over "timestamp" types Jan 19, 2025
@donnysim
Copy link
Contributor

There's also a "timezone" option you can specify on the connection in config (at least for mysql, that is not present by default) to switch the timezone for how the timestamps are retrieved independent of the server. Overall I'd say this change is necessary not only because of timestamp issues, but also to make more dev aware that majority of datepickers return based on local user timezone if not set to ISO format (and even that, js ISO is not really compatible with the PHP ISO validation), not the servers or the projects and it should be handled no matter if the project is single country targeted etc. as country does not mean you have to live there to use it. There's also not a lot of content around this to make more developers aware of it and I often encounter dangerous project changes to dates without the knowledge why it's made as it is from juniors.

@kminek
Copy link
Contributor

kminek commented Jan 21, 2025

There's also a "timezone" option you can specify on the connection in config (at least for mysql, that is not present by default) to switch the timezone for how the timestamps are retrieved independent of the server. Overall I'd say this change is necessary not only because of timestamp issues, but also to make more dev aware that majority of datepickers return based on local user timezone if not set to ISO format (and even that, js ISO is not really compatible with the PHP ISO validation), not the servers or the projects and it should be handled no matter if the project is single country targeted etc. as country does not mean you have to live there to use it. There's also not a lot of content around this to make more developers aware of it and I often encounter dangerous project changes to dates without the knowledge why it's made as it is from juniors.

maybe this timezone option on mysql connection should be present by default in app skeleton and set to application timezone from config.app - just my two cents

@donnysim
Copy link
Contributor

donnysim commented Jan 21, 2025

@kminek it's not as simple as just adding it. To set a time zone, the mysql.time_zone_name table must contain it or it will throw an exception, and on windows it's empty by default so you have to go download and import it. It also adds additional sql call on init. And in all cases you must ensure that it contains your specified time zone or it will result in your site being unavailable because of that one added sql call that will fail if it's not.

@browner12
Copy link
Contributor Author

yah, while related, the database config timezone option is out of scope of this PR. and actually, switching to datetime types makes it all moot anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants