You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm getting an error when initializing the "fmow" dataset. I got the following error for the conversion of the timestamp to datetime with Pandas:
ValueError: time data "2011-02-07T02:48:56.643Z" doesn't match format "%Y-%m-%dT%H:%M:%S%z", at position 92. You might want to try:
- passing format if your strings have a consistent format;
- passing format='ISO8601' if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing format='mixed', and the format will be inferred for each element individually. You might want to use dayfirst alongside this.
I noticed I was using Pandas 2.0.0 (presumably the most recent version) and when I reverted to Pandas 1.5.3, the issue seemed to go away. I'm guessing the datetime formatting was changed in version 2 and it might be good to update WILDS to still work with the new version. Thanks!
The text was updated successfully, but these errors were encountered:
After investigating the issue, I believe I've identified the cause and how it can be avoided.
It turns out that most timestamps in the FMoW dataset, which contains over 500,000 elements, follow the format 2013-10-05T02:27:17Z. However, fewer than 2,700 timestamps include higher precision, such as 2011-02-07T02:48:56.643Z (note the three additional digits after the decimal point).
In versions of pandas prior to 2.0.0, this discrepancy wasn't a problem because pandas.to_datetime inferred the format for each element individually. In the latest versions, however, pandas essentially infers the format once at the start and expects all subsequent entries to adhere to that format. To resolve this, we can explicitly specify the (flexible) format ISO8601.
Here's an example:
import pandas as pd
# Two dates, one of them with additional precision.
dates = ["2013-10-05T02:27:17Z", "2011-02-07T02:48:56.643Z"]
# Individually, each of the two dates can be loaded.
print(pd.to_datetime(dates[0])) # Prints: "2013-10-05 02:27:17+00:00"
print(pd.to_datetime(dates[1])) # Prints: "2011-02-07 02:48:56.643000+00:00"
# Loading both elements at once causes the problem specified above.
print(pd.to_datetime(dates)) # Raises ValueError: "time data "2011-02-07T02:48:56.643Z" doesn't match format "%Y-%m-%dT%H:%M:%S%z", at position 1"
# If we specify the format "ISO8601" as proposed by the error message, pandas is able to handle the deviation in precision.
print(pd.to_datetime(dates, format="ISO8601")) # Prints: DatetimeIndex(['2013-10-05 02:27:17+00:00', '2011-02-07 02:48:56.643000+00:00'], dtype='datetime64[ns, UTC]', freq=None)
I'm getting an error when initializing the "fmow" dataset. I got the following error for the conversion of the timestamp to datetime with Pandas:
ValueError: time data "2011-02-07T02:48:56.643Z" doesn't match format "%Y-%m-%dT%H:%M:%S%z", at position 92. You might want to try:
- passing
format
if your strings have a consistent format;- passing
format='ISO8601'
if your strings are all ISO8601 but not necessarily in exactly the same format;- passing
format='mixed'
, and the format will be inferred for each element individually. You might want to usedayfirst
alongside this.I noticed I was using Pandas 2.0.0 (presumably the most recent version) and when I reverted to Pandas 1.5.3, the issue seemed to go away. I'm guessing the datetime formatting was changed in version 2 and it might be good to update WILDS to still work with the new version. Thanks!
The text was updated successfully, but these errors were encountered: