Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepDiff Fails to Detect Timezone Changes in Arrays with ignore_order=True #516

Open
3schwartz opened this issue Jan 9, 2025 · 0 comments

Comments

@3schwartz
Copy link

3schwartz commented Jan 9, 2025

Describe the bug

The deepdiff library is unable to detect timezone changes in datetimes within arrays.

When ignore_order=True and a datetime is present within an array, deepdiff uses an execution path that makes a DeepHash. Due to the implementation of datetime_normalize, all datetimes have their timezone set to UTC:

obj = obj.replace(tzinfo=datetime.timezone.utc)

This behavior causes issues in our use case, as we rely on deepdiff to detect changes in datetime objects, including changes to their timezones.

To Reproduce

Here’s a minimal example demonstrating the issue. The data structure includes datetimes within an array. When the timezone of the datetimes is changed, deepdiff` does not recognize this change:

class Simple:
    def __init__(self, date: datetime):
        self.dates: List[datetime] = [date]

    @staticmethod
    def construct(add_timezone: bool = False) -> "Simple":
        if add_timezone:
            date = datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)
        else:
            date = datetime(2020, 8, 31, 13, 14, 1)
        return Simple(date=date)

def test_simple():
    old = Simple.construct()
    new = Simple.construct(add_timezone=True)

    diff = DeepDiff(old, new, ignore_order=True)

    assert bool(diff) == True # <—— This assertion fails

def test_simple_array():
    old = [datetime(2020, 8, 31, 13, 14, 1)]
    new = [datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)]

    diff = DeepDiff(old, new, ignore_order=True)

    assert bool(diff) == True # <—— This assertion fails

Execution path

  • diff.py (725): _diff_iterable -> _diff_iterable_with_deephash
  • diff.py (1269): _diff_iterable_with_deephash -> _create_hashtable
  • diff.py (1085): _create_hashtable -> DeepHash
  • diff.py (1085): _create_hashtable -> DeepHash
  • deephash.py (216): DeepHash -> (DeepHash)self._hash
  • deephash.py (535): (DeepHash)self._hash -> (DeepHash)self._prep_datetime
  • deephash.py (478): (DeepHash)self._prep_datetime -> datetime_normalize
  • helper.py (627): datetime_normalize

Expected behavior

I expect deepdiff to detect changes to datetime objects, including differences in their timezones, and report the data structures as different.

Is there a specific reason why datetime_normalize replaces the timezone with UTC? Would it be reasonable to modify or remove this behavior?

Alternatively, could the logic be adjusted to conditionally apply the timezone replacement only when truncate_datetime is explicitly set? For example, the following indentation could be introduced:

def datetime_normalize(truncate_datetime, obj):
    if truncate_datetime:
        if truncate_datetime == 'second':
            obj = obj.replace(microsecond=0)
        elif truncate_datetime == 'minute':
            obj = obj.replace(second=0, microsecond=0)
        elif truncate_datetime == 'hour':
            obj = obj.replace(minute=0, second=0, microsecond=0)
        elif truncate_datetime == 'day':
            obj = obj.replace(hour=0, minute=0, second=0, microsecond=0)

        if isinstance(obj, datetime.datetime):
            obj = obj.replace(tzinfo=datetime.timezone.utc)
        elif isinstance(obj, datetime.time):
            obj = time_to_seconds(obj)
    return obj

This approach ensures that timezone replacement occurs only when truncate_datetime is explicitly set.

A solution with the proposed approach mentioned above has been submitted in PR #517.

OS, DeepDiff version and Python version (please complete the following information):

  • Python Version 3.10
  • DeepDiff Version 8.0.1 (Note: logic is present on main branch)
@3schwartz 3schwartz changed the title DeepDiff Fails to Detect Timezone Changes with ignore_order=True DeepDiff Fails to Detect Timezone Changes in Arrays with ignore_order=True Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant