-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timedelta, timedelta64 and datetime64 plus respective conversions #509
base: main
Are you sure you want to change the base?
Conversation
@cjdoris what do you think about this PR? |
@cjdoris still hoping to get this integrated ... |
Just chiming in with some comments: 1: Could you add some tests for this? This adds a lot of new features so I think should have testing to cover everything. Ideally the test coverage of the diff should be 100%. It should also cover your intended usecase with
as this seems subtle. 2: Is the 3-arg version of the |
src/Convert/numpy.jl
Outdated
year::Int=_year, month::Int=_month, day::Int=_day, hour::Int=_hour, minute::Int=_minute, second::Int=_second, | ||
millisecond::Int=_millisecond, microsecond::Int=_microsecond, nanosecond::Int=_nanosecond | ||
) | ||
pyimport("numpy").datetime64("$(DateTime(year, month, day, hour, minute, second))") + pytimedelta64(;millisecond, microsecond, nanosecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pyimport("numpy")
the correct API call, or is that just to be used in user packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw similar calls at different places in the package, so I took this approach. But I also wouldn't know how to code a timedelta64 without calling pyimport.
Please let me know if there's a better solution.
src/Convert/numpy.jl
Outdated
T = types[findfirst(==(unit), units)] | ||
pyconvert_return(CompoundPeriod(T(value * count)) |> canonicalize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper way to do this would be to use Base.Cartesian.@nif
. That way you could write this code to avoid dynamic dispatch on types
(which will be very slow).
src/Convert/numpy.jl
Outdated
units = ("Y", "M", "W", "D", "h", "m", "s", "ms", "us", "ns") | ||
types = (Year, Month, Week, Day, Hour, Minute, Second, Millisecond, Microsecond, Nanosecond) | ||
T = types[findfirst(==(unit), units)] | ||
pyconvert_return(DateTime(_base_datetime) + T(value * count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to other comment – you should write this using Base.Cartesian.@nif
over the types
tuple to avoid dynamic dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, tested with a julia function calls and found that the julia part is around 25ns, whereas the python call is around 1.5microsecond
src/Convert/numpy.jl
Outdated
_year::Int=0, _month::Int=0, _day::Int=0, _hour::Int=0, _minute::Int=0, _second::Int=0, _millisecond::Int=0, _microsecond::Int=0, _nanosecond::Int=0, _week::Int=0; | ||
year::Int=_year, month::Int=_month, day::Int=_day, hour::Int=_hour, minute::Int=_minute, second::Int=_second, microsecond::Int=_microsecond, millisecond::Int=_millisecond, nanosecond::Int=_nanosecond, week::Int=_week) | ||
pytimedelta64(sum(( | ||
Year(year), Month(month), # you cannot mix year or month with any of the below units in python, the error will be thrown by `pytimedelta64(::CompoundPeriod)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment
you cannot mix year or month with any of the below units in python, the error will be thrown by
pytimedelta64(::CompoundPeriod)
Should be presented to the user as a descriptive error message rather than a comment in the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment isn't clear enough.
Python throws a well understandable descriptive error in case of wrong usage, so no need for us to do so. Agree?
|
||
function pyconvert_rule_timedelta64(::Type{CompoundPeriod}, x::Py) | ||
unit, count = pyconvert(Tuple, pyimport("numpy").datetime_data(x)) | ||
value = reinterpret(Int64, pyconvert(Vector, x))[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is reinterpret
safe here? Is there a better alternative to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought, pyconvert creates a new Julia Vector which is not mapped onto Python data. If that would be the case, we'd need to wrap the vector by a copy()
.
src/Convert/numpy.jl
Outdated
for T in (CompoundPeriod, Year, Month, Day, Hour, Minute, Second, Millisecond, Microsecond, Nanosecond, Week) | ||
pyconvert_add_rule("numpy:timedelta64", T, pyconvert_rule_timedelta64, priority) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Julia is unlikely to unroll this loop, you should use Base.Cartesian.@nexprs
here to avoid dynamic dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried my best, but I'm not sure how to test whether this will speed up things
src/Convert/numpy.jl
Outdated
args = T .== (Day, Second, Millisecond, Microsecond, Minute, Hour, Week) | ||
pydatetime64(x.value .* args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to rewrite this with Base.Cartesian.@nif
rather than doing a masked sum, since you know there will be only 1 element in the sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sum is dammed fast (16ns), and I couldn't beat it with a different version.
Thanks for all the comments, I already learned some new things about Julia, which is always nice. I will go through them the next days and once we've agreed on the solutions write the test functions. Just want to re-raise my question from the previous PR. Do you think, we should stick with CompoundPeriod as default conversion type? |
Co-authored-by: Miles Cranmer <[email protected]>
@cjdoris @MilesCranmer I'd really love to see these changes merged. Please let me know if there are any remaining things to be done from your end? |
Could you please add some tests and docs? |
Can you rebase on #583 to fix the tests? |
I did rebase, the errors come from the new julia version. |
I see, thanks. In the meantime can you add tests and docs for the code changes? Ideally we should have 100% test coverage. |
I did add tests for pytimedelta. using CondaPk
CondaPkg.add("pandas") which I think is a reasonable approach. |
Were the test files not committed? I don’t see any added tests |
When writing the tests for pytimedelta and pytimedelta64 I realised that my choice for keyword argument naming might be confusing for users. pytimedelta(second = 1) but pyimport("datetime").timedelta(seconds = 1) My choice was inspired from the Dates API (e.g. |
append 's' to keywords in pytimedelta and pytimedelta64
…a64 in DataFrames
Changed the keywords. Tests should pass as soon as the other two PRs are merged. |
…me}), remove pydatetime64(::CompoundPeriod)
…nicalize, add global setting CANONICALIZE_TIMEDELTA64 for rule conversion
Added functionality that the unit of pytimedelta64() is defined by the least unit specified, also for the case 0. julia> pytimedelta64(Year(0))
Python: np.timedelta64(0,'Y')
julia> pytimedelta64(Week(0))
Python: np.timedelta64(0,'W')
julia> pytimedelta64(Millisecond(0))
Python: np.timedelta64(0,'ms') and julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60)))
60 seconds
julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60), canonicalize = true))
1 minute and julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60)))
60 seconds
julia> PythonCall.Convert.CANONICALIZE_TIMEDELTA64[] = true
true
julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60)))
1 minute |
I hope you find these changes consistent. |
There's an error for CondaPkg on Windows, but in principle, the tests pass. |
I cherry-picked #589 in order to proceed with testing.
The docs are probably the only remaining topic. |
This PR replaces #334 and takes into account the major refactoring of PythonCall.
Particularly, it fixes #293.
What's new?
Python Constructors
Conversion to Julian types
DataFrame handling
I've set the priority of datetime, timedelta, datetime64 and timedelta64 to ARRAY, which allows for automatic Table conversion - I hope that's the intended way to do it.
Default Conversion
I chose to use
Dates.CompoundPeriod
as result type of default conversion from timedelta64 as both types support year, month and minor period units. This is debatable, we could also change it toPeriod
, hence the resulting type would depend on the input.Both Python and Julia do not convert between Year/Month and the other period types, so there is no danger with this choice to arrive at ill-determined intervals.
The difference is that Julia allows addition/subtraction of mixed types while Python/Numpy throws an error.
The difference to the previous PR is that all conversions rely on either builtin or numpy functions and do not use pandas.
Ordering of arguments for
pytimedelta()
was chosen to be identical to the python version, while ordering forpytimedelta64()
is strictly descending, exceptweek
which comes last.EDIT: add comments in what's new code, added conversions
EDIT2: removed comment about datetime_data, I had misunderstood the meaning and have updated the code