Handling of quantities #133

toddrjen · 2014-02-04T11:38:17Z

I split this off from issue #123

The current unit-handling class, quantities, has some major issues. The most pressing is that it is unmaintained, and how well it will deal with future python 3.x releases is unknown. It also has bugs that are affecting us that are probably never going to be fixed.

However, it has other issues. For one thing, since it subclasses numpy, it requires multiple copy operations for many tasks we need. Further, it prevents us from using numpy-like classes that support lazy loading (such as h5py). That also means we can't attach units to anything other than numpy arrays. If we want a single number with units, we have to use a numpy scalar.

There are also other problems. For example, telling if two units are the same type (such as time) is difficult and slow. Similarly, copying the units from one variable to another is difficult.

I have found another project, pint, that seems to solve most or all of these problems. It also seems to be a smaller code-based, and is easier to define new units.

https://pint.readthedocs.org/

rproepp · 2014-02-04T11:53:42Z

Andrew mentioned Pint in the upcoming Neo paper and I've already had a quick look at it. It seems very promising and we should definitely consider moving from quantities to Pint. As I've written earlier, I think we should do such a switch at the same version as our other API breaking changes.

If we decide to use Pint, we should contact the authors and talk about sustainability. It has 12 contributors but seems to be largely the effort of a single author.

toddrjen · 2014-02-04T11:59:43Z

Yes, there does seem to be one person driving it. However, this was also the case with quantities, and pint seems to have more outside contributions than quantities did (for example pint has more accepted and total pull requests, despite being younger). It also seems to do a better job dealing with issues.

rproepp · 2014-02-04T12:03:27Z

Yes, I completely agree. I only want to make sure that we don't switch and then run into a similar situation as with quantities one or two years down the line.

samuelgarcia · 2014-02-04T12:19:58Z

Trevor did a good comparison between units handling in python here:
http://conference.scipy.org/scipy2013/presentation_detail.php?id=174

samuelgarcia · 2014-02-04T12:46:04Z

And here
https://github.com/tbekolay/quantities-comparison

toddrjen · 2014-02-11T11:12:27Z

I know that several of the projects he represented have had a lot of advances since then. I managed to install and run his the comparison script, but I haven't been able to get his tables working yet. I will report back when I have an updated version of the analysis complete.

physicalist · 2014-03-28T09:04:31Z

Has anyone made any progress on this issue? In the aforementioned talk by Trevor, someone in the audience claimed pint was based on quantities, but that's not (or no longer?) the case. The benchmarks on ops weren't really in pint's favor, but if it had been rewritten w/o quantities as dependence this might have changed drastically, worth to re-evaluate in any case - unless someone already did that?

toddrjen · 2014-03-28T10:02:03Z

I have re-run the data analysis from the presentation, and pint still has performance issues, although it is improving its coverage of python and numpy mathematical calculations.

If we have a concrete interest in switching, I could always approach the developers about it. But I didn't want to do that if we weren't really willing to switch if things improved. There didn't seem to be a consensus yet that switching was the right move.

toddrjen · 2014-04-10T16:46:34Z

Anybody have any thoughts on this?

rproepp · 2014-04-11T12:13:59Z

I think I would now prefer to stay with quantities, which might require adopting the project or finding a new maintainer to get the bugs fixed (some already have pull requests). Performance is important for us and it seems that the issues with Pint are inherent in the architecture, see comments at the end on this page: https://pint.readthedocs.org/en/latest/numpy.html. On the same page the author mentions that Pint might be based on numpy inheritance in the future, which would remove the performance issues but reintroduce some of the problems we have with quantities.

The h5py lazy loading object is nice, but it might not be what we need: First, it supports only a very limited subset of numpy array functionality: mostly indexing only. And lazy loading from HDF5 arrays in general is often dependent on the chunking defined when creating the array. A common case would be an n*m array with n signals, each m samples long. When we're interested in only one of the channels and try to load it, we will actually often touch the whole data because the default chunking is rectangular. If we change the chunking to e.g. one chunk per channel, loading temporally defined subsets becomes useless. Too small chunks lead to bad general loading performance etc. It can still save memory, but with a potentially huge performance hit.

toddrjen · 2014-04-11T16:40:13Z

I've done some benchmarking on pint, and the issues raised in that link are
not the primary performance bottleneck. Most of the time seems to be taken
up combining compound units, which is done in a recursive manner.

rproepp · 2014-04-11T17:54:45Z

Ok, so the performance issues could be solvable? Maybe we could contact the author about that, or try ourselves? I'm not sure if switching would be worth it even if performance was closer to quantities (but still often only 50% if I understand the implications of the issues from the link correctly). Further opinions?

toddrjen · 2014-04-11T18:11:12Z

I don't know if they are solvable, we would have to talk to the developers. I don't know how much pint can be optimized, I don't yet understand how the operations work internally, I was just able to determine which methods took most of the time.

The performance could end up better than quantities. The primary performance bottleneck of quantities is also not the numpy operations. Further, quantities may very well have the same issue with multiple numpy operations. I really need to understand how things are handled in practice to know.

pint, however, has the advantage that in performance-critical situations you can bypass the issue entirely by working with the numpy array directly. This is not possible with quantities, at least not in a documented manner.

samuelgarcia · 2023-06-14T08:34:16Z

Close and leave #278 open

rproepp mentioned this issue Mar 31, 2014

Codejam#6 : Inheritance based vs composition based #123

Closed

apdavison modified the milestones: 0.6, 0.4 Jul 4, 2016

apdavison added bug and removed defect labels Jan 25, 2017

apdavison mentioned this issue Feb 3, 2017

Create neo.units module #278

Open

samuelgarcia closed this as completed Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of quantities #133

Handling of quantities #133

toddrjen commented Feb 4, 2014

rproepp commented Feb 4, 2014

toddrjen commented Feb 4, 2014

rproepp commented Feb 4, 2014

samuelgarcia commented Feb 4, 2014

samuelgarcia commented Feb 4, 2014

toddrjen commented Feb 11, 2014

physicalist commented Mar 28, 2014

toddrjen commented Mar 28, 2014

toddrjen commented Apr 10, 2014

rproepp commented Apr 11, 2014

toddrjen commented Apr 11, 2014

rproepp commented Apr 11, 2014

toddrjen commented Apr 11, 2014

samuelgarcia commented Jun 14, 2023

Handling of quantities #133

Handling of quantities #133

Comments

toddrjen commented Feb 4, 2014

rproepp commented Feb 4, 2014

toddrjen commented Feb 4, 2014

rproepp commented Feb 4, 2014

samuelgarcia commented Feb 4, 2014

samuelgarcia commented Feb 4, 2014

toddrjen commented Feb 11, 2014

physicalist commented Mar 28, 2014

toddrjen commented Mar 28, 2014

toddrjen commented Apr 10, 2014

rproepp commented Apr 11, 2014

toddrjen commented Apr 11, 2014

rproepp commented Apr 11, 2014

toddrjen commented Apr 11, 2014

samuelgarcia commented Jun 14, 2023