Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SplObjectStorage #276

Open
gnat42 opened this issue Nov 11, 2015 · 4 comments
Open

Remove SplObjectStorage #276

gnat42 opened this issue Nov 11, 2015 · 4 comments

Comments

@gnat42
Copy link
Contributor

gnat42 commented Nov 11, 2015

SplObjectStorage is currently used to store what amount to an array of exceptions objects. I'm suggesting it be changed to an actual array for the following reasons.

  • You cannot inspect the storage contents via xdebug
  • Iterating the SplObjectStorage is non-intuitive in comparison to a basic array
  • Performance between the two is nearly identical until the number of records reaches a number high enough that in reality an OOM would occur with a data-importer of any complexity
  • Its much better suited to be used as a map/set but in this case isn't really useable that way as it stores exceptions so no one will be doing $exceptions->contains($exceptionObj)
  • It doesn't offer anything better than a standard array and creates a more complex hoop to jump through in post processing for new developers inspecting its contents.

I don't mind submitting a PR for this. I'm adding it as an issue because I've submitted 13 pull requests recently, all of which have received no response and my work uses the SplObjectStorage and adds another one. Anyway, if this issue was agreed upon, I'd submit a PR that could be merged before or after my other ones depending on review etc..

@Baachi
Copy link
Collaborator

Baachi commented Nov 12, 2015

Hey Nathanael,

first of all, a big sorry 😢 for that no one takes care of your work. The thing is, we want to move this repo to portphp and nobody haven't the time to give it the love it which is needed.

About your issue:
I implemented the SplObjectStorage for a one reason, it saves memory in complex cases. If you pass a SplObjectStorage to an function/method, PHP will get them a reference of this object.
If this is an array, PHP will copy it and pass the copy to the function. And we talk about an import/export library which have to deal with a huge amount of data. So my idea was to help PHP to save memory and
so our users can use it without have to care about it.

To your first argument against SplObjectStorage, i have good news, xdebug will now support it (see http://bugs.xdebug.org/bug_view_page.php?bug_id=00000686).

Iterating the SplObjectStorage is non-intuitive in comparison to a basic array

Why it isn't intuitive to iteratate over an SplObjectStorage? Its exactly the same syntax so i don't see the point, maybe you can provide a script which shows the issue?

Its much better suited to be used as a map/set but in this case isn't really useable that way as it stores exceptions so no one will be doing $exceptions->contains($exceptionObj)

That's true and i hate this one limitation, but on the other hand what should be the key? And we can extend them and provide a better api.

But here we go, we can discuss about it. Maybe my thoughts are wrong.

@gnat42
Copy link
Contributor Author

gnat42 commented Nov 12, 2015

Hey Marcus,

No need to apologize, I wasn't trying to blame anyone. Just gave a reason it was an issue instead of a PR. What do you mean you want it moved to portphp? Even if you were, I would appreciate some input on the PRs I submitted if you can find the time.

I was pretty sure that PHP passed objects by reference by default. So passing SplObjectStorage itself isn't particularly unique just that you want to pass an object for the same effect I presume?

As for iterating over a SplObjectStorage object... Perhaps the issue is that I store extra information so my iterating code looks like this mess.

  $exceptions->rewind();
  while ($exceptions->valid()) {
    $object = $exceptions->current(); // similar to current($s)
    $row = $exceptions->offsetGet($object);
    $exceptions->next();
  }

This is necessary if you want to match the source row you were importing to the exception. Personally that information is hugely important in our use case.

So if all you want is the ability to pass an object as a reference, I would implement a simple array wrapper with an iterator. It can be passed around, store the row and exception and be inspectable by xdebug. I would love to combine the exceptions store into a single 'reports' store. I have a PR #241 that captures messages from low level converters/filters. I implemented it separately but really think there should be a Reports object that stores all the messages (exceptions or otherwise). But the SplObjectStore gives very little benefit if you ask me over any other object. Since most of its benefit comes from being able to quickly query for an object's existence in the store which is not likely the use case here.

As an aside...

We've built a nice complex system that allows users to build dynamic maps from uploaded files to standard data sets. They can set a preprocessor, value converters and all sorts of stuff. If it doesn't work we can send the user feedback on what rows were excluded and why as well as other things like that. You can check out all of my pr's as a merged set in the 'gnat' branch of my fork of data-import (https://github.com/gnat42/data-import). Some of the PRs are non-architectural bug fixes / improvements including test cases.

@Baachi
Copy link
Collaborator

Baachi commented Nov 12, 2015

Hopefully i can spent some time in the next coming days/weeks, to review the PRs.

What do you mean you want it moved to portphp?

We want to split this repo in its essential parts and moved the code to the portphp organization, because this have some benefits for us.
If you now scream "Hell, oh no this library goes dead", no reason for that, bugfixes will be merged but all features should be to the new code.

Ah now i get your problem with that. Maybe its better to rely on SplStack or SplDoublyLinkedList, this will fix it or not?

Oh wow looks really great and i'm happy the library helps you to build such amazing things ❤️
If you have the possibility to benchmark the SplObjectStorage vs array in your application, it would be awesome. Then we have real world results.

@gnat42
Copy link
Contributor Author

gnat42 commented Nov 12, 2015

I'll take a look at SplStack and LinkedList.

Honestly I doubt the SplObjectStorage even contributes single percent of the computing time. I did notice recently that the slowest part is the initial reading of a large ExcelReader file. xls files take a long time to load - I haven't investigate why. Small xlsx files exhaust system memory.

I would doubt much processing is spent with the object store. When you think about it, you shouldn't really have many exceptions. you can't have more than one per row. Whereas you can have multiple processors, validators, converters filters etc per row.

I'll get back to you about the other potential objects and thoughts on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants