-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for aggregation #122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long for this. Here is my first pass!
docs/source/aggregation.rst
Outdated
.. note:: | ||
|
||
In case the number of jobs in the project is odd, there will be one aggregate containing only a single | ||
job and hence users should be careful while defining the parameters for an *aggregate operation*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could show an example on how to do that, e.g.:
def op4(job1, job2=None):
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since in the above example the parity of total jobs is not defined, I think it'll be more appropriate to include this in the above example itself
docs/source/aggregation.rst
Outdated
.. note:: | ||
|
||
In case the number of jobs in the project is odd, there will be one aggregate containing only a single | ||
job and hence users should be careful while defining the parameters for an *aggregate operation*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to be a bit more precise here. What does "careful" mean? This may or may not be a valid use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/source/aggregation.rst
Outdated
============ | ||
|
||
Similar to the concept of a job id, an aggregate id is a unique hash identifying an aggregate of jobs. | ||
The aggregate id is sensitive to the order of the jobs in the aggregate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is the default order then? Should be documented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/source/aggregation.rst
Outdated
.. note:: | ||
|
||
Currently, **signac-flow** only allows single :class:`~flow.aggregator` per group, i.e. all the operations present | ||
in a :py:class:`FlowGroup` will be using a same :class:`~flow.aggregator` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a :py:class:`FlowGroup` will be using a same :class:`~flow.aggregator` object. | |
in a :py:class:`FlowGroup` will be using the same :class:`~flow.aggregator` object. |
docs/source/aggregation.rst
Outdated
class Project(FlowProject): | ||
pass | ||
|
||
group = Project.make_group('agg-group', aggregator_obj=aggregator()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the example would be easier to follow if you provided an implementation for such an aggregator class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csadorf, I am not sure I completely follow what you say. Can you please clarify?
@kidrahahjo can you still work on this? If not (which is fine) we could assign someone else so that we can prioritize getting this updated on the documentation. |
I spoke with @DomFijan and he expressed interest in helping as well. |
I'll try to update this PR by addressing @csadorf 's reviews within a few hours. I don't have any problems with anyone else working on this. Please feel free to contribute. |
When I was trying out aggregation yesterday and it wasn't immediately obvious to me that functions passed to |
Thank you @kidrahahjo so much for the documentation! As someone who's not super familiar with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments below. Overall this is great and covers the right topics! Just need to clean it up a bit. I think we should show fewer examples of def my_op(job1, job2)
and replace them with def my_op(*jobs)
. I think that is a better practice to demonstrate for most users' needs.
Also, I have seen several times that newer Python programmers are confused by how *args
and **kwargs
become a tuple / dictionary inside the function body. I would suggest that some of the function bodies containing pass
should be replaced by something acting on the jobs, like print("Number of jobs in aggregate:", len(jobs))
, so that users can understand how the *jobs
argument gets translated into a tuple.
docs/source/aggregation.rst
Outdated
if __name__ == '__main__': | ||
Project().main() | ||
|
||
If :class:`~flow.aggregator` is used with the default arguments, an aggregate of all the jobs present in the project will be created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If :class:`~flow.aggregator` is used with the default arguments, an aggregate of all the jobs present in the project will be created. | |
If :class:`~flow.aggregator` is used with the default arguments, it will create a single aggregate containing all the jobs present in the project. |
docs/source/aggregation.rst
Outdated
|
||
@aggregator.groupsof(2, sort_by='temperature', sort_ascending=False) | ||
@Project.operation | ||
def op5(job1, job2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use *jobs
or job2=None
to support a final aggregate with one job?
(I also worry that showing examples with job1, job2
will be more confusing than *jobs
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've made our point of using non default arguments carefully, I think we should go with *jobs
here.
In case the number of jobs in the project in this example is odd, there will be one aggregate containing only a single job. | ||
In general, the last aggregate from :class:`~flow.aggregator.groupsof` will contain the remaining jobs if the aggregate size does not evenly divide the number of jobs in the project. | ||
If a remainder is expected and valid, users should make sure that the operation function can be called with the reduced number of arguments (e.g. by using ``*jobs`` or providing default arguments as shown above). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add more examples for some or all of the following:
- Group by state point keys: The aggregates are grouped by multiple state point keys.
- Group by arbitrary key function: The aggregates are grouped by keys determined by a key-function that expects an instance of :class:
~.signac.contrib.job.Job
and return the grouping key. - Using a completely custom aggregator function when even greater flexibility is needed.
- Using sorting/selection in conjunction with other aggregator parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a new issue for this. #146.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is mostly complete, so I am approving it. All of my remaining suggestions could be addressed in a separate PR if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referring to Python unpacking is very helpful!
…gates of one job. Co-authored-by: Carl Simon Adorf <[email protected]>
I created #146, #147 to address my remaining comments on this PR. With two approvals, I think it's ready to merge. @csadorf I think we've addressed your comments sufficiently, but feel free to follow up if you wanted to make another round of changes. Thanks for your work on this, @kidrahahjo. 👍 edit: I did some final touches to add this to the table of contents and fix the intersphinx references. |
Thx a lot @kidrahahjo !! |
Description
Add documentation for the newly introduced aggregation feature.
Motivation and Context
This should be merged after glotzerlab/signac-flow#464 gets merged into master.
Checklist: