Add documentation for aggregation #122

kidrahahjo · 2021-02-24T18:24:49Z

Description

Add documentation for the newly introduced aggregation feature.

Motivation and Context

This should be merged after glotzerlab/signac-flow#464 gets merged into master.

Checklist:

I am familiar with the Contributing Guidelines.
I agree with the terms of the Contributor Agreement.
My code follows the code style guideline of this project.

csadorf

Sorry for taking so long for this. Here is my first pass!

docs/source/aggregation.rst

csadorf · 2021-03-25T08:56:40Z

docs/source/aggregation.rst

+.. note::
+
+    In case the number of jobs in the project is odd, there will be one aggregate containing only a single
+    job and hence users should be careful while defining the parameters for an *aggregate operation*.


We could show an example on how to do that, e.g.:

def op4(job1, job2=None): pass

Since in the above example the parity of total jobs is not defined, I think it'll be more appropriate to include this in the above example itself

csadorf · 2021-03-25T08:57:14Z

docs/source/aggregation.rst

+.. note::
+
+    In case the number of jobs in the project is odd, there will be one aggregate containing only a single
+    job and hence users should be careful while defining the parameters for an *aggregate operation*.


I would suggest to be a bit more precise here. What does "careful" mean? This may or may not be a valid use case.

Tried addressing here: https://github.com/glotzerlab/signac-docs/pull/122/files#diff-d053cd842dd6fe746188c4339762f7e336b763b94d513473d65900b9836a09f0R83

docs/source/aggregation.rst

csadorf · 2021-03-25T09:03:27Z

docs/source/aggregation.rst

+============
+
+Similar to the concept of a job id, an aggregate id is a unique hash identifying an aggregate of jobs.
+The aggregate id is sensitive to the order of the jobs in the aggregate


So what is the default order then? Should be documented here.

Tried addressing here: https://github.com/glotzerlab/signac-docs/pull/122/files#diff-d053cd842dd6fe746188c4339762f7e336b763b94d513473d65900b9836a09f0R89

docs/source/aggregation.rst

csadorf · 2021-03-25T09:04:04Z

docs/source/aggregation.rst

+.. note::
+
+    Currently, **signac-flow** only allows single :class:`~flow.aggregator` per group, i.e. all the operations present
+    in a :py:class:`FlowGroup` will be using a same :class:`~flow.aggregator` object.


Suggested change

in a :py:class:`FlowGroup` will be using a same :class:`~flow.aggregator` object.

in a :py:class:`FlowGroup` will be using the same :class:`~flow.aggregator` object.

csadorf · 2021-03-25T09:04:58Z

docs/source/aggregation.rst

+    class Project(FlowProject):
+        pass
+
+    group = Project.make_group('agg-group', aggregator_obj=aggregator())


I think the example would be easier to follow if you provided an implementation for such an aggregator class.

@csadorf, I am not sure I completely follow what you say. Can you please clarify?

b-butler · 2021-05-21T17:25:30Z

@kidrahahjo can you still work on this? If not (which is fine) we could assign someone else so that we can prioritize getting this updated on the documentation.

bdice · 2021-05-21T17:35:29Z

I spoke with @DomFijan and he expressed interest in helping as well.

kidrahahjo · 2021-05-21T17:39:56Z

I'll try to update this PR by addressing @csadorf 's reviews within a few hours. I don't have any problems with anyone else working on this. Please feel free to contribute.

…-docs

DomFijan · 2021-05-21T20:28:03Z

When I was trying out aggregation yesterday and it wasn't immediately obvious to me that functions passed to @Project.post/pre need to now have a different input argument (job vs *jobs). If one wants to use aggregation to "embarrassingly parallelise" certain set of jobs (which I foresee will be a common application of this feature as bundling is often so problematic) the whole "workflow" of the operation one wishes to aggregate needs to be changed. This involves writing filters to filter out jobs which might have already been finished, rewriting pre/post condition functions for aggregated operation as outlined above, and finally splitting the MPI communicator for each job that one wishes to aggregate, and possibly some more...
My question is - should this be metioned somewhere in the documentation? Perhaps not in a form of an example in this place, but in such a capacity that user is aware that some modifications are needed, and simply adding an aggregate decorator will not just work automagically in such cases? Perhaps a followup example somewhere else in the docs might be useful to demonstrate this capability to provide a more well-rounded example of usefulness of this feature.

docs/source/aggregation.rst

Charlottez112 · 2021-05-21T21:49:46Z

Thank you @kidrahahjo so much for the documentation! As someone who's not super familiar with aggregation, I found the examples very helpful. I just made a few comments on things that were not immediately clear to me. Could you also resolve the comments/ suggestions you've applied to the doc?

bdice

Comments below. Overall this is great and covers the right topics! Just need to clean it up a bit. I think we should show fewer examples of def my_op(job1, job2) and replace them with def my_op(*jobs). I think that is a better practice to demonstrate for most users' needs.

Also, I have seen several times that newer Python programmers are confused by how *args and **kwargs become a tuple / dictionary inside the function body. I would suggest that some of the function bodies containing pass should be replaced by something acting on the jobs, like print("Number of jobs in aggregate:", len(jobs)), so that users can understand how the *jobs argument gets translated into a tuple.

docs/source/aggregation.rst

bdice · 2021-06-21T17:02:32Z

docs/source/aggregation.rst

+    if __name__ == '__main__':
+        Project().main()
+
+If :class:`~flow.aggregator` is used with the default arguments, an aggregate of all the jobs present in the project will be created.


Suggested change

If :class:`~flow.aggregator` is used with the default arguments, an aggregate of all the jobs present in the project will be created.

If :class:`~flow.aggregator` is used with the default arguments, it will create a single aggregate containing all the jobs present in the project.

docs/source/aggregation.rst

bdice · 2021-06-21T20:20:59Z

docs/source/aggregation.rst

+
+    @aggregator.groupsof(2, sort_by='temperature', sort_ascending=False)
+    @Project.operation
+    def op5(job1, job2):


Do we need to use *jobs or job2=None to support a final aggregate with one job?
(I also worry that showing examples with job1, job2 will be more confusing than *jobs.)

Since we've made our point of using non default arguments carefully, I think we should go with *jobs here.

docs/source/aggregation.rst

bdice · 2021-06-25T15:41:02Z

docs/source/aggregation.rst

+    In case the number of jobs in the project in this example is odd, there will be one aggregate containing only a single job.
+    In general, the last aggregate from :class:`~flow.aggregator.groupsof` will contain the remaining jobs if the aggregate size does not evenly divide the number of jobs in the project.
+    If a remainder is expected and valid, users should make sure that the operation function can be called with the reduced number of arguments (e.g. by using ``*jobs`` or providing default arguments as shown above).
+


We could add more examples for some or all of the following:

Group by state point keys: The aggregates are grouped by multiple state point keys.

Group by arbitrary key function: The aggregates are grouped by keys determined by a key-function that expects an instance of :class:~.signac.contrib.job.Job and return the grouping key.

Using a completely custom aggregator function when even greater flexibility is needed.

Using sorting/selection in conjunction with other aggregator parameters.

I created a new issue for this. #146.

docs/source/aggregation.rst

bdice

I think this PR is mostly complete, so I am approving it. All of my remaining suggestions could be addressed in a separate PR if desired.

Charlottez112

Referring to Python unpacking is very helpful!

…gates of one job. Co-authored-by: Carl Simon Adorf <[email protected]>

bdice · 2021-06-26T18:49:57Z

I created #146, #147 to address my remaining comments on this PR. With two approvals, I think it's ready to merge. @csadorf I think we've addressed your comments sufficiently, but feel free to follow up if you wanted to make another round of changes. Thanks for your work on this, @kidrahahjo. 👍

edit: I did some final touches to add this to the table of contents and fix the intersphinx references.

csadorf · 2021-06-26T21:27:30Z

Thx a lot @kidrahahjo !!

kidrahahjo added 2 commits February 24, 2021 23:38

Add aggregation docs without FlowGroups

197a1d4

Add documentation for aggregation with FlowGroups

61f9c08

kidrahahjo requested review from a team as code owners February 24, 2021 18:24

kidrahahjo self-assigned this Feb 24, 2021

kidrahahjo requested review from csadorf and Charlottez112 and removed request for a team February 24, 2021 18:24

csadorf requested changes Mar 25, 2021

View reviewed changes

bdice requested a review from DomFijan May 13, 2021 17:15

kidrahahjo added 3 commits May 22, 2021 00:10

Merge remote-tracking branch 'origin/master' into feature/aggregation…

cf56bd7

…-docs

Address reviews and update docs with current API

b65f9f3

Improve wordings

cc7de69

DomFijan reviewed May 21, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Charlottez112 reviewed May 21, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Charlottez112 reviewed May 21, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Charlottez112 reviewed May 21, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Charlottez112 reviewed May 21, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Address code review

5d4b5c6

kidrahahjo requested review from DomFijan, csadorf and Charlottez112 May 22, 2021 19:28

bdice requested changes Jun 21, 2021

View reviewed changes

csadorf removed their request for review June 22, 2021 07:13

Update docs/source/aggregation.rst

7ffd383

bdice reviewed Jun 25, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Update docs/source/aggregation.rst

35cbf3b

bdice reviewed Jun 25, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Update docs/source/aggregation.rst

c00bb86

bdice reviewed Jun 25, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Update docs/source/aggregation.rst

49c31f2

bdice reviewed Jun 25, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Update docs/source/aggregation.rst

375b3cd

bdice reviewed Jun 25, 2021

View reviewed changes

docs/source/aggregation.rst Show resolved Hide resolved

bdice reviewed Jun 25, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Update docs/source/aggregation.rst

7d5b3d5

bdice reviewed Jun 25, 2021

View reviewed changes

docs/source/aggregation.rst Outdated Show resolved Hide resolved

Update docs/source/aggregation.rst

573eb10

bdice approved these changes Jun 25, 2021

View reviewed changes

Charlottez112 approved these changes Jun 26, 2021

View reviewed changes

Explain that operations are like aggregate operations acting on aggre…

16b1c97

…gates of one job. Co-authored-by: Carl Simon Adorf <[email protected]>

This was referenced Jun 26, 2021

Add aggregation examples #146

Open

Make capitalization of section headings consistent #147

Open

bdice added 6 commits June 26, 2021 13:50

Add aggregation to table of contents.

8047341

Rename section to match FlowGroup.

d16f43a

Unitalicize.

9cd9995

Fix links to pre/post.

23576c6

Fix intersphinx references.

abd7544

Use :py: role prefix for consistency with other docs.

aca6752

bdice merged commit 81bfeb3 into master Jun 26, 2021

bdice deleted the feature/aggregation-docs branch June 26, 2021 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation for aggregation #122

Add documentation for aggregation #122

kidrahahjo commented Feb 24, 2021

csadorf left a comment •

edited

Loading

csadorf Mar 25, 2021

kidrahahjo May 21, 2021

csadorf Mar 25, 2021

kidrahahjo May 21, 2021

csadorf Mar 25, 2021

kidrahahjo May 21, 2021

csadorf Mar 25, 2021

csadorf Mar 25, 2021

kidrahahjo May 21, 2021

b-butler commented May 21, 2021

bdice commented May 21, 2021

kidrahahjo commented May 21, 2021

DomFijan commented May 21, 2021

Charlottez112 commented May 21, 2021

bdice left a comment •

edited

Loading

bdice Jun 21, 2021

bdice Jun 21, 2021

kidrahahjo Jun 24, 2021

bdice Jun 25, 2021 •

edited

Loading

bdice Jun 26, 2021 •

edited

Loading

bdice left a comment

Charlottez112 left a comment

bdice commented Jun 26, 2021 •

edited

Loading

csadorf commented Jun 26, 2021

	in a :py:class:`FlowGroup` will be using a same :class:`~flow.aggregator` object.
	in a :py:class:`FlowGroup` will be using the same :class:`~flow.aggregator` object.

	If :class:`~flow.aggregator` is used with the default arguments, an aggregate of all the jobs present in the project will be created.
	If :class:`~flow.aggregator` is used with the default arguments, it will create a single aggregate containing all the jobs present in the project.

Add documentation for aggregation #122

Add documentation for aggregation #122

Conversation

kidrahahjo commented Feb 24, 2021

Description

Motivation and Context

Checklist:

csadorf left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-butler commented May 21, 2021

bdice commented May 21, 2021

kidrahahjo commented May 21, 2021

DomFijan commented May 21, 2021

Charlottez112 commented May 21, 2021

bdice left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Jun 25, 2021 • edited Loading

Choose a reason for hiding this comment

bdice Jun 26, 2021 • edited Loading

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

Charlottez112 left a comment

Choose a reason for hiding this comment

bdice commented Jun 26, 2021 • edited Loading

csadorf commented Jun 26, 2021

csadorf left a comment •

edited

Loading

bdice left a comment •

edited

Loading

bdice Jun 25, 2021 •

edited

Loading

bdice Jun 26, 2021 •

edited

Loading

bdice commented Jun 26, 2021 •

edited

Loading