Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplex output to files? #59

Open
tsibley opened this issue May 6, 2015 · 4 comments
Open

Multiplex output to files? #59

tsibley opened this issue May 6, 2015 · 4 comments
Labels

Comments

@tsibley
Copy link
Collaborator

tsibley commented May 6, 2015

Occasionally I reach for recs multiplex when I want to split a record stream into multiple files. For example, recs multiplex -k foo -- recs-tocsv works great, except all of the CSV output goes to stdout. When I'm using an output format without a distinct marker to split on, I usually work around this limitation using some combination of recs piped to parallel running the recs to... command. recs chain and recs generate seem like they would almost allow me to multiplex to separate files, but either generate needs to support outputting non-records or chain needs to support some sort of interpolation like generate (ick).

In terms of supporting this feature, I see two options:

  1. Build support into multiplex itself. Something like --output-filename-key=<keyspec> or --output-filename=<snippet> on which output is written to for each group. The filename key or evaluated snippet would be added to the set of keys records are grouped upon.
  2. Add a new operation which enables use of the existing multiplex to do this, for example: recs multiplex -k foo -- recs-tofiles -k filename -- recs-tocsv

I think option one is cleaner than option two, both in terms of implementation and command line syntax. Option two however is implementable outside of core recs.

Is this feature worth having in core recs? General thoughts?

@benbernard
Copy link
Owner

Hmmm I'm cool with having it in core recs, just not certain what the interface should be... I think it seems reasonable to me to have multiplex be able to do it...

Another options would be to add a -o flag to all recs commands, like --filename-key that lets you output to a named file (which seems reasonable) and then let multiplex be able to interpolate command names based on a clumping record...

Would be cool to have the latter, but the former is much more usable. I'll also ping @amling to see what he thinks

@amling
Copy link
Collaborator

amling commented May 6, 2015

I'm also not sure the right combination of primitives to pull this off,
but here are some ideas:

Something we've thought about previously was having line output commands
take a --records to output a single key ("LINE" or the like) record
instead. This means inside multiplex you'd get your bucket stamped back
on that so recs-multiplex -k foo -- recs-tocsv --records would have
output records with a "foo" field and a "LINE" field.

That's sort of the minimum of multiplex+tocsv not destroying the data.
After that the best primitive to sort into files is not very clear,
especially because you want to sort into file by "foo" field but then
also eval down to the "LINE" field. That alone doesn't seem like a
great primitive, but maybe it would be OK? recs-tofiles --file <snippet> would write records (not what you want here but we'd allow
it), --line <snippet> would write the evaluation of the snippet.

End-to-end this makes it:

recs-multiplex -k foo -- recs-tocsv --records | recs-tofiles --file '{{foo}}' --line '{{LINE}}'

We could also split --file into --file-key (-f) and --file-eval (-F) and
likewise --line into --line-key (-l) and --line-eval (-L):

recs-multiplex -k foo -- recs-tocsv --records | recs-tofiles -f foo -l LINE

Keith

On Wed, May 06, 2015 at 10:31:11AM -0700, Ben Bernard wrote:

Hmmm I'm cool with having it in core recs, just not certain what the interface should be... I think it seems reasonable to me to have multiplex be able to do it...

Another options would be to add a -o flag to all recs commands, like --filename-key that lets you output to a named file (which seems reasonable) and then let multiplex be able to interpolate command names based on a clumping record...

Would be cool to have the latter, but the former is much more usable. I'll also ping @amling to see what he thinks


Reply to this email directly or view it on GitHub:
#59 (comment)

@benbernard
Copy link
Owner

Keith and I talked about this for a long while today... we think probably the best thing to do is to build it into multiplex...

recs-multiplex -k foo -o foo -- recs-tocsv

would output to a file named foo-FOO_VALUE for each clump, with the output of tocsv

Similarly you could use -O to provide evalable perl to generate the filename

recs-multiplex -k foo -O '"myawesomefile-{{foo}}.recs"'

We thought about tofiles for a long time, but in the end it just seemed to be duplicating multiplex clumping without much value....

Thoughts?

@tsibley
Copy link
Collaborator Author

tsibley commented May 7, 2015

Sounds good! I agree about duplicating the multiplex clumping without much value, and that's why I also had settled on option one instead of option two when thinking this through.

Unless you or Keith have a burning desire to implement this, I'll probably take a swing at it in the next few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants