Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DownloadDistmapResult: merge SAM header (feature request) #511

Closed
robmaz opened this issue Aug 23, 2018 · 3 comments
Closed

DownloadDistmapResult: merge SAM header (feature request) #511

robmaz opened this issue Aug 23, 2018 · 3 comments
Labels
Priority: Low Status: Abandoned Not part of the project backlog Type: Enhancement New features or improved behavior

Comments

@robmaz
Copy link

robmaz commented Aug 23, 2018

Following up on the idea of a bam-based pipeline, it would be super useful if you could merge a saved SAM header (in particular the RG info) back into the generated bam.

@magicDGS
Copy link
Owner

Thanks for the proposal: this one is a bit more complicated. I need to know some specific requirements and possible difficulties in the implementation:

  1. How the read groups are assigned to the reads? There is no information retained in distmap about that (ony the barcode - and that means making again the matching for them).
  2. Should the header be merged with the one comming from mapping, or just overriden completely?
  3. In case of overriden, what will happen if there are conflicts with previos header lines? For example, if remapping with distmap a previous mapped file.

@magicDGS magicDGS added Priority: Low Type: Enhancement New features or improved behavior Status: Pending In discussion to include in the project backlog labels Aug 23, 2018
@robmaz
Copy link
Author

robmaz commented Aug 24, 2018

What I originally thought was basically to put back in the previously stored RG info from the header, assigning all reads to the same group; I realize that as a generic problem, this is not so straight-forward. The old PG info could be restored assuming that all reads were processed in the same way. I guess remapping should replace the SQ lines with the new ones, but retain a PG line for the previous mapping? I would say that it cannot be readtools duty to figure out ambiguities, one has to assume that person who wants the header merged knows what they are doing. If there are two RGs, for example, just bail out with an error message.

So the assumption should be that this is a header that can unambiguously added and is a single PG step (the mapping) away from the current header. And just fail with a corresponding error message if that does not seem to be true.

That means:

  • Ignore sorting info as it may no longer be true (and will be changed anyway now).
  • Ignore old SQ lines.
  • Insert single RG line or fail.
  • Prepend (or append? not sure what the normal sequence is) old PG lines
  • Add all reads to RG and PGs.

@magicDGS
Copy link
Owner

I linked into the new issue your comment to conserve your mind idea. But I close in favor of #518 to keep the conversation only in one place.

@magicDGS magicDGS added Status: Abandoned Not part of the project backlog and removed Status: Pending In discussion to include in the project backlog labels Aug 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low Status: Abandoned Not part of the project backlog Type: Enhancement New features or improved behavior
Projects
None yet
Development

No branches or pull requests

2 participants