Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments, continued, strings, card as enum, etc #16

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

chrsoo
Copy link
Contributor

@chrsoo chrsoo commented Jan 20, 2025

This is a big change that may not be aligned with the vision for FITSRS. If so, no worries if you reject it.

Note that some sample files likely have invalid cards as per FITSv4, these cards are stored as Card::Undefined.

All unit tests pass, once the external sample file is downloaded and installed.

Changes

  • Added support for continued strings (long-string).
  • Added comments to Value enum.
  • Refactored Card as enum with variants for the different card variants in FITSv4
  • Changed cards in Header from HashMap to Vec to represent all cards in a FITS header.
  • Added values HashMap in Header to hold Cards representing values
    (i.e. Cards with with = in bytes 9 and 10 of the buffer)
  • Added history() and comments() for retrieving all HISTORY and COMMENT cards in the FITS header.
  • Added HISTORY and COMMENT as keywords in values (this is redundant with the two previous methods
  • Added support for value unit convention in comments
  • ...

Todo

  • support for complex numbers (integer and float)
  • support for HIEARCH cards
  • additional tests and documentation
  • some code cleanup
  • ...

Open Potins

  • Having the HISTORY and COMMENT supported both by methods returning an iterator and as concatenated values in the keyword/value map is redundant and potentially confusing, there should probably only be one way to get the header HISTORY and COMMENT. Also COMMENT might not make sense as it is usually contextual and related to the adjoint value cards.
  • There is potentially additional refactoring to be done for the extensions, e.g. switching to strings instead of eight byte buffers.

src/card.rs Outdated
comment: Option<String>,
},
// FITS extension string value, cf FITSv4, section 4.2.1.1
Extension([u8; 8]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should put an enum here Extension that could have only 3 possibilities, Image, Table and BinaryTable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, especially since there is already an enum defined in fitsrs::hdu::header::extension:

pub enum XtensionType {
    Image,
    BinTable,
    AsciiTable,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// [cards](Card) using the `HISTORY` keyword.
///
pub fn history(&self) -> impl Iterator<Item = &String> + use<'_, X> {
self.filter_string_values_by_kw("HISTORY")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong but it seems the filter_string_values_by_kw method only check for cards being the Value variant and do not consider the History variant (as well as COMMENT ...). So here the history cards might not be returned ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I understand the comments and history are copied 2 times in the Card vec, one time as a history (resp comment) variant and one time as a value variant. What was the reason for this redundancy ? I think that if possible, it would be nice indeed to store them only one time.

Maybe for history simply as a history variant. And for comment maybe the same. I am not very familiar whether a COMMENT card can refer to a value card or is more general ? I would say if someone wants to comment a specific card then he writes it in after the value after the / character ? (which is a case you handle too, associating a string comment as a value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong but it seems the filter_string_values_by_kw method only check for cards being the Value variant and do not consider the History variant (as well as COMMENT ...). So here the history cards might not be returned ?

You are very much right, not sure what I was thinking. Shows the value of unit tests and the peril of leaving them out. In this case the unit tests were on the TODO list, and I actually found it yesterday when I started making them. Also found a few other inconsistencies, like the fact that the SIMLPE and NAXIS headers (etc) are not present in the values map.

Sorry I understand the comments and history are copied 2 times in the Card vec, one time as a history (resp comment) variant and one time as a value variant. What was the reason for this redundancy ? I think that if possible, it would be nice indeed to store them only one time.

My reasoning is that there should be a memory representation of the header as found in the FITS file which is why all header lines are (should be) in the cards vec. This would be useful later on for writing the header to disk, as it means it is possible to closely emulate the original content. Here is where there are two design decisions to make: a) is it a goal to preserve the header when writing, and b) is it preferable to do in-place editing of the file instead of writing back the whole header.

I do realise I might be thinking ahead a bit too much as FITS file writing is still in the future.

Given the above I see the values map as a convenience to the API user as the FITS long-strings values are already processed. We could of course prioritise memory over cpu cycles and process the long-strings on-demand, but I was thinking that we want to catch issues when parsing and not at retrieval of the values by API clients. The memory vs cpu I think is probably a non-issue.

Regarding the fact that HISTORY and COMMENT are stored in both cards and values, this is what I mentioned in the Open Points above. I did that just for the sake of the discussion, but when I started on the header unit tests I am thinking that adding them as concatenated strings (separated by \n) is actually a mistake. I think the on-demand filtering (now fixed and soon pushed) is better.

Retrieving HISTORY as a vector of strings make a lot of sense, since it is a processing log that may be displayed in a UI, but COMMENT I am less sure of as I understand this to be header comments in the file that may be interspersed between the values. They make sense when seen in the header, side-by-side with the values, but aggregated in a vec or as a concatenated string... the context is lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Removed HISTORY and COMMENT from the values
  • Added missing unit tests
  • Fixed some bugs
  • Improved parsing
  • Added support for comments on Xtension.

Copy link
Contributor Author

@chrsoo chrsoo Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if you prefer not to have an in-memory representation of the header as seen on disk (admittedly, e.g. five Cards representing space might not be that useful), I suggest that we as an alternative...

  1. Remove comments() on the header and consider header-level comments not part of the API
  2. Replace the cards field with history and parse all history cards upfront.

This means that anybody interested in the header-level comments would have to use another tool (e.g. a text editor).

If you think that it should be possible to see the header level comments using the API, I propose we leave the PR more or less as is, with the only change that we remove the comments() method as I believe it makes little sense to put all comments together. Better to display them inline with the rest of the unprocessed cards.

Note that I don't think it makes sense treating the HISTORY as Value unless we include a new Value type representing an array. The user of the API should not have to split the value to access the individual processing lines, that are (presumably) independent operations. Similar argument for COMMENTS but less strong, as my guess is that multiple comment cards often must be read together.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chrsoo, thank you very much for all the precision. Give me some time understanding/testing the code again, it it a big and very valuable PR :) . I think that is good you removed the HISTORY and COMMENT from values to remove the redundancy.

When will come the time to writing I think that is a good to indeed dump all the cards with little to no modification from the original read header. Maybe one modification I see would be to dump the mandatory cards in the good order given by the standard for each extension type.

I like that you give a special enum variant for HISTORY and COMMENT. I think I share your ideas, retrieving all the HISTORY cards might be useful but the COMMENTS, as they might relate to different things is strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants