Terminology: most rapidly varying dimension mislabelling of storage order #583

pvanlaake · 2025-01-08T17:46:24Z

Mis-labelling of storage order in definition of "most rapidly varying dimension"

In #530 a new definition for "most rapidly varying dimension" was developed and this has since been included in 1.12. Unfortunately, row-major ordering and column-major ordering are reversed. In fact, C-style is row-major and Fortran-style is column major.

Moderator

Not assigned.

Requirement Summary

Update the text in the terminology section to correctly reflect the storage order.

Additionally, some minor textual changes are proposed to make the text more accurate and inclusive:

Change "When netCDF is represented in CDL..." to "When a netCDF dataset is represented in CDL...".
Change "C and Python NumPy use the same order as C, also called "column-major order", but Fortran uses the opposite convention, also called "row-major order", so that when netCDF variables are accessed in Fortran the most rapidly varying dimension is the first one." to "C and Python NumPy use the same order as CDL, called "row-major order", while R and Fortran use the alternative arrangement, called "column-major order", so that when netCDF variables are accessed in R or Fortran the most rapidly varying dimension is the first one."

Associated pull request

PR will be made after any comments and suggestions have been processed.

ChrisBarker-NOAA · 2025-01-08T17:57:13Z

I think we should use:

"C and Python NumPy uses the same order as CDL...."

Since it begins with the definition for CDL.

(and I suspect that was the original intent, as saying C uses the same order as C wouldn't have been intentional :-)

Agree with the addition of R as an example.

What does confuse me, and maybe this isn't the place to clarify in the docs, but if a netcdf File has:

variable(x, y, z)

in it, and you open it in Fortran or R, do you access it as:

variable[z, y, x] ?

(and the same for writing?

(I don't use Fortran or R, so ....)

pvanlaake · 2025-01-08T20:05:07Z

Proposed text updated as suggested.

On confusion: this is a common thing among the best of us, but in the end it really doesn't matter. Dimensions can be stored in any order so a reader has to examine the relevant attributes to determine how to orient the data. I am not sure about the details of the netcdf library, of which there are versions in C and Fortran, and if they would write in their native storage mode or whether there is a default arrangement that both library versions use. In R I use package RNetCDF, which is written and maintained by UCAR staff, as a low-level access to the library and that produces data in row-major order. That leads to fun stuff like flipped maps etc - you may find any number of non-plussed users on StackOverflow or similar platforms.

Where it does matter is in processing of the data. Getting a time profile for a specific location from a COARDS compliant 3-dimensional data set is painfully slow compared to getting an area of data for a specific time, due to the contiguity of the data on file and thus the more efficient I/O. That, however, is the same for both storage orders, but just operating on different dimensions.

ChrisBarker-NOAA · 2025-01-08T20:34:43Z

Where it does matter is in processing of the data. Getting a time profile for a specific location from a COARDS compliant 3-dimensional data set is painfully slow compared to getting an area of data for a specific time, due to the contiguity of the data on file and thus the more efficient I/O. "

well, yes, which is why CF recommends an order, but does not require it -- and why it uses "most rapidly varying" rather than first [last] dimension.

Though with modern file formats (netCDF4, zarr, ???) this ends up being more an issue of how the data are chunked, rather than the dimension order.

JonathanGregory · 2025-01-09T19:01:14Z

Thanks for opening the issue, @pvanlaake, and for spotting the mistake. I agree with the suggested change of Chris's, which you've made, and also agree with his explanation of why CF relaxed the COARDS requirement for ordering of dimensions.

Patrick should be added to the list of contributors to the convention once this issue has been concluded. I've added the new contributor label to remind us.

pvanlaake added the defect Conventions text meaning not as intended, misleading, unclear, has typos, format or language errors label Jan 8, 2025

pvanlaake mentioned this issue Jan 8, 2025

Use of "most rapidly varying" #530

Closed

JonathanGregory added the new contributor This issue was worked on by new contributors to the CF conventions label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminology: most rapidly varying dimension mislabelling of storage order #583

Terminology: most rapidly varying dimension mislabelling of storage order #583

pvanlaake commented Jan 8, 2025 •

edited

Loading

ChrisBarker-NOAA commented Jan 8, 2025

pvanlaake commented Jan 8, 2025

ChrisBarker-NOAA commented Jan 8, 2025

JonathanGregory commented Jan 9, 2025

Terminology: most rapidly varying dimension mislabelling of storage order #583

Terminology: most rapidly varying dimension mislabelling of storage order #583

Comments

pvanlaake commented Jan 8, 2025 • edited Loading

Mis-labelling of storage order in definition of "most rapidly varying dimension"

Moderator

Requirement Summary

Associated pull request

ChrisBarker-NOAA commented Jan 8, 2025

pvanlaake commented Jan 8, 2025

ChrisBarker-NOAA commented Jan 8, 2025

JonathanGregory commented Jan 9, 2025

pvanlaake commented Jan 8, 2025 •

edited

Loading