Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Model Extension in FAIRiCUBE #21

Open
cozzolinoac11 opened this issue Dec 10, 2024 · 1 comment
Open

ML Model Extension in FAIRiCUBE #21

cozzolinoac11 opened this issue Dec 10, 2024 · 1 comment

Comments

@cozzolinoac11
Copy link

In the FAIRiCUBE project, we are using ML Model Extension for metadata analysis and processing (a/p) resources and, in particular, for those resources concerning machine learning and deep learning.
Through STAC properties, we have added useful fields to better describe the resource. We want to share our work with the community to get feedback as well as for interoperability purposes (in case someone has similar documentation demands).

In particular:

  • Platform - platform hosting the resource. It is possible to use a combination of values (e.g., EOX and AWS).
  • Framework - This field is generally intended as a collection of reusable code written by others. It includes both frameworks, intended as program scaffolds that supply the blueprint of a product, and libraries, intended as collections of pre-defined methods and classes. Notice that the same processing can be done using multiple libraries.
  • Algorithm - Name of the algorithm
  • Model configuration - Configuration/initialisation data. How the model has been parameterized
  • Performance - Result description and explanation, including a detailed description of the hyperparameters used, the run times, the metrics used for evaluation, and the respective scores and performance.
  • UseConstraints - Possible constraints related to the use of the resource (e.g., the resource works only for certain Input data
    the resource needs specific Process of providing computational power)
  • Validation - Link to a validation report

The following fields are implemented as assets and asset properties:

  • Input data used - Link to data (or related metadata) to which the a/p resource has been applied. This information is required for a better understanding of the context and domain of the a/p resource.
  • Characteristics of input data - This field contains a textual description of the main characteristics of each input data to the resource.
  • Biases and ethical aspects - This field may contain observations on the data and/or any biases found (e.g., class imbalances).
  • Output data obtained - Link to output data (or related metadata) produced by the execution of the a/p resource. This information is required for a better understanding of the a/p resource.
  • Characteristics of output data - Textual description of the output data from the resource.

For a detailed look at an example of metadata, the FAIRiCUBE Catalog is available. For example: https://catalog.eoxhub.fairicube.eu/collections/ML%20collection/items/8BLIAOAZJS

@fmigneault
Copy link
Collaborator

@cozzolinoac11

I recommend the FAIRiCUBE community to consider using https://github.com/stac-extensions/mlm instead.

It is intended as the updated and extended definition of ml-model, with much more properties to describe the model inputs/outputs, the framework/platform/runtime constraints, model hyperparameters, and related data-sources. Basically, most of what I am seeing in your suggestions seems to be covered, and is addressed by mlm, as an effort performed after receiving from users many similar concerns that ml-model is lacking those details. mlm was deemed necessary (rather than updating ml-model) to incorporate important refactors needed to address more recent ML/AI concerns.

Warning

The ml-model extension is not expected to receive further updates. However, mlm is in active development with many participants. Let us know if mlm works for your use case, and if anything seems to be still missing, we can expedite their definition.

Note

Full disclosure:
I am maintainer on both STAC extensions (and others). I joined ml-model maintainers due to the lack of response from original maintainers, while trying to revive the project. Instead, discussions and efforts with the community lead to its mlm replacement. For more detail: https://github.com/orgs/stac-utils/discussions/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants