All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Add better descriptions about required and recommended MLM Asset Roles and their implications (fixes #54).
- Add explicit check of
value_scaling
sub-fieldsminimum
,maximum
,mean
,stddev
, etc. for correspondingtype
valuesmin-max
andz-score
that depend on it. - Allow different
value_scaling
operations per band/channel/dimension as needed by the model. - Allow a
processing:expression
for a band/channel/dimension-specificvalue_scaling
operation, granting more flexibility in the definition of input preparation in contrast to having it applied for the entire input (but still possible). - Add optional
mlm:compile_method
field at the Asset level with optionsaot
for Ahead of Time Compilation,jit
for Just-In Time Compilation.
- Explicitly disallow
mlm:name
,mlm:input
,mlm:output
andmlm:hyperparameters
at the Asset level. These fields describe the model as a whole and should therefore be defined in Item properties. - Moved
norm_type
tovalue_scaling
object to better reflect the expected operation, which could be another operation than what is typically known as "normalization" or "standardization" techniques in machine learning. - Moved
statistics
tovalue_scaling
object to better reflect their mutualtype
and additional properties dependencies. - moved
mlm:artifact_type
field value descriptions that are framework specific to best-practices section. - expanded suggested
mlm:artifact_type
values to include Tensorflow/Keras.
- n/a
- Removed
norm_type
enum values that were ambiguous regarding their expected result. Instead, aprocessing:expression
should be employed to explicitly define the calculation they represent. - Removed
norm_clip
property. It is now represented undervalue_scaling
objects with a correspondingtype
definition. - Removed
norm_by_channel
frommlm:input
objects. If rescaling (previously normalization in the documentation) is a single value, broadcasting to the relevant bands should be performed implicitly. Otherwise, the amount ofvalue_scaling
objects should match the number of bands or channels involved in the input.
- Fix missing
mlm:artifact_type
property check for a Model Asset definition (fixes #42). Themlm:artifact_type
is now mutually and exclusively required by the corresponding Asset withmlm:model
role. - Fix check of disallowed unknown/undefined
mlm:
-prefixed fields (fixes #41).
- Add
raster:bands
required propertyname
for describingmlm:input
bands (see README - Bands and Statistics for details). - Add README warnings about new extension
eo
andraster
versions.
- Split
ModelBands
andAnyBandsRef
definitions in the JSON schema to allow them to be referenced individually. - Move
AnyBandsRef
definition explicitly to STAC Item JSON schema, rather than implicitly inferred viamlm:input
. - Modified the JSON schema to use a
if
check of thetype
(STAC Item or Collection) prior to validating further properties. This allows some validators (e.g.pystac
) to better report the real error that causes the schema to fail, rather than reporting the first mismatchingtype
case with a poor error description to debug the issue.
- n/a
- Removed
$comment
entries from the JSON schema that are considered as invalid by some parsers. - When
mlm:input
objects do NOT define band references (i.e.:bands: []
is used), the JSON schema will not fail if an Asset with themlm:model
role contains a band definition. This is to allow MLM model definitions to simultaneously use some inputs withbands
reference names while others do not.
- Band checks against
eo
,raster
or STAC Core 1.1bands
when amlm:input
references names inbands
are now properly validated. - Fix the examples using
raster:bands
incorrectly defined in STAC Item properties. The correct use is for them to be defined under the STAC Asset using themlm:model
role. - Fix the EuroSAT ResNet pydantic example that incorrectly referenced some
bands
in itsmlm:input
definition without providing any definition of those bands. Theeo:bands
properties have been added to the correspondingmodel
Asset using thepystac.extensions.eo
utilities. - Fix various STAC Asset definitions erroneously employing
mlm:model
role instead of the intendedmlm:source_code
.
- Add the missing JSON schema
item_assets
definition under a Collection to ensure compatibility with the Item Assets extension, as mentioned this specification. - Add
ModelBand
representation usingname
,format
andexpression
properties to allow derived band references (fixes crim-ca/mlm-extension#7).
- Adds a job to
.github/workflows/publish.yaml
to publish thestac-model
package to PyPI.
- n/a
- Field
mlm:name
requirement to be unique. There is no way to guarantee this from a single Item's definition and their JSON schema validation. For uniqueness requirement, users should instead rely on theid
property of the Item, which is ensured to be unique under the corresponding Collection, since it would not be retrievable otherwise (i.e.:collections/{collectionID}/items/{itemID}
).
- Fix the validation strategy of the
mlm:model
role required by at least one Asset under a STAC Item. Although the role requirement was validated, the definition did not allow for other Assets without it to exist. - Correct
stac-model
version in code and publish matching release on PyPI.
- Add pattern for
mlm:framework
, needing at least one alphanumeric character, without leading or trailing non-alphanumeric characters. - Add
examples/item_eo_and_raster_bands.json
demonstrating the original use case represented by the previousexamples/item_eo_bands.json
contents. - Add a
description
field formlm:input
andmlm:output
definitions.
- Adjust
scikit-learn
andHugging Face
framework names to match the format employed by the official documentation.
- n/a
- Removed combination of
mlm:input
withbands: null
that could never occur due to pre-requirement oftype: array
.
- Fix
AnyBands
definition and use in the JSON schema to better consider possible use cases witheo
extension. - Fix
examples/item_eo_bands.json
that was incorrectly also usingraster
extension. This is not fundamentally wrong, but it did not allow to validate theeo
extension use case properly, since theraster:bands
reference caused a bypass for themlm:input[*].bands
to succeed validation.
- more Task Enum tasks
- Model Output Object
- batch_size and hardware summary
mlm:accelerator
,mlm:accelerator_constrained
,mlm:accelerator_summary
to specify hardware requirements for the model- Use common metadata Asset Object to refer to model asset and source code.
- use
classification:classes
in Model Output - add
scene-classification
to the Enum Tasks to allow disambiguation between pixel-wise and patch-based classification
disk_size
replaced byfile:size
(see Best Practices - File Extension)memory_size
underdlm:architecture
moved directly under Item properties asmlm:memory_size
- replaced all hardware/accelerator/runtime definitions into distinct
mlm
fields directly under the STAC Item properties (top-level, not nested) to allow better search support by STAC API. - reorganized
dlm:architecture
nested fields to exist at the top level of properties asmlm:name
,mlm:summary
and so on to provide STAC API search capabilities. - replaced
normalization:mean
, etc. with statistics from STAC 1.1 common metadata - added
pydantic
models for internal schema objects instac_model
package and published to PYPI - specified rel_type to be
derived_from
and specify how model item or collection json should be named - replaced all Enum Tasks names to use hyphens instead of spaces
- replaced
dlm:task
bymlm:tasks
using an array of value instead of a single one, allowing models to represent multiple tasks they support simultaneously or interchangeably depending on context - replace
pre_processing_function
andpost_processing_function
to use similar definitions to the Processing Extension - Expression Object such that more extended definitions of custom processors can be defined. - updated JSON schema to reflect changes of MLM fields
- any
dlm
-prefixed field or property
- Data Object, replaced with Model Input Object that uses the
name
field from the common metadata band object which also recordsdata_type
andnodata
type
- n/a
- Added example model architecture summary text.
- Modified
$id
if the extension schema to refer to the expected location when eventually released (https://schemas.stacspec.org/v1.0.0-beta.3/extensions/dl-model/json-schema/schema.json
). - Replaced
dtype
field bydata_type
to better align with the corresponding field ofraster:bands
. - Replaced
nodata_value
field bynodata
to better align with the corresponding field ofraster:bands
. - Refactored schema to use distinct definitions and references instead of embedding all objects
within
dl-model
properties. - Allow schema to contain other
dlm:
-prefixed elements usingpatternProperties
and explicitly deny otheradditionalProperties
. - Allow
class_name_mapping
to be directly provided as a mapping of index-based properties and class-name values.
- Specifying
class_name_mapping
by array is deprecated. Direct mapping as an object of index to class name should be used. For backward compatibility, mapping as array and using nested objects withindex
andclass_name
properties is still permitted, although overly verbose compared to the direct mapping.
- Field
nodata_value
. - Field
dtype
.
- Fixed references to other STAC extensions to use the official schema links on
https://stac-extensions.github.io/
. - Fixed examples to refer to local files.
- Fixed formatting of tables and descriptions in README.
- Initial release of the extension description and schema.
- n/a
- n/a
- n/a
- n/a