You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.
When creating ONNX models for classifiers, using NimbusML, and then applying them either with OnnxRunner (aka OnnxTransformer from ML.NET) or directly using Onnx runtime (aka ORT) python's API, then we get unexpected values in the Label column (i.e. the column that was used as Label for the classifier).
The behavior is somewhat different if the input DataFrame's Label column is category, object (string) or float (as I show in my repro below, but I guess similar problems arise for different types). There are two main issues: Issue 1. When running with ORT, the output Label column from the ONNX model, is 'keys' and not 'values'... i.e. we get integers starting from 0, instead of whatever original values there where in Label. This happens regardless of the input Label column type. Issue 2. When running with OnnxRunner, the Label column has weird values. If the input Label column was object (string), then, for all rows, the value in that column is "4294967295"... if the input was category or float, then the value is "0".
Repro
NOTE: the data_frame_tool module used is the one currently in the aml branch (link)
importosimporttempfilefromdata_frame_toolimportDataFrameToolasDFTfromnimbusml.datasetsimportget_datasetfromnimbusml.linear_modelimportFastLinearClassifierfromnimbusml.preprocessingimportOnnxRunnerfromnimbusml.preprocessingimportFromKey, ToKeyfromnimbusmlimportPipelinedefget_tmp_file(suffix=None):
fd, file_name=tempfile.mkstemp(suffix=suffix)
fl=os.fdopen(fd, 'w')
fl.close()
returnfile_name# Change the label column to see different behaviors:LABEL_COLUMN_NAME="Species"# Type: object (string)#LABEL_COLUMN_NAME = "Setosa" # Type: float#LABEL_COLUMN_NAME = "Label" # Type: categoryiris_df=get_dataset("iris").as_df()
print("\n\nORIGINAL DATASET - using", LABEL_COLUMN_NAME, " as Label column")
print(iris_df)
print(iris_df.dtypes)
predictor=FastLinearClassifier(feature=["Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width"], label=LABEL_COLUMN_NAME)
predictor.fit(iris_df)
print("\n\nML.NET RESULT")
original_result=predictor.predict(iris_df) # Notice this outputs only "PredictedLabel" so the user can't get the Label column after applying the predictor. QUESTION: Is there a way for the user to get that column after the predictor?print(predictor.model_)
print(original_result)
print(original_result.dtypes)
# onnxpath = get_tmp_file()onnxpath=get_tmp_file()
print()
print("Onnx model path:", onnxpath)
predictor.export_to_onnx(onnxpath, 'com.microsoft.ml')
print("\n\nORT RESULT")
df_tool=DFT(onnxpath)
result_ort=df_tool.execute(iris_df, [])
print(result_ort)
print("\nColumn:", LABEL_COLUMN_NAME, " - ORT RESULT") # Issue 1: It prints the "keys", instead of values for the Label columnprint(result_ort[LABEL_COLUMN_NAME+".output"])
print("\n\nONNX RUNNER RESULT")
onnxrunner=OnnxRunner(model_file=onnxpath)
result_onnx=onnxrunner.fit_transform(iris_df)
print(result_onnx)
print(result_onnx.dtypes)
print("\nColumn:", LABEL_COLUMN_NAME, " - ONNX RUNNER RESULT") # Issue 2: It prints "4294967295" when label column is "Species" (string), "0" when label column is "Label" (category) and "Setosa" (float), for every rowprint(result_onnx[LABEL_COLUMN_NAME])
So far it seems to me that both issues are related to the fact that a "Transforms.OptionalColumnCreator" is added to the input Label column, by NimbusML and also how NimbusML works with KeyDataViewTypes.
For Issue 1
When input Label column is not category
When it is not of type category, then NimbusML adds a "Transforms.LabelColumnKeyBooleanConverter" to the beginning of the pipeline, which then adds a ValueToKeyTransformer that maps values to keys. NimbusML never adds a KeyToValueTransformer, and that's why we only get keys. In dotnet/machinelearning#4841 there were only 2 cases considered (described there) where ML.NET would add the KeyToValueTransformers; this case isn't one of them, so perhaps consider adding support for this case as well.
When input Label column is category
When the input Label column is of type category, then NimbusML converts it automatically to KeyDataViewType, without actually adding a ValueToKeyTransformer to the pipeline.
In dotnet/machinelearning#4841 2 cases were considered, one of which is "pass through" categorical columns, which addresses this issue. So in here, the case of having a Label column of type category, which is untouched by the inference pipeline, is supposed to be a "pass through" column. Problem is that since the OptionalColumn transform is added to the pipeline, then the Label column stop being passthrough, and the KeyToValueTransformer isn't added to the pipeline. So also changes to this need to be taken into account.
For Issue 2
The OptionalColumnTransform is saved to ONNX as an initializer, which gives default values as their input. So this might be related that the same value is given for each row... 0 for float and categorical (which, in this case, was int behind the scenes)... and "4294967295" for strings (because, it initializes in whatever string, which then isn't found by the labelEncoder, outputting int64 -1, which is then casted to uint32 as 4294967295).
Why this doesn't work in OnnxRunner, but it works in ORT is still unclear to me. It might, or might not, be an issue in OnnxTransformer which might be, perhaps, not handling initializing nodes correctly.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When creating ONNX models for classifiers, using NimbusML, and then applying them either with
OnnxRunner
(akaOnnxTransformer
from ML.NET) or directly using Onnx runtime (aka ORT) python's API, then we get unexpected values in the Label column (i.e. the column that was used as Label for the classifier).The behavior is somewhat different if the input DataFrame's Label column is
category
,object (string)
orfloat
(as I show in my repro below, but I guess similar problems arise for different types). There are two main issues:Issue 1. When running with ORT, the output Label column from the ONNX model, is 'keys' and not 'values'... i.e. we get integers starting from 0, instead of whatever original values there where in Label. This happens regardless of the input Label column type.
Issue 2. When running with OnnxRunner, the Label column has weird values. If the input Label column was object (string), then, for all rows, the value in that column is "4294967295"... if the input was category or float, then the value is "0".
Repro
NOTE: the
data_frame_tool
module used is the one currently in the aml branch (link)Output (for
LABEL_COLUMN_NAME="Species"
)The text was updated successfully, but these errors were encountered: