Skip to content

Latest commit

 

History

History
 
 

binary

Binary Parser Plugin

The binary data format parser parses binary protocols into metrics using user-specified configurations.

Configuration

[[inputs.file]]
  files = ["example.bin"]

  ## Data format to consume.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  data_format = "binary"

  ## Do not error-out if none of the filter expressions below matches.
  # allow_no_match = false

  ## Specify the endianness of the data.
  ## Available values are "be" (big-endian), "le" (little-endian) and "host",
  ## where "host" means the same endianness as the machine running Telegraf.
  # endianness = "host"

  ## Interpret input as string containing hex-encoded data.
  # hex_encoding = false

  ## Multiple parsing sections are allowed
  [[inputs.file.binary]]
    ## Optional: Metric (measurement) name to use if not extracted from the data.
    # metric_name = "my_name"

    ## Definition of the message format and the extracted data.
    ## Please note that you need to define all elements of the data in the
    ## correct order with the correct length as the data is parsed in the order
    ## given.
    ## An entry can have the following properties:
    ##  name        --  Name of the element (e.g. field or tag). Can be omitted
    ##                  for special assignments (i.e. time & measurement) or if
    ##                  entry is omitted.
    ##  type        --  Data-type of the entry. Can be "int8/16/32/64", "uint8/16/32/64",
    ##                  "float32/64", "bool" and "string".
    ##                  In case of time, this can be any of "unix" (default), "unix_ms", "unix_us",
    ##                  "unix_ns" or a valid Golang time format.
    ##  bits        --  Length in bits for this entry. If omitted, the length derived from
    ##                  the "type" property will be used. For "time" 64-bit will be used
    ##                  as default.
    ##  assignment  --  Assignment of the gathered data. Can be "measurement", "time",
    ##                  "field" or "tag". If omitted "field" is assumed.
    ##  omit        --  Omit the given data. If true, the data is skipped and not added
    ##                  to the metric. Omitted entries only need a length definition
    ##                  via "bits" or "type".
    ##  terminator  --  Terminator for dynamic-length strings. Only used for "string" type.
    ##                  Valid values are "fixed" (fixed length string given by "bits"),
    ##                  "null" (null-terminated string) or a character sequence specified
    ##                  as HEX values (e.g. "0x0D0A"). Defaults to "fixed" for strings.
    ##  timezone    --  Timezone of "time" entries. Only applies to "time" assignments.
    ##                  Can be "utc", "local" or any valid Golang timezone (e.g. "Europe/Berlin")
    entries = [
      { type = "string", assignment = "measurement", terminator = "null" },
      { name = "address", type = "uint16", assignment = "tag" },
      { name = "value",   type = "float64" },
      { type = "unix", assignment = "time" },
    ]

    ## Optional: Filter evaluated before applying the configuration.
    ## This option can be used to mange multiple configuration specific for
    ## a certain message type. If no filter is given, the configuration is applied.
    # [inputs.file.binary.filter]
    #   ## Filter message by the exact length in bytes (default: N/A).
    #   # length = 0
    #   ## Filter the message by a minimum length in bytes.
    #   ## Messages longer of of equal length will pass.
    #   # length_min = 0
    #   ## List of data parts to match.
    #   ## Only if all selected parts match, the configuration will be
    #   ## applied. The "offset" is the start of the data to match in bits,
    #   ## "bits" is the length in bits and "match" is the value to match
    #   ## against. Non-byte boundaries are supported, data is always right-aligned.
    #   selection = [
    #     { offset = 0, bits = 8, match = "0x1F" },
    #   ]
    #
    #

In this configuration mode, you explicitly specify the field and tags you want to scrape out of your data.

A configuration can contain multiple binary subsections for e.g. the file plugin to process the binary data multiple times. This can be useful (together with filters) to handle different message types.

Please note: The filter section needs to be placed after the entries definitions due to TOML constraints as otherwise the entries will be assigned to the filter section.

General options and remarks

allow_no_match (optional)

By specifying allow_no_match you allow the parser to silently ignore data that does not match any given configuration filter. This can be useful if you only want to collect a subset of the available messages.

endianness (optional)

This specifies the endianness of the data. If not specified, the parser will fallback to the "host" endianness, assuming that the message and Telegraf machine share the same endianness. Alternatively, you can explicitly specify big-endian format ("be") or little-endian format ("le").

hex_encoding (optional)

If true, the input data is interpreted as a string containing hex-encoded data like C0 C7 21 A9. The value is case insensitive and can handle spaces, however prefixes like 0x or x are not allowed.

Non-byte aligned value extraction

In both, filter and entries definitions, values can be extracted at non-byte boundaries. You can for example extract 3-bit starting at bit-offset 8. In those cases, the result will be masked and shifted such that the resulting byte-value is right aligned. In case your 3-bit are 101 the resulting byte value is 0x05.

This is especially important when specifying the match value in the filter section.

Entries definitions

The entry array specifies how to dissect the message into the measurement name, the timestamp, tags and fields.

measurement specification

When setting the assignment to "measurement", the extracted value will be used as the metric name, overriding other specifications. The type setting is assumed to be "string" and can be omitted similar to the name option. See string type handling for details and further options.

time specification

When setting the assignment to "time", the extracted value will be used as the timestamp of the metric. By default the current time will be used for all created metrics.

The type setting here contains the time-format can be set to unix, unix_ms, unix_us, unix_ns, or an accepted Go "reference time". Consult the Go time package for details and additional examples on how to set the time format. If type is omitted the unix format is assumed.

For the unix format and derivatives, the underlying value is assumed to be a 64-bit integer. The bits setting can be used to specify other length settings. All other time-formats assume a fixed-length string value to be extracted. The length of the string is automatically determined using the format setting in type.

The timezone setting allows to convert the extracted time to the given value timezone. By default the time will be interpreted as utc. Other valid values are local, i.e. the local timezone configured for the machine, or valid timezone-specification e.g. Europe/Berlin.

tag specification

When setting the assignment to "tag", the extracted value will be used as a tag. The name setting will be the name of the tag and the type will default to string. When specifying other types, the extracted value will first be interpreted as the given type and then converted to string.

The bits setting can be used to specify the length of the data to extract and is required for fixed-length string types.

field specification

When setting the assignment to "field" or omitting the assignment setting, the extracted value will be used as a field. The name setting is used as the name of the field and the type as type of the field value.

The bits setting can be used to specify the length of the data to extract. By default the length corresponding to type is used. Please see the string and bool specific sections when using those types.

string type handling

Strings are assumed to be fixed-length strings by default. In this case, the bits setting is mandatory to specify the length of the string in bit.

To handle dynamic strings, the terminator setting can be used to specify characters to terminate the string. The two named options, fixed and null will specify fixed-length and null-terminated strings, respectively. Any other setting will be interpreted as hexadecimal sequence of bytes matching the end of the string. The termination-sequence is removed from the result.

bool type handling

By default bool types are assumed to be one bit in length. You can specify any other length by using the bits setting. When interpreting values as booleans, any zero value will be false, while any non-zero value will result in true.

omitting data

Parts of the data can be omitted by setting omit = true. In this case, you only need to specify the length of the chunk to omit by either using the type or bits setting. All other options can be skipped.

Filter definitions

Filters can be used to match the length or the content of the data against a specified reference. See the examples section for details. You can also check multiple parts of the message by specifying multiple section entries for a filter. Each section is then matched separately. All have to match to apply the configuration.

length and length_min options

Using the length option, the filter will check if the data to parse has exactly the given number of bytes. Otherwise, the configuration will not be applied. Similarly, for length_min the data has to have at least the given number of bytes to generate a match.

selection list

Selections can be used with or without length constraints to match the content of the data. Here, the offset and bits properties will specify the start and length of the data to check. Both values are in bit allowing for non-byte aligned value extraction. The extracted data will the be checked against the given match value specified in HEX.

If multiple selection entries are specified all of the selections must match for the configuration to get applied.

Examples

In the following example, we use a binary protocol with three different messages in little-endian format

Message A definition

+--------+------+------+--------+--------+------------+--------------------+--------------------+
| ID     | type | len  | addr   | count  | failure    | value              | timestamp          |
+--------+------+------+--------+--------+------------+--------------------+--------------------+
| 0x0201 | 0x0A | 0x18 | 0x7F01 | 0x2A00 | 0x00000000 | 0x6F1283C0CA210940 | 0x10D4DF6200000000 |
+--------+------+------+--------+--------+------------+--------------------+--------------------+

Message B definition

+--------+------+------+------------+
| ID     | type | len  | value      |
+--------+------+------+------------+
| 0x0201 | 0x0B | 0x04 | 0xDEADC0DE |
+--------+------+------+------------+

Message C definition

+--------+------+------+------------+------------+--------------------+
| ID     | type | len  | value x    | value y    | timestamp          |
+--------+------+------+------------+------------+--------------------+
| 0x0201 | 0x0C | 0x10 | 0x4DF82D40 | 0x5F305C08 | 0x10D4DF6200000000 |
+--------+------+------+------------+------------+--------------------+

All messages consists of a 4-byte header containing the message type in the 3rd byte and a message specific body. To parse those messages you can use the following configuration

[[inputs.file]]
  files = ["messageA.bin", "messageB.bin", "messageC.bin"]
  data_format = "binary"
  endianness = "le"

  [[inputs.file.binary]]
    metric_name = "messageA"

    entries = [
      { bits = 32, omit = true },
      { name = "address", type = "uint16", assignment = "tag" },
      { name = "count",   type = "int16" },
      { name = "failure", type = "bool", bits = 32, assignment = "tag" },
      { name = "value",   type = "float64" },
      { type = "unix",    assignment = "time" },
    ]

    [inputs.file.binary.filter]
      selection = [{ offset = 16, bits = 8, match = "0x0A" }]

  [[inputs.file.binary]]
    metric_name = "messageB"

    entries = [
      { bits = 32, omit = true },
      { name = "value",   type = "uint32" },
    ]

    [inputs.file.binary.filter]
      selection = [{ offset = 16, bits = 8, match = "0x0B" }]

  [[inputs.file.binary]]
    metric_name = "messageC"

    entries = [
      { bits = 32, omit = true },
      { name = "x",   type = "float32" },
      { name = "y",   type = "float32" },
      { type = "unix",    assignment = "time" },
    ]

    [inputs.file.binary.filter]
      selection = [{ offset = 16, bits = 8, match = "0x0C" }]

The above configuration has one [[inputs.file.binary]] section per message type and uses a filter in each of those sections to apply the correct configuration by comparing the 3rd byte (containing the message type). This will lead to the following output

metricA,address=383,failure=false count=42i,value=3.1415 1658835984000000000
metricB value=3737169374i 1658847037000000000
metricC x=2.718280076980591,y=0.0000000000000000000000000000000006626070178575745 1658835984000000000

where metricB uses the parsing time as timestamp due to missing information in the data. The other two metrics use the timestamp derived from the data.