Telemetry

Telemetry defines how telemetry (metrics, logs and traces) is generated for workloads within a mesh.

The hierarchy of Telemetry configuration is as follows:

  1. Workload-specific configuration
  2. Namespace-specific configuration
  3. Root namespace configuration

For mesh level configuration, put a resource in the root configuration namespace for your Istio installation without a workload selector.

For any namespace, including the root configuration namespace, it is only valid to have a single workload selector-less Telemetry resource.

For resources with a workload selector, it is only valid to have one resource selecting any given workload.

Gateways and waypoints are targeted for telemetry configuration using the targetRefs field.

Examples:

Enable random sampling for 10% of traffic:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  # no selector specified, applies to all workloads
  tracing:
  - randomSamplingPercentage: 10.00

Disable trace reporting for the foo workload (note: tracing context will still be propagated):

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: foo-tracing
  namespace: bar
spec:
  selector:
    matchLabels:
      service.istio.io/canonical-name: foo
  tracing:
  - disableSpanReporting: true

Select a named tracing provider for trace reporting:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: foo-tracing-alternate
  namespace: baz
spec:
  selector:
    matchLabels:
      service.istio.io/canonical-name: foo
  tracing:
  - providers:
    - name: "zipkin-alternate"
    randomSamplingPercentage: 10.00

Tailor the “zipkin” provider to sample traces from client workloads only:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  # no selector specified, applies to all workloads
  tracing:
  - match: CLIENT
  - providers:
    - name: "zipkin"

Add a custom tag from a literal value:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  # no selector specified, applies to all workloads
  tracing:
  - randomSamplingPercentage: 10.00
    customTags:
      my_new_foo_tag:
        literal:
          value: "foo"

Disable server-side metrics for Prometheus for an entire mesh:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  # no selector specified, applies to all workloads
  metrics:
  - providers:
    - name: prometheus
    overrides:
    - match:
        metric: ALL_METRICS
        mode: SERVER
      disabled: true

Add dimensions to all Prometheus metrics for the foo namespace:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: namespace-metrics
  namespace: foo
spec:
  # no selector specified, applies to all workloads in the namespace
  metrics:
  - providers:
    - name: prometheus
    overrides:
    # match clause left off matches all istio metrics, client and server
    - tagOverrides:
        request_method:
          value: "request.method"
        request_host:
          value: "request.host"

Remove the response_code dimension on some Prometheus metrics for the bar.foo workload:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: remove-response-code
  namespace: foo
spec:
  selector:
    matchLabels:
      service.istio.io/canonical-name: bar
  metrics:
  - providers:
    - name: prometheus
    overrides:
    - match:
        metric: REQUEST_COUNT
      tagOverrides:
        response_code:
          operation: REMOVE
    - match:
        metric: REQUEST_DURATION
      tagOverrides:
        response_code:
          operation: REMOVE
    - match:
        metric: REQUEST_SIZE
      tagOverrides:
        response_code:
          operation: REMOVE
    - match:
        metric: RESPONSE_SIZE
      tagOverrides:
        response_code:
          operation: REMOVE

Enable access logging for the entire mesh:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  # no selector specified, applies to all workloads
  accessLogging:
  - providers:
    - name: envoy
    # By default, this turns on access logging (no need to set `disabled: false`).
    # Unspecified `disabled` will be treated as `disabled: false`, except in
    # cases where a parent configuration has marked as `disabled: true`. In
    # those cases, `disabled: false` must be set explicitly to override.

Disable access logging for the foo namespace:

apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: namespace-no-log
  namespace: foo
spec:
  # no selector specified, applies to all workloads in the namespace
  accessLogging:
  - disabled: true

Telemetry

FieldDescription

The selector decides where to apply the policy. If not set, the policy will be applied to all workloads in the same namespace as the policy.

At most one of selector or targetRefs can be set for a given policy.

The targetRefs specifies a list of resources the policy should be applied to. The targeted resources specified will determine which workloads the policy applies to.

Currently, the following resource attachment types are supported:

  • kind: Gateway with group: gateway.networking.k8s.io in the same namespace.
  • kind: Service with group: "" or group: "core" in the same namespace. This type is only supported for waypoints.

If not set, the policy is applied as defined by the selector. At most one of the selector and targetRefs can be set.

NOTE: If you are using the targetRefs field in a multi-revision environment with Istio versions prior to 1.22, it is highly recommended that you pin the policy to a revision running 1.22+ via the istio.io/rev label. This is to prevent proxies connected to older control planes (that don’t know about the targetRefs field) from misinterpreting the policy as namespace-wide during the upgrade process.

NOTE: Waypoint proxies are required to use this field for policies to apply; selector policies will be ignored.

Tracing configures the tracing behavior for all selected workloads.

Metrics configures the metrics behavior for all selected workloads.

Access logging configures the access logging behavior for all selected workloads.

Tracing

Tracing configures tracing behavior for workloads within a mesh. It can be used to enable/disable tracing, as well as to set sampling rates and custom tag extraction.

Tracing configuration support overrides of the fields providers, random_sampling_percentage, disable_span_reporting, and custom_tags at each level in the configuration hierarchy, with missing values filled in from parent resources. However, when specified, custom_tags will fully replace any values provided by parent configuration.

FieldDescription

Allows tailoring of behavior to specific conditions.

Name of provider(s) to use for span reporting. If a provider is not specified, the default tracing provider will be used. NOTE: At the moment, only a single provider can be specified in a given Tracing rule.

Controls the rate at which traffic will be selected for tracing if no prior sampling decision has been made. If a prior sampling decision has been made, that decision will be respected. However, if no sampling decision has been made (example: no x-b3-sampled tracing header was present in the requests), the traffic will be selected for telemetry generation at the percentage specified.

Defaults to 0%. Valid values [0.00-100.00]. Can be specified in 0.01% increments.

Controls span reporting. If set to true, no spans will be reported for impacted workloads. This does NOT impact context propagation or trace sampling behavior.

map<string, CustomTag>

Configures additional custom tags to the generated trace spans.

Determines whether or not trace spans generated by Envoy will include Istio specific tags. By default Istio specific tags are included in the trace spans.

TracingSelector

TracingSelector provides a coarse-grained ability to configure tracing behavior based on certain traffic metadata (such as traffic direction).

FieldDescription

This determines whether or not to apply the tracing configuration based on the direction of traffic relative to the proxied workload.

CustomTag

CustomTag defines a tag to be added to a trace span that is based on an operator-supplied value. This value can either be a hard-coded value, a value taken from an environment variable known to the sidecar proxy, or from a request header.

NOTE: when specified, custom_tags will fully replace any values provided by parent configuration.

FieldDescription

Literal adds the same, hard-coded value to each span.

Environment adds the value of an environment variable to each span.

RequestHeader adds the value of an header from the request to each span.

Literal

FieldDescription
string
Required

The tag value to use.

Environment

FieldDescription
string
Required

Name of the environment variable from which to extract the tag value.

If the environment variable is not found, this value will be used instead.

RequestHeader

FieldDescription
string
Required

Name of the header from which to extract the tag value.

If the header is not found, this value will be used instead.

ProviderRef

Used to bind Telemetry configuration to specific providers for targeted customization.

FieldDescription
string
Required

Name of Telemetry provider in MeshConfig.

Metrics

Metrics defines the workload-level overrides for metrics generation behavior within a mesh. It can be used to enable/disable metrics generation, as well as to customize the dimensions of the generated metrics.

FieldDescription

Name of providers to which this configuration should apply. If a provider is not specified, the default metrics provider will be used.

Ordered list of overrides to metrics generation behavior.

Specified overrides will be applied in order. They will be applied on top of inherited overrides from other resources in the hierarchy in the following order:

  1. Mesh-scoped overrides
  2. Namespace-scoped overrides
  3. Workload-scoped overrides

Because overrides are applied in order, users are advised to order their overrides from least specific to most specific matches. That is, it is a best practice to list any universal overrides first, with tailored overrides following them.

Reporting interval allows configuration of the time between calls out to for metrics reporting. This currently only supports TCP metrics but we may use this for long duration HTTP streams in the future. The default duration is 5s.

MetricSelector

Provides a mechanism for matching metrics for the application of override behaviors.

FieldDescription

One of the well-known Istio Standard Metrics.

string (oneof)

Allows free-form specification of a metric. No validation of custom metrics is provided.

Controls which mode of metrics generation is selected: CLIENT, SERVER, or CLIENT_AND_SERVER.

IstioMetric

Curated list of known metric types that is supported by Istio metric providers. See also: https://istio.io/latest/docs/reference/config/metrics/#metrics

NameDescription
ALL_METRICS

Use of this enum indicates that the override should apply to all Istio default metrics.

REQUEST_COUNT

Counter of requests to/from an application, generated for HTTP, HTTP/2, and GRPC traffic.

The Prometheus provider exports this metric as: istio_requests_total.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/request_count (SERVER mode)
  • istio.io/service/client/request_count (CLIENT mode)
REQUEST_DURATION

Histogram of request durations, generated for HTTP, HTTP/2, and GRPC traffic.

The Prometheus provider exports this metric as: istio_request_duration_milliseconds.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/response_latencies (SERVER mode)
  • istio.io/service/client/roundtrip_latencies (CLIENT mode)
REQUEST_SIZE

Histogram of request body sizes, generated for HTTP, HTTP/2, and GRPC traffic.

The Prometheus provider exports this metric as: istio_request_bytes.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/request_bytes (SERVER mode)
  • istio.io/service/client/request_bytes (CLIENT mode)
RESPONSE_SIZE

Histogram of response body sizes, generated for HTTP, HTTP/2, and GRPC traffic.

The Prometheus provider exports this metric as: istio_response_bytes.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/response_bytes (SERVER mode)
  • istio.io/service/client/response_bytes (CLIENT mode)
TCP_OPENED_CONNECTIONS

Counter of TCP connections opened over lifetime of workload.

The Prometheus provider exports this metric as: istio_tcp_connections_opened_total.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/connection_open_count (SERVER mode)
  • istio.io/service/client/connection_open_count (CLIENT mode)
TCP_CLOSED_CONNECTIONS

Counter of TCP connections closed over lifetime of workload.

The Prometheus provider exports this metric as: istio_tcp_connections_closed_total.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/connection_close_count (SERVER mode)
  • istio.io/service/client/connection_close_count (CLIENT mode)
TCP_SENT_BYTES

Counter of bytes sent during a response over a TCP connection.

The Prometheus provider exports this metric as: istio_tcp_sent_bytes_total.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/sent_bytes_count (SERVER mode)
  • istio.io/service/client/sent_bytes_count (CLIENT mode)
TCP_RECEIVED_BYTES

Counter of bytes received during a request over a TCP connection.

The Prometheus provider exports this metric as: istio_tcp_received_bytes_total.

The Stackdriver provider exports this metric as:

  • istio.io/service/server/received_bytes_count (SERVER mode)
  • istio.io/service/client/received_bytes_count (CLIENT mode)
GRPC_REQUEST_MESSAGES

Counter incremented for every gRPC messages sent from a client.

The Prometheus provider exports this metric as: istio_request_messages_total

GRPC_RESPONSE_MESSAGES

Counter incremented for every gRPC messages sent from a server.

The Prometheus provider exports this metric as: istio_response_messages_total

MetricsOverrides

MetricsOverrides defines custom metric generation behavior for an individual metric or the set of all standard metrics.

FieldDescription

Match allows providing the scope of the override. It can be used to select individual metrics, as well as the workload modes (server, client, or both) in which the metrics will be generated.

If match is not specified, the overrides will apply to all metrics for both modes of operation (client and server).

Must explicitly set this to true to turn off metrics reporting for the listed metrics. If disabled has been set to true in a parent configuration, it must explicitly be set to false to turn metrics reporting on in the workloads selected by the Telemetry resource.

Collection of tag names and tag expressions to override in the selected metric(s). The key in the map is the name of the tag. The value in the map is the operation to perform on the the tag. WARNING: some providers may not support adding/removing tags. See also: https://istio.io/latest/docs/reference/config/metrics/#labels

TagOverride

TagOverride specifies an operation to perform on a metric dimension (also known as a label). Tags may be added, removed, or have their default values overridden.

FieldDescription

Operation controls whether or not to update/add a tag, or to remove it.

string

Value is only considered if the operation is UPSERT. Values are CEL expressions over attributes. Examples include: string(destination.port) and request.host. Istio exposes all standard Envoy attributes. Additionally, Istio exposes node metadata as attributes. More information is provided in the customization docs.

Operation

NameDescription
UPSERT

Insert or Update the tag with the provided value expression. The value field MUST be specified if UPSERT is used as the operation.

REMOVE

Specifies that the tag should not be included in the metric when generated.

AccessLogging

Access logging defines the workload-level overrides for access log generation. It can be used to select provider or enable/disable access log generation for a workload.

FieldDescription

Allows tailoring of logging behavior to specific conditions.

Name of providers to which this configuration should apply. If a provider is not specified, the default logging provider will be used.

Controls logging. If set to true, no access logs will be generated for impacted workloads (for the specified providers). NOTE: currently default behavior will be controlled by the provider(s) selected above. Customization controls will be added to this API in future releases.

If specified, this filter will be used to select specific requests/connections for logging.

LogSelector

LogSelector provides a coarse-grained ability to configure logging behavior based on certain traffic metadata (such as traffic direction). LogSelector applies to traffic metadata which is not represented in the attribute set currently supported by filters. It allows control planes to limit the configuration sent to individual workloads. Finer-grained logging behavior can be further configured via filter.

FieldDescription

This determines whether or not to apply the access logging configuration based on the direction of traffic relative to the proxied workload.

Filter

Allows specification of an access log filter.

FieldDescription

CEL expression for selecting when requests/connections should be logged.

Examples:

  • response.code >= 400
  • connection.mtls && request.url_path.contains('v1beta3')
  • !has(request.useragent) || !(request.useragent.startsWith("Amazon-Route53-Health-Check-Service"))

WorkloadMode

WorkloadMode allows selection of the role of the underlying workload in network traffic. A workload is considered as acting as a SERVER if it is the destination of the traffic (that is, traffic direction, from the perspective of the workload is inbound). If the workload is the source of the network traffic, it is considered to be in CLIENT mode (traffic is outbound from the workload).

NameDescription
CLIENT_AND_SERVER

Selects for scenarios when the workload is either the source or destination of the network traffic.

CLIENT

Selects for scenarios when the workload is the source of the network traffic.

SERVER

Selects for scenarios when the workload is the destination of the network traffic.

Was this information useful?
Do you have any suggestions for improvement?

Thanks for your feedback!