Knative Serving Metrics¶

Administrators can monitor Serving control plane based on the metrics exposed by each Serving component.

Note

These metrics may change as we flush out our migration from OpenCensus to OpenTelemetry

Workload Metrics¶

Each workload pod has a sidecar that enforces container concurrency and provides metrics to the autoscaler. The following OTel metrics provide you insights into queued requests and user-container behavior.

The following attributes are included with workload metrics

Name	Type	Description
`container.name`	string	Name of the container emit metrics. This is hardcoded to `queue-proxy`.
`k8s.namespace.name`	string	Namespace of the workload
`k8s.pod.name`	string	Name of the workload pod
`service.version`	string	Version of the sidecar emitting metrics
`service.name`	string	Either the name of the Knative Service, Configuration or Revision.
`service.instance.id`	string	Identifier of the instance which is the same as the `k8s.pod.name`
`kn.service.name`	string	Knative Service name associated with this Revision
`kn.configuration.name`	string	Knative Configuration name associated with this Revision
`kn.revision.name`	string	The name of the Revision

`kn.serving.queue.depth`¶

Instrument Type: Int64Gauge

Unit (UCUM): {request}

Description: Number of current items in the queue proxy queue

`kn.serving.invocation.duration`¶

Instrument Type: Float64Histogram

Unit (UCUM): s

Description: The duration of the task execution

The following attributes are included with the metric

Name	Type	Description
`http.response.status_code`	int	Status code of the duration

HTTP metrics¶

Since the sidecar receives and forwards requests to the user container it has both HTTP server and client metrics.

HTTP Server Metrics¶

Knative implements the semantic conventions for HTTP Servers using the OpenTelemetry otel-go/otelhttp package.

Please refer to the OpenTelemetry docs for details about the HTTP Server metrics it exports.

HTTP Client Metrics¶

Knative implements the semantic conventions for HTTP Clients using the OpenTelemetry otel-go/otelhttp package.

Please refer to the OpenTelemetry docs for details about the HTTP Client metrics it exports.

Activator¶

The following metrics can help you to understand how an application responds when traffic passes through the activator. For example, when scaling from zero, high request latency might mean that requests are taking too much time to be fulfilled.

`kn.revision.request.concurrency`¶

Instrument Type: Float64Gauge

Unit (UCUM): {request}

Description: Concurrent requests that are routed to the Activator

The following attributes are included with the metric

Name	Type	Description
`k8s.namespace.name`	string	Namespace of the resource
`kn.service.name`	string	Knative Service name associated with this Revision
`kn.configuration.name`	string	Knative Configuration name associated with this Revision
`kn.revision.name`	string	The name of the Revision

`kn.activator.stats.conn.reachable`¶

Instrument Type: Int64Gauge

Unit (UCUM): {reachable}

Description: Whether a peer is reachable from the activator (1 = reachable, 0 = not reachable)

The following attributes are included with the metric

Name	Type	Description
`peer`	string	The peer service the activator is connecting to (e.g., `autoscaler`)

This metric helps operators identify connectivity issues between the activator and its peer components. The metric is recorded:

When a connection is established (value = 1)
When a connection is lost (value = 0)

`kn.activator.stats.conn.errors`¶

Instrument Type: Int64Counter

Unit (UCUM): {error}

Description: Number of connection errors from the activator

The following attributes are included with the metric

Name	Type	Description
`peer`	string	The peer service the activator is connecting to (e.g., `autoscaler`)

This counter increments each time the activator fails to communicate with a peer. It complements the kn.activator.stats.conn.reachable gauge by providing a cumulative count of errors, which is useful for:

Detecting flaky connections that might be missed by point-in-time gauge sampling
Creating rate-based alerts (e.g., alert if error rate exceeds threshold over 5 minutes)
Tracking connection stability trends over time

HTTP metrics¶

Since the activator receives and forwards requests to the user workload it has both HTTP server and client metrics.

HTTP Server Metrics¶

Knative implements the semantic conventions for HTTP Servers using the OpenTelemetry otel-go/otelhttp package.

Please refer to the OpenTelemetry docs for details about the HTTP Server metrics it exports.

The following attributes are included in the server metrics

Name	Type	Description
`kn.service.name`	string	Knative Service name associated with this Revision
`kn.configuration.name`	string	Knative Configuration name associated with this Revision
`kn.revision.name`	string	The name of the Revision
`k8s.namespace.name`	string	Namespace of the resource

HTTP Client Metrics¶

Knative implements the semantic conventions for HTTP Clients using the OpenTelemetry otel-go/otelhttp package.

Please refer to the OpenTelemetry docs for details about the HTTP Client metrics it exports.

Autoscaler¶

Autoscaler component exposes a number of metrics related to its decisions per revision. For example, at any given time, you can monitor the desired pods the Autoscaler wants to allocate for a Service, the average number of requests per second during the stable window, or whether autoscaler is in panic mode (KPA).

The following attributes are included with the autoscaling metrics below

Name	Type	Description
`k8s.namespace.name`	string	Namespace of the Revision
`kn.service.name`	string	Knative Service name associated with this Revision
`kn.configuration.name`	string	Knative Configuration name associated with this Revision
`kn.revision.name`	string	The name of the Revision

`kn.autoscaler.scrape.duration`¶

Instrument Type: Float64Histogram

Unit (UCUM): s

Description: The duration of scraping the revision

`kn.revision.pods.desired`¶

Instrument Type: Int64Gauge

Unit (UCUM): {pod}

Description: Number of pods the autoscaler wants to allocate

`kn.revision.capacity.excess`¶

Instrument Type: Float64Gauge

Unit (UCUM): {concurrency}

Description: Excess burst capacity observed over the stable window

`kn.revision.concurrency.stable`¶

Instrument Type: Float64Gauge

Unit (UCUM): {concurrency}

Description: Average of request count per observed pod over the stable window

`kn.revision.concurrency.panic`¶

Instrument Type: Float64Gauge

Unit (UCUM): {concurrency}

Description: Average of request count per observed pod over the panic window

`kn.revision.concurrency.target`¶

Instrument Type: Float64Gauge

Unit (UCUM): {concurrency}

Description: The desired concurrent requests for each pod

`kn.revision.rps.stable`¶

Instrument Type: Float64Gauge

Unit (UCUM): {request}/s

Description: Average of requests-per-second per observed pod over the stable window

`kn.revision.rps.panic`¶

Instrument Type: Float64Gauge

Unit (UCUM): {request}/s

Description: Average of requests-per-second per observed pod over the panic window

`kn.revision.pods.requested`¶

Instrument Type: Int64Gauge

Unit (UCUM): {pod}

Description: Number of pods autoscaler requested from Kubernetes

`kn.revision.pods.count`¶

Instrument Type: Int64Gauge

Unit (UCUM): {pod}

Description: Number of pods that are allocated currently

`kn.revision.pods.not_ready.count`¶

Instrument Type: Int64Gauge

Unit (UCUM): {pod}

Description: Number of pods that are not ready currently

`kn.revision.pods.pending.count`¶

Instrument Type: Int64Gauge

Unit (UCUM): {pod}

Description: Number of pods that are pending currently

`kn.revision.pods.terminating.count`¶

Instrument Type: Int64Gauge

Unit (UCUM): {pod}

Description: Number of pods that are terminating currently

Webhook Metrics¶

Webhook metrics report useful info about operations. For example, if a large number of operations fail, this could indicate an issue with a user-created resource.

`http.server.request.duration`¶

Knative implements the semantic conventions for HTTP Servers using the OpenTelemetry otel-go/otelhttp package.

Please refer to the OpenTelemetry docs for details about the HTTP Server metrics it exports.

The following attributes are included with the metric

Name	Type	Description	Examples
`kn.webhook.type`	string	Specifies the type of webhook invoked	`admission`, `defaulting`, `validation`, `conversion`
`kn.webhook.resource.group`	string	Specifies the resource Kubernetes group name
`kn.webhook.resource.version`	string	Specifies the resource Kubernetes group version
`kn.webhook.resource.kind`	string	Specifies the resource Kubernetes group kind
`kn.webhook.subresource`	string	Specifies the subresource	"" (empty), `status`, `scale`
`kn.webhook.operation.type`	string	Specifies the operation that invoked the webhook	`CREATE`, `UPDATE`, `DELETE`
`kn.webhook.operation.status`	string	Specifies whether the operation was successful	`success`, `failed`

`kn.webhook.handler.duration`¶

Instrument Type: Histogram

Unit ([UCUM): s

Description: The duration of task execution.

The following attributes are included with the metric

Name	Type	Description	Examples
`kn.webhook.type`	string	Specifies the type of webhook invoked	`admission`, `defaulting`, `validation`, `conversion`
`kn.webhook.resource.group`	string	Specifies the resource Kubernetes group name
`kn.webhook.resource.version`	string	Specifies the resource Kubernetes group version
`kn.webhook.resource.kind`	string	Specifies the resource Kubernetes group kind
`kn.webhook.subresource`	string	Specifies the subresource	"" (empty), `status`, `scale`
`kn.webhook.operation.type`	string	Specifies the operation that invoked the webhook	`CREATE`, `UPDATE`, `DELETE`
`kn.webhook.operation.status`	string	Specifies whether the operation was successful	`success`, `failed`

Workqueue Metrics¶

Knative controllers expose client-go workqueue metrics

The following attributes are included with the metrics below

Name	Type	Description
`name`	string	Name of the work queue

`kn.workqueue.depth`¶

Instrument Type: Int64UpDownCounter

Unit (UCUM): {item}

Description: Number of current items in the queue

`kn.workqueue.adds`¶

Instrument Type: Int64Counter

Unit (UCUM): {item}

Description: Number of items added to the queue

`kn.workqueue.queue.duration`¶

Instrument Type:

Unit (UCUM): s

Description: How long an item stays in workqueue

`kn.workqueue.process.duration`¶

Instrument Type: Float64Histogram

Unit (UCUM): s

Description: How long in seconds processing an item from workqueue takes

`kn.workqueue.unfinished_work`¶

Instrument Type: Float64Gauge

Unit (UCUM): s

Description: How many seconds of work the reconciler has done that is in progress and hasn't been observed by duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.

`kn.workqueue.longest_running_processor`¶

Instrument Type: Float64Gauge

Unit (UCUM): s

Description: How long the longest worker thread has been running

`kn.workqueue.retries`¶

Instrument Type: Int64Counter

Unit (UCUM): {item}

Description: Number of items re-added to the queue

Go Runtime¶

Knative implements the semantic conventions for Go runtime metrics using the OpenTelemetry otel-go/instrumentation/runtime package.

Please refer to the OpenTelemetry docs for details about the go runtime metrics it exports.

Knative Serving Metrics¶

Workload Metrics¶

kn.serving.queue.depth¶

kn.serving.invocation.duration¶

HTTP metrics¶

HTTP Server Metrics¶

HTTP Client Metrics¶

Activator¶

kn.revision.request.concurrency¶

kn.activator.stats.conn.reachable¶

kn.activator.stats.conn.errors¶

HTTP metrics¶

HTTP Server Metrics¶

HTTP Client Metrics¶

Autoscaler¶

kn.autoscaler.scrape.duration¶

kn.revision.pods.desired¶

kn.revision.capacity.excess¶

kn.revision.concurrency.stable¶

kn.revision.concurrency.panic¶

kn.revision.concurrency.target¶

kn.revision.rps.stable¶

kn.revision.rps.panic¶

kn.revision.pods.requested¶

kn.revision.pods.count¶

kn.revision.pods.not_ready.count¶

kn.revision.pods.pending.count¶

kn.revision.pods.terminating.count¶

Webhook Metrics¶

http.server.request.duration¶

kn.webhook.handler.duration¶

Workqueue Metrics¶

kn.workqueue.depth¶

kn.workqueue.adds¶

kn.workqueue.queue.duration¶

kn.workqueue.process.duration¶

kn.workqueue.unfinished_work¶

kn.workqueue.longest_running_processor¶

kn.workqueue.retries¶

Go Runtime¶

`kn.serving.queue.depth`¶

`kn.serving.invocation.duration`¶

`kn.revision.request.concurrency`¶

`kn.activator.stats.conn.reachable`¶

`kn.activator.stats.conn.errors`¶

`kn.autoscaler.scrape.duration`¶

`kn.revision.pods.desired`¶

`kn.revision.capacity.excess`¶

`kn.revision.concurrency.stable`¶

`kn.revision.concurrency.panic`¶

`kn.revision.concurrency.target`¶

`kn.revision.rps.stable`¶

`kn.revision.rps.panic`¶

`kn.revision.pods.requested`¶

`kn.revision.pods.count`¶

`kn.revision.pods.not_ready.count`¶

`kn.revision.pods.pending.count`¶

`kn.revision.pods.terminating.count`¶

`http.server.request.duration`¶

`kn.webhook.handler.duration`¶

`kn.workqueue.depth`¶

`kn.workqueue.adds`¶

`kn.workqueue.queue.duration`¶

`kn.workqueue.process.duration`¶

`kn.workqueue.unfinished_work`¶

`kn.workqueue.longest_running_processor`¶

`kn.workqueue.retries`¶