Traffic Management Problems
Requests are rejected by Envoy
Requests may be rejected for various reasons. The best way to understand why requests are being rejected is by inspecting Envoy’s access logs. By default, access logs are output to the standard output of the container. Run the following command to see the log:
In the default access log format, Envoy response flags and Mixer policy status are located after the response code,
if you are using a custom log format, make sure to include %RESPONSE_FLAGS%
and %DYNAMIC_METADATA(istio.mixer:status)%
.
Refer to the Envoy response flags2 for details of response flags.
Common response flags are:
NR
: No route configured, check yourDestinationRule
orVirtualService
.UO
: Upstream overflow with circuit breaking, check your circuit breaker configuration inDestinationRule
.UF
: Failed to connect to upstream, if you’re using Istio authentication, check for a mutual TLS configuration conflict.
A request is rejected by Mixer if the response flag is UAEX
and the Mixer policy status is not -
.
Common Mixer policy statuses are:
UNAVAILABLE
: Envoy cannot connect to Mixer and the policy is configured to fail close.UNAUTHENTICATED
: The request is rejected by Mixer authentication.PERMISSION_DENIED
: The request is rejected by Mixer authorization.RESOURCE_EXHAUSTED
: The request is rejected by Mixer quota.INTERNAL
: The request is rejected due to Mixer internal error.
Route rules don’t seem to affect traffic flow
With the current Envoy sidecar implementation, up to 100 requests may be required for weighted version distribution to be observed.
If route rules are working perfectly for the Bookinfo3 sample, but similar version routing rules have no effect on your own application, it may be that your Kubernetes services need to be changed slightly. Kubernetes services must adhere to certain restrictions in order to take advantage of Istio’s L7 routing features. Refer to the Requirements for Pods and Services4 for details.
Another potential issue is that the route rules may simply be slow to take effect. The Istio implementation on Kubernetes utilizes an eventually consistent algorithm to ensure all Envoy sidecars have the correct configuration including all route rules. A configuration change will take some time to propagate to all the sidecars. With large deployments the propagation will take longer and there may be a lag time on the order of seconds.
503 errors after setting destination rule
If requests to a service immediately start generating HTTP 503 errors after you applied a DestinationRule
and the errors continue until you remove or revert the DestinationRule
, then the DestinationRule
is probably
causing a TLS conflict for the service.
For example, if you configure mutual TLS in the cluster globally, the DestinationRule
must include the following trafficPolicy
:
Otherwise, the mode defaults to DISABLE
causing client proxy sidecars to make plain HTTP requests
instead of TLS encrypted requests. Thus, the requests conflict with the server proxy because the server proxy expects
encrypted requests.
Whenever you apply a DestinationRule
, ensure the trafficPolicy
TLS mode matches the global server configuration.
Route rules have no effect on ingress gateway requests
Let’s assume you are using an ingress Gateway
and corresponding VirtualService
to access an internal service.
For example, your VirtualService
looks something like this:
You also have a VirtualService
which routes traffic for the helloworld service to a particular subset:
In this situation you will notice that requests to the helloworld service via the ingress gateway will not be directed to subset v1 but instead will continue to use default round-robin routing.
The ingress requests are using the gateway host (e.g., myapp.com
)
which will activate the rules in the myapp VirtualService
that routes to any endpoint of the helloworld service.
Only internal requests with the host helloworld.default.svc.cluster.local
will use the
helloworld VirtualService
which directs traffic exclusively to subset v1.
To control the traffic from the gateway, you need to also include the subset rule in the myapp VirtualService
:
Alternatively, you can combine both VirtualServices
into one unit if possible:
Envoy is crashing under load
Check your ulimit -a
. Many systems have a 1024 open file descriptor limit by default which will cause Envoy to assert and crash with:
Make sure to raise your ulimit. Example: ulimit -n 16384
Envoy won’t connect to my HTTP/1.0 service
Envoy requires HTTP/1.1
or HTTP/2
traffic for upstream services. For example, when using NGINX5 for serving traffic behind Envoy, you
will need to set the proxy_http_version6 directive in your NGINX configuration to be “1.1”, since the NGINX default is 1.0.
Example configuration:
404 errors occur when multiple gateways configured with same TLS certificate
Configuring more than one gateway using the same TLS certificate will cause browsers that leverage HTTP/2 connection reuse7 (i.e., most browsers) to produce 404 errors when accessing a second host after a connection to another host has already been established.
For example, let’s say you have 2 hosts that share the same TLS certificate like this:
- Wildcard certificate
*.test.com
installed inistio-ingressgateway
Gateway
configurationgw1
with hostservice1.test.com
, selectoristio: ingressgateway
, and TLS using gateway’s mounted (wildcard) certificateGateway
configurationgw2
with hostservice2.test.com
, selectoristio: ingressgateway
, and TLS using gateway’s mounted (wildcard) certificateVirtualService
configurationvs1
with hostservice1.test.com
and gatewaygw1
VirtualService
configurationvs2
with hostservice2.test.com
and gatewaygw2
Since both gateways are served by the same workload (i.e., selector istio: ingressgateway
) requests to both services
(service1.test.com
and service2.test.com
) will resolve to the same IP. If service1.test.com
is accessed first, it
will return the wildcard certificate (*.test.com
) indicating that connections to service2.test.com
can use the same certificate.
Browsers like Chrome and Firefox will consequently reuse the existing connection for requests to service2.test.com
.
Since the gateway (gw1
) has no route for service2.test.com
, it will then return a 404 (Not Found) response.
You can avoid this problem by configuring a single wildcard Gateway
, instead of two (gw1
and gw2
).
Then, simply bind both VirtualServices
to it like this:
Gateway
configurationgw
with host*.test.com
, selectoristio: ingressgateway
, and TLS using gateway’s mounted (wildcard) certificateVirtualService
configurationvs1
with hostservice1.test.com
and gatewaygw
VirtualService
configurationvs2
with hostservice2.test.com
and gatewaygw
Port conflict when configuring multiple TLS hosts in a gateway
If you apply a Gateway
configuration that has the same selector
labels as another
existing Gateway
, then if they both expose the same HTTPS port you must ensure that they have
unique port names. Otherwise, the configuration will be applied without an immediate error indication
but it will be ignored in the runtime gateway configuration. For example:
With this configuration, requests to the second host, myhost2.com
, will fail because
both gateway ports have name: https
.
A curl request, for example, will produce an error message something like this:
You can confirm that this has happened by checking Pilot’s logs for a message similar to the following:
To avoid this problem, ensure that multiple uses of the same protocol: HTTPS
port are uniquely named.
For example, change the second one to https2
: