Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDS configured client not setting :authority for TLS handshake #11750

Closed
dvilaverde opened this issue Dec 13, 2024 · 8 comments
Closed

XDS configured client not setting :authority for TLS handshake #11750

dvilaverde opened this issue Dec 13, 2024 · 8 comments

Comments

@dvilaverde
Copy link

What version of gRPC-Java are you using?

<grpc.version>1.69.0</grpc.version>

What is your environment?

macOS 15.1.1

java version "21.0.5" 2024-10-15 LTS
Java(TM) SE Runtime Environment (build 21.0.5+9-LTS-239)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.5+9-LTS-239, mixed mode, sharing)

What did you expect to see?

When using grpc-java client with XDS configuration I was expecting the client request to succeed but instead it failed. The only way the client application will work is when I manually set the authority via channelBuilder.overrideAuthority(), but doing this doesn't seem to make sense since the GRPC client now needs to know something I feel should have been provided in the XDS response with the endpoints.

Note that I've enabled:

  • thetrusted_xds_server server feature in the bootstrap.json
  • set the feature flag GRPC_EXPERIMENTAL_XDS_AUTHORITY_REWRITE to true
  • and returned auto_host_rewrite: true in the RouteMatch for that virtual host.

Since this GRPC server is behind an envoy that is handling multiple TLS certificates the SNI information needs to be sent via the TLS handshake, and I think that is what may be happening here. Is there something that I need to do in order to send :authority as SNI info either in the XDS response or during the ManagedChannel setup?

What did you see instead?

Exception in thread "main" io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:268)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:249)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:167)
	at com.example.service.v1.LookupGrpc$LookupBlockingStub.resolve(LookupGrpc.java:160)
	at com.example.adc.App.main(App.java:38)
Caused by: java.net.SocketException: Connection reset
	at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
	at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426)
	at io.grpc.netty.shaded.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:255)
	at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
	at io.grpc.netty.shaded.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:356)
	at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
	at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:842)

Steps to reproduce the bug

public class App 
{

    public static void main( String[] args )
    {
        String target = "xds:///cluster1:443";

        ChannelCredentials credentials = TlsChannelCredentials.create();
        ManagedChannelBuilder<?> channelBuilder = Grpc.newChannelBuilder(target, credentials);

        // NOTE: Client call will fail unless this is enabled.
//        channelBuilder.overrideAuthority("cluster1.us-east-1.example.com");

        final LookupGrpc.LookupBlockingStub blockingStub = LookupGrpc.newBlockingStub(channelBuilder.build());

        Request req = Request.newBuilder().addIps(IP.newBuilder()
            .setType(IPAddressType.IPV4)
            .setOriginAddress("8.8.8.8")).build();
        final Response result = blockingStub.resolve(req);
        if (result != null) {
          try {
              System.out.println(JsonFormat.printer().print(resolve));
          } catch (InvalidProtocolBufferException e) {
            throw new RuntimeException(e);
          }
        }
    }
}

my bootstrap.json looks like this:

{
  "client_default_listener_resource_name_template": "%s",
  "xds_servers": [
    {
      "server_uri": "localhost:8080",
      "channel_creds": [
        {
          "type": "insecure"
        }
      ],
      "server_features": [
        "xds_v3",
        "trusted_xds_server"
      ]
    }
  ],
  "cluster": "example-client-cluster",
  "node": {
    "id": "app1",
    "locality": { "region": "us-west-1" },
    "metadata": {}
  }
}

This is the ADS response from the XDS server:

Listener

- address:
    socket_address:
      address: 0.0.0.0
      port_value: 9999
  api_listener:
    api_listener:
      '@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
      http_filters:
      - name: envoy.filters.http.router
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          suppress_envoy_headers: true
      rds:
        config_source:
          ads: {}
          resource_api_version: V3
        route_config_name: cluster1_route_config
      stat_prefix: cluster1
  name: cluster1:443

RouteConfiguration

- name: cluster1_route_config
  virtual_hosts:
  - domains:
    - cluster1.us-east-1.example.com
    name: default
    routes:
    - match:
        prefix: ""
      name: default_route
      route:
        auto_host_rewrite: true
        cluster: cluster1

Clusters

- eds_cluster_config:
    eds_config:
      ads: {}
      resource_api_version: V3
  name: cluster1
  type: EDS

Endpoints

- cluster_name: cluster1
  endpoints:
  - lb_endpoints:
    - endpoint:
        address:
          socket_address:
            address: 12.34.45.56
            port_value: 443
        hostname: cluster1.us-east-1.example.com
    - endpoint:
        address:
          socket_address:
            address: 12.34.45.57
            port_value: 443
        hostname: cluster1.us-east-1.example.com
    load_balancing_weight: 1
    locality:
      region: us-west-1
@ejona86
Copy link
Member

ejona86 commented Dec 13, 2024

I responded to your email on the grpc-io list. In short, you need to use XdsChannelCredentials. Since you mention SNI here, it seems that may not be implemented; A29 says it is ignored.

set the feature flag GRPC_EXPERIMENTAL_XDS_AUTHORITY_REWRITE to true

gRFC A81 isn't fully implemented yet. None of it was in 1.69. In 1.70 it may be limping enough to look like it works, but it is known to be incomplete and it hasn't been fully tested. That's why it can only be enabled with an undocumented flag (yes, it is in the gRFC, but hasn't been documented for Java).

@dvilaverde
Copy link
Author

dvilaverde commented Dec 14, 2024

Hi @ejona86,

I tried setting the XdsChannelCredentials like so:

ChannelCredentials credentials = TlsChannelCredentials.create();
credentials = XdsChannelCredentials.create(credentials);
ManagedChannelBuilder<?> channelBuilder = Grpc.newChannelBuilder(target, credentials);

and like gRFC A29 described I set transport_socket the as follows:

- cluster1:
    eds_config:
      ads: {}
      resource_api_version: V3
  name: enrichment-geo
  transport_socket:
    name: envoy.transport_sockets.tls
    typed_config:
      '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
      common_tls_context:
        validation_context:
          ca_certificate_provider_instance:
            instance_name: ca-certificates

Of course when I added that transport_socket to my cluster I also added this to my bootstrap.json:

"certificate_providers": {
    "ca-certificates": {
      "plugin_name": "file_watcher",
      "config": {
        "certificate_file": "/tmp/cert.pem",
        "private_key_file": "/tmp/key.pem",
        "ca_certificate_file": "/tmp/ca-certificates/2024-11-26/cacerts.pem",
        "refresh_interval": "600s"
      }
    }
  }

And this is the same connection reset error:

Exception in thread "main" io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:268)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:249)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:167)
	at com.example.service.v1.LookupGrpc$LookupBlockingStub.resolve(GeoLookupGrpc.java:160)
	at com.example.adc.App.main(App.java:52)
Caused by: java.net.SocketException: Connection reset
	at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
	at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426)
	at io.grpc.netty.shaded.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:255)
	at io.grpc.netty.shaded.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
	at io.grpc.netty.shaded.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:356)
	at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
	at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:842)

I also tried this with 1.70.0-SNAPSHOT that I build locally and the same problem exists. Note that I've enable auto_rewrite_host from gRFC A81 but that doesn't solve the problem yet although manually setting overrideAuthority() does let the request succeed.

If I understand you correctly this is probably just not implemented yet, is that correct? If so, do you have a timeframe for when this might be implemented?

Thanks

@dvilaverde
Copy link
Author

dvilaverde commented Dec 14, 2024

Just to add some more context, I was able to patch grpc code to make this work but I'm pretty sure it's not the correct fix but I wanted to share it anyway. Basically in the ClusterImplLoadBalancer I've set the io.grpc.EquivalentAddressGroup.ATTR_AUTHORITY_OVERRIDE attribute on the EquivalentAddressGroup to be the Endpoint.hostname from the EDS response when auto_host_rewrite is enabled.

Here is an my code:

   if (GrpcUtil.getFlag("GRPC_EXPERIMENTAL_XDS_AUTHORITY_REWRITE", false)) {
        String hostname = args.getAddresses().get(0).getAttributes()
            .get(InternalXdsAttributes.ATTR_ADDRESS_NAME);
        if (hostname != null) {
          attrsBuilder.set(InternalXdsAttributes.ATTR_ADDRESS_NAME, hostname);

          // Set the :authority for TLS negotiation when XSD authority rewrite is enabled.
          for (EquivalentAddressGroup eag : addresses) {
            Attributes.Builder eagAttrsBuilder = eag.getAttributes().toBuilder();
            eagAttrsBuilder.set(EquivalentAddressGroup.ATTR_AUTHORITY_OVERRIDE, hostname);
            addresses.set(addresses.indexOf(eag),
                new EquivalentAddressGroup(eag.getAddresses(), eagAttrsBuilder.build()));
          }
        }
      }

Ideally I think the SNI value should be sourced from the UpstreamTlsContext. Is there a gRFC coming to support the SNI value from the transport_socket on the cluster?

@ejona86
Copy link
Member

ejona86 commented Dec 16, 2024

I've set the io.grpc.EquivalentAddressGroup.ATTR_AUTHORITY_OVERRIDE

:-/ Looks like there's a bug. When using XdsChannelCredentials, SNI should be disabled. But the way it is delegating to the normal implementation means the implementation knows the regular authority and passes it to the SSLEngine. That's what powers SNI.

@dvilaverde
Copy link
Author

SNI should be disabled

@ejona86 not sure why you say SNI should be disable when using XdsChannelCredentials, is it because gRFC A29 says the sni field is ignored in a CDS update?

Since the sni field is present in the UpstreamTlsContext of a CDS update, I'd believe that SNI should be enabled when using XdsChannelCredentials.

What is the process to have a gRFC created to add SNI support when enabling xds-tls-security as decribed in gRFC 29?

@ejona86
Copy link
Member

ejona86 commented Dec 16, 2024

Since sni field is ignored, we shouldn't be using SNI now. That allows us to start observing the sni field in the future and know we aren't breaking existing users.

Making the gRFC is only part of it; it would also require implementation work. A gRFC is not "please implement this for me" but "here is what I will implement."

Because of this issue and me noticing SNI was not supported, I've been asking around to see if others actually need SNI for other reasons soon.

@dvilaverde
Copy link
Author

@ejona86 thanks for looking into the sni issue. It's pretty important on our end and without it we can't get by the connection reset issue.

@ejona86
Copy link
Member

ejona86 commented Dec 26, 2024

Closing in favor of #11784. This has noise like GRPC_EXPERIMENTAL_XDS_AUTHORITY_REWRITE which isn't yet supported and ATTR_AUTHORITY_OVERRIDE which we wouldn't want to work. It's very easy to believe a that someone needs SNI when connecting to a shared proxy.

@ejona86 ejona86 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants