Use PEM certificates loaded from secrets for Kafka by tinaselenge · Pull Request #11447 · strimzi/strimzi-kafka-operator

tinaselenge · 2025-05-19T10:41:16Z

Type of change

Refactoring

Description

Use KubernetesSecretConfigProvider to access secrets directly to configure Kafka truststore and keystore used for nodes to authenticate each other and with clients.
OAuth and Authorization server configurations will still use PKCS12 certs generated by the script because they are getting deprecated and removed soon in CRD v1 release. Once they are removed, the script for preparing TLS certificates can be completely removed.
Remove volume mounts and environment variables for configuring truststore and keystore as they are no longer needed because secrets are directly accessed.
Refactored KafkaAgent to directly access the cluster CA and node certificates and use them to configure the HTTP server, instead of using PKCS12 certificates generated by the script. Added util class to allow creating JKS keystores from secrets.

Resolves part of #11294

Checklist

Please go through this checklist and make sure all applicable tasks have been done

Write tests
Make sure all tests pass
Update documentation
Check RBAC rights for Kubernetes / OpenShift roles
Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
Reference relevant issue(s) and close them after merging
Update CHANGELOG.md
Supply screenshots for visual changes, such as Grafana dashboards

tinaselenge · 2025-06-10T08:25:39Z

@ppatierno @katheris can you please review this PR when you get a chance? Thank you :)

tinaselenge · 2025-06-18T14:42:29Z

Thank @ppatierno so much for reviewing the PR. I have now addressed your comments.

Could you also please kick off the regression tests?

im-konge · 2025-06-18T16:02:41Z

/azp run regression

azure-pipelines · 2025-06-18T16:02:54Z

Azure Pipelines successfully started running 1 pipeline(s).

im-konge · 2025-06-19T08:35:06Z

/azp run regression

azure-pipelines · 2025-06-19T08:35:19Z

Azure Pipelines successfully started running 1 pipeline(s).

katheris

The changes look pretty good to me, I just had a couple of questions and suggestions that I added.

ppatierno · 2025-06-23T08:09:45Z

@tinaselenge I restarted failed regression tests, not sure if they were related to the PR but there were quite a few. Let's see the next run.

im-konge · 2025-06-23T08:15:14Z

@tinaselenge I restarted failed regression tests, not sure if they were related to the PR but there were quite a few. Let's see the next run.

They failed even for the previous runs, so I guess they are related to the PR.

tinaselenge · 2025-06-24T15:13:25Z

yes, they are definitely related as they failed locally for me as well. I fixed OAuth related failures but still trying to fix some failures in ListenersST that tests listeners with custom certificates. I will update the PR once I have it passing locally.

katheris · 2025-06-30T10:08:03Z

/azp run regression

azure-pipelines · 2025-06-30T10:08:16Z

Azure Pipelines successfully started running 1 pipeline(s).

scholzj · 2025-10-04T17:51:48Z

     */
    @SuppressWarnings("deprecation") // OAuth authentication is deprecated
-    private void configureAuthentication(String listenerName, List<String> securityProtocol, boolean tls, KafkaListenerAuthentication auth)    {
+    private void configureAuthentication(String listenerName, List<String> securityProtocol, boolean tls, KafkaListenerAuthentication auth, String clusterName)    {


Did you considered storing the cluster name at the object level given it seems to be needed all around the place now? What happens if the Secret is deleted or if the fields inside it are renamed and the broker Pod restarts (not through the operator but for some other reason).

Did you considered storing the cluster name at the object level given it seems to be needed all around the place now?

I did but cluster name seems to be passed into most of the with* methods so if we make it at the object level, I would refactor all of those methods. Maybe should be done in a separate PR?

What happens if the Secret is deleted or if the fields inside it are renamed and the broker Pod restarts (not through the operator but for some other reason).

Pods will restart but brokers would fail to authenticate clients, I guess? Don't we have the similar risk today though? We generate p12 files based on the volume mounted secrets with the specific fields. If broker pod restarts but the secret does not exist, the pod would not restart or if *.crt field does not exist, it would not find the volume mounted file to generate the p12 files?

Pods will restart but brokers would fail to authenticate clients, I guess? Don't we have the similar risk today though? We generate p12 files based on the volume mounted secrets with the specific fields. If broker pod restarts but the secret does not exist, the pod would not restart or if *.crt field does not exist, it would not find the volume mounted file to generate the p12 files?

Does it fail the clients? Or does it make the brokers crashlooping because the initialization fails? I think those are two different outcomes.

You are right that today the broker would end up pending I guess. But that does not mean we cannot improve on it.

Also I think I added the second part to a wrong comment - it should have been probably added to the one about copying the custom server certificates.

I think the broker will go into crashloop because of failing to initialise since the kubernetes config provider will run and fail to fetch the custom cert secret or its field if they are missing. So do we think copying the custom cert secrets into our internal secret would help us in case the secret is deleted or its field has changed?

If we copy them into the existing internal per broker secret, I wonder how we should reconcile it. We would append the key and cert with their original field names as some listeners might still use the internal per broker cert. If the field has changed, do we keep appending the new one and then remove the old field at some point?

I do agree that we should improve on it, but as the PR is already quite big, I wonder if we should tackle it a separate PR with more discussion, unless if people think this is a stopper for this PR.

You would need to use a separate field for it. But not sure the CA reconciliation would not wipe it out. Maybe the easiest thing would be to keep it as is and open a new issue for this? We can probably get back to it later and think how to best fix it. And it would not block this PR any further.

Opened an issue for this here #12000.

scholzj

LGTM. Thanks.

scholzj · 2025-10-06T14:10:05Z

/gha run pipeline=upgrade,regression

github-actions · 2025-10-06T14:10:37Z

⏳ System test verification started: link

The following 10 job(s) will be executed:

regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)
upgrade-azp_kraft_upgrade-amd64 (oracle-vm-4cpu-16gb-x86-64)
upgrade-azp_kafka_upgrade-amd64 (oracle-vm-4cpu-16gb-x86-64)
upgrade-azp_kraft_upgrade-arm64 (oracle-vm-4cpu-16gb-arm64)
upgrade-azp_kafka_upgrade-arm64 (oracle-vm-4cpu-16gb-arm64)

Tests will start after successful build completion.

scholzj · 2025-10-06T14:11:51Z

@strimzi/system-test-contributors Any chance you can run STs for this on some FIPS cluster?

github-actions · 2025-10-06T19:15:33Z

❌ System test verification failed: link

tinaselenge · 2025-10-07T11:25:10Z

Looks like system test failed due to a flaky test that is unrelated to this PR. The flaky test was fixed by #11986. Should we kick off the tests again?

scholzj · 2025-10-07T12:14:38Z

Looks like system test failed due to a flaky test that is unrelated to this PR. The flaky test was fixed by #11986. Should we kick off the tests again?

I'm not sure we need to re-run them as we know the failure is unrelated. Let's see if @ppatierno has any more comments. Maybe we can re-run them afterwards.

ppatierno

LGTM.

ppatierno · 2025-10-08T06:28:04Z

/gha run pipeline=regression

github-actions · 2025-10-08T06:28:35Z

⏳ System test verification started: link

The following 6 job(s) will be executed:

regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

ppatierno · 2025-10-08T06:29:22Z

@tinaselenge I re-ran the regression pipeline but just notice there is conflict to resolve on the CHANGELOG. Of course, it won't have impact on tests result.

github-actions · 2025-10-08T11:13:19Z

❌ System test verification failed: link

Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>

tinaselenge · 2025-10-08T13:10:19Z

Not sure why a ST failed, but when running it locally, it passes. Can we please kick off the tests again? I only rebased and updated the CHANGELOG.md since the last successful STs.

scholzj · 2025-10-08T13:16:22Z

I think it failed because the Pr was not rebased when Paolo started it 🙄.

ppatierno · 2025-10-08T14:21:24Z

/gha run pipeline=regression

github-actions · 2025-10-08T14:23:17Z

⏳ System test verification started: link

The following 6 job(s) will be executed:

regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

github-actions · 2025-10-08T19:06:28Z

❌ System test verification failed: link

github-actions · 2025-10-09T00:25:37Z

🎉 System test verification passed: link

shk3 · 2026-01-08T14:45:57Z

  If you want to deploy and run the Heartbeat connector, you can use separate `KafkaConnect` and `KafkaConnector` custom resources.
 * The `.spec.build.output.additionalKanikoOptions` field in the `KafkaConnect` custom resource is deprecated and will be removed in the future. 
  * Use `.spec.build.output.additionalBuildOptions` field instead.
+* Kafka nodes are now configured with PEM certificates instead of P12/JKS for keystore and truststore.


Just a nitpicking feedback. We recently upgraded to 0.49.1, and one of the brokers got stuck in crashlooping with some errors about corrupted files with PEM. It turns out that we happened to be still using PKCS#1, while Kafka native support only supported PKCS#8 for PEM.

I'm not sure if it's worth a note or something in the release notes or upgrade tips though. However, I would like to share my experience and finding here so in case it matters or anyone else got trapped by PKCS#1.

tinaselenge force-pushed the use-pem-kafka branch from e5eaea1 to eb7a6d6 Compare May 19, 2025 12:06

tinaselenge force-pushed the use-pem-kafka branch from eb7a6d6 to 8ffd37f Compare May 28, 2025 14:09

tinaselenge marked this pull request as ready for review May 28, 2025 14:12

tinaselenge requested review from katheris and ppatierno and removed request for ppatierno May 28, 2025 14:12

ppatierno added this to the 0.47.0 milestone Jun 4, 2025

tinaselenge force-pushed the use-pem-kafka branch from 8ffd37f to 91eaf20 Compare June 12, 2025 13:09

ppatierno requested changes Jun 16, 2025

View reviewed changes

tinaselenge force-pushed the use-pem-kafka branch 2 times, most recently from 3d7de64 to 3363f8a Compare June 18, 2025 09:15

github-advanced-security AI found potential problems Jun 18, 2025

View reviewed changes

Comment thread kafka-agent/src/main/java/io/strimzi/kafka/agent/KafkaAgent.java Fixed

tinaselenge force-pushed the use-pem-kafka branch from f67eec1 to 595b61b Compare June 18, 2025 10:39

katheris reviewed Jun 19, 2025

View reviewed changes

scholzj modified the milestones: 0.47.0, 0.48.0 Jul 10, 2025

tinaselenge force-pushed the use-pem-kafka branch 2 times, most recently from bfede7c to 8ae072a Compare July 23, 2025 14:28

scholzj reviewed Oct 4, 2025

View reviewed changes

tinaselenge force-pushed the use-pem-kafka branch from bfeb368 to 8985df4 Compare October 5, 2025 12:11

scholzj approved these changes Oct 6, 2025

View reviewed changes

ppatierno approved these changes Oct 8, 2025

View reviewed changes

im-konge reviewed Oct 8, 2025

View reviewed changes

Comment thread CHANGELOG.md

tinaselenge added 2 commits October 8, 2025 13:31

Use PEM certificates loaded from secrets for Kafka

c171dd4

Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>

Address comments from Jakub

da62b8e

Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>

tinaselenge force-pushed the use-pem-kafka branch from 7cf2aba to da62b8e Compare October 8, 2025 12:32

scholzj merged commit 6c17f27 into strimzi:main Oct 9, 2025
29 checks passed

tinaselenge deleted the use-pem-kafka branch October 9, 2025 07:55

tinaselenge mentioned this pull request Oct 9, 2025

Handle missing secrets configured by KafkaBrokerConfigurationBuilder #12000

Closed

scholzj added this to Roadmap Oct 12, 2025

scholzj moved this to 0.49.0 (Work in Progress) in Roadmap Oct 12, 2025

shk3 reviewed Jan 8, 2026

View reviewed changes

Conversation

tinaselenge commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Description

Checklist

Uh oh!

tinaselenge commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tinaselenge commented Jun 18, 2025

Uh oh!

im-konge commented Jun 18, 2025

Uh oh!

azure-pipelines Bot commented Jun 18, 2025

Uh oh!

im-konge commented Jun 19, 2025

Uh oh!

azure-pipelines Bot commented Jun 19, 2025

Uh oh!

katheris left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ppatierno commented Jun 23, 2025

Uh oh!

im-konge commented Jun 23, 2025

Uh oh!

tinaselenge commented Jun 24, 2025

Uh oh!

katheris commented Jun 30, 2025

Uh oh!

azure-pipelines Bot commented Jun 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scholzj left a comment

Choose a reason for hiding this comment

Uh oh!

scholzj commented Oct 6, 2025

Uh oh!

github-actions Bot commented Oct 6, 2025

Uh oh!

scholzj commented Oct 6, 2025

Uh oh!

github-actions Bot commented Oct 6, 2025

Uh oh!

tinaselenge commented Oct 7, 2025

Uh oh!

scholzj commented Oct 7, 2025

Uh oh!

ppatierno left a comment

Choose a reason for hiding this comment

Uh oh!

ppatierno commented Oct 8, 2025

Uh oh!

github-actions Bot commented Oct 8, 2025

tinaselenge commented May 19, 2025 •

edited

Loading