Use PEM certificates loaded from secrets for Kafka#11447
Use PEM certificates loaded from secrets for Kafka#11447scholzj merged 2 commits intostrimzi:mainfrom
Conversation
e5eaea1 to
eb7a6d6
Compare
eb7a6d6 to
8ffd37f
Compare
|
@ppatierno @katheris can you please review this PR when you get a chance? Thank you :) |
8ffd37f to
91eaf20
Compare
3d7de64 to
3363f8a
Compare
f67eec1 to
595b61b
Compare
|
Thank @ppatierno so much for reviewing the PR. I have now addressed your comments. Could you also please kick off the regression tests? |
|
/azp run regression |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run regression |
|
Azure Pipelines successfully started running 1 pipeline(s). |
katheris
left a comment
There was a problem hiding this comment.
The changes look pretty good to me, I just had a couple of questions and suggestions that I added.
|
@tinaselenge I restarted failed regression tests, not sure if they were related to the PR but there were quite a few. Let's see the next run. |
They failed even for the previous runs, so I guess they are related to the PR. |
|
yes, they are definitely related as they failed locally for me as well. I fixed OAuth related failures but still trying to fix some failures in ListenersST that tests listeners with custom certificates. I will update the PR once I have it passing locally. |
|
/azp run regression |
|
Azure Pipelines successfully started running 1 pipeline(s). |
bfede7c to
8ae072a
Compare
| */ | ||
| @SuppressWarnings("deprecation") // OAuth authentication is deprecated | ||
| private void configureAuthentication(String listenerName, List<String> securityProtocol, boolean tls, KafkaListenerAuthentication auth) { | ||
| private void configureAuthentication(String listenerName, List<String> securityProtocol, boolean tls, KafkaListenerAuthentication auth, String clusterName) { |
There was a problem hiding this comment.
Did you considered storing the cluster name at the object level given it seems to be needed all around the place now? What happens if the Secret is deleted or if the fields inside it are renamed and the broker Pod restarts (not through the operator but for some other reason).
There was a problem hiding this comment.
Did you considered storing the cluster name at the object level given it seems to be needed all around the place now?
I did but cluster name seems to be passed into most of the with* methods so if we make it at the object level, I would refactor all of those methods. Maybe should be done in a separate PR?
What happens if the Secret is deleted or if the fields inside it are renamed and the broker Pod restarts (not through the operator but for some other reason).
Pods will restart but brokers would fail to authenticate clients, I guess? Don't we have the similar risk today though? We generate p12 files based on the volume mounted secrets with the specific fields. If broker pod restarts but the secret does not exist, the pod would not restart or if *.crt field does not exist, it would not find the volume mounted file to generate the p12 files?
There was a problem hiding this comment.
Pods will restart but brokers would fail to authenticate clients, I guess? Don't we have the similar risk today though? We generate p12 files based on the volume mounted secrets with the specific fields. If broker pod restarts but the secret does not exist, the pod would not restart or if
*.crtfield does not exist, it would not find the volume mounted file to generate the p12 files?
Does it fail the clients? Or does it make the brokers crashlooping because the initialization fails? I think those are two different outcomes.
You are right that today the broker would end up pending I guess. But that does not mean we cannot improve on it.
There was a problem hiding this comment.
Also I think I added the second part to a wrong comment - it should have been probably added to the one about copying the custom server certificates.
There was a problem hiding this comment.
I think the broker will go into crashloop because of failing to initialise since the kubernetes config provider will run and fail to fetch the custom cert secret or its field if they are missing. So do we think copying the custom cert secrets into our internal secret would help us in case the secret is deleted or its field has changed?
If we copy them into the existing internal per broker secret, I wonder how we should reconcile it. We would append the key and cert with their original field names as some listeners might still use the internal per broker cert. If the field has changed, do we keep appending the new one and then remove the old field at some point?
I do agree that we should improve on it, but as the PR is already quite big, I wonder if we should tackle it a separate PR with more discussion, unless if people think this is a stopper for this PR.
There was a problem hiding this comment.
You would need to use a separate field for it. But not sure the CA reconciliation would not wipe it out. Maybe the easiest thing would be to keep it as is and open a new issue for this? We can probably get back to it later and think how to best fix it. And it would not block this PR any further.
bfeb368 to
8985df4
Compare
|
/gha run pipeline=upgrade,regression |
|
⏳ System test verification started: link The following 10 job(s) will be executed:
Tests will start after successful build completion. |
|
@strimzi/system-test-contributors Any chance you can run STs for this on some FIPS cluster? |
|
❌ System test verification failed: link |
|
Looks like system test failed due to a flaky test that is unrelated to this PR. The flaky test was fixed by #11986. Should we kick off the tests again? |
I'm not sure we need to re-run them as we know the failure is unrelated. Let's see if @ppatierno has any more comments. Maybe we can re-run them afterwards. |
|
/gha run pipeline=regression |
|
⏳ System test verification started: link The following 6 job(s) will be executed:
Tests will start after successful build completion. |
|
@tinaselenge I re-ran the regression pipeline but just notice there is conflict to resolve on the CHANGELOG. Of course, it won't have impact on tests result. |
|
❌ System test verification failed: link |
Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>
Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>
7cf2aba to
da62b8e
Compare
|
Not sure why a ST failed, but when running it locally, it passes. Can we please kick off the tests again? I only rebased and updated the CHANGELOG.md since the last successful STs. |
|
I think it failed because the Pr was not rebased when Paolo started it 🙄. |
|
/gha run pipeline=regression |
|
⏳ System test verification started: link The following 6 job(s) will be executed:
Tests will start after successful build completion. |
|
❌ System test verification failed: link |
|
🎉 System test verification passed: link |
| If you want to deploy and run the Heartbeat connector, you can use separate `KafkaConnect` and `KafkaConnector` custom resources. | ||
| * The `.spec.build.output.additionalKanikoOptions` field in the `KafkaConnect` custom resource is deprecated and will be removed in the future. | ||
| * Use `.spec.build.output.additionalBuildOptions` field instead. | ||
| * Kafka nodes are now configured with PEM certificates instead of P12/JKS for keystore and truststore. |
There was a problem hiding this comment.
Just a nitpicking feedback. We recently upgraded to 0.49.1, and one of the brokers got stuck in crashlooping with some errors about corrupted files with PEM. It turns out that we happened to be still using PKCS#1, while Kafka native support only supported PKCS#8 for PEM.
I'm not sure if it's worth a note or something in the release notes or upgrade tips though. However, I would like to share my experience and finding here so in case it matters or anyone else got trapped by PKCS#1.
Type of change
Description
Resolves part of #11294
Checklist
Please go through this checklist and make sure all applicable tasks have been done