Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: merge apache kafka trunk (#1030)
* KAFKA-16305: Avoid optimisation in handshakeUnwrap (#15434) Performs additional unwrap during handshake after data from client is processed to support openssl, which needs the extra unwrap to complete handshake. Reviewers: Ismael Juma <[email protected]>, Rajini Sivaram <[email protected]> * KAFKA-16116: Rebalance Metrics for AsyncKafkaConsumer (#15339) Adding the following rebalance metrics to the consumer: rebalance-latency-avg rebalance-latency-max rebalance-latency-total rebalance-rate-per-hour rebalance-total failed-rebalance-rate-per-hour failed-rebalance-total Due to the difference in protocol, we need to redefine when rebalance starts and ends. Start of Rebalance: Current: Right before sending out JoinGroup ConsumerGroup: When the client receives assignments from the HB End of Rebalance - Successful Case: Current: Receiving SyncGroup request after transitioning to "COMPLETING_REBALANCE" ConsumerGroup: After completing reconciliation and right before sending out "Ack" heartbeat End of Rebalance - Failed Case: Current: Any failure in the JoinGroup/SyncGroup response ConsumerGroup: Failure in the heartbeat Note: Afterall, we try to be consistent with the current protocol. Rebalances start and end with sending and receiving network requests. Failures in network requests signify the user failures in rebalance. And it is entirely possible to have multiple failures before having a successful one. Reviewers: Lucas Brutschy <[email protected]> * MINOR: Optimize EventAccumulator (#15430) `poll(long timeout, TimeUnit unit)` is either used with `Long.MAX_VALUE` or `0`. This patch replaces it with `poll` and `take`. It removes the `awaitNanos` usage. Reviewers: Jeff Kim <[email protected]>, Justine Olshan <[email protected]> * MINOR: Remove the space between two words (#15439) Remove the space between two words Reviewers: Luke Chen <[email protected]> * KAFKA-16154: Broker returns offset for LATEST_TIERED_TIMESTAMP (#15213) This is the first part of the implementation of KIP-1005 The purpose of this pull request is for the broker to start returning the correct offset when it receives a -5 as a timestamp in a ListOffsets API request Reviewers: Luke Chen <[email protected]>, Kamal Chandraprakash <[email protected]>, Satish Duggana <[email protected]> * KAFKA-15462: Add Group Type Filter for List Group to the Admin Client (#15150) In KIP-848, we introduce the notion of Group Types based on the protocol type that the members in the consumer group use. As of now we support two types of groups: * Classic : Members use the classic consumer group protocol ( existing one ) * Consumer : Members use the consumer group protocol introduced in KIP-848. Currently List Groups allows users to list all the consumer groups available. KIP-518 introduced filtering the consumer groups by the state that they are in. We now want to allow users to filter consumer groups by type. This patch includes the changes to the admin client and related files. It also includes changes to parameterize the tests to include permutations of the old GC and the new GC with the different protocol types. Reviewers: David Jacot <[email protected]> * KAFKA-16191: Clean up of consumer client internal events (#15438) There are a few minor issues with the event sub-classes in the org.apache.kafka.clients.consumer.internals.events package that should be cleaned up: - Update the names of subclasses to remove "Application" or "Background" - Make toString() final in the base classes and clean up the implementations of toStringBase() - Fix minor whitespace inconsistencies - Make variable/method names consistent Reviewer: Bruno Cadonna <[email protected]> * MINOR: Fix UpdatedImage and HighWatermarkUpdated events' logs (#15432) I have noticed the following log when a __consumer_offsets partition immigrate from a broker. It appends because the event is queued up after the event that unloads the state machine. This patch fixes it and fixes another similar one. ``` [2024-02-06 17:14:51,359] ERROR [GroupCoordinator id=1] Execution of UpdateImage(tp=__consumer_offsets-28, offset=13251) failed due to This is not the correct coordinator.. (org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime) org.apache.kafka.common.errors.NotCoordinatorException: This is not the correct coordinator. ``` Reviewers: Justine Olshan <[email protected]> * KAFKA-16167: Disable wakeups during autocommit on close (#15445) When the consumer is closed, we perform a sychronous autocommit. We don't want to be woken up here, because we are already executing a close operation under a deadline. This is in line with the behavior of the old consumer. This fixes PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup which is flaky on trunk - because we return immediately from the synchronous commit with a WakeupException, which causes us to not wait for the commit to finish and thereby sometimes miss the committed offset when a new consumer is created. Reviewers: Matthias J. Sax <[email protected]>, Bruno Cadonna <[email protected]> * KAFKA-16261: updateSubscription fails if already empty subscription (#15440) The internal SubscriptionState object keeps track of whether the assignment is user-assigned, or auto-assigned. If there are no assigned partitions, the assignment resets to NONE. If you call SubscriptionState.assignFromSubscribed in this state it fails. This change makes sure to check SubscriptionState.hasAutoAssignedPartitions() so that assignFromSubscribed is going to be permitted. Also, a minor refactoring to make clearing the subscription a bit easier to follow in MembershipManagerImpl. Testing via new unit test. Reviewers: Bruno Cadonna <[email protected]>, Andrew Schofield <[email protected]> * KAFKA-15878: KIP-768 - Extend support for opaque (i.e. non-JWT) tokens in SASL/OAUTHBEARER (#14818) # Overview * This change pertains to [SASL/OAUTHBEARER ](https://kafka.apache.org/documentation/#security_sasl_oauthbearer) mechanism of Kafka authentication. * Kafka clients can use [SASL/OAUTHBEARER ](https://kafka.apache.org/documentation/#security_sasl_oauthbearer) mechanism by overriding the [custom call back handlers](https://kafka.apache.org/documentation/#security_sasl_oauthbearer_prod) . * [KIP-768](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186877575) available from v3.1 further extends the mechanism with a production grade implementation. * Kafka's [SASL/OAUTHBEARER ](https://kafka.apache.org/documentation/#security_sasl_oauthbearer) mechanism currently **rejects the non-JWT (i.e. opaque) tokens**. This is because of a more restrictive set of characters than what [RFC-6750](https://datatracker.ietf.org/doc/html/rfc6750#section-2.1) recommends. * This JIRA can be considered an extension of [KIP-768](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186877575) to support the opaque tokens as well apart from the JWT tokens. # Solution * Have updated the regex in the the offending class to be compliant with the [RFC-6750](https://datatracker.ietf.org/doc/html/rfc6750#section-2.1) * Have provided a supporting test case that includes the possible character set defined in [RFC-6750](https://datatracker.ietf.org/doc/html/rfc6750#section-2.1) --------- Co-authored-by: Anuj Sharma <[email protected]> Co-authored-by: Jamie Holmes <[email protected]> Co-authored-by: Christopher Webb <[email protected]> Reviewers: Manikumar Reddy <[email protected]>, Kirk True <[email protected]> * MINOR: Upgrade jqwik to version 1.8.3 (#14365) This minor pull request consist of upgrading version of jqwik library to version 1.8.0 that brings some bug fixing and some enhancements, upgrading the version now will make future upgrades easier For breaking changes: We are not using ArbitraryConfiguratorBase, so there is no overriding of configure method We are not using TypeUsage.canBeAssignedTo(TypeUsage) No breaking is related to @Provide and @ForAll usage no Exception CannotFindArbitraryException is thrown during tests running No usage of StringArbitrary.repeatChars(0.0) We are not affected by the removal of method TypeArbitrary.use(Executable) We are not affected by the removal or methods ActionChainArbitrary.addAction(action) and ActionChainArbitrary.addAction(weight, action) For more details check the release notes: https://jqwik.net/release-notes.html#180 Reviewers: Chia-Ping Tsai <[email protected]>, Yash Mayya <[email protected]> * MINOR: fix link for ListTransactionsOptions#filterOnDuration (#15459) Reviewers: Chia-Ping Tsai <[email protected]> * MINOR: fix SessionStore java doc (#15412) Reviewers: Chia-Ping Tsai <[email protected]> * MINOR: Remove unnecessary easymock/powermock dependencies (#15460) These projects don't actually use easymock/powermock. Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-15625: Do not flush global state store at each commit (#15361) Global state stores are currently flushed at each commit, which may impact performance, especially for EOS (commit each 200ms). The goal of this improvement is to flush global state stores only when the delta between the current offset and the last checkpointed offset exceeds a threshold. This is the same logic we apply on local state store, with a threshold of 10000 records. The implementation only flushes if the time interval elapsed and the threshold of 10000 records is exceeded. Reviewers: Jeff Kim <[email protected]>, Bruno Cadonna <[email protected]> * MINOR: Updating comments to match the code (#15388) This comment was added by #12862 The method with the comment was originally named updateLastSend, but its name was later changed to onSendAttempt. This method doesn't increment numAttempts. It seems that the numAttempts is only modified after a Request succeeds or fails. Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16285: Make group metadata available when a new assignment is set (#15426) Currently, in the async Kafka consumer updates to the group metadata that are received by the heartbeat are propagated to the application thread in form of an event. Group metadata is updated when a new assignment is received. The new assignment is directly set in the subscription without sending an update event from the background thread to the application thread. That means that there might be a delay between the application thread being aware of the update to the assignment and the application thread being aware of the update to the group metadata. This delay can cause stale group metadata returned by the application thread that then causes issues when data of the new assignment is committed. A concrete example is producer.sendOffsetsToTransaction(offsetsToCommit, groupMetadata) The offsets to commit might already stem from the new assignment but the group metadata might relate to the previous assignment. Reviewers: Kirk True <[email protected]>, Andrew Schofield <[email protected]>, Lucas Brutschy <[email protected]> * MINOR: simplify ensure topic exists condition (#15458) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-14747: record discarded FK join subscription responses (#15395) A foreign-key-join might drop a "subscription response" message, if the value-hash changed. This PR adds support to record such event via the existing "dropped records" sensor. Reviewers: Matthias J. Sax <[email protected]> * KAFKA-16288, KAFKA-16289: Fix Values convertToDecimal exception and parseString corruption (#15399) * KAFKA-16288: Prevent ClassCastExceptions for strings in Values.convertToDecimal * KAFKA-16289: Values inferred schemas for map and arrays should ignore element order Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]> * KAFKA-16169: FencedException in commitAsync not propagated without callback (#15437) The javadocs for commitAsync() (w/o callback) say: @throws org.apache.kafka.common.errors.FencedInstanceIdException if this consumer instance gets fenced by broker. If no callback is passed into commitAsync(), no offset commit callback invocation is submitted. However, we only check for a FencedInstanceIdException when we execute a callback. When the consumer gets fenced by another consumer with the same group.instance.id, and we do not use a callback, we miss the exception. This change modifies the behavior to propagate the FencedInstanceIdException even if no callback is used. The code is kept very similar to the original consumer. We also change the order - first try to throw the fenced exception, then execute callbacks. That is the order in the original consumer so it's safer to keep it this way. For testing, we add a unit test that verifies that the FencedInstanceIdException is thrown in that case. Reviewers: Philip Nee <[email protected]>, Matthias J. Sax <[email protected]> * KAFKA-14588 Log cleaner configuration move to CleanerConfig (#15387) In order to move ConfigCommand to tools we must move all it's dependencies which includes KafkaConfig and other core classes to java. This PR moves log cleaner configuration to CleanerConfig class of storage module. Reviewers: Chia-Ping Tsai <[email protected]> * MINOR: parameterize group-id in GroupMetadataManagerTestContext (#15467) This pr parameterize some group ids in GroupMetadataManagerTestContext that are now constant strings. Reviewers: Chia-Ping Tsai <[email protected]> * MINOR: remove test constructor for PartitionAssignment (#15435) Remove the test constructor for PartitionAssignment and remove the TODO. Also add KRaftClusterTest.testCreatePartitions to get more coverage for createPartitions. Reviewers: David Arthur <[email protected]>, Chia-Ping Tsai <[email protected]> * MINOR: Remove controlPlaneRequestProcessor in BrokerServer (#15245) It seems likely that BrokerServer was built upon the KafkaServer codebase.(#10113) KafkaServer, using Zookeeper, separates controlPlane and dataPlane to implement KIP-291. In KRaft, the roles of DataPlane and ControlPlane in KafkaServer seem to be divided into BrokerServer and ControllerServer. It appears that the initial implementation of BrokerServer initialized and used the controlPlaneRequestProcessor, but it seems to have been removed, except for the code used in the shutdown method, through subsequent modifications.(#10931) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16209 : fetchSnapshot might return null if topic is created before v2.8 (#15444) Change the function with a better way to deal with the NULL pointer exception. Reviewers: Luke Chen <[email protected]> * KAFKA-15417: flip joinSpuriousLookBackTimeMs and emit non-joined items (#14426) Kafka Streams support asymmetric join windows. Depending on the window configuration we need to compute window close time etc differently. This PR flips `joinSpuriousLookBackTimeMs`, because they were not correct, and introduced the `windowsAfterIntervalMs`-field that is used to find if emitting records can be skipped. Reviewers: Hao Li <[email protected]>, Guozhang Wang <[email protected]>, Matthias J. Sax <[email protected]> * KAFKA-16347: Upgrade zookeeper 3.8.3 -> 3.8.4 (#15480) Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-14589 [3/4] Tests of ConsoleGroupCommand rewritten in java (#15365) Is contains some of ConsoleGroupCommand tests rewritten in java. Intention of separate PR is to reduce changes and simplify review. Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16252: Fix the documentation and adjust the format (#15473) Currently, there are few document files generated automatically like the task genConnectMetricsDocs However, the unwanted log information also added into it. And the format is not aligned with other which has Mbean located of the third column. I modified the code logic so the format could follow other section in ops.html Also close the log since we take everything from the std as a documentation Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16322 upgrade jline from 3.22.0 to 3.25.1 (#15464) An issue in the component "GroovyEngine.execute" of jline-groovy versions through 3.24.1 allows attackers to cause an OOM (OutofMemory) error. Please refer to https://devhub.checkmarx.com/cve-details/CVE-2023-50572 for more details Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-15797: Fix flaky EOS_v2 upgrade test (#15449) Originally, we set commit-interval to MAX_VALUE for this test, to ensure we only commit expliclity. However, we needed to decrease it later on when adding the tx-timeout verification. We did see failing test for which commit-interval hit, resulting in failing test runs. This PR increase the commit-interval close to test-timeout to avoid commit-interval from triggering. Reviewers: Bruno Cadonna <[email protected]> * KAFKA-14683: Migrate WorkerSinkTaskTest to Mockito (3/3) (#15316) Reviewers: Greg Harris <[email protected]> * MINOR: Add 3.7 to Kafka Streams system tests (#15443) Reviewers: Bruno Cadonna <[email protected]> * KAFKA-14589 [2/4] Tests of ConsoleGroupCommand rewritten in java (#15363) This PR is part of #14471 It contains some of ConsoleGroupCommand tests rewritten in java. Intention of separate PR is to reduce changes and simplify review. Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16246: Cleanups in ConsoleConsumer (#15457) Reviewers: Mickael Maison <[email protected]>, Omnia Ibrahim <[email protected]> * KAFKA-14133: Move consumer mock in TaskManagerTest to Mockito - part 2 (#15261) The previous pull request in this series was #15112. This pull request continues the migration of the consumer mock in TaskManagerTest test by test for easier reviews. I envision there will be at least 1 more pull request to clean things up. For example, all calls to taskManager.setMainConsumer should be removed. Reviewer: Bruno Cadonna <[email protected]> * KAFKA-16100: Add timeout to all the CompletableApplicationEvents (#15455) This is part of the larger task of enforcing the timeouts for application events, per KAFKA-15974. This takes a first step by adding a Timer to all of the CompletableApplicationEvent subclasses. For the few classes that already included a timeout, this refactors them to use the Timer mechanism instead. Reviewers: Andrew Schofield <[email protected]>, Bruno Cadonna <[email protected]> * MINOR: Add 3.7.0 to core and client's upgrade compatibility tests (#15452) Reviewers: Mickael Maison <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16319: Divide DeleteTopics requests by leader node (#15479) Reviewers: Reviewers: Mickael Maison <[email protected]>, Kirk True <[email protected]>, Daniel Gospodinow <[email protected]> * MINOR: Add read/write all operation (#15462) There are a few cases in the group coordinator service where we want to read from or write to each of the known coordinators (each of __consumer_offsets partitions). The current implementation needs to get the list of the known coordinators then schedules the operation and finally aggregate the results. This patch is an attempt to streamline this by adding multi read/write to the runtime. Reviewers: Omnia Ibrahim <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-15964: fix flaky StreamsAssignmentScaleTest (#15485) This PR bumps some timeouts due to slow Jenkins builds. Reviewers: Bruno Cadonna <[email protected]> * MINOR: Use INFO logging for tools tests (#15487) Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16202 Extra dot in error message in producer (#15296) The author of KAFKA-16202 noticed that there is an extra dot in the error message for KafkaStorageException message. Looking into org.apache.kafka.clients.producer.internals.Sender, it turns out that the string for the message to be sent in completeBatch() added an extra dot. I think that the formatted component (error.exception(response.errorMessage).toString())) of the error message already has a dot in the end of its string. Thus the dot after the "{}" sign caused the extra dot. Reviewers: "Gyeongwon, Do" <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16325 Add missing producer metrics to documentatio (#15466) Add `buffer-exhausted-rate`, `buffer-exhausted-total`, `bufferpool-wait-ratio` and `metadata-wait-time-ns-total` Reviewers: Chia-Ping Tsai <[email protected]> * MINOR: Reduce memory allocation in ClientTelemetryReporter (#15402) Reviewers: Divij Vaidya <[email protected]> * KAFKA-10892: Shared Readonly State Stores ( revisited ) (#12742) Implements KIP-813. Reviewers: Matthias J. Sax <[email protected]>, Walker Carlson <[email protected]> * KAFKA-14589 [4/4] Tests of ConsoleGroupCommand rewritten in java (#15465) Reviewers: Chia-Ping Tsai <[email protected]> * TRIVIAL: fix typo * HOTFIX: fix html markup * MINOR: Fix incorrect syntax for config (#15500) Fix incorrect syntax for config. Reviewers: Matthias J. Sax <[email protected]> * MINOR: remove the copy constructor of LogSegment (#15488) In the LogSegment, the copy constructor is only used in LogLoaderTest Reviewers: Chia-Ping Tsai <[email protected]> * MINOR: Cleanup log.dirs in ReplicaManagerTest on JVM exit (#15289) - Scala TestUtils now delegates to the function in JTestUtils - The function is modified such that we delete the rootDir on JVM exit if it didn't exist prior to the function being invoked. Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]> * MINOR: change "inter.broker.protocol version" to inter.broker.protocol.version (#15504) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16146: Checkpoint log-start-offset for remote log enabled topics (#15201) The log-start-offset was not getting flushed to the checkpoint file due to the check where we compare the log-start-offset with the localLog first segment base offset only. This change makes sure that tiered storage enabled topics will always try to add their entries in the log-start-offset checkpoint file. Reviewers: Jun Rao <[email protected]>, Satish Duggana <[email protected]> * KAFKA-14133: Move consumer mock in TaskManagerTest to Mockito - part 3 (#15497) The previous pull request in this series was #15261. This pull request continues the migration of the consumer mock in TaskManagerTest test by test for easier reviews. The next pull request in the series will be #15254 which ought to complete the Mockito migration for the TaskManagerTest class Reviewer: Bruno Cadonna <[email protected]> * KAFKA-16227: Avoid IllegalStateException during fetch initialization (#15491) The AsyncKafkaConsumer might throw an IllegalStateException during the initialization of a new fetch. The exception is caused by the partition being unassigned by the background thread before the subscription state is accessed during initialisation. This commit avoids the IllegalStateException by verifying that the partition was not unassigned each time the subscription state is accessed. Reviewer: Lucas Brutschy <[email protected]> * MINOR: Tweak streams config doc (#15518) Reviewers: Matthias J. Sax <[email protected]> * MINOR: Resolve SSLContextFactory.getNeedClientAuth deprecation (#15468) Reviewers: Mickael Maison <[email protected]> * MINOR; Make string from array (#15526) If toString is called on an array it returns the string representing the object reference. Use mkString instead to print the content of the array. Reviewers: Luke Chen <[email protected]>, Justine Olshan <[email protected]>, Lingnan Liu <[email protected]> * MINOR: simplify consumer logic (#15519) For static member, the `group.instance.id` cannot change. Reviewers: Chia-Ping Tsai <[email protected]>, Lianet Magrans <[email protected]>, David Jacot <[email protected]> * MINOR: Kafka Streams docs fixes (#15517) - add missing section to TOC - add default value for client.id Reviewers: Lucas Brutschy <[email protected]>, Bruno Cadonna <[email protected]> * KAFKA-16249; Improve reconciliation state machine (#15364) This patch re-work the reconciliation state machine on the server side with the goal to fix a few issues that we have recently discovered. * When a member acknowledges the revocation of partitions (by not reporting them in the heartbeat), the current implementation may miss it. The issue is that the current implementation re-compute the assignment of a member whenever there is a new target assignment installed. When it happens, it does not consider the reported owned partitions at all. As the member is supposed to only report its own partitions when they change, the member is stuck. * Similarly, as the current assignment is re-computed whenever there is a new target assignment, the rebalance timeout, as it is currently implemented, becomes useless. The issue is that the rebalance timeout is reset whenever the member enters the revocation state. In other words, in the current implementation, the timer is reset when there are no target available even if the previous revocation is not completed yet. The patch fixes these two issues by not automatically recomputing the assignment of a member when a new target assignment is available. When the member must revoke partitions, the coordinator waits. Otherwise, it recomputes the next assignment. In other words, revoking is really blocking now. The patch also proposes to include an explicit state in the record. It makes the implementation cleaner and it also makes it more extensible in the future. The patch also changes the record format. This is a non-backward compatible change. I think that we should do this change to cleanup the record. As KIP-848 is only in early access in 3.7 and that we clearly state that we don't plane to support upgrade from it, this is acceptable in my opinion. Reviewers: Jeff Kim <[email protected]>, Justine Olshan <[email protected]> * KAFKA-13922: Adjustments for jacoco, coverage reporting (#11982) Jacoco and scoverage reporting hasn't been working for a while. This commit fixes report generation. After this PR only subproject level reports are generated as Jenkins and Sonar only cares about that. This PR doesn't change Kafka's Jenkinsfile. Reviewers: Viktor Somogyi-Vass <[email protected]> * MINOR: AddPartitionsToTxnManager performance optimizations (#15454) Reviewers: Mickael Maison <[email protected]>, Justine Olshan <[email protected]> * KAFKA-14683 Cleanup WorkerSinkTaskTest (#15506) 1) Rename WorkerSinkTaskMockitoTest back to WorkerSinkTaskTest 2) Tidy up the code a bit 3) rewrite "fail" by "assertThrow" Reviewers: Omnia Ibrahim <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16342 fix getOffsetByMaxTimestamp for compressed records (#15474) Fix getOffsetByMaxTimestamp for compressed records. This PR adds: 1) For inPlaceAssignment case, compute the correct offset for maxTimestamp when traversing the batch records, and set to ValidationResult in the end, instead of setting to last offset always. 2) For not inPlaceAssignment, set the offsetOfMaxTimestamp for the log create time, like non-compressed, and inPlaceAssignment cases, instead of setting to last offset always. 3) Add tests to verify the fix. Reviewers: Jun Rao <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-15206: Fix the flaky RemoteIndexCacheTest.testClose test (#15523) It is possible that due to resource constraint, ShutdownableThread#run might be called later than the ShutdownableThread#close method. Reviewers: Luke Chen <[email protected]>, Divij Vaidya <[email protected]> * MINOR: Update javadocs and exception string in "deprecated" ProcessorRecordContext#hashcode (#15508) This PR updates the javadocs for the "deprecated" hashCode() method of ProcessorRecordContext, as well as the UnsupportedOperationException thrown in its implementation, to actually explain why the class is mutable and therefore unsafe for use in hash collections. They now point out the mutable field in the class (namely the Headers) Reviewers: Matthias Sax <[email protected]>, Bruno Cadonna <[email protected]> * KAFKA-16358: Sort transformations by name in documentation; add missing transformations to documentation; add hyperlinks (#15499) Reviewers: Yash Mayya <[email protected]> * MINOR: Only enable replay methods to modify timeline data structure (#15528) The patch prevents the main method (the method generating records) from modifying the timeline data structure `groups` by calling `getOrMaybeCreateConsumerGroup` in kip-848 new group coordinator. Only replay methods are able to add the newly created group to `groups`. Reviewers: David Jacot <[email protected]> * KAFKA-16231: Update consumer_test.py to support KIP-848’s group protocol config (#15330) Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]> * MINOR: Cleanup BoundedList to Make Constructors More Safe (#15507) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16267: Update consumer_group_command_test.py to support KIP-848’s group protocol config (#15537) * KAFKA-16267: Update consumer_group_command_test.py to support KIP-848’s group protocol config Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢 Note: this requires #15330. * Update consumer_group_command_test.py Reviewers: Lucas Brutschy <[email protected]> * KAFKA-16268: Update fetch_from_follower_test.py to support KIP-848’s group protocol config (#15539) Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]> * KAFKA-16269: Update reassign_partitions_test.py to support KIP-848’s group protocol config (#15540) Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]> * KAFKA-16270: Update snapshot_test.py to support KIP-848’s group protocol config (#15538) Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]> * KAFKA-16190: Member should send full heartbeat when rejoining (#15401) When the consumer rejoins, heartbeat request builder make sure that all fields are sent in the heartbeat request. Reviewers: Lucas Brutschy <[email protected]> * MINOR: fix flaky EosIntegrationTest (#15494) Bumping some timeout due to slow Jenkins build. Reviewers: Bruno Cadonna <[email protected]> * MINOR: Remove unused client side assignor fields/classes (#15545) In https://github.com/apache/kafka/pull/15364, we introduced, thoughtfully, a non-backward compatible record change for the new consumer group protocol. So it is a good opportunity for cleaning unused fields, mainly related to the client side assignor logic which is not implemented yet. It is better to introduce them when we need them and more importantly when we implement it. Note that starting from 3.8, we won't make such changes anymore. Non-backward compatible changes are still acceptable now because we clearly said that upgrade won't be supported from the KIP-848 EA. Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16369: Broker may not shut down when SocketServer fails to bind as Address already in use (#15530) * KAFKA-16369: wait on enableRequestProcessingFuture Add a Wait in in KafkaServer (ZK mode) for all the SocketServer ports to be open, and the Acceptors to be started The BrokerServer (KRaft mode) had such a wait, which was missing from the KafkaServer (ZK mode). Add unit test. * KAFKA-16312, KAFKA-16185: Local epochs in reconciliation (#15511) The goal of this commit is to change the following internals of the reconciliation: - Introduce a "local epoch" to the local target assignment. When a new target is received by the server, we compare it with the current value. If it is the same, no change. Otherwise, we bump the local epoch and store the new target assignment. Then, on the reconciliation, we also store the epoch in the reconciled assignment and keep using target != current to trigger the reconciliation. - When we are not in a group (we have not received an assignment), we use null to represent the local target assignment instead of an empty list, to avoid confusions with an empty assignment received by the server. Similarly, we use null to represent the current assignment, when we haven't reconciled the assignment yet. We also carry the new epoch into the request builder to ensure that we report the owned partitions for the last local epoch. - To address KAFKA-16312 (call onPartitionsAssigned on empty assignments after joining), we apply the initial assignment returned by the group coordinator (whether empty or not) as a normal reconciliation. This avoids introducing another code path to trigger rebalance listeners - reconciliation is the only way to transition to STABLE. The unneeded parts of reconciliation (autocommit, revocation) will be skipped in the existing. Since a lot of unit tests assumed that not reconciliation behavior is invoked when joining the group with an empty assignment, this required a lot of the changes in the unit tests. Reviewers: Lianet Magrans <[email protected]>, David Jacot <[email protected]> * MINOR; Log reason for deleting a kraft snapshot (#15478) There are three reasons why KRaft would delete a snapshot. One, it is older than the retention time. Two, the total number of bytes between the log and the snapshot excess the configuration. Three, the latest snapshot is newer than the log. This change allows KRaft to log the exact reason why a snapshot is getting deleted. Reviewers: David Arthur <[email protected]>, Hailey Ni <[email protected]> * KAFKA-16352: Txn may get get stuck in PrepareCommit or PrepareAbort state (#15524) Now the removal of entries from the transactionsWithPendingMarkers map checks the value and all pending marker operations keep the value along with the operation state. This way, the pending marker operation can only delete the state it created and wouldn't accidentally delete the state from a different epoch (which could lead to "stuck" transactions). Reviewers: Justine Olshan <[email protected]> * KAFKA-16341 fix the LogValidator for non-compressed type (#15476) - Fix the verifying logic. If it's LOG_APPEND_TIME, we choose the offset of the first record. Else, we choose the record with the maxTimeStamp. - rename the shallowOffsetOfMaxTimestamp to offsetOfMaxTimestamp Reviewers: Jun Rao <[email protected]>, Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16367; Full ConsumerGroupHeartbeat response must be sent when full request is received (#15533) This patch fixes a bug in the logic which decides when a full ConsumerGroupHeartbeat response must be returned to the client. Prior to it, the logic only relies on the `ownedTopicPartitions` field to check whether the response was a full response. This is not enough because `ownedTopicPartitions` is also set in different situations. This patch changes the logic to check `ownedTopicPartitions`, `subscribedTopicNames` and `rebalanceTimeoutMs` as they are the only three non optional fields. Reviewers: Lianet Magrans <[email protected]>, Jeff Kim <[email protected]>, Justine Olshan <[email protected]> * KAFKA-12187 replace assertTrue(obj instanceof X) with assertInstanceOf (#15512) Reviewers: Chia-Ping Tsai <[email protected]> * MINOR: Update upgrade docs to refer 3.6.2 version * KAFKA-16222: desanitize entity name when migrate client quotas (#15481) The entity name is sanitized when it's in Zk mode. We didn't desanitize it when we migrate client quotas. Add Sanitizer.desanitize to fix it. Reviewers: Luke Chen <[email protected]> * KAFKA-14589 ConsumerGroupCommand rewritten in java (#14471) This PR contains changes to rewrite ConsumerGroupCommand in java and transfer it to tools module Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16313: Offline group protocol migration (#15546) This patch enables an empty classic group to be automatically converted to a new consumer group and vice versa. Reviewers: David Jacot <[email protected]> * KAFKA-16392: Stop emitting warning log message when parsing source connector offsets with null partitions (#15562) Reviewers: Yash Mayya <[email protected]> * MINOR : Removed the depreciated information about Zk to Kraft migration. (#15552) Reviewers: Luke Chen <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16318 : add javafoc for kafka metric (#15483) Add the javadoc for KafkaMetric Reviewers: Mickael Maison <[email protected]>, Chia-Ping Tsai <[email protected]> * KAFKA-16206: Fix unnecessary topic config deletion during ZK migration (#14206) Reviewers: Mickael Maison <[email protected]>, Ron Dagostino <[email protected]> * KAFKA-16273: Update consumer_bench_test.py to use consumer group protocol (#15548) Adding this as part of the greater effort to modify the system tests to incorporate the use of consumer group protocol from KIP-848. Following is the test results and the tests using protocol = consumer are expected to fail: ================================================================================ SESSION REPORT (ALL TESTS) ducktape version: 0.11.4 session_id: 2024-03-16--002 run time: 76 minutes 36.150 seconds tests run: 28 passed: 25 flaky: 0 failed: 3 ignored: 0 ================================================================================ Reviewers: Lucas Brutschy <[email protected]>, Kirk True <[email protected]> * MINOR: KRaft upgrade tests should only use latest stable mv (#15566) This should help us avoid testing MVs before they are usable (stable). We revert back from testing 3.8 in this case since 3.7 is the current stable version. Reviewers: Proven Provenzano <[email protected]>, Justine Olshan <[email protected]> * KAFKA-14133: Move stateDirectory mock in TaskManagerTest to Mockito (#15254) This pull requests migrates the StateDirectory mock in TaskManagerTest from EasyMock to Mockito. The change is restricted to a single mock to minimize the scope and make it easier for review. Reviewers: Ismael Juma <[email protected]>, Bruno Cadonna <[email protected]> * KAFKA-16271: Upgrade consumer_rolling_upgrade_test.py (#15578) Upgrading the test to use the consumer group protocol. The two tests are failing due to Mismatch Assignment Reviewers: Lucas Brutschy <[email protected]> * KAFKA-16274: Update replica_scale_test.py to support KIP-848’s group protocol config (#15577) Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]> * KAFKA-16276: Update transactions_test.py to support KIP-848’s group protocol config (#15567) Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢 Reviewers: Lucas Brutschy <[email protected]> * KAFKA-16314: Introducing the AbortableTransactionException (#15486) As a part of KIP-890, we are introducing a new class of Exceptions which when encountered shall lead to Aborting the ongoing Transaction. The following PR introduces the same with client side handling and server side changes. On client Side, the code attempts to handle the exception as an Abortable error and ensure that it doesn't take the producer to a fatal state. For each of the Transactional APIs, we have added the appropriate handling. For the produce request, we have verified that the exception transitions the state to Aborted. On the server side, we have bumped the ProduceRequest, ProduceResponse, TxnOffestCommitRequest and TxnOffsetCommitResponse Version. The appropriate handling on the server side has been added to ensure that the new error case is sent back only for the new clients. The older clients will continue to get the old Invalid_txn_state exception to maintain backward compatibility. Reviewers: Calvin Liu <[email protected]>, Justine Olshan <[email protected]> * KAFKA-16381 use volatile to guarantee KafkaMetric#config visibility across threads (#15550) Reviewers: vamossagar12 <[email protected]>, Chia-Ping Tsai <[email protected]> * MINOR: Tuple2 replaced with Map.Entry (#15560) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16388 add production-ready test of 3.3 - 3.6 release to MetadataVersionTest.testFromVersionString (#15563) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16408 kafka-get-offsets / GetOffsetShell doesn't handle --version or --help (#15583) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16410 kafka-leader-election / LeaderElectionCommand doesn't set exit code on error (#15591) Reviewers: Chia-Ping Tsai <[email protected]> * KAFKA-16374; High watermark updates should have a higher priority (#15534) When the group coordinator is under heavy load, the current mechanism to release pending events based on updated high watermark, which consist in pushing an event at the end of the queue, is bad because pending events pay the cost of the queue twice. A first time for the handling of the first event and a second time for the handling of the hwm update. This patch changes this logic to push the hwm update event to the front of the queue in order to release pending events as soon as as possible. Reviewers: Jeff Kim <[email protected]>, Justine Olshan <[email protected]> * KAFKA-15882: Add nightly docker image scan job (#15013) Reviewers: Mickael Maison <[email protected]> * KAFKA-16375: Fix for rejoin while reconciling (#15579) This PR includes a fix to properly identify a reconciliation that should be interrupted and not applied because the member has rejoined. It does so simply based on a flag (not epochs, server or local). If the member has rejoined while reconciling, the reconciliation will be interrupted. This also ensures that the check to abort the reconciliation is performed on all the 3 stages of the reconciliation that could be delayed: commit, onPartitionsRevoked, onPartitionsAssigned. Reviewers: David Jacot <[email protected]>, Lucas Brutschy <[email protected]> * KAFKA-16406: Splitting consumer integration test (#15535) Splitting consumer integration tests to allow for parallelization and reduce build times. This PR is only extracting tests from PlainTextConsumerTest into separate files, no changes in logic. Grouping tests by the feature they relate to so that they can be easily found Reviewers: Andrew Schofield <[email protected]>, Lucas Brutschy <[email protected]> * KAFKA-15950: Serialize heartbeat requests (#14903) In between HeartbeatRequest being sent and the response being handled, i.e. while a HeartbeatRequest is in flight, an extra request may be immediately scheduled if propagateDirectoryFailure, setReadyToUnfence, or beginControlledShutdown is called. To prevent the extra request, we can avoid the extra requests by checking whether a request is in flight, and delay the scheduling if necessary. Some of the tests in BrokerLifecycleManagerTest are also improved to remove race conditions and reduce flakiness. Reviewers: Colin McCabe <[email protected]>, Ron Dagostino <[email protected]>, Jun Rao <[email protected]> * KAFKA-16224: Do not retry committing if topic or partition deleted (#15581) Current logic for auto-committing offsets when partitions are revoked will retry continuously when getting UNKNOWN_TOPIC_OR_PARTITION, leading to the member not completing the revocation in time. This commit considers error UNKNOWN_TOPIC_OR_PARTITION to be fatal in the context of an auto-commit of offsets before a revocation, even though the error is defined as retriable. This ensures that the revocation can finish in time. Reviewers: Andrew Schofield <[email protected]>, Lucas Brutschy <[email protected]>, Lianet Magrans <[email protected]> * KAFKA-16386: Convert NETWORK_EXCEPTIONs from KIP-890 transaction verification (#15559) KIP-890 Part 1 introduced verification of transactions with the transaction coordinator on the `Produce` and `TxnOffsetCommit` paths. This introduced the possibility of new errors when responding to those requests. For backwards compatibility with older clients, a choice was made to convert some of the new retriable errors to existing errors that are expected and retried correctly by older clients. `NETWORK_EXCEPTION` was forgotten about and not converted, but can occur if, for example, the transaction coordinator is temporarily refusing connections. Now, we convert it to: * `NOT_ENOUGH_REPLICAS` on the `Produce` path, just like the other retriable errors that can arise from transaction verification. * `COORDINATOR_LOAD_IN_PROGRESS` on the `TxnOffsetCommit` path. This error does not force coordinator lookup on clients, unlike `COORDINATOR_NOT_AVAILABLE`. Note that this deviates from KIP-890, which says that retriable errors should be converted to `COORDINATOR_NOT_AVAILABLE`. Reviewers: Artem Livshits <[email protected]>, David Jacot <[email protected]>, Justine Olshan <[email protected]> * KAFKA-16409: DeleteRecordsCommand should use standard exception handling (#15586) DeleteRecordsCommand should use standard exception handling Reviewers: Luke Chen <[email protected]> * KAFKA-16415 Fix handling of "--version" option in ConsumerGroupCommand (#15592) Reviewers: Chia-Ping Tsai <[email protected]> * fix(test): fix ElasticUnifiedLog test Signed-off-by: Robin Han <[email protected]> --------- Signed-off-by: Greg Harris <[email protected]> Signed-off-by: Robin Han <[email protected]> Co-authored-by: Gaurav Narula <[email protected]> Co-authored-by: Philip Nee <[email protected]> Co-authored-by: David Jacot <[email protected]> Co-authored-by: John Yu <[email protected]> Co-authored-by: Christo Lolov <[email protected]> Co-authored-by: Ritika Reddy <[email protected]> Co-authored-by: Kirk True <[email protected]> Co-authored-by: Lucas Brutschy <[email protected]> Co-authored-by: Jamie <[email protected]> Co-authored-by: Anuj Sharma <[email protected]> Co-authored-by: Jamie Holmes <[email protected]> Co-authored-by: Christopher Webb <[email protected]> Co-authored-by: Said Boudjelda <[email protected]> Co-authored-by: PoAn Yang <[email protected]> Co-authored-by: Ayoub Omari <[email protected]> Co-authored-by: Ismael Juma <[email protected]> Co-authored-by: Gyeongwon, Do <[email protected]> Co-authored-by: Bruno Cadonna <[email protected]> Co-authored-by: Greg Harris <[email protected]> Co-authored-by: Nikolay <[email protected]> Co-authored-by: Dongnuo Lyu <[email protected]> Co-authored-by: Colin Patrick McCabe <[email protected]> Co-authored-by: Victor van den Hoven <[email protected]> Co-authored-by: Cheng-Kai, Zhang <[email protected]> Co-authored-by: Johnny Hsu <[email protected]> Co-authored-by: Matthias J. Sax <[email protected]> Co-authored-by: Hector Geraldino <[email protected]> Co-authored-by: Dmitry Werner <[email protected]> Co-authored-by: Stanislav Kozlovski <[email protected]> Co-authored-by: Andrew Schofield <[email protected]> Co-authored-by: PoAn Yang <[email protected]> Co-authored-by: Dung Ha <[email protected]> Co-authored-by: Owen Leung <[email protected]> Co-authored-by: testn <[email protected]> Co-authored-by: Daan Gerits <[email protected]> Co-authored-by: Joel Hamill <[email protected]> Co-authored-by: Kamal Chandraprakash <[email protected]> Co-authored-by: Cheryl Simmons <[email protected]> Co-authored-by: José Armando García Sancio <[email protected]> Co-authored-by: Andras Katona <[email protected]> Co-authored-by: David Mao <[email protected]> Co-authored-by: Luke Chen <[email protected]> Co-authored-by: A. Sophie Blee-Goldman <[email protected]> Co-authored-by: Chris Holland <[email protected]> Co-authored-by: TapDang <[email protected]> Co-authored-by: Edoardo Comar <[email protected]> Co-authored-by: Artem Livshits <[email protected]> Co-authored-by: Kuan-Po (Cooper) Tseng <[email protected]> Co-authored-by: Manikumar Reddy <[email protected]> Co-authored-by: Chris Egerton <[email protected]> Co-authored-by: Alyssa Huang <[email protected]> Co-authored-by: Sanskar Jhajharia <[email protected]> Co-authored-by: Vedarth Sharma <[email protected]> Co-authored-by: Lianet Magrans <[email protected]> Co-authored-by: Igor Soarez <[email protected]> Co-authored-by: Sean Quah <[email protected]>
- Loading branch information