Tasks assignment strategy - commons integration - [KCON-63] #384

muralibasani · 2025-01-06T16:46:31Z

[KCON-63]

Integrate Task assignment strategies of common module into s3 release feature branch
Delete hard coding of file pattern from s3 iterator class
Update existing tests
Added new integration tests to verify other strategy use cases

aindriu-aiven · 2025-01-07T08:43:59Z

commons/src/main/java/io/aiven/kafka/connect/common/config/SourceConfigFragment.java

 import org.apache.kafka.common.config.AbstractConfig;
 import org.apache.kafka.common.config.ConfigDef;

 import io.aiven.kafka.connect.common.config.enums.ErrorsTolerance;
+import io.aiven.kafka.connect.common.config.enums.ObjectDistributionStrategies;

 import org.codehaus.plexus.util.StringUtils;


Hi @muralibasani I know you didn't add it here but can you update this import to use
'import org.apache.commons.lang3.StringUtils;' instead of plexus which we are pulling in from the kafka library (and later versions dont include)

aindriu-aiven · 2025-01-07T08:45:46Z

commons/src/main/java/io/aiven/kafka/connect/common/config/SourceConfigFragment.java

+                "Based on tasks.max config and this strategy, objects are processed in distributed"
+                        + " way by Kafka connect workers, supported values : " + OBJECT_HASH + ", "
+                        + PARTITION_IN_FILENAME + ", " + PARTITION_IN_FILEPATH,
+                GROUP_OTHER, sourcePollingConfigCounter++, ConfigDef.Width.NONE, OBJECT_DISTRIBUTION_STRATEGY); // NOPMD


Adding SourcePollingConfigCounter++ here means we should remove the //NOPMD from line 66 also can you add the comment // Unused Assignment as an explanation for this here please? (removing it from line 67)

It was an incorrect usage of counter. Updating it to use offsetStorageGroupCounter.

aindriu-aiven · 2025-01-07T08:49:00Z

commons/src/main/java/io/aiven/kafka/connect/common/config/SourceConfigFragment.java

@@ -92,6 +105,10 @@ public String getErrorsTolerance() {
        return cfg.getString(ERRORS_TOLERANCE);
    }

+    public String getObjectDistributionStrategy() {


Suggested change

public String getObjectDistributionStrategy() {

public ObjectDistributionStrategies getObjectDistributionStrategy() {

return ObjectDistributionStrategies.forName(sourceConfigFragment.getObjectDistributionStrategy());

The same is being done for ErrorsTolerance in another PR so we should keep this consistent.

Also I am not sure should it be called ObjectDistributionStrategies or ObjectDistributionStrategy as the enum will return one strategy ?

thought so, but ObjectDistributionStrategy already exists as an interface in commons.

Change the interface to be "DistributionStrategy" as each of the strategies are implementations of a distribution strategy. Then you can use ObjectDistributionStrategy as the enum name.

Can you update this to return ObjectDistributionStrategy instead of String?

@muralibasani this one is still outstanding just to update the returning value to be the object.

missed it, updated.

aindriu-aiven · 2025-01-07T08:53:55Z

.../src/main/java/io/aiven/kafka/connect/common/source/task/HashObjectDistributionStrategy.java

+
+    @Override
+    public Pattern getFilePattern() {
+        return filePattern;


As the HashObjectDistributionStrategy does not use the filePattern I think getFilePattern here should throw a NotImplementedExcepetion() as it is unexpected that it ever gets called here.

Even for object hash strategy, pattern is used in iterator to extract topic and partition

The extraction of topic and partition should not depend on the distribution strategy going forward.

They are separate concerns and should be implemented as such.

Good point. Pattern configuring is moved to source task.

aindriu-aiven · 2025-01-07T08:54:22Z

.../src/main/java/io/aiven/kafka/connect/common/source/task/HashObjectDistributionStrategy.java

+
+    private void configureDistributionStrategy(final int maxTasks, final String expectedSourceNameFormat) {
+        this.maxTasks = maxTasks;
+        this.filePattern = configurePattern(expectedSourceNameFormat);


This filePattern should not need to be set as it is unused here as well.

this pattern is required in this hash object strategy too, pls check source iterator class.

aindriu-aiven · 2025-01-07T09:15:35Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/ObjectDistributionStrategy.java

+     * Based on the format of the file name or prefix, Pattern is created for each of the strategies.
+     */
+    default Pattern configurePattern(final String expectedSourceNameFormat) {
+        if (expectedSourceNameFormat == null || !expectedSourceNameFormat.contains(PARTITION_PATTERN)) {


Same for this configure Pattern it is really strategy implementation and shouldn't be added here.

Pattern is must for all the strategies on source connectors. Based on this,

extract topic and partition in source iterator

task assignments
are done.
If pattern is not done here, we would have to duplicate the whole piece of code again to create pattern

Since some task distributions require topic and partition, topic and partition extraction should be done first and should be made available to the task distribution. This way, if future S3 implementations use a different strategy for topic and/or partition identification these strategies will work fine.

Agree, moved.

aindriu-aiven · 2025-01-07T09:20:11Z

...main/java/io/aiven/kafka/connect/common/source/task/PartitionInPathDistributionStrategy.java

    private final static Logger LOG = LoggerFactory.getLogger(PartitionInPathDistributionStrategy.class);
-
-    private String prefix;
+    private String s3Prefix;


This is commons code so we shouldn't have any reference specifically to S3 here.

...main/java/io/aiven/kafka/connect/common/source/task/PartitionInPathDistributionStrategy.java

aindriu-aiven · 2025-01-07T09:28:57Z

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/AWSV2SourceClient.java

@@ -128,19 +151,25 @@ public void addFailedObjectKeys(final String objectKey) {
        this.failedObjectKeys.add(objectKey);
    }

-    public void setFilterPredicate(final Predicate<S3Object> predicate) {
-        filterPredicate = predicate;


We should be just passing in the Strategy as a predicate here, and if it matches that predicate then, in the future as we add features, predicates can be chained making it really easy to extend this quickly.

If DistributionStrategy has a getTaskFor(String) method the predicate can be constructed as
s3Object -> task == distributionStrategy.getTaskFor(s3Object.key())

I think we might want a new method in AWSV2SourceClient called addPredicate(Predicate<S3Object> newPredicate) that will do

this.filterPredicate = this.filterPredicate.and(newPredicate)

I think this still needs to be updated as well, so we can link the predicates here.

Removed failed object keys from the predicate list, I do not see any valid processing for it, as data exceptions are handled.
Code updated.

There should still be a way to link the predicates this allows better control and configuration of the records returned.

aindriu-aiven · 2025-01-07T09:31:14Z

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/S3SourceTask.java

@@ -97,13 +97,14 @@ public void start(final Map<String, String> props) {
        this.transformer = TransformerFactory.getTransformer(s3SourceConfig);
        offsetManager = new OffsetManager(context, s3SourceConfig);
        awsv2SourceClient = new AWSV2SourceClient(s3SourceConfig, failedObjectKeys);
+        awsv2SourceClient.initializeObjectDistributionStrategy();


We should initialize the ObjctDistributionStrategy here in S3SourceTask and add it as a predicate.

Initially started with it, but stuck with some issue and couldn't back to this. thx

Claudenw

Overall looks like a good start. Getting it settled into the streaming strategy will take a bit of work.

Claudenw · 2025-01-07T10:51:44Z

commons/src/main/java/io/aiven/kafka/connect/common/config/SourceConfigFragment.java

@@ -92,6 +105,10 @@ public String getErrorsTolerance() {
        return cfg.getString(ERRORS_TOLERANCE);
    }

+    public String getObjectDistributionStrategy() {


Change the interface to be "DistributionStrategy" as each of the strategies are implementations of a distribution strategy. Then you can use ObjectDistributionStrategy as the enum name.

Claudenw · 2025-01-07T10:53:22Z

.../src/main/java/io/aiven/kafka/connect/common/source/task/HashObjectDistributionStrategy.java

+
+    @Override
+    public Pattern getFilePattern() {
+        return filePattern;


The extraction of topic and partition should not depend on the distribution strategy going forward.

They are separate concerns and should be implemented as such.

Claudenw · 2025-01-07T10:55:40Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/ObjectDistributionStrategy.java

+    String TOPIC_NAMED_GROUP_REGEX_PATTERN = "(?<" + PATTERN_TOPIC_KEY + ">[a-zA-Z0-9\\-_.]+)";
+    String START_OFFSET_PATTERN = "{{start_offset}}";
+    String TIMESTAMP_PATTERN = "{{timestamp}}";
+    String DEFAULT_PREFIX_FILE_PATH_PATTERN = "topics/{{topic}}/partition={{partition}}/";


The files based distribution strategies will need to use these as will any file base topic and partition based extraction. So how about an interface or static class that contains the file based patterns. Call it something like FileExctractionPatterns

Claudenw · 2025-01-07T10:58:21Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/ObjectDistributionStrategy.java

+     * Based on the format of the file name or prefix, Pattern is created for each of the strategies.
+     */
+    default Pattern configurePattern(final String expectedSourceNameFormat) {
+        if (expectedSourceNameFormat == null || !expectedSourceNameFormat.contains(PARTITION_PATTERN)) {


Since some task distributions require topic and partition, topic and partition extraction should be done first and should be made available to the task distribution. This way, if future S3 implementations use a different strategy for topic and/or partition identification these strategies will work fine.

commons/src/main/java/io/aiven/kafka/connect/common/config/SourceConfigFragment.java

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/S3SourceTask.java

aindriu-aiven · 2025-01-08T11:39:14Z

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/AWSV2SourceClient.java

    /**
     * @param s3SourceConfig
     *            configuration for Source connector
     * @param failedObjectKeys
     *            all objectKeys which have already been tried but have been unable to process.
     */
-    public AWSV2SourceClient(final S3SourceConfig s3SourceConfig, final Set<String> failedObjectKeys) {
+    public AWSV2SourceClient(final S3SourceConfig s3SourceConfig, final Set<String> failedObjectKeys,
+            final DistributionStrategy distributionStrategy, final int taskId, final Pattern filePattern) {


So the AWS client should be agnostic to how the task assignment works its only real focus should be on talking to AWS, so we should not set the filePattern/taskId/distributionStrategy here.

We should call setFilterPredicate either from the S3SourceTask or the SourceRecordIterator.

aindriu-aiven · 2025-01-08T11:40:44Z

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/AWSV2SourceClient.java

-        final int taskAssignment = Math.floorMod(objectKey.hashCode(), maxTasks);
-        return taskAssignment == taskId;
+    public void setFilterPredicate(final Predicate<S3Object> basePredicate) {
+        this.filterPredicate = basePredicate


I think we do need this update to look something like the below and this way we just and additional predicates as we need them.

Suggested change

this.filterPredicate = basePredicate

public void setFilterPredicate(final Predicate<S3Object> predicate) {

this.filterPredicate = this.filterPredicate.and(predicate);

This doesn't account for or predicates but we dont have a use case for that yet so I think it should be ok for the moment.

aindriu-aiven · 2025-01-08T12:49:10Z

commons/src/main/java/io/aiven/kafka/connect/common/config/SourceCommonConfig.java

+    public ObjectDistributionStrategy getObjectDistributionStrategy() {
+        return ObjectDistributionStrategy.forName(sourceConfigFragment.getObjectDistributionStrategy());
+    }
+


We should also add in getTaskId and getMaxTasksId in CommonConfig.java

Actually it might be good in here as the existing sink connectors dont currently require it it can be moved up a level later if required.

aindriu-aiven · 2025-01-08T14:53:48Z

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/S3SourceTask.java

+        this.taskId = Integer.parseInt(s3SourceConfig.originals().get("task.id").toString()) % maxTasks;
+        DistributionStrategy distributionStrategy;
+
+        switch (objectDistributionStrategy) {


I think we actually need two separate regex's here.
One for the SourceRecordIterator and one for the Distribution Strategy.

The reason being that the fileNameFragment set in the Partition in filepath for example has an any filename pattern at the end, but users will still want to be able to set the filename template separately to only include certain files in for example they may want to put *.png and only send images or they want files with a certain pattern.

The other thing I can't 100% tell here is the difference between
s3SourceConfig.getS3FileNameFragment().getFilenameTemplate().originalTemplate()
and
s3SourceConfig.getS3FileNameFragment().getFilenameTemplate().toString()

is one of them the default pattern we had previously? Ignore if not but if it is we should always look to get the custom one that has been configured and fall back on the default if not configured (also throw an error if its missing partition etc when we configure PARTITION_IN_FILENAME)

There is no difference between original template and tostr. Just checked. Added integration tests with default value and non default.

For prefix, introduced a new config for pattern.

This should allow users to provide any patterns on file names or prefixes as long as topic, partition are available.

Thanks for checking that for me

aindriu-aiven · 2025-01-09T13:49:19Z

commons/src/main/java/io/aiven/kafka/connect/common/config/FileNameFragment.java

@@ -112,6 +115,16 @@ public void ensureValid(final String name, final Object value) {
                // UnusedAssignment


NIT: to remove those now we have the one below.

aindriu-aiven · 2025-01-09T14:17:43Z

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java

            if (currentObjectKey != null) {
+                if (validateTaskDistributionStrategy(currentObjectKey)) {


I think with the new changes this might not be needed anymore as the predicate checks inside of the AWSV2Client as it builds the stream?

I think your earlier point was correct to keep task distribution in source task/iterator. as it is not relevant in Aws client. Keeping as is.
We shall also delete the failed objects from aws client sometime, and keeping only listing objects in aws client.

Yes agree as long as the predicate is created outside of the client and then we pass it in with the set predicate we should be good.

commons/src/main/java/io/aiven/kafka/connect/common/source/input/utils/FilePatternUtils.java

Claudenw · 2025-01-07T15:57:54Z

...s/src/main/java/io/aiven/kafka/connect/common/source/input/utils/FileExtractionPatterns.java

+ */
+
+package io.aiven.kafka.connect.common.source.input.utils;
+public class FileExtractionPatterns {


This should be a final class

Deleted this class

Claudenw · 2025-01-09T14:48:22Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/DistributionStrategy.java

I think that DistributionStrategy should be an abstract class that has one instance variable maxTasks
It should have 3 methods:

getMaxTasks()

getTaskFor(String)

configure(int maxTasks)

If we don't put the current task into the class then we can share the class across all instances of task.
We can construct the DistributionStrategy instance in the AivenKafkaCOnnectS3CSourceConnector.

Classes like PartitionInFilenameDistribution should have the pattern specified in the constructor.

In the end by not binding the instance to a specific task we will end up with a more flexible implementation that is not tightly coupled with a single task.

It might make sens to pass a CommonConfig in the DistributionStrategy constructor so that any implementation can have access to the known set of configuration properties.

It may also make sens for configure to accept the CommonConfig rather than the int maxTasks.

I shall refactor these in next pr.

I think this needs to be done correctly now this is not some minor change to an internal method this is the decision about how DistributionStrategy should be architected.

Your current design has opened the door for DistributionStrategies that require multiple parameters to require those parameters be known up and down the stack.

Claudenw · 2025-01-09T14:51:15Z

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/AWSV2SourceClient.java

@@ -128,19 +151,25 @@ public void addFailedObjectKeys(final String objectKey) {
        this.failedObjectKeys.add(objectKey);
    }

-    public void setFilterPredicate(final Predicate<S3Object> predicate) {
-        filterPredicate = predicate;


If DistributionStrategy has a getTaskFor(String) method the predicate can be constructed as
s3Object -> task == distributionStrategy.getTaskFor(s3Object.key())

Claudenw · 2025-01-09T14:53:26Z

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/AWSV2SourceClient.java

@@ -128,19 +151,25 @@ public void addFailedObjectKeys(final String objectKey) {
        this.failedObjectKeys.add(objectKey);
    }

-    public void setFilterPredicate(final Predicate<S3Object> predicate) {
-        filterPredicate = predicate;


I think we might want a new method in AWSV2SourceClient called addPredicate(Predicate<S3Object> newPredicate) that will do

this.filterPredicate = this.filterPredicate.and(newPredicate)

Claudenw · 2025-01-10T10:05:51Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/HashDistributionStrategy.java

+    public void reconfigureDistributionStrategy(final int maxTasks) {
+        this.maxTasks = maxTasks;
    }

    public void setMaxTasks(final int maxTasks) {
        this.maxTasks = maxTasks;
    }
+
+    private void configureDistributionStrategy(final int maxTasks) {
+        this.maxTasks = maxTasks;
+    }


These 2 methods do exactly the same thing. This is an indication that there should be one method to configure the distribution strategy after construction and that it should take a maxTasks argument. Change reconfigureDistributionStrategy to configureDistributionStrategy in the base class. and just call that.

Claudenw · 2025-01-10T10:06:08Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/DistributionStrategy.java

I think this needs to be done correctly now this is not some minor change to an internal method this is the decision about how DistributionStrategy should be architected.

Your current design has opened the door for DistributionStrategies that require multiple parameters to require those parameters be known up and down the stack.

.../java/io/aiven/kafka/connect/common/source/task/PartitionInFilenameDistributionStrategy.java

Claudenw · 2025-01-10T10:18:38Z

commons/src/main/java/io/aiven/kafka/connect/common/source/input/utils/FilePatternUtils.java

See notes elsewhere on partition strategy for background.

This class should have a method to extract all the patterns from a string and make them available so that we can call it once to get all the values set early in the process.

It should probably have methods to return the values (e.g. getTopic()) that return Optional types. returning Optional.empty() when the type name is not found in the pattern.

Updated FilePatternUtils to have those methods reusable.

This solution is close but... The FileNameUtils

Should have a constructor that takes the pattern and basically does what the current configurePattern() does.

Should have a method that takes a fileName and produces an object (Call it a context) Optional<Context> process(String fileName) This method will return a context if the fileName matches, or an empty optional if it does not.

Context should be a separate class.interface and should have methods getTopic(), getPartition() and getTimestamp(). These should be populated during the FileNameUtils.process() call noted above. Context should be a simple bean.

@muralibasani
I don't see these changes noted in KCON-98. Is there another ticket for that or is it part of KCON-98?

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/AWSV2SourceClient.java

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java

Claudenw · 2025-01-13T11:25:24Z

commons/src/main/java/io/aiven/kafka/connect/common/source/input/utils/FilePatternUtils.java

This solution is close but... The FileNameUtils

Should have a constructor that takes the pattern and basically does what the current configurePattern() does.

Should have a method that takes a fileName and produces an object (Call it a context) Optional<Context> process(String fileName) This method will return a context if the fileName matches, or an empty optional if it does not.

Context should be a separate class.interface and should have methods getTopic(), getPartition() and getTimestamp(). These should be populated during the FileNameUtils.process() call noted above. Context should be a simple bean.

Claudenw · 2025-01-13T11:28:27Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/DistributionStrategy.java

@@ -37,18 +38,16 @@ public interface ObjectDistributionStrategy {
     *            The value to be evaluated to determine if it should be processed by the task.
     * @return true if the task should process the object, false if it should not.
     */
-    boolean isPartOfTask(int taskId, String valueToBeEvaluated);
+    boolean isPartOfTask(int taskId, String valueToBeEvaluated, Pattern filePattern);


This method should just be int getTaskFor(Context) where Context is the class defined above for FileNameProcessing. This means that Distribution strategy is adding the requirement for Context to contain the file name.

In fact the interface should be an abstract class and it should handle tracking the number of tasks. Implementations should take the Context and calculate a long value as an implementation of an abstract method. Then getTaskFor(Context) can call that method and return that value % numberOfTasks.

All the classes become smaller and easier to maintain.

@Claudenw I see this as an improvement and we don't have to make all the refactorings for the mvp version.

Claudenw · 2025-01-13T11:29:11Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/DistributionStrategy.java

@@ -78,14 +77,12 @@ default boolean taskMatchesPartition(final int taskId, final int partitionId) {
     * @return true if the task supplied should handle the supplied partition
     */
    default boolean taskMatchesModOfPartitionAndMaxTask(final int taskId, final int maxTasks, final int partitionId) {
-
        return taskMatchesPartition(taskId, partitionId % maxTasks);
    }

    default boolean toBeProcessedByThisTask(final int taskId, final int maxTasks, final int partitionId) {


this method should be deleted.

Claudenw · 2025-01-13T11:29:35Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/DistributionStrategy.java

     */
-    void reconfigureDistributionStrategy(int maxTasks, String expectedFormat);
+    void reconfigureDistributionStrategy(int maxTasks);


This method should just be called configure

Claudenw · 2025-01-13T11:43:15Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/HashDistributionStrategy.java

If DistributionStrategy is an abstract class than this becomes an implementation that is simply

long generateValue(Context) { return context.getFileName().hashCode() };

The class can focus on what it does.

I think we discussed this not to make it abstract. Even if we need, let's move it to after mvp version

https://aiven.atlassian.net/browse/KCON-98 should do this.

Please add items to KCON-98 that identify all the changes that were requested in in PR so that we don't lose them.

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/AWSV2SourceClient.java

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java

Claudenw · 2025-01-13T12:14:28Z

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java

+    private boolean isFileMatchingPattern(final S3Object s3Object) {
+        final Optional<String> optionalTopic = FilePatternUtils.getTopic(filePattern, s3Object.key());
+        final Optional<Integer> optionalPartitionId = FilePatternUtils.getPartitionId(filePattern, s3Object.key());
+
+        if (optionalTopic.isPresent() && optionalPartitionId.isPresent()) {
+            topic = optionalTopic.get();
+            partitionId = optionalPartitionId.get();
+            return true;
+        }
+        return false;
+    }


If FilePatternUtils is implemented as noted above then there should be an instance already created and this method becomes something like

private boolean isFileMatchingPattern(final S3Object s3Object) { final Optional<Context> ctxt = filePatternUtils.process(s3Object.key()); if (ctxt.isPresent()) { this.context = ctxt.get(); this.context.setS3Object(s3Object); return true; } return false; }

https://aiven.atlassian.net/browse/KCON-98 should do this.

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java

Claudenw · 2025-01-13T12:19:02Z

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java

+    private boolean isFileAssignedToTask(final S3Object s3Object) {
+        return distributionStrategy.isPartOfTask(taskId, s3Object.key(), filePattern);
+    }


With changes noted above this becomes

private boolean isFileAssignedToTask(final S3Object s3Object) { return task == distributionStrategy.getTask(this.context); }

https://aiven.atlassian.net/browse/KCON-98 should do this.

muralibasani · 2025-01-13T13:38:22Z

@Claudenw for further distribution strategy refactorings, I have created a ticket https://aiven.atlassian.net/browse/KCON-98.

I think for our current MVP version, I do not see the immediate need of it.

...rc/main/java/io/aiven/kafka/connect/common/source/task/enums/ObjectDistributionStrategy.java

...connector/src/integration-test/java/io/aiven/kafka/connect/s3/source/AwsIntegrationTest.java

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/RecordProcessor.java

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java

aindriu-aiven

LGTM Thank you

Claudenw

Please add items to KCON-98 that identify all the changes that were requested in in PR so that we don't lose them.

Claudenw · 2025-01-14T09:32:57Z

commons/src/main/java/io/aiven/kafka/connect/common/source/task/HashDistributionStrategy.java

Please add items to KCON-98 that identify all the changes that were requested in in PR so that we don't lose them.

muralibasani · 2025-01-14T10:10:49Z

Please add items to KCON-98 that identify all the changes that were requested in in PR so that we don't lose them.

Ticket is updated.

Claudenw · 2025-01-14T10:32:40Z

commons/src/main/java/io/aiven/kafka/connect/common/source/input/utils/FilePatternUtils.java

@muralibasani
I don't see these changes noted in KCON-98. Is there another ticket for that or is it part of KCON-98?

muralibasani · 2025-01-14T10:34:33Z

@Claudenw https://aiven.atlassian.net/browse/KCON-98 here with links description is updated, so we don't miss anything else. If you think it's not clear, may be you can update on what we need.

muralibasani changed the title ~~Kcon63 tasks strategy~~ Kcon63 tasks strategy - [KCON-63] Jan 7, 2025

aindriu-aiven mentioned this pull request Jan 7, 2025

Polling efficiency #378

Merged

muralibasani changed the title ~~Kcon63 tasks strategy - [KCON-63]~~ Tasks assignment strategy - commons integration - [KCON-63] Jan 7, 2025

muralibasani marked this pull request as ready for review January 7, 2025 08:35

muralibasani requested review from a team as code owners January 7, 2025 08:35

aindriu-aiven requested changes Jan 7, 2025

View reviewed changes

Claudenw requested changes Jan 7, 2025

View reviewed changes

muralibasani force-pushed the kcon63-tasks-strategy branch from da924e3 to f342fb9 Compare January 7, 2025 14:39

muralibasani requested review from aindriu-aiven and Claudenw January 7, 2025 15:45

aindriu-aiven reviewed Jan 8, 2025

View reviewed changes

commons/src/main/java/io/aiven/kafka/connect/common/config/SourceConfigFragment.java Show resolved Hide resolved

aindriu-aiven reviewed Jan 8, 2025

View reviewed changes

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/S3SourceTask.java Show resolved Hide resolved

muralibasani force-pushed the kcon63-tasks-strategy branch from 6e9e77b to 6157773 Compare January 8, 2025 11:31

muralibasani requested a review from aindriu-aiven January 8, 2025 11:36

aindriu-aiven reviewed Jan 8, 2025

View reviewed changes

muralibasani force-pushed the kcon63-tasks-strategy branch from e23306c to bb8c4bd Compare January 9, 2025 12:01

muralibasani requested a review from aindriu-aiven January 9, 2025 12:04

aindriu-aiven reviewed Jan 9, 2025

View reviewed changes

Claudenw requested changes Jan 9, 2025

View reviewed changes

muralibasani force-pushed the kcon63-tasks-strategy branch from 3c96864 to 6c437ae Compare January 10, 2025 07:01

muralibasani requested review from aindriu-aiven and Claudenw January 10, 2025 07:09

Claudenw requested changes Jan 10, 2025

View reviewed changes

aindriu-aiven mentioned this pull request Jan 10, 2025

Place offset manager in commons #373

Open

muralibasani requested a review from Claudenw January 13, 2025 10:07

Claudenw requested changes Jan 13, 2025

View reviewed changes

Integrate tasks distribution strategy

d7ca010

muralibasani force-pushed the kcon63-tasks-strategy branch from bfc6349 to d7ca010 Compare January 13, 2025 13:49

aindriu-aiven reviewed Jan 13, 2025

View reviewed changes

...rc/main/java/io/aiven/kafka/connect/common/source/task/enums/ObjectDistributionStrategy.java Show resolved Hide resolved

aindriu-aiven reviewed Jan 13, 2025

View reviewed changes

...connector/src/integration-test/java/io/aiven/kafka/connect/s3/source/AwsIntegrationTest.java Show resolved Hide resolved

Integrate tasks distribution strategy

0c81ac5

aindriu-aiven reviewed Jan 13, 2025

View reviewed changes

s3-source-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/RecordProcessor.java Show resolved Hide resolved

Integrate tasks distribution strategy

96a89e8

muralibasani requested review from aindriu-aiven and Claudenw January 13, 2025 16:34

aindriu-aiven reviewed Jan 14, 2025

View reviewed changes

...rce-connector/src/main/java/io/aiven/kafka/connect/s3/source/utils/SourceRecordIterator.java Outdated Show resolved Hide resolved

Integrate dist strategies

dbc77b3

aindriu-aiven approved these changes Jan 14, 2025

View reviewed changes

Claudenw requested changes Jan 14, 2025

View reviewed changes

Claudenw approved these changes Jan 14, 2025

View reviewed changes

Claudenw merged commit 6b967d3 into s3-source-release Jan 14, 2025
8 checks passed

muralibasani deleted the kcon63-tasks-strategy branch January 16, 2025 09:19

	public String getObjectDistributionStrategy() {
	public ObjectDistributionStrategies getObjectDistributionStrategy() {
	return ObjectDistributionStrategies.forName(sourceConfigFragment.getObjectDistributionStrategy());

	this.filterPredicate = basePredicate
	public void setFilterPredicate(final Predicate<S3Object> predicate) {
	this.filterPredicate = this.filterPredicate.and(predicate);

		@@ -112,6 +115,16 @@ public void ensureValid(final String name, final Object value) {
		// UnusedAssignment

		if (currentObjectKey != null) {
		if (validateTaskDistributionStrategy(currentObjectKey)) {

Tasks assignment strategy - commons integration - [KCON-63] #384

Tasks assignment strategy - commons integration - [KCON-63] #384

Conversation

muralibasani commented Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Claudenw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muralibasani commented Jan 6, 2025 •

edited

Loading