-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve][Connector-V2] Improve Paimon source split enumerator #6766
base: dev
Are you sure you want to change the base?
[Improve][Connector-V2] Improve Paimon source split enumerator #6766
Conversation
cc @hailin0 |
Please add test case. |
Thanks for your review. I think this is an optimization of the existing PaimonSourceSplitEnumerator. Currently, pamion's test case can already cover it. |
} | ||
|
||
public PaimonSourceSplitEnumerator( | ||
Context<PaimonSourceSplit> context, Table table, PaimonSourceState sourceState) { | ||
this.context = context; | ||
this.table = table; | ||
this.assignedSplit = sourceState.getAssignedSplits(); | ||
this.pendingSplit = new HashMap<>(); | ||
this.shouldEnumerate = sourceState == null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please optimize the code like this
this.shouldEnumerate = (sourceState == null || sourceState.isShouldEnumerate());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can refer the org.apache.seatunnel.connectors.seatunnel.file.source.split.FileSourceSplitEnumerator in which we can keep the pendingSplit and assignedSplit at the sametime. It's more logical that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can refer the org.apache.seatunnel.connectors.seatunnel.file.source.split.FileSourceSplitEnumerator in which we can keep the pendingSplit and assignedSplit at the sametime. It's more logical that way.
OK, I will optimize the code based on your suggestions
} | ||
|
||
@Override | ||
public void open() { | ||
this.pendingSplit = new HashSet<>(); | ||
this.pendingSplit = new HashedMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will overwrite the init in the construct method.
public Set<PaimonSourceSplit> getAssignedSplits() { | ||
return assignedSplits; | ||
} | ||
private final Map<Integer, Set<PaimonSourceSplit>> assignedSplits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change Set to Map here? Any considerations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change Set to Map here? Any considerations?
I think there is a small optimization,avoid the traversal operation when removing the allocated PaimonSourceSplit from pendingSplit
. use set
like FileSourceSplitEnumerator, a nested loop will be formed in the FileSourceSplitEnumerator#run()
method.
Could you implement stream read for paimon source ? I don't think this is necessary in batch read scenarios. |
Ok, I'd be happy to do it. |
Purpose of this pull request
Improve Paimon source split enumerator
Does this PR introduce any user-facing change?
no
How was this patch tested?
exists test
Check list
New License Guide
release-note
.