Is your feature request related to a problem or challenge?
Introduction
This ticket is my weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community.
Community Highlights
Releases!
Performance
DataFusion's core value proposition is great performance without having to re-implement it yourself
Quality
Testing
Bug Fixes
DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people are now bashing them down
Docs
Build time
Cleanups π§Ή
Features
Features under way
Better Out of Core Support
In general, DataFusion is getting better at handling datasets that are larger than can fit in memory.
We can have nice things! (Explain plans)
> explain select * from t1 inner join t2 on t1.i=t2.i;
+---------------+------------------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------------------+
| logical_plan | Inner Join: t1.i = t2.i |
| | TableScan: t1 projection=[i] |
| | TableScan: t2 projection=[i] |
| physical_plan | βββββββββββββββββββββββββββββ |
| | β CoalesceBatchesExec β |
| | βββββββββββββββ¬ββββββββββββββ |
| | βββββββββββββββ΄ββββββββββββββ |
| | β HashJoinExec ββββββββββββββββ |
| | βββββββββββββββ¬ββββββββββββββ β |
| | βββββββββββββββ΄βββββββββββββββββββββββββββββ΄ββββββββββββββ |
| | β DataSourceExec ββ DataSourceExec β |
| | β -------------------- ββ -------------------- β |
| | β partition_sizes: [0] ββ partitions: 1 β |
| | β partitions: 1 ββ partition_sizes: [0] β |
| | ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | |
+---------------+------------------------------------------------------------+
2 row(s) fetched.
Better Error Messages
@eliaperantoni is working with various contributors to make the error messages better. This work is tracked in
Misc
Looking to get more involved? Please help review code! π£
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @ mentioning one of the committers.
Help wanted
- I would love to see the community offer additional help performance testing, triaging bugs helping to make DataFusion a more stable foundation for building systems
Please feel leave your own comments on this ticket if you are looking for help
Community
Upcoming meetups:
Is your feature request related to a problem or challenge?
Introduction
This ticket is my weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community.
Community Highlights
Releases!
Performance
DataFusion's core value proposition is great performance without having to re-implement it yourself
to_hex2x faster: Speedupto_hex(~2x faster)Β #14686to_hex4x faster: Speed upuuidUDF (40x faster)Β #14675 (no string copies for the win!)date_truncto be 2x faster: Speedupdate_trunc(~20% time reduction)Β #14593substrfaster: Always useStringViewArrayas output ofsubstrΒ #14498Quality
Testing
Bug Fixes
DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people are now bashing them down
LogicalPlanBuilderor when building logical plans from SubstraitΒ #14860return_type_from_argsinstead ofreturn_typeΒ #14852 @rluvaton πDocs
Build time
Cleanups π§Ή
invoke_argsetcFeatures
Features under way
Better Out of Core Support
In general, DataFusion is getting better at handling datasets that are larger than can fit in memory.
StringViewdue to shared buffersΒ #14823We can have nice things! (Explain plans)
tree/ pretty explain modeΒ #14677. I'll give you a teaser below. Come help with the follow on work on [EPIC] CompleteSQL EXPLAINTree RenderingΒ #14914Better Error Messages
@eliaperantoni is working with various contributors to make the error messages better. This work is tracked in
Diagnosticto more errorsΒ #14429DataFusionError::Collectionto return multipleDataFusionErrors Β #14439Misc
rangetable functionΒ #14830UNION ALL BY NAMEfeat: Implement UNION ALL BY NAMEΒ #14538Looking to get more involved? Please help review code! π£
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try
@mentioning one of the committers.Help wanted
Please feel leave your own comments on this ticket if you are looking for help
Community
Upcoming meetups: