Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51330][PYTHON] Enable spark.sql.execution.pythonUDTF.arrow.enabled by default #50096

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR enables spark.sql.execution.pythonUDTF.arrow.enabled by default.

Why are the changes needed?

We enabled Arrow optimization #49482 and #50036. We should also enable it for UDTF too.

Does this PR introduce any user-facing change?

It will fallback to non-optimized code path so it impact will be minimized. Users will leverage Arrow optimization by default.

How was this patch tested?

Existing tests in the CI.

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon
Copy link
Member Author

cc @allisonwang-db

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistent with Python UDFs, yes we should enable this. But arrow code path does not necessarily have performance improvements (or it can even lead to perf regressions). It only helps when the output table size is large:
https://spark.apache.org/docs/latest/api/python/user_guide/sql/python_udtf.html#arrow-optimization
I think https://issues.apache.org/jira/browse/SPARK-44856 needs to be worked on first to make arrow code path more performant for small output size.
cc @ueshin @wengh

@HyukjinKwon
Copy link
Member Author

Made a draft PR for SPARK-44856 #50099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants