New recovery system, enhanced rescaling and other updates in Bytewax v0.17.0
We are excited to announce the release of Bytewax v0.17, which brings performance boosts and major changes to the recovery system and rescaling mechanism. With this update, you can stop the dataflow execution, adjust the number of workers, and safely resume. To help you navigate the new features and breaking changes, we have prepared a migration guide (with code examples!) that covers the updates made to our API for upgrading to v0.17. Please find it here. Jump on and see the release on our GitHub!
What's Changed
Features:
SQLite recovery to support rescaling: Recovery has been updated to support rescaling the number of workers in a dataflow. In v0.17, the number of workers in a cluster can now be changed by stopping the dataflow execution and specifying a different number of workers on resume. Creating recovery stores has been moved to a separate step from running your dataflow. For more information, see the Bytewax migration guide.
AsyncBatcher
: We've included a newbytewax.inputs.batcher_async
to help you use async Python libraries in Bytewax input sources. This lets you play nicely with Bytewax's cooperative multitasking but still use async APIs. To see an example of it's use, check out our wikistream example.next_awake
: Adds a new method in input sources:next_awake
. Input sources now have an optionalnext_awake
method which you can use to schedule when thenext_batch
call should occur. You can use this to "sleep" the input operator for a fixed amount of time while you are waiting for more input. The default behavior uses a simple heuristic to prevent a spin loop when there is no input. Always usenext_awake
rather than using atime.sleep
in an input source. See periodic_input.py in the examples directory for an implementation that uses this functionality.Support for additional platforms: Bytewax is now available for linux/aarch64 and linux/armv7! 🎉
Code Improvements:
Restructures input and output to support batching: Changes the API of input sources from
next
tonext_batch
which should return a list of elements rather than a single item. Changing input sources to operate over a batch has which has improved the throughput of Dataflows. Our built-in input classes have been updated to accept abatch_size
parameter, which configures the maximum number of messages to request from an input source at a time.Builds exception messages only on error: Adds {re,}raise_with (to parallel {re,}raise) and has it take a closure that produces the error message. This causes the error handling code to only attempt to format the exception message string when an error has occurred.
Run clippy on pre-commit: Runs the Clippy linting tool on pre-commit on all the files. Additional pre-commit improvements can be found here.
Bug Fixes:
[Corrects the docs](https://github.com/bytewax/bytewax/pull/ 255): External contribution! Rob de Wit made a change in our docs 💛 Every Contribution Counts! In the world of OSS, there's no such thing as a "small" contribution. Each PR, no matter how minor, is a step towards improving the project. Thank you, Rob!
Fixes a typo: Another external contribution! Thanks, Ryan Abernathey!💛
Fix kafka input error message: Bugfix for the issue #285: AttributeError: error getting next input item from partition source.
Summing up, Bytewax v0.17 brings important modifications to the recovery process to accommodate rescaling and many other improvements.
Given the nontrivial nature of the changes, we will post more detailed tech dives about our recovery system and performance. Please stay tuned!
Excited about Bytewax? Consider joining our Slack and giving our GitHub repository a star. As a community-driven open-source endeavor, we deeply value your participation in shaping our future 💛
Stay updated with our newsletter
Subscribe and never miss another blog post, announcement, or community event.