New recovery system, enhanced rescaling and other updates in Bytewax v0.17.0

By Oli Makhasoeva
Release v0.17.0 is here!

We are excited to announce the release of Bytewax v0.17, which brings performance boosts and major changes to the recovery system and rescaling mechanism. With this update, you can stop the dataflow execution, adjust the number of workers, and safely resume. To help you navigate the new features and breaking changes, we have prepared a migration guide (with code examples!) that covers the updates made to our API for upgrading to v0.17. Please find it here. Jump on and see the release on our GitHub!

What's Changed

Features:

  1. SQLite recovery to support rescaling: Recovery has been updated to support rescaling the number of workers in a dataflow. In v0.17, the number of workers in a cluster can now be changed by stopping the dataflow execution and specifying a different number of workers on resume. Creating recovery stores has been moved to a separate step from running your dataflow. For more information, see the Bytewax migration guide.

  2. AsyncBatcher: We've included a new bytewax.inputs.batcher_async to help you use async Python libraries in Bytewax input sources. This lets you play nicely with Bytewax's cooperative multitasking but still use async APIs. To see an example of it's use, check out our wikistream example.

  3. next_awake: Adds a new method in input sources: next_awake. Input sources now have an optional next_awake method which you can use to schedule when the next_batch call should occur. You can use this to "sleep" the input operator for a fixed amount of time while you are waiting for more input. The default behavior uses a simple heuristic to prevent a spin loop when there is no input. Always use next_awake rather than using a time.sleep in an input source. See periodic_input.py in the examples directory for an implementation that uses this functionality.

  4. Support for additional platforms: Bytewax is now available for linux/aarch64 and linux/armv7! 🎉

Code Improvements:

  1. Restructures input and output to support batching: Changes the API of input sources from next to next_batch which should return a list of elements rather than a single item. Changing input sources to operate over a batch has which has improved the throughput of Dataflows. Our built-in input classes have been updated to accept a batch_size parameter, which configures the maximum number of messages to request from an input source at a time.

  2. Builds exception messages only on error: Adds {re,}raise_with (to parallel {re,}raise) and has it take a closure that produces the error message. This causes the error handling code to only attempt to format the exception message string when an error has occurred.

  3. Run clippy on pre-commit: Runs the Clippy linting tool on pre-commit on all the files. Additional pre-commit improvements can be found here.

Bug Fixes:

  1. [Corrects the docs](https://github.com/bytewax/bytewax/pull/ 255): External contribution! Rob de Wit made a change in our docs 💛 Every Contribution Counts! In the world of OSS, there's no such thing as a "small" contribution. Each PR, no matter how minor, is a step towards improving the project. Thank you, Rob!

  2. Fixes a typo: Another external contribution! Thanks, Ryan Abernathey!💛

  3. Fix kafka input error message: Bugfix for the issue #285: AttributeError: error getting next input item from partition source.

Summing up, Bytewax v0.17 brings important modifications to the recovery process to accommodate rescaling and many other improvements.

Given the nontrivial nature of the changes, we will post more detailed tech dives about our recovery system and performance. Please stay tuned!

Excited about Bytewax? Consider joining our Slack and giving our GitHub repository a star. As a community-driven open-source endeavor, we deeply value your participation in shaping our future 💛

Stay updated with our newsletter

Subscribe and never miss another blog post, announcement, or community event.

Previous post
Oli Makhasoeva

Oli Makhasoeva

Director of Developer Relations and Operations
Oli is a passionate technologist with a background in engineering, consulting, and community building. On a break from creating content, she loves to network online & in person at meetups, conferences, and forums.
Next post