Leveling up Bytewax to v0.20!

By Zander Matheson

📣 We just released v0.20 with some awesome updates! There is a Python interface to make custom windows, dataflow visualization, a caching and enrichment operator and more

Please review the official migration guide and release notes to help you smoothly transition your code from Bytewax v0.19 to v0.20.

Here’s an overview of what’s new

Dataflow Structure Visualizer

Personally, the visualizer is one of my favorite additions in this release. You can now visualize your dataflow as a mermaid diagram by running:

python -m bytewax.visualize

This helps during development and debugging to understand how your dataflow works.

You could, for example, dump the mermaid diagram into something like Excalidraw to jumpstart some visualizations.

large_upload_3c0777a454f4f92d55106fec217e74c9.png

Changes

There are a couple of breaking changes in this version of Bytewax. Depending on how you are currently using the library, these may affect you.

Recovery Serialization Format

Breaking Change

The internal library used for serialization has changed from using JsonPickle to Python's built-in pickle module. Recovery stores using the old format will be unusable after upgrading and should be recreated.We know this is a big change for those running production dataflows, but this guarantees fewer future headaches around serialization and Python versions.

Renaming of Core Operators

Breaking Change

  • unary operator and UnaryLogic have been renamed to stateful and StatefulLogic.
  • Introduces a stateful_batch operator for lower-level batch control while managing state.

Windowing Operators and Configuration

Breaking Change

  • Windowing operators have moved from bytewax.operators.window to bytewax.operators.windowing.
  • ClockConfig classes are now simplified to just Clock. For instance, SystemClockConfig is now SystemClock.
  • WindowConfigs are renamed to Windowers, such as SessionWindow to SessionWindower.
  • Windowing operators now return a set of streams encapsulated in a WindowOut dataclass, with WindowMetadata and late arriving data output into their own streams.

Fold Window Merges

Breaking Change

fold_window now requires a merge argument to handle session window merges.

Join Operators Update

Breaking Change

The join_named and join_window_named operators have been removed to improve compatibility with typed dataflows.

New Additions

Optional Overrides in StatefulLogic

StatefulLogic.on_notify, StatefulLogic.on_eof, and StatefulLogic.notify_at are now optional overrides, retaining state and emitting nothing by default.

Custom Clocks and Windowers

Python interfaces are now available for custom clocks and windowers. Subclass Clock and ClockLogic or Windower and WindowerLogic to define custom time and window definitions.

New Operators

  • New filter_map_value operator
  • enrich_cached operator for easier external data source joining
  • key_rm operator to remove keys from a KeyedStream

Performance and Functionality Enhancements

  • Session windows now correctly handle out-of-order data and joins.
  • Windowing operators process items in timestamp order, improving output consistency.
  • Simplified operator interfaces for better performance and usability.
  • Documentation and Guides

Documentation cleanups

  • New async connector guide
  • Performance guide addition
  • Updated deployment documentation
  • Fixed links and typos for better navigation

Community Contributions

A special thanks to Csaba Hoch for his invaluable contributions in keeping our documentation accurate and up to date.

Conclusion 🐝

The v0.20 release of Bytewax represents a significant upgrade, introducing powerful new features and improvements. From the dataflow visualizer to enriched operators and critical performance enhancements, this version offers robust tools for your streaming data processing needs.

As always, the Bytewax community looks forward to your feedback and contributions to further refine and expand this versatile data processing framework.

We encourage you to open an issue with any feedback or bugs you might have in the GitHub repo.

Cheers to seamless dataworkflows!

Stay updated with our newsletter

Subscribe and never miss another blog post, announcement, or community event.

Previous post
Zander Matheson

Zander Matheson

CEO, Founder
Zander is a seasoned data engineer who has founded and currently helms Bytewax. Zander has worked in the data space since 2014 at Heroku, GitHub, and an NLP startup. Before that, he attended business school at the UT Austin and HEC Paris in Europe.
Next post