Unpacking the Bytewax v0.19.0 Release!

By Zander Matheson

📣 We just released v0.19.0!

Included in the release are some wicked performance update!

Please review this guide designed to assist you in seamlessly updating your code for the changes from Bytewax v0.18 to v0.19.

Migration Guide!

Here's a comprehensive look at what's new and what's changed.

Enhanced Performance and Efficiency

A major highlight of this release is the optimization of multiple operators to minimize interaction with Python's Global Interpreter Lock (GIL). By reducing the need to acquire and release the GIL during operations, performance of windowing operators, stateful operators, and branching operators is significantly improved. This change is poised to deliver faster processing times and enhance efficiency, particularly in complex dataflows.

A big thanks to Damion Werner for their pivotal role in identifying the opportunity for this enhancement.

In tests, we have seen performance boosts of 🍾 12x!

❯ hyperfine \
                             --parameter-list branch_name v0.18.2,main \
                             --setup "git checkout {branch_name}; maturin develop --release" \
                             "python -m bytewax.run examples.benchmark_windowing -i0 -a\"localhost:9999\""
Benchmark 1: python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = v0.18.2)
  Time (mean ± σ):     13.196 s ±  1.778 s    [User: 3.960 s, System: 13.032 s]
  Range (min … max):   10.214 s … 15.092 s    10 runs

Benchmark 2: python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = main)
  Time (mean ± σ):      1.058 s ±  0.007 s    [User: 0.979 s, System: 0.080 s]
  Range (min … max):    1.046 s …  1.069 s    10 runs

  python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = main) ran
   12.48 ± 1.68 times faster than python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = v0.18.2)

Changes to Source and Sink Building

The update introduces a breaking change affecting the ->

which now require an additional step_id argument.

This adjustment is designed to enhance the labeling and tracking of custom Python metrics, allowing for more granular monitoring and optimization.

Prometheus Client Integration

With v0.19.0, we expanded the monitoring capabilities of the library by enabling the collection of custom Python metrics via the prometheus-client library. This integration facilitates real-time tracking of dataflow performance for a better operating experience.

Schema Registry Interface Update

The release removes the direct schema registry interface integration. While this may seem like a step back, it opens the door to greater flexibility by allowing users to instantiate (de)serializers manually. This approach is exemplified in the confluent_serde and redpanda_serde examples, guiding users on adapting to the new interface.

If you're interested in learning more about our integration with Redpanda, feel free to check out our blog.

Bug Fixes and Operator Additions

In v0.19.0 we addressed a bug that led to items being incorrectly marked as late in sliding and tumbling windows, particularly when timestamps diverged significantly from the align_to parameter. Additionally, the update introduces the stateful_flat_map operator, further expanding the toolkit available for data manipulation.

Streamlined Operator Interfaces

To simplify the development experience and boost performance, there were several breaking changes to operator interfaces. Notably, the builder argument in stateful_map has been removed, standardizing the initial state value as None and allowing users to manually call their previous builder within the mapper.

You can see an example implementation in an anomaly detection example here.

Performance has also been enhanced by eliminating the now: datetime and sched: datetime arguments from several source, sink, and logic operators. If you need the current time or the next scheduled awake time, the advice is to implement these functionalities directly within your dataflow code, leveraging Python's datetime module.

Conclusion 🐝

We believe the v0.19.0 release follows our commitment to continuous improvement and user-centric development. By enhancing performance, refining interfaces, and expanding functionality, ensuring everyone using the library has access to a powerful, efficient, and flexible library for every streaming data processing need.

As always, the Bytewax community looks forward to feedback and contributions, aiming to further refine and expand this versatile data processing framework.

We encourage you to open an issue with any feedback or bugs you might have in the repo.

Now on AWS marketplace: Bytewax! Simplify EKS environments and unlock Startup Program advantages. More info -> here.

Cheers to seamless data workflows!

Real-Time Financial Exchange Order Book

We'are refreshing all our guides❗️But you can already dive right in and see how v0.19.0 operates firsthand!

P.S. Got a question or need help? Join us on Slack and tap into the collective wisdom of the Bytewax community. We're all here to help each other grow. 💛

Stay updated with our newsletter

Subscribe and never miss another blog post, announcement, or community event.

Previous post
Zander Matheson

Zander Matheson

CEO, Co-founder
Zander is a seasoned data engineer who has founded and currently helms Bytewax. Zander has worked in the data space since 2014 at Heroku, GitHub, and an NLP startup. Before that, he attended business school at the UT Austin and HEC Paris in Europe.
Next post