📣 We just released v0.19.0!
Included in the release are some wicked performance update!
Please review this guide designed to assist you in seamlessly updating your code for the changes from Bytewax v0.18 to v0.19.
Here's a comprehensive look at what's new and what's changed.
Enhanced Performance and Efficiency
A major highlight of this release is the optimization of multiple operators to minimize interaction with Python's Global Interpreter Lock (GIL). By reducing the need to acquire and release the GIL during operations, performance of windowing operators, stateful operators, and branching operators is significantly improved. This change is poised to deliver faster processing times and enhance efficiency, particularly in complex dataflows.
A big thanks to Damion Werner for their pivotal role in identifying the opportunity for this enhancement.
In tests, we have seen performance boosts of 🍾 12x!
❯ hyperfine \
--parameter-list branch_name v0.18.2,main \
--setup "git checkout {branch_name}; maturin develop --release" \
"python -m bytewax.run examples.benchmark_windowing -i0 -a\"localhost:9999\""
Benchmark 1: python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = v0.18.2)
Time (mean ± σ): 13.196 s ± 1.778 s [User: 3.960 s, System: 13.032 s]
Range (min … max): 10.214 s … 15.092 s 10 runs
Benchmark 2: python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = main)
Time (mean ± σ): 1.058 s ± 0.007 s [User: 0.979 s, System: 0.080 s]
Range (min … max): 1.046 s … 1.069 s 10 runs
Summary
python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = main) ran
12.48 ± 1.68 times faster than python -m bytewax.run examples.benchmark_windowing -i0 -a"localhost:9999" (branch_name = v0.18.2)
Changes to Source and Sink Building
The update introduces a breaking change affecting the ->
FixedPartitionedSource.build_part
;DynamicSource.build
;FixedPartitionedSink.build_part
;DynamicSink.build
methods,
which now require an additional step_id
argument.
This adjustment is designed to enhance the labeling and tracking of custom Python metrics, allowing for more granular monitoring and optimization.
Prometheus Client Integration
With v0.19.0, we expanded the monitoring capabilities of the library by enabling the collection of custom Python metrics via the prometheus-client
library. This integration facilitates real-time tracking of dataflow performance for a better operating experience.
Schema Registry Interface Update
The release removes the direct schema registry interface integration. While this may seem like a step back, it opens the door to greater flexibility by allowing users to instantiate (de)serializers manually. This approach is exemplified in the confluent_serde and redpanda_serde examples, guiding users on adapting to the new interface.
If you're interested in learning more about our integration with Redpanda, feel free to check out our blog.
Bug Fixes and Operator Additions
In v0.19.0 we addressed a bug that led to items being incorrectly marked as late in sliding and tumbling windows, particularly when timestamps diverged significantly from the align_to
parameter. Additionally, the update introduces the stateful_flat_map
operator, further expanding the toolkit available for data manipulation.
Streamlined Operator Interfaces
To simplify the development experience and boost performance, there were several breaking changes to operator interfaces. Notably, the builder
argument in stateful_map
has been removed, standardizing the initial state value as None
and allowing users to manually call their previous builder within the mapper.
You can see an example implementation in an anomaly detection example here.
Performance has also been enhanced by eliminating the now: datetime
and sched: datetime
arguments from several source, sink, and logic operators. If you need the current time or the next scheduled awake time, the advice is to implement these functionalities directly within your dataflow code, leveraging Python's datetime
module.
Conclusion 🐝
We believe the v0.19.0 release follows our commitment to continuous improvement and user-centric development. By enhancing performance, refining interfaces, and expanding functionality, ensuring everyone using the library has access to a powerful, efficient, and flexible library for every streaming data processing need.
As always, the Bytewax community looks forward to feedback and contributions, aiming to further refine and expand this versatile data processing framework.
We encourage you to open an issue with any feedback or bugs you might have in the repo.
Now on AWS marketplace: Bytewax! Simplify EKS environments and unlock Startup Program advantages. More info -> here.
Cheers to seamless data workflows!
We'are refreshing all our guides❗️But you can already dive right in and see how v0.19.0 operates firsthand!
P.S. Got a question or need help? Join us on Slack and tap into the collective wisdom of the Bytewax community. We're all here to help each other grow. 💛
Stay updated with our newsletter
Subscribe and never miss another blog post, announcement, or community event.