Stream processing
as easy as Py
Build real-time streaming pipelines 5x faster with 80% lower TCO and deliver cutting-edge AI use cases in any environment, from edge to cloud.
Start with > pip install bytewax
and scale with our platform!
What is Bytewax?
Bytewax is a complete data processing solution combining our core open source library with powerful modules and connectors to extend the core library and a capable platform for orchestrating and governing your data processing.
Build your pipelines in Python
Write more powerful data streaming pipelines in less lines of code! With Bytewax, there is now a powerful Python-native stateful stream processor that makes it easy for you to set up and deploy your dataflows in Python, allowing you to tap into its vast library ecosystem to perform advanced data transformations far beyond the reach of SQL, all while abstracting much of the underlying complexity and handling the difficult parts for you.
from datetime import timedelta
import numpy as np
from bytewax.operators import window as window_op
from bytewax.operators.window import TumblingWindow, SystemClockConfig
cc = SystemClockConfig()
wc = TumblingWindow(length=timedelta(seconds=1))
def build_array():
return np.empty(0)
def insert_value(np_array, value):
return np.insert(np_array, 0, value)
windowed_stream = wop.fold_window("window", stream, cc,
wc, build_array, insert_value)
Extend Bytewax with
connectors, operators, and E2E dataflows
Our Module Hub extends the open source dataflows framework with pre-built connectors to countless sources and sinks, advanced operators, and E2E dataflows
Deploy with
a single command $ waxctl dataflow deploy my_dataflow.py
Ease of deployment is a critical aspect of enabling agile development within a CI/CD framework. By using our command-line interface, waxctl, you can seamlessly deploy the same code you wrote and tested locally across a cluster of machines with a single command.
Deploy dataflows anywhere, from edge to cloud:
Secure, scale, and manage
your dataflows with the Bytewax Platform
Secure, scale, manage, and operate your dataflows with the Bytewax Platform. Enhance your data streaming operations with robust observability, advanced management APIs, disaster recovery, cloud backup, and autoscaling capabilities. Streamline your data processes and ensure high availability and resilience effortlessly.
Ships with powerful orchestration & governance features:
Developer-friendly stream processing
for Python - 100% JVM free
Bytewax features a modern architecture that combines the performance of a Rust engine for distributed, parallel streaming with the ease of use of Python. The outcome is a stateful stream processor that rivals the functionality and performance of traditional Java-based tools like Flink, without any of the drawbacks. Enable all your Python teams to work with streaming!
Compared to Apache Flink®:*
*Source: Data Stream Processing Ease of Use and TCO, McKnight Consulting Group (2024)
Loved by developers
working on:
Loved by the data community 💛
We went from 5 days of training to 5 minutes DIY. Anyone with a limited Python background can just get going immediately. A defensible 10x reduction in infrastructure cost.
I have a lot of fun integrating Bytewax into my projects. It brings a lot of value in removing the resistance to streaming technologies in Python’s ecosystem. Before tools like Bytewax, using a streaming engine was a real headache. It’s totally exciting to be part of this movement 🔥
We use Flink a lot internally, but after picking up Bytewax we are looking for more and more real-time ML workloads to use Bytewax with because we find it to be more accessible and faster to set up than Flink
We have been using Bytewax for well over a year and incredibly happy with the performance and support. I was able to ship the real-time analysis feature we needed at Hark in under a week and it’s been delightful to work with the Bytewax team.
Bytewax is simple enough that we can quickly prove ahead of time that we can solve a problem and then use the same tool to scale it and move it to production.
Libraries like Bytewax 🐝 expose a pure Python API on top of a highly-efficient language like Rust. So you get the best of both worlds. Rust's speed and performance, plus Python' rich ecosystem of libraries.
I was a fan of batch things but after I discovered how easy is to implement a streaming pipelines with Bytewax, I changed my mind 😅
The key difference between Apache Spark and Bytewax for me teaching my class on ML systems is that it takes me around six lectures to bring students up to the level where they can begin utilizing Spark. However, I only need one lecture to do the same with Bytewax.
Python alone is not a language designed for speed 🐢, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink. However, things are changing fast with the emergence of Rust 🦀 and libraries like Bytewax 🐝 that expose a pure Python API on top of a highly-efficient language like Rust.
Setting up Bytewax was incredibly straightforward, allowing us to go from pip to a fully operational dataflow in just minutes, without the hassle of complex build files and classpath issues found in JVM-based solutions. Remarkably, our production deployments have been rock-solid, seamlessly indexing multiple blockchains with unwavering reliability and no drama.