Background animation
Reduce your TCO by 80% compared to Apache Flink®

Stream processing
as easy as Py

Build real-time streaming pipelines 5x faster with 80% lower TCO and deliver cutting-edge AI use cases in any environment, from edge to cloud.

Start with > pip install bytewax
and scale with our platform!

Trusted by innovative organizations
STAT Trading
Simply Business
HARK Technologies
PingThings
Range Energy
Rated Network
Overview

What is Bytewax?

Bytewax is a complete data processing solution combining our core open source library with powerful modules and connectors to extend the core library and a capable platform for orchestrating and governing your data processing.

Python-Native Stream Processing

Build your pipelines in Python

Write more powerful data streaming pipelines in less lines of code! With Bytewax, there is now a powerful Python-native stateful stream processor that makes it easy for you to set up and deploy your dataflows in Python, allowing you to tap into its vast library ecosystem to perform advanced data transformations far beyond the reach of SQL, all while abstracting much of the underlying complexity and handling the difficult parts for you.

from datetime import timedelta 
import numpy as np 
from bytewax.operators import window as window_op 
from bytewax.operators.window import TumblingWindow, SystemClockConfig  

cc = SystemClockConfig() 
wc = TumblingWindow(length=timedelta(seconds=1))  

def build_array():
     return np.empty(0)  
def insert_value(np_array, value):
     return np.insert(np_array, 0, value)  

windowed_stream = wop.fold_window("window", stream, cc, 
wc, build_array, insert_value)
Bytewax Connectors
Modules

Extend Bytewax with
connectors, operators, and E2E dataflows

Our Module Hub extends the open source dataflows framework with pre-built connectors to countless sources and sinks, advanced operators, and E2E dataflows

Deployment

Deploy with
a single command $ waxctl dataflow deploy my_dataflow.py

Ease of deployment is a critical aspect of enabling agile development within a CI/CD framework. By using our command-line interface, waxctl, you can seamlessly deploy the same code you wrote and tested locally across a cluster of machines with a single command.

Deploy dataflows anywhere, from edge to cloud:

Bytewax vs. Flink

Developer-friendly stream processing
for Python - 100% JVM free

Bytewax features a modern architecture that combines the performance of a Rust engine for distributed, parallel streaming with the ease of use of Python. The outcome is a stateful stream processor that rivals the functionality and performance of traditional Java-based tools like Flink, without any of the drawbacks. Enable all your Python teams to work with streaming!

Bytewax dramatically reduces your Total Cost of Ownership
5x

Bytewax dramatically reduces your Total Cost of Ownership (TCO) by a 5x on average

Bytewax can accelerate time to production by up to 8x
8x

For table stakes stream processing use cases in Python, Bytewax can accelerate time to production by up to 8x

Bytewax is radically more memory efficient than Flink
25x

Bytewax is radically more memory efficient than Flink, using 7x–25x less memory for common streaming workloads

*Source: Data Stream Processing Ease of Use and TCO, McKnight Consulting Group (2024)

Loved by developers
working on:

Developer voices

Loved by the data community 💛

We went from 5 days of training to 5 minutes DIY. Anyone with a limited Python background can just get going immediately. A defensible 10x reduction in infrastructure cost.

I have a lot of fun integrating Bytewax into my projects. It brings a lot of value in removing the resistance to streaming technologies in Python’s ecosystem. Before tools like Bytewax, using a streaming engine was a real headache. It’s totally exciting to be part of this movement 🔥

We use Flink a lot internally, but after picking up Bytewax we are looking for more and more real-time ML workloads to use Bytewax with because we find it to be more accessible and faster to set up than Flink

We have been using Bytewax for well over a year and incredibly happy with the performance and support. I was able to ship the real-time analysis feature we needed at Hark in under a week and it’s been delightful to work with the Bytewax team.

Bytewax is simple enough that we can quickly prove ahead of time that we can solve a problem and then use the same tool to scale it and move it to production.

Libraries like Bytewax 🐝 expose a pure Python API on top of a highly-efficient language like Rust. So you get the best of both worlds. Rust's speed and performance, plus Python' rich ecosystem of libraries.

I was a fan of batch things but after I discovered how easy is to implement a streaming pipelines with Bytewax, I changed my mind 😅

The key difference between Apache Spark and Bytewax for me teaching my class on ML systems is that it takes me around six lectures to bring students up to the level where they can begin utilizing Spark. However, I only need one lecture to do the same with Bytewax.

Python alone is not a language designed for speed 🐢, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink. However, things are changing fast with the emergence of Rust 🦀 and libraries like Bytewax 🐝 that expose a pure Python API on top of a highly-efficient language like Rust.

Setting up Bytewax was incredibly straightforward, allowing us to go from pip to a fully operational dataflow in just minutes, without the hassle of complex build files and classpath issues found in JVM-based solutions. Remarkably, our production deployments have been rock-solid, seamlessly indexing multiple blockchains with unwavering reliability and no drama.

We have used Bytewax to develop a recommender system for a video streaming platform. I like to think in MapReduce terms when I do data processing, so I was super happy to find that Bytewax does precisely what I need and is easy to deploy and support.