Bytewax v0.20 is out now!

Stream processing
as easy as Py

Build real-time streaming pipelines 5x faster and deliver cutting-edge AI use cases in any environment, from edge to cloud.

00:00 / 00:00
How it works

Build streaming data applications easily. In Python.

Step 1Easy install
> pip install bytewax
Step 2Connect to data sources
from bytewax import operators as op
from bytewax.connectors.kafka import operators as kop
from bytewax.dataflow import Dataflow

BROKERS = ["localhost:19092"]
IN_TOPICS = ["in_topic"]
OUT_TOPIC = "out_topic"

flow = Dataflow("kafka_in_out")
kinp = kop.input("inp", flow, brokers=BROKERS, topics=IN_TOPICS)
op.inspect("inspect-errors", kinp.errs)
op.inspect("inspect-oks", kinp.oks)
kop.output("out1", kinp.oks, brokers=BROKERS, topic=OUT_TOPIC)
Step 3Stateful operations like windowing and aggregations
from datetime import timedelta
import numpy as np
from bytewax.operators import window as window_op
from bytewax.operators.window import TumblingWindow, SystemClockConfig

cc = SystemClockConfig()
wc = TumblingWindow(length=timedelta(seconds=1))

def build_array():
    return np.empty(0)

def insert_value(np_array, value):
    return np.insert(np_array, 0, value)

windowed_stream = wop.fold_window("window", stream, cc, wc, build_array, insert_value)
Step 4Use the Python tools you are familiar with
import numpy as np

avg_stream ="average", windowed_stream, lambda x: np.mean(x[1]))
Step 5Run locally
> python -m my_dataflow:flow
Step 6Deploy anywhere
> waxctl df deploy
Scrapy, PyTorch, Huggingface, Pandas, Numpy, Tensorflow, Streamlit, Polars, spaCy, Requests, scikit learn, Matplotlib, SQLAlchemy
Easy integrations

Leverage the Python Ecosystem

Bytewax can be used out-of-the box with any python library to connect to hundreds of data sources and use the entire ecosystem of data processing libraries.


What can you build with bytewax?

Developer voices

Loved by the data community ๐Ÿ’›

We went from 5 days of training to 5 minutes DIY. Anyone with a limited Python background can just get going immediately. A defensible 10x reduction in infrastructure cost.

I have a lot of fun integrating Bytewax into my projects. It brings a lot of value in removing the resistance to streaming technologies in Pythonโ€™s ecosystem. Before tools like Bytewax, using a streaming engine was a real headache. Itโ€™s totally exciting to be part of this movement ๐Ÿ”ฅ

We use Flink a lot internally, but after picking up Bytewax we are looking for more and more real-time ML workloads to use Bytewax with because we find it to be more accessible and faster to set up than Flink

We have been using Bytewax for well over a year and incredibly happy with the performance and support. I was able to ship the real-time analysis feature we needed at Hark in under a week and itโ€™s been delightful to work with the Bytewax team.

Bytewax is simple enough that we can quickly prove ahead of time that we can solve a problem and then use the same tool to scale it and move it to production.

Libraries like Bytewax ๐Ÿ expose a pure Python API on top of a highly-efficient language like Rust. So you get the best of both worlds. Rust's speed and performance, plus Python' rich ecosystem of libraries.

I was a fan of batch things but after I discovered how easy is to implement a streaming pipelines with Bytewax, I changed my mind ๐Ÿ˜…

The key difference between Apache Spark and Bytewax for me teaching my class on ML systems is that it takes me around six lectures to bring students up to the level where they can begin utilizing Spark. However, I only need one lecture to do the same with Bytewax.

Python alone is not a language designed for speed ๐Ÿข, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink. However, things are changing fast with the emergence of Rust ๐Ÿฆ€ and libraries like Bytewax ๐Ÿ that expose a pure Python API on top of a highly-efficient language like Rust.

Setting up Bytewax was incredibly straightforward, allowing us to go from pip to a fully operational dataflow in just minutes, without the hassle of complex build files and classpath issues found in JVM-based solutions. Remarkably, our production deployments have been rock-solid, seamlessly indexing multiple blockchains with unwavering reliability and no drama.