Stream processing
as easy as Py
Build real-time streaming pipelines 5x faster and deliver cutting-edge AI use cases in any environment, from edge to cloud.
Build streaming data applications easily. In Python.
> pip install bytewax
from bytewax import operators as op
from bytewax.connectors.kafka import operators as kop
from bytewax.dataflow import Dataflow
BROKERS = ["localhost:19092"]
IN_TOPICS = ["in_topic"]
OUT_TOPIC = "out_topic"
flow = Dataflow("kafka_in_out")
kinp = kop.input("inp", flow, brokers=BROKERS, topics=IN_TOPICS)
op.inspect("inspect-errors", kinp.errs)
op.inspect("inspect-oks", kinp.oks)
kop.output("out1", kinp.oks, brokers=BROKERS, topic=OUT_TOPIC)
from datetime import timedelta
import numpy as np
from bytewax.operators import window as window_op
from bytewax.operators.window import TumblingWindow, SystemClockConfig
cc = SystemClockConfig()
wc = TumblingWindow(length=timedelta(seconds=1))
def build_array():
return np.empty(0)
def insert_value(np_array, value):
return np.insert(np_array, 0, value)
windowed_stream = wop.fold_window("window", stream, cc, wc, build_array, insert_value)
import numpy as np
avg_stream = flow.map("average", windowed_stream, lambda x: np.mean(x[1]))
> python -m bytewax.run my_dataflow:flow
> waxctl df deploy my_dataflow.py
Leverage the Python Ecosystem
Bytewax can be used out-of-the box with any python library to connect to hundreds of data sources and use the entire ecosystem of data processing libraries.
What makes bytewax so special?
Python Native
Stateful
Recoverable
Native Connectors
Scalable
Performant
What can you build with bytewax?
Recoverable Streaming Shopping Cart Application
Enriching Streaming Data from Redpanda
Real-Time Financial Exchange Order Book
Building Sessions from Search Logs
Handling Missing Values in Data Streams
Loved by the data community ๐
We went from 5 days of training to 5 minutes DIY. Anyone with a limited Python background can just get going immediately. A defensible 10x reduction in infrastructure cost.
![](https://images.production.bytewax.io/nathan_26078ee8e6.jpg)
I have a lot of fun integrating Bytewax into my projects. It brings a lot of value in removing the resistance to streaming technologies in Pythonโs ecosystem. Before tools like Bytewax, using a streaming engine was a real headache. Itโs totally exciting to be part of this movement ๐ฅ
![Paul Iusztin](https://images.production.bytewax.io/paul_iusztin_3d4077bada.jpeg)
We use Flink a lot internally, but after picking up Bytewax we are looking for more and more real-time ML workloads to use Bytewax with because we find it to be more accessible and faster to set up than Flink
We have been using Bytewax for well over a year and incredibly happy with the performance and support. I was able to ship the real-time analysis feature we needed at Hark in under a week and itโs been delightful to work with the Bytewax team.
![Hark Technologies](https://images.production.bytewax.io/hark_technologies_logo_63947fc087.jpeg)
Bytewax is simple enough that we can quickly prove ahead of time that we can solve a problem and then use the same tool to scale it and move it to production.
Libraries like Bytewax ๐ expose a pure Python API on top of a highly-efficient language like Rust. So you get the best of both worlds. Rust's speed and performance, plus Python' rich ecosystem of libraries.
![Pau Labarta Bajo](https://images.production.bytewax.io/pau_labarto_bajo_dddbe08d8d.jpeg)
I was a fan of batch things but after I discovered how easy is to implement a streaming pipelines with Bytewax, I changed my mind ๐
![](https://images.production.bytewax.io/vesa_1f9c238af6.jpeg)
The key difference between Apache Spark and Bytewax for me teaching my class on ML systems is that it takes me around six lectures to bring students up to the level where they can begin utilizing Spark. However, I only need one lecture to do the same with Bytewax.
![RJ Nowling](https://images.production.bytewax.io/rj_nowling_3e2eefe62c.jpeg)
Python alone is not a language designed for speed ๐ข, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink. However, things are changing fast with the emergence of Rust ๐ฆ and libraries like Bytewax ๐ that expose a pure Python API on top of a highly-efficient language like Rust.
![](https://images.production.bytewax.io/soufiene_yakoubi_d9a14d1b6c.jpeg)
Setting up Bytewax was incredibly straightforward, allowing us to go from pip to a fully operational dataflow in just minutes, without the hassle of complex build files and classpath issues found in JVM-based solutions. Remarkably, our production deployments have been rock-solid, seamlessly indexing multiple blockchains with unwavering reliability and no drama.
![](https://images.production.bytewax.io/aris_b98bacf00c.jpg)