Bytewax is known as a complete data processing solution, combining our core open-source library with a robust platform for orchestrating and governing your data flows. Today, we’re excited to introduce the Module Hub, a powerful expansion of the Bytewax ecosystem!
The Module Hub brings pre-built connectors and advanced operators to our open-source dataflow framework, designed to save your team time and let data engineers focus directly on high-impact projects. As highlighted in the State of Data Management report, up to 44% of data engineering time is often spent building in-house connectors. With our modules, you can reclaim that time!
While, with 500,000 downloads and a rapidly growing user base across industries worldwide, Bytewax’s traction speaks to its value, we would like to give our users even more. Now, focus on what matters most—building real-time streaming pipelines 5x faster with 80% lower TCO, connecting to all your sources and sinks effortlessly, and enabling cutting-edge AI use cases from edge to cloud.
If you are already convinced, please go ahead and see the Module Hub. If you need a little more explanation and details, you'll find them in this blog.
Introduction to Bytewax Modules
To understand what a module is, we must understand the steps in data flows and stream processing. In stream processing, you often come across the terms source and sink. The data flow typically involves these steps:
- Source: Where the data originates
- Transformation: The data is processed, filtered, or enriched
- Sink: Where the transformed data is stored
Connectors
Each data source or sink in Bytewax is called a connector. A source refers to any API, file, database, or data warehouse from which you want to ingest data, while a sink defines where you want to send your processed data—be it a data lake, database, data warehouse, or analytics tool. Each connector module falls into one or both categories. For example, DeltaLake module is a sink connector, AWS IoT Gateway module is a source connector, and Apache Kafka module has both.
Operators
Operators are the transformation building blocks of Bytewax. Each operator provides a specific “shape” for data transformation, while you give them logic functions to customize them to your specific task. Together, an operator and its custom logic function form a dataflow step. By chaining these steps in a dataflow, you can address your high-level data processing challenges.
If you’ve used Python's built-in functions like map
, filter
, or functools.reduce
(or similar functions in other languages), you’re already familiar with this concept. If not, no worries—our documentation includes examples for each operator in bytewax.operators
to help you get started.
Open-source Bytewax comes equipped with a range of operators fundamental to building flexible dataflows. Plus, you can add custom operators to handle specific, complex semantics or tackle particularly tricky data transformations. While working with the community, we noticed some advanced operators in high demand and we are happy to present them today as modules!
End-to-end dataflows
This one’s in the “coming soon” category. Imagine setting up a real-time vector embedding pipeline that captures changes in your S3 document store and streams embeddings directly into Pinecone—or any vector database—using Bytewax and the embedding model of your choice. How useful would it be to have a tried-and-tested dataflow module that you can install with ease? As open-source adoption grows, we’re seeing patterns emerge and where dedicated end-to-end dataflow modules could greatly accelerate development work. Stay tuned!
We have changed our offerings to adapt to user requests as they grow with Bytewax. I am excited to show our take on how we have added pre-built extensions to the open source framework in Bytewax modules. Modules are standalone Python packages that contain connectors, operators, or complete dataflow code to speed up development and increase capabilities. We endeavor for the software we build to align with the principle of making one developer go faster and further and we think modules do just that! Modules are commercially licensed and source available so you can give them a spin locally before you push to production with a license.
Zander Matheson, CEO, Founder at Bytewax
List of modules
For the modules below, you can either use them with an Apache 2, Open Source license ("open source"), or purchase Premium connectors in our store (also available as part of our platform). Every connector is well tested and in production today, and as we want to ensure that your experience is curated and that the Bytewax team provides the best guidance, we invite you to join our Slack.
ID | Name | License | Module Type |
---|---|---|---|
1 | Apache Kafka | open source | Sink, Source |
2 | Google BigQuery | premium | Sink |
3 | Hopsworks FS | premium | Sink |
4 | AWS Kinesis Streams | premium | Sink, Source |
5 | Clickhouse | premium | Sink |
6 | MongoDB | premium | Sink |
7 | MQTT | premium | Sink, Source |
8 | DeltaLake | premium | Sink |
9 | Amazon S3 | premium | Sink |
10 | Azure EventHub | premium | Sink, Source |
11 | RabbitMQ | premium | Sink, Source |
12 | AWS IoT Gateway | premium | Source |
13 | Azure IoT Data Hub | premium | Sink |
14 | Redpanda | open source | Sink, Source |
15 | Amazon MSK | open source | Sink, Source |
16 | Confluent | open source | Sink, Source |
17 | Redis | premium | Sink, Source |
18 | Websockets | premium | Sink, Source |
19 | Snowflake | premium | Sink, Source |
20 | Qdrant | premium | Sink |
21 | Milvus | premium | Sink |
22 | Pinecone | premium | Sink |
23 | MySQL | premium | Sink, Source |
24 | Google Vertex AI | premium | Source |
25 | Amazon SageMaker | premium | Sink |
26 | Feast | premium | Sink |
27 | Weaviate | premium | Sink |
34 | InfluxDB | premium | Sink, Source |
35 | Azure AI Search | premium | Sink |
36 | SingleStore* | open source | Sink |
37 | Interval Join | premium | Operator |
38 | Ordering | premium | Operator |
39 | Stateful timeout | premium | Operator |
40 | Timers | premium | Operator |
41 | Select Timerange | premium | Operator |
*This is a community-contributed connector by Tom Kühl; Bytewax was not directly involved in its creation. The connector is open-source—please refer to the project’s repository for licensing details and credits.
Getting Started: A Quick Example
To illustrate how easy it is to get started, let's walk through an example using the InfluxDB module.
Prerequisites
Make sure your InfluxDB instance is up and running.
Installation
pip install bytewax-influxdb
Configuration
We begin by setting up our InfluxDB credentials and details:
import os
TOKEN = os.getenv(
"INLFUXDB_TOKEN",
"my-token",
)
DATABASE = os.getenv("INFLUXDB_DATABASE", "testing")
ORG = os.getenv("INFLUXDB_ORG", "dev")
Dataflow
Next, define a dataflow:
from bytewax.dataflow import Dataflow
flow = Dataflow("a_simple_example")
Set up the InfluxDB source:
from bytewax.influxdb import InfluxDBSource
from datetime import timedelta, datetime, timezone
inp = op.input(
"inp",
flow,
InfluxDBSource(
timedelta(minutes=30),
"https://us-east-1-1.aws.cloud2.influxdata.com",
DATABASE,
TOKEN,
"home",
ORG,
datetime.fromtimestamp(1724258000, tz=timezone.utc),
),
)
The InfluxDBSource
operator pulls data from an InfluxDB instance. In this example, the source reads data from the home
at 30-minute intervals, starting from a specified timestamp. The data is streamed from https://us-east-1-1.aws.cloud2.influxdata.com
, which you can replace with your specific InfluxDB instance URL.
That's it! Now you can access your data, transform it, and pass it downstream.
Running
To run the dataflow, simply execute the following command:
python -m bytewax.run path.to.this.file:flow
For more details on the sink part, please refer to the module. Enjoy your dataflow with InfluxDB, and don't forget to purchase the license for production use!
To stay updated on more examples and use cases for other Bytewax modules, be sure to follow our blog and subscribe to our newsletter. We regularly share insights, tutorials, and advanced use cases to help you get the most out of Bytewax!
Get in touch
Interested in seeing a new Bytewax module? We'd love to hear from you! At Bytewax, it's our privilege to create tools that fuel innovation, and partnering with other companies and projects to bring new ideas to life is an honor. If there's a specific connector or sink you'd love to see, reach out – let's make it happen together! As an open-source project, Bytewax's module catalog is continuously expanding, with contributions from the community and the Bytewax team. Bytewax encourages you to build new modules and contribute enhancements, bug fixes, or entirely new modules for inclusion in the catalog. Learn more about how you can contribute to Bytewax modules here.
Stay updated with our newsletter
Subscribe and never miss another blog post, announcement, or community event.