Going Head-to-Head Against Flink

There are two aspects in modern Data & Analytics that are relatively undisputed:

Stream processing is the foundation of modern, event-driven data architectures.
Python is the preferred programming language and most powerful ecosystem for Data & Analytics, and even more so when it comes to Machine Learning & AI.

There is also little debate about the leading stateful stream processor in the market, which is Apache Flink. However, not all is well in the world of modern stream processing. The reason is that the Python implementation of Flink is flawed due to the neglected SDK: It is inefficient, slow to receive upgrades, and exceptionally lacking in user experience.

A quick online search or a conversation with AI/ML engineers or Python developers working with streaming data typically reveals sentiments like this:

LinkedIn comment by Daniel Palma about PyFlink

The Need for Powerful Python-native Streaming

The idea of Python-native streaming is not new: Robinhood's Faust (discontinued) proved that there is ample demand for Python-native stream processing.

Python is often criticized for its lack of speed and as a result many libraries dip down into other languages to provide GIL-free high-performance computations like C via Cython or more recently Rust via PyO3. However, pure Python as a programming language is not ideally suited for high performance applications such as distributed, parallel stream processing. Faust, as a pure Python library, ultimately was never able to reach performance or scalability levels close to the Java-based streaming tools like Flink.

Bytewax was founded to make Python-native stream processing as intuitive and accessible as possible, but—importantly—without sacrificing performance, functionality, or scalability.

If this seems like a Goldilocks set of functionalities, it is! And trade-offs are necessary. What we achieved with Bytewax has only been possible through pioneering a new approach for Python-based streaming: Leveraging the performance of the Rust library Timely Dataflow as a lower-level framework for distributed stream processing and exposing the functionality in Python, so from a developer perspective it provides all the benefits of a pure Python library. Of course, we still had some of the overhead of regular Python - serializing Python objects, contending with the GIL etc. - so it will never be as performant as pure Rust, but it excels on so many other dimensions.

The below illustration offers a better understanding of the Bytewax architecture:

And Bytewax is in good company with this approach: A growing number of performance-focused Python libraries, such as Polars, Daft, Pydantic, and others, are making use of the performance benefits of Rust.

Bytewax Has Come a Long Way

Since the founding of Bytewax, our open source library has seen a long list of improvements, performance enhancements, and feature additions. From state management and operators to deployment tooling like waxctl, and the Bytewax Platform for dataflow orchestration and governance, Bytewax has the features and capabilities of a mature, production-ready stream processor.

Today, Bytewax supports the table stakes operators of Flink, has a competitive set of I/O connectors, and allows users to quickly define custom operators and connectors.

We see an increasing number of organizations—from cutting-edge technology startups to some of the world’s biggest social networks—replacing Flink workloads with Bytewax. We believe this is an indication of the maturity of Bytewax.

An additional data point that further underscores our the increasing maturity of Bytewax is our user feedback:

Integrating Bytewax into my projects has been incredibly enjoyable. It significantly reduces the complexity traditionally associated with Python's streaming technologies. Before Bytewax, navigating these technologies felt overwhelmingly cumbersome.

Paul Iusztin, Senior Machine Learning Engineer

Transitioning from a five-day training requirement to a 'do-it-yourself in five minutes' setup represents not just a user-friendly improvement but a substantial, tenfold decrease in infrastructure costs. Bytewax makes it feasible for individuals with even basic Python knowledge to start streaming immediately.

Nathan Verril, Senior Principal Machine Learning Engineer

These testimonials confirm that Bytewax has achieved a critical goal: making stream processing simpler and more accessible without compromising power or functionality.

Let the Code Speak for Itself

There is no better way of illustrating the ease of use than by looking at the code required to do the same operation. When comparing code snippets for the same operations, it becomes evident why Bytewax is so much easier to use in comparison to PyFlink.

Looking first at the PyFlink code for a join/merge:

from pyflink.common import WatermarkStrategy, Row
from pyflink.common.serialization import Encoder
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FileSink, OutputFileConfig, NumberSequenceSource
from pyflink.datastream.execution_mode import RuntimeExecutionMode
from pyflink.datastream.functions import KeyedProcessFunction, RuntimeContext, MapFunction, CoMapFunction, CoFlatMapFunction
from pyflink.datastream.state import MapStateDescriptor, ValueStateDescriptor
from pyflink.common import JsonRowDeserializationSchema, JsonRowSerializationSchema

...

class JoinStream(CoFlatMapFunction):

  def open(self, runtime_context: RuntimeContext):
    state_desc = MapStateDescriptor('map', Types.INT(), Types.STRING())
    self.state = runtime_context.get_map_state(state_desc)
        
  def flat_map1(self, value):
    self.state.put(value[0], value[1])

  def flat_map2(self, value):
    self.state.put(value[0], value[1])

env = StreamExecutionEnvironment.get_execution_environment()    
env.set_parallelism(1)

ds = env.from_collection(...,
  type_info=Types...)

ds2 = env.from_collection(...,
  type_info=Types...)

connect_ds = ds.connect(ds2)
connect_ds.key_by(lambda a: a[0], lambda a: a[0]).flat_map(JoinStream(), Types...)

env.execute()

...

And now, a look at the Bytewax code for the same join:

from bytewax.dataflow import Dataflow
import bytewax.operators as op

...

flow = Dataflow("join")

inp1 = op.input("inp1", flow, ...)
inp2 = op.input("inp2", flow, ...)

merged_stream = op.merge("merge", inp1, inp2)

...

Using Bytewax greatly reduces the number of lines of code required to set up the dataflow and perform the join operation. There are four key factors evident in this code comparison:

Setup and Imports: PyFlink requires 13 import statements, including specific functions and classes for state management, serialization, and environment setup, while Bytewax requires two import statements.
Data Stream Creation: Bytewax abstracts much of the underlying complexity, while PyFlink requires the explicit creation and connection of data streams in addition to requiring several method calls for keying and processing.
State Management: State management is simplified, and users don't need to manually handle state descriptors or runtime contexts.
Execution: There is no need for environment setup and explicit commands for execution.

What isn't explicitly evident here is user experience outside of this: How do you install different dependencies? How do you run the software locally and remotely? How do you write and run tests? How do you debug? Have you ever looked at a JVM/Python stack trace 😳! All of these dimensions are critical quality of life dimensions that should not be ignored and they all happen to be areas where the team has focused a lot of their efforts over the past years.

Putting Bytewax to the Test

We are confident that Bytewax compares well to Flink based on our user feedback, code comparisons, and our own performance benchmarks. However, we wanted an unbiased evaluation, so we reached out to McKnight Consulting Group, known for their comprehensive benchmarks in the data space, focusing on Total Cost of Ownership (TCO), Time to Value, Performance, and Ease of Use.

McKnight Consulting Group found our proposal interesting and agreed to benchmark Bytewax against PyFlink. We expected to see clear advantages for Bytewax on some of the dimensions, but even we were surprised by how big the differences in TCO and infrastructure were for certain use cases and workloads

Here are some of their key findings:

Bytewax reduces the Total Cost of Ownership (TCO) for stream processing by an average factor of 4.6 compared to Flink. This reduction is primarily due to significantly lower development and maintenance efforts, with infrastructure costs also being considerably lower, though they contribute less to the overall TCO.
Bytewax decreases the development effort required for typical stream processing scenarios, from real-time analytics to real-time IoT processing, with a 1.5x to 8x reduction in effort needed when using Bytewax compared to Flink.
When comparing different stream processing workloads at 200k transactions per second, Bytewax has a 7x to 25x lower memory footprint. While memory is not the most expensive resource, it significantly impacts cloud infrastructure costs. More importantly, being memory efficient allows for the deployment of more pipelines or operations in environments with limited resources, such as edge deployments.