Real-Time Data Transformation for the Electrification of Trucking

By Jonas Best

Highlights:

  • Real-World Testing with Route Segments: By mapping trailer GPS data to predefined route segments, Range Energy can validate telematics data through consistent, real-world experiments, accelerating vehicle development verification.
  • Fast Time to Production with a Small Team: Bytewax enabled Range Energy to quickly build, test, and deploy complex real-time data pipelines - all with just a small engineering team with no dedicated data engineers or data scientists.
  • Custom Transformations through Python Ecosystem: Bytewax’s seamless integration with the Python ecosystem allows Range Energy to leverage a wide range of Python packages for advanced, complex transformations, such as using Shapely for real-time segment matching.
  • Scalable Real-Time Dataflows for Future Growth: With Bytewax and the Bytewax Platform as the foundation of their real-time operations, Range Energy can efficiently develop, manage, and scale complex real-time dataflows as they scale their operations.

Range Energy: Smart Trailers for More Sustainable Trucking

About Range Energy

In the pursuit of sustainable transportation solutions, Range Energy is revolutionizing the heavy-duty trucking industry by electrifying big rig trailers. The San Francisco Bay Area-based startup retrofits standard trailers with electric capabilities, equipping them with batteries and powered axles that provide propulsive assistance. This clever technology reduces fuel consumption by up to 40%, offering significant cost savings and environmental benefits. With trailers that can easily hook up to any tow vehicle, Range Energy accelerates the transition to electrification, making trucking more sustainable without necessitating a complete overhaul of existing fleets. Currently, they offer two types of trailers: a dry van and a refrigerated trailer.

The Role of Telematics and Real-Time Data

To optimize operations and maximize efficiency, Range Energy leverages real-time telematics, with each trailer continuously transmitting streams of data from multiple electronic control units (ECUs) that are transmitted via a centralized CAN bus to a Telematics Control Unit (TCU). The data captured by the TCU includes key metrics such as location, speed, battery levels, and energy consumption which is either stored on the device or transmitted wirelessly to the cloud in real time. Every day, millions of rows of real-time telematics data are transmitted per trailer.

Trailer TCU.png Illustration of Range Trailer with TCU, CAN, and ECUs

Range Energy’s Use Cases for Real-Time Data Processing

By leveraging stream processing to handle large volumes of raw real-time data, Range Energy pursues several key real-time use cases:

  • Telematics data processing: Their primary focus is on analyzing telematics data to monitor trailer performance, driving product development and improvement with the help of data science and machine learning.
  • Mapping and geospatial processing: This includes fleet tracking and optimizing routes through real-time geospatial data analysis.
  • Real-time data dashboards: Providing customers and partners with live data and analytics through dashboards for fleet tracking, logistics optimization, or predictive maintenance.

The Challenge: Finding a Stream Processor that Works at Startup Development Speeds

Requirements for Streaming Stack

As a startup with a small engineering team and fast iteration cycles, Range Energy needed a robust data stack that operates at startup speed, allowing them to quickly develop, test, and deploy real-time data pipelines. Unlike large corporations that have entire teams of data engineers who can spend months and significant resources to build and maintain data pipelines at their disposal, Range’s Head of IT, Daniel Meyer, was looking for a solution that would allow their existing engineering team to build and deploy real-time pipelines to production within weeks. A key requirement was to use a scalable stream processing solution that doesn’t rely on a Java Virtual Machine (JVM), a dependency that introduces complexity, slower iteration, and higher overhead: Range E2.png The first message from Daniel to the Bytewax team

After several discussions and gaining a deeper understanding of their goals, it became clear that Range Energy had the following requirements for their stream processor:

  • Cost-effective to run, with minimal maintenance required in production.
  • Support for stateful processing, handling both batch and streaming data.
  • Broad connectivity across various data formats, with flexible integration for diverse data sources and sinks.
  • Scalable, robust streaming pipelines capable of handling growing workloads as the company expands.
  • Seamless deployment across different environments.
  • Fast time to value, empowering a small team to achieve more in less time.
  • No prior knowledge should be required from their Python-centric engineering team to build advanced data pipelines with powerful transformations.

Why Legacy Stream Processing Solutions Were Not an Option

The team at Range Energy realized that legacy stream processors like Flink or Spark, while very capable, presented several obstacles. Their complexity and lengthy setup times weren’t feasible for their fast-paced startup. The reliance on the JVM would add unnecessary complications for their Python-centric team, while the need for expertise in Java or Scala introduced potential hiring or retraining challenges. These limitations highlighted the need for a more agile, Python-native solution that is aligned with Range Energy’s requirements.

The Solution: How Bytewax is at the Heart of Range Energy’s Data Streams

From Open Source to Platform

Range Energy began testing Bytewax’s Open Source Stream Processing framework in February 2024. Recognizing it as a strong fit for their needs, development continued, and by mid-March, they were successfully running their first production workloads.

As Range Energy expanded its use of real-time processing and continuously added more real-time data pipelines, they were looking for a scalable way to manage and deploy multiple pipelines. The Bytewax Platform made this easier by providing a single pane of glass to manage, orchestrate & scale dataflows - without having to increase the team size or hire for special skills.

Solution Architecture

A key piece of Range Energy’s solution is the real-time data collection of their trailers. Each trailer is equipped with various sensors, or Electronic Control Units (ECUs), monitoring functions like braking, battery, and speed, all connected via a CAN bus to a central Telematics Control Unit (TCU). The TCU wirelessly transmits this data to Sibros, a cloud-based Connected Vehicle Platform.

But it is not only their trailers collecting real-time data, also their charging infrastructure captures valuable real-time data that is accessible through an API endpoint that is polled with Bytewax and ingested into a streaming pipeline.

Data from the Sibros Connected Vehicle Platform is stored in a Data Lake, while select real-time telematics data is serialized using an Avro producer and ingested into Redpanda Cloud as Kafka topics.

Within the streaming pipeline, raw data from different Kafka topics is enriched and processed using Bytewax:

  • Stream Processing with Bytewax: Bytewax serves as the core processing engine, handling real-time data transformations, computations, and analytics.
  • Bytewax Platform: Provides observability and monitoring, allowing users to track data flow, detect bottlenecks, and optimize processing performance.
  • Deployment on Amazon EKS: A scalable, managed Kubernetes environment used to run Bytewax applications, ensuring smooth and efficient operation.

Range Energy Solution Architecture.png

🔍 Click to enlarge image High-Level Streaming Solution Architecture

Deep Dive: Matching Route Segments with Bytewax to Run Real-world Experiments

The Role of Route Segments

As Range Energy develops its trailers, the ability to quickly test and validate new improvements is crucial. Whether it’s a firmware update to the battery management system or a physical change to the trailer, the effectiveness of these updates - initially forecasted through simulations - must be confirmed with real-world data. The closer the correlation between simulated and actual results, the faster Range Energy can refine its designs to improve trailer efficiency and performance.

Measuring vehicle improvements with real-world data is challenging due to hard-to-control variables like traffic, road conditions, gradients, and weather. To achieve high predictive accuracy and keep environmental variables as constant as possible, Range Energy uses frequently driven route segments as real-world experiments to allow for meaningful comparisons. Segments are essentially polylines comprised of GPS coordinates, stored in a global constant, against which real-world GPS data is matched.

What sounds conceptually straightforward, is a rather complex engineering problem - especially when one of the requirements is to do the processing in real-time. The Bytewax engineering team collaborated closely with Range’s team to come up with real-time solution for segment matching.

Range Energy DAG.png Bytewax Dataflow (illustrative)

Processing the Raw Data from Kafka

Real-time telematics event data comes in as a Kafka stream via Redpanda. The events are then keyed by device ID and location data is separated from the other telematics data.

The first challenge that Range Energy faced was that the latitude and longitude of their GPS data were sent as separate events in the same stream. The interval join operator, one of the premium operators shipping with our platform, was instrumental in combining latitude and longitude pairs into unified location observations for each trailer.

Segment Matching

Based on the stream of location observations, segment matching can start. Here, the dataflow maintains candidate paths for each trailer and applies a stateful matching algorithm to check if the trailer’s path aligns with predefined segments. Here Bytewax’s ability to seamlessly integrate Python libraries comes into play: Shapely is used in combination with a Bytewax flat map operator to align GPS data with route segments by calculating distances between the trailer’s path and stored segments in real-time. When a trailer’s path deviates beyond a defined threshold, the candidate is discarded, but when it stays within close range of the ideal segment, it’s matched to a route segment and the start and end times are recorded. Bytewax even allows the inspection of matched and discarded segments by generating visual GeoJSON files for later review.

Route Segment.png Segment Template in Mountain View on OpenStreetMap

Storing and Analyzing Segment Metrics

By collecting and storing all telematics data within each matched segment’s timeframe, every segment completion becomes a real-world experiment that can be compared to others. This approach enables Range Energy to analyze performance across configurations, calculate meaningful metrics, and identify trends over time. This can range from validating the impact of firmware updates, and mechanical changes, to other enhancements in real-world conditions, enabling Range Energy to make high-conviction, data-driven development decisions to quickly iterate their trailers.

Range E6.png Illustrative Segment Telematics Data (dummy data)

Impact: Bytewax Empowers Range to Rapidly Build, Test, and Deploy Powerful Real-Time Dataflows

“Bytewax allows us to quickly and seamlessly build, test, and deploy complex dataflows - all with our existing engineering team without dedicated data engineers or data scientists. Its Python-native design and real-time processing capabilities fit our needs and fast pace of iterations perfectly and the Bytewax platform makes it easy to scale as the number of our real-time dataflows grow.”

Daniel Meyer, Head of IT Range Energy

Streaming as an Enabler, Not a Bottleneck

Paradoxically, for many companies, the standout impact of adopting real-time stream processing is the complexity it adds to development workflows and team agility, often making real-time data handling slow and costly - Stream Processing is seen as a bottleneck. For Range, Bytewax’s greatest strength was that it provides high-performance stream processing without slowing down development through its ability to seamlessly integrate real-time processing into existing workflows. This allows Range Energy to efficiently build, test, and deploy complex dataflows without the need for dedicated data science or engineering teams.

The Power of Python

Bytewax’s Python-native design integrates seamlessly into Range Energy’s stack, removing the need for Java expertise and allowing the use of specialized Python libraries for complex transformations - taking full advantage of advanced Python libraries - for instance for its route segment matching. Bytewax also provides the flexibility to quickly incorporate additional real-time AI/ML-centric workflows, such as predictive maintenance and route optimization, without adding operational complexity.

Rapid Deployment and Cost-Effective Scaling

As Range Energy expands its real-time dataflows, the Bytewax platform has become central to managing and scaling operations. A core strength of Bytewax is its local development capability—dataflows can be created, tested, and refined locally before deploying to production. This flexibility helps the Range team iterate rapidly, catching issues early and optimizing performance. Range brought real-time pipelines to production in a few weeks, leveraging the platform’s local deployment and testing capabilities to iterate quickly. This efficiency meant Range’s small team could operate at scale without extensive overhead.

Robust Orchestration, Governance, and Recovery Options

The Bytewax platform serves as Range Energy’s orchestration and governance layer, simplifying the deployment and monitoring of real-time dataflows across Kubernetes environments. Through the comprehensive management dashboard, Range’s team gains full visibility into each dataflow, enabling quick troubleshooting and optimization. The platform provides an intuitive status view and visual insights, such as directed acyclic graph (DAG) representations and real-time pod logs, for efficient dataflow management. Customizable recovery options—like snapshot intervals and partitioning—enhance resilience, while flexible deployment settings allow the team to adjust configurations as their operations scale.

Ready for Future Growth

With Bytewax as the backbone of their real-time data operations, Range Energy has a stream processor that allows them to efficiently develop and scale complex dataflows. The platform’s intuitive management tools and Python-native design have streamlined their workflow, allowing them to focus on their core mission of electrifying the trucking industry. Bytewax not only accelerates their time to production but also ensures they can handle future growth with confidence.

“Range Energy is in many ways an ideal customer for us at Bytewax, they have started as open-source enthusiasts and became power users with the Bytewax platform. Range is good example of the unique possibilities when streaming processing meets the Python ecosystem. The ability of their small, talented engineering team to solve challenging real-time problems quickly exemplifies the power of stream processing with Bytewax. Most of all, their team has been an absolute pleasure to collaborate with.”

Zander Matheson, CEO, Founder at Bytewax

Are you facing a complex real-time data problem that you’d like to solve? We’re here to help in our 🐝 Slack Community, or feel free to reach out to us directly.

Stay updated with our newsletter

Subscribe and never miss another blog post, announcement, or community event.

Previous post

Jonas Best

Chief of Staff
Jonas brings extensive experience from Accenture and Monitor Deloitte, where he managed projects at the intersection of technology and business. Before joining Bytewax, he attended business school at the University of St. Gallen and HEC Paris. He is crucial in coordinating Bytewax's strategic efforts and ensuring seamless operations.
Next post