Join us on March 19th for a collaborative 🌎 virtual workshop with Startree, Streamlit, and support from AICamp, where we'll guide you through creating a real-time pizza 🍕 analytics dashboard in just two hours.
This 🆓 event not only offers learning opportunities but also the chance to win some amazing swag 🎁 for active participants.
❗️ Please keep in mind, we'll be handling all communications and answering your questions through our Slack channel - #workshop-room, so it's the perfect time to join if you haven't already.
Let's take a closer look at what the workshop has in store pizza shop for you.
Takeaway
You will be cooking up a real-time analytics dashboard for the operators of All About That Dough (AATD), an online pizza delivery service that specalizes in pizzas with Indian toppings. They will use the dashboard to get a live view on the number of orders and revenue of their business and to keep an eye on the most popular products.
Workshop prerequisites
- The workshop can be completed on Windows, macOS, or Linux. The host will be using Python 3.11 on MacOS.
- Join our Slack workspace, where we will have a dedicated channel for the workshop #workshop-room. Please ensure you're in prior to the event. It'll be available after the workshop, too.
- To run the app, you'll need docker.
- All code is here!
Learning Objectives
By participating in this workshop, you'll learn how to:
- Build a streaming pipeline to join data from multiple sources using 🐝 Bytewax.
- Analyze and aggregate the data to return live metrics using 🍷 Apache Pinot.
- Build a real-time dashboard to monitor the metrics with 🎈 Streamlit.
Architecture diagram
Below is a diagram showing how all the components work together ->
We will be focusing on the parts within the dashed line rectangle.
The agenda of the workshop
1. Introduction (10 min)
We start with briefly introducing the instructors and the technologies we will use.
Bytewax ➡️ is an open-source Python framework that simplifies building apps for streaming data. It's developed for real-time processing and supports aggregation, windowing and splitting/joining streams, making it easier to handle big projects.
Pinot ➡️ is a real time distributed OLAP datastore, designed to answer OLAP queries with low latency. With user-facing applications querying Pinot directly, it can serve hundreds of thousands of concurrent queries per second.
Streamlit ➡️ is a Python library that transforms data scripts into interactive web apps quickly and with minimal coding. It's ideal for creating data visualizations and dashboards, streamlining the development process for data scientists and developers alike.
1.1 Orders
Creating the orders and products is out of the scope of this workshop. Here is what we need to know:
- The orders service generates and publishes orders to a Kafka topic.
- Products are also published in a separate Apache Kafka topic.
- There are an infinite number of orders, which comprise the given products and are made by one of the users.
2. How we read and enrich data with Bytewax (30 min)
Order items are initially contained inside orders, and we use Bytewax to extract them and join them with product details before publishing them to the enriched-order-items Kafka topic. Expect to learn about:
- What is stream processing? Why is stream processing important in real-time analytics?
- Intro to the best Python stream processor aka Bytewax.
- Setting up Kafka and Bytewax environment.
- Setting up Bytewax to consume from and produce to Kafka.
- Writing a dataflow to transform and join data together.
3. How we analyze the data with Apache Pinot (30 min)
We store the data from the orders and enriched-order-items topics in Apache Pinot. Each topic has its own table and associated schema. We count the number of orders per minute and the revenue per minute, as well as find the most popular items and categories.
Compelling metrics to calculate:
1. Orders
- Count of orders per minute
- Revenue per minute
2. Enriched-Order-Items
- Most popular items
- Most popular categories
A brief outline of this section:
- Why do you need user facing analytics?
- How Pinot is different from other OLAP databases
- Apache Pinot® concepts and architecture
- Ingest data from Kafka
- Query data in Query console
- Running Pinot queries from Python applications
4. How we visualize data with Streamlit (30 min)
The code for the dashboard is written using Streamlit, a Python based framework for building interactive web applications. We query Pinot using its Python client and render results using Pandas' DataFrames and plot.ly charts. Dive into Streamlit world:
What is Streamlit? 🎈
- The basics
- Dynamic data apps in just a few lines of code
- Share data insights across teams and with the world
- Seamlessly composable – compatible with your fave Python library or GenAI stack
- How can you get involved with Streamlit?
Walk through the Streamlit app code for this demo
Talk through the main features that this app is using
Streamlit’s execution model and how to modify it
Bonus! What's next?
- editing order info and writing it back with
st.data_editor
- editing order info and writing it back with
5. Demo (10 min)
We demo the app and recap the main points step by step.
6. Assessment and Q&A
If time allows, we will expand on how to use the techniques from this workshop to visualize your data and build your own pipelines that can process your streaming data in real-time in a performant way.
Instructors info
Zander Matheson, Founder & CEO at Bytewax Zander is a seasoned data engineer who has founded and currently helms Bytewax. Zander has worked in the data space since 2014 at Heroku, GitHub, and an NLP startup. Before that, he attended business school at the UT Austin and HEC Paris in Europe.
Viktor Gamov is the Head of Developer Advocacy at StarTree, a pioneering company in real-time analytics with Apache Pinot. Viktor is known for his insightful presentations at top industry events like JavaOne, Devoxx, Kafka Summit, and QCon. His expertise spans distributed systems, real-time data streaming, JVM, and DevOps.
Caroline Frasca works on developer relations and partnerships for Streamlit's open-source Python library. Previously, she led customer success for Streamlit pre-acquisition and worked as a Solution Architect at Klaviyo. Caroline is also a UNC Chapel Hill Tar Heel, has two rambunctious cats, and loves to crochet.
Your input is valuable to us! ⭐️
Have suggestions for making this document better? We'd love to chat with you in the #questions-answered channel on Slack!
The full video can be find here ➡️
Stay updated with our newsletter
Subscribe and never miss another blog post, announcement, or community event.