Let’s talk terms: What is an interactive livestreaming platform?
Before we dive into the 9 elements you should consider when choosing an interactive livestreaming platform, let’s first define the key terms.
When we talk about an interactive livestream, we are referring to “real-time broadcasts in which the structured actions of the participants affect the content of the live stream.”
Essentially, it’s a traditional livestream video broadcast that allows the audience to interact, creating a two-way communication loop between you and the end viewers, (instead of a traditional one-way broadcast, where the viewer on the other end simply consumes the content). The streamer can see and share those interactions and results in real-time, ultimately giving the audience the power to influence the content.
The interaction engine is the tool responsible for powering the interactive layer on top of the standard video stream. It’s the key to the magic.
An interactive livestreaming platform is the digital place you go to set up and host the interactive livestream.
An interactive livestreaming platform combines the video streaming tools you’d see in a traditional livestreaming platform (like YouTube, for example), with interactive tools (i.e., the interaction engine) that you can customize to play simultaneously on top of the video.
An on-screen interactive layer opens the door for customers to engage in new ways: turning static viewers into active and influential participants. This structured engagement has been shown to increase retention, reach, and the average revenue per viewer.
Video Livestream + Interaction Engine = Interactive Livestreaming
Interactive Livestreaming is best hosted on an interactive livestreaming platform because it offers you an your end-viewers a seamless and easy-to-use experience.
Interactive livestreaming platforms vs. DIY Solutions
As we touched on, Livery customers who combined live video with interactive elements saw a user engagement boost of 80–95% over traditional live broadcasting. While these results are certainly convincing, there is a lack of end-to-end (i.e., easy-to-use) interactive streaming solutions. The complexity of DIY solutions limits the adoption rate for organizations big and small that would otherwise give it a try.
At Livery, we have seen firsthand how organizations are struggling with workarounds: they either use existing communication tools like Zoom, Meets, and Teams, (platforms designed for small scale collaboration), or they combine live streaming solutions like Vimeo and YouTube with third-party collaboration tools. The problem is, none of these cobbled-together systems were designed with interactive live video in mind. Even if they function, these DIY systems typically require incredible sacrifices in video quality or lag time.
Taobao: Paving the way for interactive livestreaming
In this video you can see how popular online video-streaming tool Taobao Live is being used by merchants to reach and engage with their customers.
Note: This video from Taobao is a great example of an interactive live stream and will be used as a reference throughout this article.
The above video shows how Taobao Live broadcasts use an interactive layer on top of an ultra-low latency (i.e., negligible video lag time) livestream during e-commerce events. The interactive layer allows viewers to actively participate via chat, send live emoji reactions, and shop the on-screen product offers.
While this concept was pioneered in Asia, European e-commerce platforms have taken notice. A growing number of Livery customers are using our interactive streaming solution to achieve similar results, eager to capitalize on the increasingly popular shoppable video trend.
Intime Mall’s staff used Taobao Live to sell products remotely during the coronavirus outbreak (https://www.alizila.com/).
In the Taobao example, the hosts use multiple interaction types–asking questions via chat, or initiating structured interactions (like polls) to collect real-time feedback from viewers, allowing the host to quickly adjust accordingly. Where traditional livestreams would leave the host stuck with relatively dead air, continuing with their content blind to the audience’s feedback, Taobao has instead created a full-feedback loop.
Since they already have their customers loosened up–clicking and interacting with the video–shopping becomes more natural action for their audience to take. As the stream continues, Taobao embeds product offers at top of the screen that appear the moment they are mentioned by the host. These offers use direct deeplinks (which allow viewers to buy the items without leaving the stream). Thanks to their setup, Taobao was able to hit $7.5 billion in sales during the first 30 minutes of last year’s Singles’ Day presale event.
If you want to create a successful full feedback loop like this one, you will need to understand the method behind the magic. Let’s dive into the 9 technical building blocks and challenges you can expect to face when designing an interactive live stream experience.
Let’s talk timing: A full interactive livestreaming feedback loop
To set the scene for breaking down the elements of an interactive livestream, let’s go through an example together. Imagine you’re hosting an interactive livestream for a new product launch. You want to ask your audience what their favorite product color is, presenting them with a few options to choose from. Here’s how the interaction could look:
(Note: The format for the below timestamps is MINUTES:SECONDS:MILLISECONDS)
00:00:000 Your livestream host (standing in front of a camera or mobile phone) is showcasing a given product. The host asks the viewers (tuning in on either mobile or desktop) which color they prefer.
00:03:000 The video is broadcasted with a delay of 1–3 seconds, (delivered via a scalable video CDN) to 100,000 viewers. As the question is presented to viewers on their end, a structured poll interaction simultaneously pops up on screen.
00:03:010 The poll data generated by the viewers is delivered in (almost) real time back to the interaction server. The server visually presents the data to your host.
00:23:010 The response window for the poll stays open for 20 seconds allowing your host ample time to discuss the incoming data or answer questions while the poll is still open.
00:23:020 The response window is closed and all user data is processed, allowing your host to reveal the results without missing a beat.
This seemingly simple example of an interactive livestream has a few specific parameters that you’ll need to address to make it work. Based on this flow, let’s lay out those requirements. At the highest level, they can be split into two sections, video and interaction requirements.
Your video solution would need:
- Glass-to-glass latency (the video delay from your streaming studio to your customer’s desktop or mobile device) under 5 seconds.
- To support 100,000+ viewers.
- The option to control the video latency.
- Support for iOS, Android, and web.
- Cloud encoder able to receive a SRT or RTMP video stream.
Your interaction engine would need:
- To support 100,000+ requests per second (the majority of your participants will be answering the on-screen question simultaneously).
- An option to sync the latency of the interaction with the video latency (so the video and poll are timed correctly).
- Support for 2-way communication (so the host can see the results and respond)
Main Takeaway: The expected scale and total number of concurrent participants are key limiting factors when determining the possible interactive livestreaming platform options.
9 Things you need to review when choosing an interactive Livestreaming Platform
1. Structured vs. unstructured interactions
When dealing with small groups between 2 to 20 participants, unstructured real-time interactions like chat, direct voice communication, and online whiteboards function well. These groups are small enough for a facilitator to structure the incoming information and allow direct and collaborative communication amongst the group.
Above 20 participants, unstructured real-time interactions get messy and overwhelming. If you plan to hold an interactive livestream for a bigger group, structured interactions become crucial. A structured interaction is just like it sounds–an interaction with structured responses. Examples include polls with predefined options, quiz questions, and live emoji reactions.
The main difference between structured and unstructured interactions is the predefined answers and the length of the response window, (for example, a poll interaction where the participants have 15 seconds to select one of four possible answers).
The communication phase during structured interactions is reduced–if you had a two-second delay between everything said in a Zoom call, for example, your user experience would be greatly diminished. You expect your communication to happen in real-time. If it takes two seconds to receive poll results, on the other hand, your experience isn’t affected. In a structured interaction, a higher video lag (or glass-to-glass latency) is acceptable because you aren’t dependent on any single response. In general, a 0.5-second glass-to-glass latency would not affect the fluidity of the broadcast.
2. Meta data vs. server interactions
The most cost effective-way to make a stream interactive is to embed the interaction data in the meta data (ID3 tags). This removes the need for an interaction server by integrating the interaction data right into the video feed. While the major advantage of this approach is cost reduction, it comes with an important limitation: meta data can only facilitate one-way communication. When using meta data, you cannot create a full feedback loop.
Note: Amazon’s IVS uses the meta data approach—you can learn more about it in this post.
When interactions—whether structured or unstructured—require a two-way channel, you need a server to process the data. The specifics (such as server capacity requirements and related costs), will depend on the type of interaction. For example, trivia is more complicated than a poll because of the associated scoring and leaderboard functions. This added complexity and two-way communication means that servers to make real-time score calculations. The more users they have, the more servers they’ll need. This increases costs, but decreases the risk of failure. But these aren’t the only two options. We’ll expand on the servers and their functionality a bit more in the next section.
3. Serverless vs. dedicated environments
Historically, seting up and maintaining dedicated servers was cost prohibitive. Thanks to serverless setups, today the costs and time to market barriers have been drastically reduced.
If you’re looking to build your own interaction engine, today several major players like AWS, Google, and Microsoft all have a serverless solution available for use. These setups are a great way to get started— but they start to break down once you reach 10,000+ concurrent viewers.
Let’s look at an example of a typical traffic pattern during a Livery interactive live stream:
A common traffic pattern during a Livery interactive live stream
The challenge with serverless setups relates to the infrastructure’s Cold Start, meaning the system is idle and is not able to scale as quickly as is needed for a smooth streaming experience..
Let’s explain with an example. When an interactive event (like a poll question) is triggered on top of the live stream, all participants will see the same CTA at the same time. In our experience, 60–85% of participants will interact within the same 1-second window. This results in a high volume of requests per second. A cold start system can’t smoothly handle the sudden volume spike and may start to break down under the pressure. This is why understanding your livestreaming platform’s server infrastructure is critical to your long-term interactive livestreaming success.
This challenge can be addressed by pre-warming the server infrastructure so it can handle the large influx of data. This warm up requires additional components like an up/down scaling system, load balancer, gatekeepers, and more. Finding a way to better handle these influxes is one of the reasons why we moved away from a serverless setup and built a fully-optimized custom platform: Livery gives us full control.
4. Multi vs. single tenant
A multi-tenant setup is generally more cost-effective than a single-tenant setup. In a multi-tenant setup, a server is used by multiple customers, or “tenants.” Naturally, not all customers will broadcast at the same time. This allows them to share the available interactive livestreaming platform resources.
In the instance that a single broadcast takes up a significant amount of the available requests within the cluster, it is best to isolate the customer (i.e. build their own setup). This prevents a single tenant from influencing another tenant’s broadcast reliability when using the same infrastructure.
When choosing an interactive livestreaming platform, it’s important to understand the back-end tenant setup. The Livery team can set up an optimized configuration regardless of your broadcast size. If a Livery customer expects more than 100,000 concurrent participants, we evaluatewhether a multi- or a single-tenant cluster is best.
5. Load testing
Every part of Livery’s interaction engine (as well as all new releases) are load tested for different scenarios. This way, we can be sure we meet customer expectations for our interactive livestreaming platform and its reliability.
When not properly load tested, an interactive livestreaming platform may create problems that hinder your success. When creating or choosing an interactive livestreaming platform, we strongly recommend you check for or perform load testing early in the development cycle.
Livery uses its own load test setup, which is provided by Ex Machina and trusted by major broadcasters worldwide.
6. Live video streaming parameters
Beneath the interactive layer lies the heartbeat of the broadcast–a live video stream. The next consideration for choosing your interactive livestreaming platform is the live video streaming parameters. As we saw in the Taobao example, the host stands in front of a camera and uses the input from his participants to enrich the video and make it more personal. This process only works because the interactive elements are in sync with the video and the delay from the in-studio broadcast to the user screen on the receiving end is minimal.
Ex Machina has years of experience building interactive experiences for both TV and live streamed events. Based on their research, they pinpointed the ideal latency for structured interactions to be between 0 to 5 seconds. A latency higher than 5 seconds starts to affect the feedback loop. But, even more important than achieving the lowest possible latency is synchronicity: the call-to-action and related participation windows cannot be too far apart for different users. An offset longer than 1 second will affect the end-user experience and overall fairness/game integrity.
If the latency is close to zero, there is no need to sync users — but even with the fastest video protocols, drifting can occur, which hurts the interactive video experience. When the stream is synced using a NTP time source, all participants can watch the same moment of the stream at the same time. With the proper optimization, it is possible to reach an almost frame-accrued synchronization across browsers and device types. If the interaction data is embedded in the metadata of the video, on the other hand, it is possible to achieve a frame-accrued sync with the video, but it cannot guarantee that all participants see the same moment and related interactions at the same time. This asynchronicity creates spoilers and breaks the feedback loop.
Based on the latency requirement of 0 to 5 seconds, protocols like HLS or DASH — including the tuned (small segments) versions — do not fit business requirements.
Live streaming protocols and their latency
The fastest kids on the block are WebRTC-based (UDP) protocols, followed by WebSocket-based (TCP) technologies which can achieve a sub-second latency. In the 1 to 3 second latency range, you can find HTTP-based technologies like ULL-CMAF and LL-HLS. There are more niche protocols available, but global adoption (and therefore, verified data) is still limited.
WebRTC-based and WebSocket-based technologies are mainly used as a base for tools focused on small audiences (like the meeting tools mentioned earlier), whereas HTTP-based technologies are created with large audiences in mind.
If you would like to learn more about the technical details behind WebRTC and ULL-CMAF you should check this post.
7. Video quality
Before we get into the technical side, let’s define a few key terms. First, the video bitrate. The video bitrate of a stream is the speed the video data is transferred to the end-viewer’s device. An adaptive bitrate, on the other hand, allows the speed to automatically flex up or down depending on variables like the internet connection.
Both WebRTC and HTTP streams can support a type of adaptive bitrate. In WebRTC-based solutions, this is done on the end-user’s side with Simile Casting. For HTTP-based streams, it is done via ABR, which is server-side (i.e., your side). When it is done on the end-user’s side, you depend on the streamer’s hardware, while server-side means the platform is in control.
The buffer available for HTTP (and socket-based) streams allows the players to hide possible hiccups or perform retries on chunks that are not properly received.
Image credit: magnolia ceiling | imgur.com
More technical details about how to measure video quality can be found here.
One of the major advantages of HTTP-based technologies is that they can scale more affordably.
WebRTC and WebSocket-based technologies, on the other hand, require a dedicated delivery infrastructure when they’ve reached the peer limits of the browser. This makes WebRTC for 500+ viewers 2–8X more expensive on average than HTTP-based technologies.
The capacity and availability for HTTP-based technologies depends on the CDN used for the video delivery. Major organizations like Akamai have built a global infrastructure able to deliver video to all corners of the world, including third-world countries. The structure is used by known VOD solutions and is prepared for 4K and 8K video resolutions into the future.
WebRTC and WebSocket-based technologies, on the other hand, require you to rent dedicated machines. Any required tools will need to be deployed to fit onto these machines.
Note: It is expected that major CDNs will extend their WebRTC support in the future, and the rise of dedicated WebRTC CDNs with global reach will tackle one of its inherent scaling and availability challenges.
The information above provides important insights to help you decide how to choose an interactive livestreaming platform.
We created Livery Interactive next to Livery Video because many of our customers didn’t have the technical knowledge or resources required to build an interactive streaming platform themselves. They wanted a single trusted point of contact to be responsible for delivering both the video and the interactive experience so that they can focus on making engaging livestreams. To understand how the Livery system works, check out the component breakdown below.
A high-level overview of the Livery platform.
Our development teams are constantly working to extend both the video and interaction platforms. All of our features are built in-house, which allows us to provide you with a high-quality solution at a reasonable price compared to our competitors. To make things easy, we’ve even created an online calculator to determine what the cost of your custom solution would be. If you like to use an off-the-shelf solution, please reach out to us and get started today.