September 19th, 2017
Introducing Cloud Recording, straight from the Dark Dimension
Today we have something extra special for you: Cloud Recording. All Live sessions are now automatically recorded on Lookback servers. Participants will no longer need to upload, and sessions are ready to re-watch much faster. No upload screens, no asking participants to keep the app running — just pure, unadulterated research directly in your dashboard.
It is clear that the one feature that all of you—and all of us at Lookback—are the most excited about is the ability to perform live streamed remote research with anyone in the world. Ever since we introduced Lookback Live a year ago, we have worked tirelessly to improve it.
Cloud Recording was first released for Desktop Live back in February, and after working through several iterations, we are now able to provide it on all platforms: Desktop, iOS and Android.
In addition to Cloud Recording, this latest iteration also gives you higher reliability, much better performance on iOS, and a much wider range of device compatibility for Android devices when live streaming with Lookback.
In short, this is the best version of live remote research we’ve ever built, and we warmly invite you to use it! Learn more about Lookback Live
If you have any feedback, as always, please let us know at firstname.lastname@example.org!
For the technical audience, we’d also like to tell you the story of how we were able to build the technology for Cloud Recording across all of our platforms. I give you…
Being Consumed by The One — A technical journey
Building Live Streaming continues to be quite a journey. We use WebRTC, a free and open source protocol and stack for live audio and video headed up by Google’s Chrome team. Live video is hard, and most solutions are proprietary, so we are truly standing on the shoulders of giants when we’re able to use Google’s tech stack in our product.
WebRTC is designed to be very flexible in what it is used for. It does this by being signalling agnostic — rather than defining how clients are to discover and connect to each other, you have to implement this part yourself. WebRTC gives you hooks to communicate two pieces of information: descriptions of media streams (
ANSWERs) and connection routing (
ICE). Most everybody designs some sort of JSON-based protocol over WebSockets to a centralized server to coordinate this information, and so did we.
Whispers in the dark
Our first approach was to use WebRTC in a peer-to-peer mode, where our servers act as a mediator to let the moderator and the participant connect directly to each other. This gives them the lowest possible latency, and saves us from having to pay any video bandwidth costs for the streamed media.
Immediately, people requested the ability to have more people in their organization observe the live sessions. We had anticipated this, and our signalling protocol was already designed to allow more peer-to-peer connections, connecting everybody to both the participant and moderator.
This was a terrible idea. One mobile phone was already capturing and encoding screen, camera, microphone and metadata in realtime to files and disk; and then doing a second encode and upload of all those media streams to live-stream to the moderator’s computer. Now we were asking this bandwidth-starved device to upload the stream to additional computers on the internet. Even though the CPU could manage on the most high-end devices, bandwidth would immediately run out anyway. Even a desktop computer was only able to handle one or two additional watchers.
The false pages of Cagliostro
We realized we would have to route these additional streams through our backend servers, through some sort of rebroadcaster. We found an off-the-shelf product called Janus. It’s open source with a nice license, and is a very flexible piece of software, with a plugin system and built for a wide range of applications.
Still, it’s written in C so we didn’t feel super comfortable making large changes, and it wasn’t a perfect fit for what we wanted to do. We decided to keep our peer-to-peer protocol, and just smack Janus onto the side. Janus has its own signalling protocol, and it also doesn’t support multiple video streams, so we ended up having three websocket connections and three WebRTC peer connections (one peer-to-peer for the moderator, one Janus for camera video for watchers, and one Janus for screen video for watchers).
Moderators would connect over peer-to-peer, and as soon as an additional user wanted to watch the ongoing stream, we kicked off two Janus “video rooms”. Our participants now had to deal with twice the bandwidth and CPU usage, but at least it wouldn’t grow beyond that.
This was a bit fragile, but it worked! We shipped support for 20 watchers for Chrome and iOS, but were hesitant to add more complexity to Android, the platform which was proving the hardest to build Live for.
The tome that broke the wizard’s back
Our next invention was Cloud Recording. After live streaming a session, all the media data we record has already touched a server; so why not record it immediately when it does, instead of uploading boring media files for hours after the live stream has finished? The only downsides were lower media quality, and potentially dropped frames; but recording exactly what the moderator saw during the session made sense to us, and having complex code to get even higher quality recordings after the fact didn’t seem useful enough. So we started hacking.
Janus supports recording incoming media streams to disk. All we had to do was to make the client always connect to Janus even if there are no additional watchers; transfer all the information from the client to Janus that it needs to pretend to be a client uploading media files; and make Janus trigger an uploading script to put the recorded files in our standard media storage.
We dug into the C code, wrote our own plugins, and debugged for weeks. Eventually we got something that worked alright for desktop recordings, but mobile support was not good enough. For some reason a lot of frames were missing, and sometimes they cut out entirely after just a minute of recording.
This turns out to be because Janus is a “Selective Forwarding Unit”, which forwards the stuff that other peers request, but it doesn’t behave like a full-fledged client; so if the participant starts dropping frames, and there is no moderator/watcher peer available to ask for retransmissions, those frames are just gone forever. This is just one of many issues we encountered during the months we worked on this approach.
Eventually we were able to ship Cloud Recording to Desktop, but we were weary to even do further attempts to get this approach to work on mobile. Issues included:
- The peer-to-peer signalling protocol being designed for something completely different from what we were using it for. The product design of Live changed significantly over the year that we worked on it, and that moved our technical implementation further from ideal the more we built.
- The six different socket connections made error handling difficult and fragile.
- It was difficult to add new features because neither our backend nor the clients were in full control of the connection.
The summoning of Dormammu
In short, we were mired in the good kind of technical debt. We had not written bad code, we had just out-grown the design we originally made for our live offering, and there was a large mismatch between the design of the code and the reality of our product.
It’s the good kind, because we had learned so much, and had a very good perspective on the whole technical field of live video streaming with recording on mobile and desktop. In addition, we had just hired an amazing media engineer, Andreas Smas, who has a penchant for building things from scratch, in C, with excellent performance and resilience.
We were weighing three options:
- Continue with the three different connections, and keep hacking it to work.
- Consolidate all the work into Janus, and write a plugin that takes over the responsibilities of the peer-to-peer signalling protocol (e g the semantics around “moderator” and “participant”, being able to “call” and “answer” a call, changing moderator mid-call, etc).
- Consolidate all the work into a new piece of backend software written from scratch.
1) was out of the question. 3) seemed like way too much work. We had almost settled on 2) when Andreas told us that he had implemented a proof of concept of 3) over the weekend, implementing the entire WebRTC stack from scratch in C. That sort of determination moves mountains, and we were convinced.
The most difficult and important part of any programming task is naming. This one had to be good. The purpose of the project was to unify everything live related we have into a single component and protocol. A “consume the world” sort of affair. Everything becoming one. Having just watched Marvel’s new Doctor Strange, it became clear: this project would be named after the master of the Dark Dimension, the entity who wants to merge all universes into one singular being: Dormammu, the Lord of Darkness.
Lookback’s co-founder Nevyn, who had been there from the very first live prototype unicorns.io, sat down with Andreas and drafted a complete protocol specification, with sequence diagrams and all, before a line of code had been written. This was only possible with the combined knowledge of the full context and history of the product (with all the limitations imposed by the rest of the tech stack, and the intricate details of the product design); together with a deep understanding of WebRTC’s internals.
This coincided perfectly with a company retreat, and the whole team with Andreas on backend, Nevyn on iOS, Frida and Martin on Android, Martin on web, and Brian on devops were able to build a functional first version on iOS in a week. Over the next two months, we were able to add error and edge case handling, Android support, Desktop support, beautiful Docker-based deployment, load test it, instrument it, and QA test it.
Mister Doctor… How Strange.
Our new Live streaming infrastructure, codenamed Dormammu, is a success. Our reliability metrics are through the roof, we were able to throw away thousands of lines of complex code, and improve recording performance across platforms. It’s a great example of the correct timing of a rewrite.
We had already spent a year in “ship fast” mode, exploring the space of live streamed UX research. We changed our mind of what the “right way” of doing it was several times, and we did not stop to rearchitect every time, because then we would not have been able to iterate at a high pace, and we would have been dead before we found a compelling answer.
Once we had a great answer, we switched to “ship slow” mode, taking all the knowledge and distilling it into a beautiful answer, cutting as much scope as possible to keep the solution to exactly what we needed (but with the flexibility to expand it in the future). We were then able to build it in record time.
This wouldn’t have been possible without our fantastic team. It’s been an amazing journey, which we hope to repeat with every facet of our tech.