Re-architecting for *real* scale

7 minute read

On the surface, Panda is a pretty simple piece of software – upload a video, encode it into various formats, add a watermark or change frame rate, and deliver it to a data store.

Once you spend some time with it, it begins to show how complex each component can be – and how important it is to continuously improve each one.

Lucy Production Line

When Panda was first built, it worked beautifully, and it was quick! But as time went on, and the volume of videos encoded per day increased, it became obvious that to keep pace with increasing speed requirements from customers, and maintain growth – core parts of the platform were going to need to be rethought.

We started looking at each component piece by piece, to find bottlenecks, optimize throughput and keep a fair operating expense so we could retain our price leadership. Panda might be a software platform – but having read the ‘The Goal‘ by Eli Goldratt about a manufacturing plant really reminded us of the process. (It’s a great read btw).

In July we updated to the most current versions of Ruby and Go – and added a memory cache to tasks that were maxing out our instances. Then we tackled the big scale bottleneck – the job manager.

Our biggest bottleneck: the Job Manager

The Job Manager is built to ensure that our customers video queues get processed as close to real-time as possible, and distributes transcoding jobs to the encoder clusters. Whether it’s 2000 encoders on 8 CPU cores each, or 1 encoder on 1 CPU core it’s important it’s allocated correctly.

It monitors all encoding servers running within an environment, receives new jobs, and assigns them to instance pools.

The Panda Job Manager was a single thread Ruby process, which worked well for quite some time. We noticed it would start struggling during peaks, and we had to do something about it. We started looking at where we could optimize it, by identifying each bottleneck one by one.

It was obvious that events processing was too slow in general, but before we even fired up a profiler, we managed to find a huge one just by looking at logs and comparing timestamps.

Redis Queue Architecture

Short digression: We use Redis queues for internal communication, and there was one such queue where all messages for the manager were being sent. The manager was constantly pooling this queue and most of its work was based on messages it received. Each encoding server had a queue in Redis too, and all these queues were used for communication between the manager and encoders.

Image 2014-12-04 at 12.24.46 PM

Because a single Redis queue was used for new jobs as well as manager/encoders communication, huge numbers of the former were causing delays in the latter. And a slow down in internal communication meant that some servers were waiting unnecessarily long for jobs to be assigned.

Is Ruby and Redis the Answer?

The obvious solution was to split the communication into two separate queues: one for new jobs and another one for internal messaging. Unfortunately, Redis doesn’t allow blocking reads from more than one queue on a single connection.

We were forced either to implement Redis client that would use non-blocking IO to handle more that one connection in a single thread, or resort to multiple threads or processes. Writing our own client seemed like a lot of work, and Ruby isn’t especially friendly if you’d like to write multithreaded code (well, unless you use Rubinius).

Before trying to solve that, we launched manager within a profiler to get a clearer picture. It turned out that roughly 30% of time was spent at querying the database (jobs were saved, updated and deleted from the DB), and the remaining 70% was just running the Ruby code. Because we were a few orders of magnitude slower that we wished, optimizing neither just the database nor the Ruby code would be enough (and we still had to solve the queues issue). We needed something more thorough that a simple fix. 

Go baby, GO!

gopherWe started by rewriting the manager in Go. We didn’t want to waste time on premature optimization, so it roughly was a 1:1 rewrite, just a few things were coded differently to be more Go-idiomatic – but the mechanics stayed the same.

The result? Those 70% that were previously spent on Ruby code dropped to about 1%! That was great, we got almost 70% speed-up, but we were still nowhere near where we wanted.

Multithreading

Then we fixed the queues issue. With Go’s multithreading model is was so simple that it’s almost not worth mentioning – we even accidentally got a free message pre-fetching in a Go channel (another thread pools Redis and pushes messages to a buffered channel). And this was a huge kick – now we could handle more than 16,000,000 jobs per day per job manager.

We could have pushed it harder, but we still hadn’t even started profiling our new Go code at this point. Golang has great tools for profiling, so rather quickly we were able to go through the bottlenecks (it was database almost all the time). When we decided that it’s enough, we started testing… And we just couldn’t get enough EC2 instances to reach manager’s limit. We ended at about a bit less than 80,000,000 jobs per day and even a sign of sweat wasn’t visible on manager.

The graph below shows the number of videos per day projected from the number of videos processed within the last 30 minutes. We started at a bit more than 1,000,000, then switched to the Golang manager and got to the 80,000,000 limit – but there were no more jobs (we reached our EC2 spot limits while performing the benchmark!), so we might have processed even more (but it should be a safe number for some time).

YC4FoQlSGCzci5043QJO

The end result of this phase is a technical architecture that clears queues much faster, and for the same encoder price, delivers better throughput and greatly enhanced encoder bursting (especially good during the holiday season where we often have customer that ratchet up activity by 100x!), and more automation. We’re not done yet – and we have some fantastic features coming in 2015 that the new back-end enables us to deliver.

PS. Kudos should also go to Redis – it’s a fantastic, very stable and battle-tested piece of software. Big thanks, Antirez!

Do you have a suggestion or have some knowledge you’d like to share with us? We’d love to hear from you – get in touch support@pandastream.com anytime (we’re 24×7).

Read More

Apple’s iPhone 6 and 6 plus boast support for H.265

Image 2014-10-16 at 4.09.16 PMApple released its flagship device, the iPhone 6 and iPhone 6 plus a few weeks ago, and according to Tim Cook, it’s their biggest iPhone month ever.

Most analysts, fanboys, and tech reviewers are keen on the larger screen size, new processor, and how thin it is.

Here at Panda on the other hand, were delightfully surprised that on their specs page both the iPhone 6 and 6 plus are said to utilize H.265 for encoding and decoding FaceTime.

As we’ve said in our previous blog post, H.265 or High Efficiency Video Coding (HEVC) is said to match the quality of H.264, but at half the bit rate. This would be a massive help for cellular networks, by reducing bandwidth by up to 50%.

Interestingly, in today’s Apple event they announced the new iPad Air 3, but that device does not support H.265.

H.265 has yet to see wide adoption on the consumer device market, so perhaps the iPhone can blaze another trend, as it has done so well so far.

Send us a note to support@copper.io if you want to get started with H.265 video encoding.

 

 

 

 

Read More

Panda introduces support for H.265

What is HEVC

H.265 or High Efficiency Video Coding (HEVC) is the next generation of H.264 which is commonly used in blu-ray encodings. It’s goal is to improve compression – not just add more – up to 50% over it’s predecessor,  while attaining the same level of picture quality. It can also support resolutions up to 8192×4320 (8K).

HEVC Background

There are two key groups that are helping move this industry forward, The Motion Picture Experts Group, and the International Telecommunication Union’s Telecommunication Standardization Sector (ITU-T). Side note: Could you please find an easier name ? Someone here is being a troll.

Their goal is to reduce the average bit-rate by 50% for fixed video quality, and higher quality at the same bit-rate, while remaining interoperable and network friendly.

Since the majority of internet bandwidth is video (I’m looking at you netflix), one can imagine by reducing the bit-rate of video while keeping quality high could significantly reduce the strain on current networks.

HEVC Frame Types

Similar to H.264 and MPEG-2, there are three types of frames: I, P, and B. These frame types are the core to video compression, but in newer codecs such as H.265, the algorithms used are becoming more sophisticated.

  • I Frame (Intra-coded picture): Like a static image, these frames are often used as references for decoding other frames. They are usually the biggest, with the most data, but are used as references for other frames to be smaller.

  • P Frame (Predicted picture): This frame uses data from the previous frame that is unchanged, and only updates the areas that have changed. This frame can use image data and/or motion vector displacements to create the frame.

  • B Frame (Bi-predictive picture): A more advanced version of P, as it looks at the frame before and after to create a frame. These frames are the most efficient for final file size, but significantly slow the encoding process.

How does H.265 work and how is it different

HEVC breaks down each frame into coding units (CU) which are small blocks ranging from 4×4 pixels, all the way up to 64×64 pixels. The old maximum size was 16×16. These blocks are then used to compare which areas of the frame to change, and which areas can be referenced from I-Frames.

There is also an increased number of modes for intra prediction, from 9 in H.264 to 35 in H.265. While that will be much more processor intensive, the larger blocks will be more efficient.

coding-units

Image credit: elementaltechnologies.com

All these improvements sound great, but it needs a great deal of computational power, in some cases up to ten times. This is one of the reasons we have introduced multi-core encoders into our infrastructure. They can handle these calculations, and increased resolutions.

Encoding Tools

Each encoder can vary depending on it’s implementation and use of tools available. This includes but not limited to:

  • Intra prediction
  • Motion compensation
  • Motion vector prediction
  • Filters
  • Parallel processing

Heads up vs. VP9

There have been preliminary tests executed by the Fraunhofer Heinrich-Hertz-Institute on performance comparisons of H.264/MPEG-AVC and H.265/MPEG-HEVC and VP9.

In similar encoding configurations, H.265 saw bit-rate savings up to 43% over VP9 and 39% over H.264.

Encoding times were a totally different story, where VP9 outperformed H.265 by 7.3% and H.264 by 130%.

We’ve done our own tests, and in the example shown below, we’ve been able to get H.265 almost 50% smaller than H.264 with the same quality. Since there is no browser support for H.265 yet, you can download a chrome plugin, or play it with VLC player.

You can view the H.264 video below, which is 1.7MB, and download the H.265 video, which is 964KB.

Where the chips fall

Our initial tests show that VP9 and H.265 have similar file sizes, VP9 in conjunction with WebM seem to be more reliable for streaming. However, H.265 seems to have better image quality.

While this isn’t turning into a Blu-ray vs HD-DVD competition, VP9 does have a leg up being royalty free. Most companies have announced support for both formats, but YouTube has yet to support H.265, and is encoding most high res videos with VP9.

H.265 and Panda

The complexity and increased processing power needed for HEVC are well matched to infrastructure and software that Panda provides. We’ve recently added multi-core encoders just for this reason.

A few customers have had early H.265 access and we’re now opening up broader access. If your business is interested in being on the leading edge, email us to be a part of the private beta at mark@copper.io

References

http://en.wikipedia.org/wiki/Video_compression_picture_types
http://en.wikipedia.org/wiki/High_Efficiency_Video_Coding
http://www.elementaltechnologies.com/
http://iphome.hhi.de/marpe/download/Performance_HEVC_VP9_X264_PCS_2013_preprint.pdf

Read More

Priority on Clouds and Select/Deselect all profiles

Two new exciting features to tell you about this week! Some of you may have noticed while using our GUI of Panda – we are on a roll with new features and improvements. This week we are bringing priority on which cloud, and select and deselect all on profiles.

Priority on Clouds

priority-cloudsPanda has been built as FIFO – First In First Out, and we encode most of your videos this way. But sometimes you have some more important videos that you need to jump the queue with.

Now with priority on clouds the videos that you need encoded asap are processed outside of the queue. Any videos uploaded to that cloud will be encoded before any other video uploaded previously.

We’ve configured this to be a little flexible, as there are three settings: Low, Normal, and High. You can build a hierarchy to get important videos delivered quick, and others just when there’s availability.

Remember, you can upload files up to 20GB in size, so maybe make those larger size videos low priority, and the small ones high priority.

select-deselect-all

Select and Deselect all Profiles

This one is pretty straight forward – if you have a number of profiles in your cloud you will be able to select/deselect all of them with one click. Sometimes it’s just easier to pick the ones you don’t want instead of selecting them manually, one by one. Just small tweak to make life easier.

Read More

Panda activity widget

Keep track of all your encoding jobs

We’ve recently launched a new great way to keep track of your encoding queue, rather than logging into the app. The Panda activity web widget is an easy glance at what videos are being encoded, and those that are pending.

Data includes the filename, the profile, encoding process, and number of jobs left in the queue.

Configure it!

It’s really easy to get the widget on your website:

  1. Log into your account at pandastream.com
  2. Click on a cloud
  3. Under Cloud Settings, change “Enable Web Widget” to Yes.
  4. Copy and Paste the code to your website.
  5. Click Save Changes at the bottom of the Cloud Settings page.

Read More

Encode Faster with Panda Multi-Core Encoders

We’ve been experimenting with ways to increase the speed of encoding times and we’ve developed a feature that allows you to turn the knob, slam the gas pedal, or push on the thrust lever. Multi-Core encoders.

Panda’s value proposition is buy an encoder and use it as much as you’d like. Our customers love that. If you have more volume of encoding jobs, you would add encoders. Adding encoders doesn’t increase the speed of jobs – you simply can encode more at once.

Video Encoding Kryptonite

Enter Panda Multi-Core. We think we have found the kryptonite to video encoding by enabling the use of multi-core encodings. With Panda Multi-Core, you now have the ability to turn on 2-32 cores on your video encoding processes.

At this point we are testing how much faster you can encode your videos using our most popular codecs, like WebM. We are still testing how fast we can increase the speed, but it is theoretically possible to increase encoding speed by 2x, 10x or even more.

Scenarios explained

Lets say you’re currently on our Crane plan, which is four encoders. That gives the capability of encoding four videos at the same time. With Panda Multi-Core you can turn up the speed on each of those encoders by as many cores as added.

multicore

Those 4 encoders can tap into our multi-core system, and you can choose to have as many cores on each of those 4 encoders. For example, you can encode four videos at once, but with up to 32 cores each!

Its kinda like being in a water slide race against your best buds, but you just put on a jet pack.

Apply for Beta

Currently we are testing out multi-core encoders with a few of our customers and seeing great results. Do you want to reduce your encoding times by 2, 10 or even 32 times? Give us a shout!

Read More

Timestamps in Panda

timestamp

A few weeks back we showed you how you could implement timestamps on your encoded videos, and we had a lot of great feedback – thank you!

A few of you also asked if this was available in the GUI as well. So, we did it! You can login to your Panda account, and select or create a new profile that you want to use for timestamps. You can check the timestamps box, and that will turn on the feature.

Currently, the options are limited, simply turning it on and off. If you are looking for more options in the GUI for timestamps, please let us know!

The video encoder adds the timestamp with FFMPG and works best with FPS of 24, 25, 30, 50 and 60.

timestamps-onvideo

Read More

The Program Praises Speedy Panda Integrations

We had the chance to sit down with Coe Lottis and Eli Anderson from The Program to discuss how they used Panda for their newest project. Coe is a Partner and Director of brand Strategy. He crafts concepts for their brand partners, then brings those concepts to market. Eli is a Senior Application Engineer.

About The Program

The Program is a digital agency located in Portland, Oregon with a growing client base in Los Angeles. The agency has a number of big brand clients, such as Nike, Twitter, Obey Clothing and more. Most of their clients lean on them for technology, and new and innovative ways of interacting with consumers across digital touchpoints.

The Program’s newly launched project allows users to upload their own content. This year-long project uses Panda. “It has been a fun one for us,” Coe says. Eli adds: “The app is live in production. Everything is working great.”

theprogram

Ease Of Use Made Panda The Obvious Choice

For this project, The Program was looking for a cloud video encoding solution. They relied on their application engineers to pitch solutions they thought would be a good fit for the project’s needs and requirements.

0ee4f96The Program’s tech lead, Eli worked with Panda previously, so suggested it for this project. “This was the third large project integration that I used Panda for. I evaluated it along with a couple other solutions. One of the main reasons I chose it for this project is the ease of integration. I mainly develop on Ruby, and did all the integration from scratch. This project uses Ruby on Rails and Panda has great integration there.”

“What it came down to was ease-of-use more than anything,” adds Coe Lottis. “As we looked at what some of the options were, it became apparent that Panda was the right fit. Having a tech lead that was familiar and comfortable working with Panda made it our obvious choice.”

User Integrations Makes Panda A Turnkey Product That The Program Recommends

Panda can easily be integrated using a web interface, javascript API or other integration. The Program sees this as one of its key benefits. Eli used Panda’s Ruby Gem to integrate. He goes on to say, “I also really like the debug console that Panda has for uploads, and being able to send messages back to the application. It’s super handy. The tool is set up well, and is easy to use.”

Coe LottisWe asked Coe if he would recommend Panda for similar solutions. “I would suggest Panda for its user integration. When we have a large project that takes upwards of a year, we look for solutions that don’t require heavy lifting from an integration perspective. Panda has a turnkey solution that provides all the features we needed for the requirements of this project. We were able to roll it out with minimal effort, based on the service that Panda offers.”

Panda Doesn’t Slow The Train

Sometimes, integrating third-party tools can slow down a project path. Not so with Panda, Coe states: “When you look at having to integrate third-party systems, it can bog things down and really slow the train. It’s key for us to work with a third-party vendor, like Panda, where we can get the support and information that we need in a timely manner. You can hit hurdles when you work through certain levels of integration that sometimes aren’t necessary. It may boil down to a third-party service not remaining focussed on what their piece is. Panda focusses on what it does and does it really well. It’s allowed us to plug in and keep moving along.”

Panda’s Pricing Sweetens The Deal

In addition to ease-of-use and ease-of-integration, Coe Lottis goes further to say: “The costs are great. Panda’s pricing model allowed us to hang on to our budget.”

Read More

Panda’s Video Encoders Get 50% More Memory

manydevicesWe’re always making improvements to Panda, and have now ensured that your video encodings run faster and more stable than ever before as we’ve allocated them 50% more memory. Specifically we’ve found that the WebM profile has benefitted the most, as it is one of the more popular codecs for delivering high quality video over the web.

What is WebM?

WebM is 100% free, open source audio-video container structure designed to provide royalty-free video compression for use with HTML5 video. The Project releases software under a BSD-style license. Users are granted a worldwide, non-exclusive, no-charge, royalty-free patent license. WebM files consist of video streams compressed with the VP8 video codec and audio streams compressed with the Vorbis audio codec.

Why use WebM?

webm-558x156

The WebM format was built for the future of the web. It can deliver high quality video while adapting to bandwidth changes, processing and memory allocations on each unique device. From doing hundreds of thousands of tests, VP8 delivers efficient bandwidth usage, which lowers the cost of storing and serving the video over time. Effectively, the time it takes to encode is a small hinderance for the long term benefit.

Why do videos encoded with WebM take so long?

VP8 in highest quality will take the longest. But the folks behind the project, specifically Google, have made it their next priority to speed up encoding times in the open source project.

We looked at our systems and found that WebM encodings use a lot of memory. To provide you with the best service possible, we are now allocating 50% more memory to help increase the speed, and stability of encodings.

You’re In Good Company

browsersNative WebM playback is supported by:

  • Google Chrome
  • Mozilla Firefox
  • Opera

Other browser support:

  • Internet Explorer 9 requires third-party WebM software
  • Safari for Windows and Mac OS X relies on QuickTime to play web media, and requires a third-party plug-in

Google’s WebM Project Team will release plugins for Internet Explorer and Safari to allow playback of WebM files through the standard HTML5 <video> tag.

Software and hardware support is expanding quickly and include:

  • Android
  • Nvidia
  • Intel
  • YouTube
  • Skype
  • Adobe’s Flash Player (will be updated to support WebM)

We’re here for you!

Panda’s support team is always here to help our users. If you have any further questions about WebM, or any other codec, ping us at support@copper.io.

Read More

Timestamps in Panda videos

The following is a post explaining how to use timestamps in Panda using our API. Interested in having this feature added to the web interface? Request beta access

article-2539097-1AA6876200000578-761_634x485In Panda, it’s easy to setup the encoding pipeline with presets developed by our team – with just a few clicks, it’s possible to configure profiles for the most popular audio and video formats on the web. With that comes some caveats, such as limiting how much you can configure.

FFmpeg and corepack-3

You might have already stumbled across a chapter in our documentation titled “FFmpeg Neckbeard”. It describes how you can create encoding profiles by specifying the whole FFmpeg command by yourself, which means that everything that our FFmpeg can do is available. We’ve recently added the FreeType library to corepack-3 (it’s the newest stack we have), and that made a few new things possible. One of them is adding timestamps to videos. This can be done through FFmpeg’s “drawtext” filter.

Play time

Okay, so let’s create a new profile that, besides doing some typical transcoding, adds a small timestamp in the top-left corner. The filter that does this can be configured by passing the below argument to FFmpeg:

-vf "drawtext=fontfile=/usr/fonts/FreeSans.ttf:timecode='00:00:00:00':r=25:x=5:y=5:fontcolor=black"

The “drawtext” filter takes a few arguments that tell FFmpeg how the timestamp should be rendered, the full list is available in FFmpeg’s documentation. And, as you can see, it needs a font. The example above uses “FreeSans.ttf” – it’s one of the fonts from the GNU FreeFont library, and the whole collection is available in /usr/fonts/ directory, so the following values of “fontfile” will work on Panda:

/usr/fonts/FreeMono.ttf
/usr/fonts/FreeMonoBold.ttf
/usr/fonts/FreeMonoBoldOblique.ttf
/usr/fonts/FreeMonoOblique.ttf
/usr/fonts/FreeSans.ttf
/usr/fonts/FreeSansBold.ttf
/usr/fonts/FreeSansBoldOblique.ttf
/usr/fonts/FreeSansOblique.ttf
/usr/fonts/FreeSerif.ttf
/usr/fonts/FreeSerifBold.ttf
/usr/fonts/FreeSerifBoldItalic.ttf
/usr/fonts/FreeSerifItalic.ttf

Now we can apply this knowledge to a profile in Panda. There are plenty of examples of creating new profiles in “FFmpeg Neckbeard”, and basically adding a timestamp is as simple as adding the filter argument to the command. The important thing is that you use corepack-3 – “drawtext” will not work with older stacks. That’s how this would look like in Ruby:

require 'panda'

Panda.configure do
  access_key "your_access_key_123"
  secret_key "your_secret_key_42"
  cloud_id "id_of_the_target_pand_cloud"
  api_host "api.pandastream.com"
end

drawtext_args_map = {
  :fontfile => "/usr/fonts/FreeSans.ttf",
  :timecode => "'00\:00\:00\:00'", # timestamp offset
  :r => "25", # FPS of the timestamp, for 1:1 ratio it should be equal to the FPS of input videos
  :x => "5", # x and y specify position of the timestamp
  :y => "5",
  :fontcolor => "black",
}

drawtext_args = drawtext_args_map.to_a.map { |k, v| "#{k}=#{v}" }.join(":")

Panda::Profile.create!({
  :stack => "corepack-3",
  :name => "timestamped_videos_v2",
  :width => 480,
  :height => 320,
  :video_bitrate => 500,
  :audio_bitrate => 128,
  :extname => ".mp4",
  :command => "ffmpeg -i $input_file$ -threads 0 -c:a libfaac" 
              " -c:v libx264 -preset medium $audio_sample_rate$" 
              " $video_bitrate$ $audio_bitrate$ $fps$ $filters$" 
              " -vf "drawtext=#{drawtext_args}" -y $output_file$"
})

The result (the original video is on pandastream.com):

That is nice, but we can do better. The timestamp could be more visible and in a better position – by using different “drawtext” switches we can add a background to the timestamp and place it near the bottom, centered. Using drawtext’s built-in variables we can even do this independently from video’s dimensions. The following does exactly that:

drawtext_args_map = {
  :fontfile => "/usr/fonts/FreeSans.ttf",
  :timecode => "'00\:00\:00\:00'",
  :r => "25",
  :x => "(w-tw)/2", # w - width, tw - text width
  :y => "h-(2*lh)", # h - height, lh - line height
  :fontcolor => "white",
  :fontsize => "18",
  :box => "1",
  :boxcolor => "black@1", # 1 means opaque
  :borderw => "5",
  :bordercolor => "black@1"
}

And the final result:

If you have any questions on this subject, send a note to support@copper.io. We’re happy to help.

Read More