Serverless Media Streaming Architecture

In the future we might want (VTT Thumbnail Previews) of the content, we can implement this with the following:

Use MediaConvert (in the same job as above) to build a .jpg file every N seconds of the video, published to the S3 upload bucket.
Using EventBridge, we can listen for MediaConvert to finish Jobs, at which point we can trigger a media-thumbnailer Lambda function.
The lambda function will fetch the jpg files from S3, and then postprocess them into a mosaic, and build a VTT thumbnail file. The postprocessing will likely require the use of ImageMagick, fortunately, there is a Lambda Layer for ImageMagick here.

Note that some other cloud providers will already do step 3 for us in their media services, Eg Azure Media Services API v2 & GCP Transcoder.

AWS S3 Events are limited to one “listener” per suffix/prefix. To get around this, we can implement an mp4 fanout SNS topic, as we require multiple processing functions per mp4 file. This will also make it easier to extend by simply adding another subscriber to the SNS topic.
AWS Transcribe requires account wide unique job names, this is a problem if we need to retranscribe the same video, if we use the video name as the identifier of the job. Hence we decided to use DynamoDB to build a simple Key Value store from a unique job id to the video name, so we know where to store the subtitles later on. This table can utilise the PAY_PER_REQUEST billing mode, and is really low traffic so should actually stay within the free tier, woohoo!