New Cloud Services for Dynamic Advertising · Para além disso, existem ainda contratos tele-visivos que obrigam as estações a limitar a emissão de conteúdo a uma determinada

FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO

New Cloud Services for DynamicAdvertising

Vasco Manuel Pérola Filipe

Mestrado Integrado em Engenharia Informática e Computação

Supervisor: Maria Teresa Magalhães da Silva Pinto de Andrade

Co-Supervisor: Alexandre Ulisses F. Almeida e Silva

July 28, 2017

New Cloud Services for Dynamic Advertising

Vasco Manuel Pérola Filipe

Mestrado Integrado em Engenharia Informática e Computação

Approved in oral examination by the committee:

Chair: Luis F. TeixeiraExternal Examiner: José Manuel TorresSupervisor: Maria Teresa Andrade

July 28, 2017

Abstract

With the constant technological evolution that has occurred in recent years, and with informationbecoming increasingly globalized and with a higher level of consumption, advertising has playedan increasing role in the revenue of television stations and other suppliers of audiovisual material .However, this globalization and ease of access to information makes targeted advertising difficult,which has diminished its effectiveness. In addition, there are also television contracts that obligestations to limit the broadcast of content to a particular region. This need to present a personalizedproduct almost instantaneously results in a greater emphasis on automation, namely in the detec-tion and replacement of advertisements in real time. In this dissertation, a distributed applicationcapable of identifying and replacing advertising segments in an audiovisual stream is then pro-posed. The prototype will be able to receive audio and video in real time and detect advertisingsegments, replacing them with other content, which may come from a stream or a file. The endresult is a stream, with the least possible delay, where the initial advertising segments are replacedby more relevant ones. Taking into account the modularity of this project, it will be possible toadd or remove components depending on the use case, enabling solutions not only in advertisingbut also in the transmission and editing of video and audio in the cloud.

i

ii

Resumo

Com a constante evolução tecnológica que se tem verificado nos últimos anos, e com a informaçãoa ser cada vez mais globalizada e com um nível de consumo mais elevado, a publicidade tem tidoum papel crescente na receita das estações televisivas e outros fornecedores de material audiovi-sual. No entanto, esta globalização e facilidade de acesso a informação torna difícil a publicidadedirecionada, o que diminuiu a eficácia da mesma. Para além disso, existem ainda contratos tele-visivos que obrigam as estações a limitar a emissão de conteúdo a uma determinada região. Estanecessidade de apresentar um produto personalizado e de forma quase instantânea resulta numamaior aposta na automatização, nomeadamente na deteção e substituição de anúncios publicitáriosem tempo real. Nesta dissertação é então proposta uma aplicação distribuída capaz de identificare substituir segmentos de publicidade numa stream audiovisual. O protótipo será capaz de receberáudio e vídeo em tempo real e detetar segmentos publicitários, substituindo-os por outro conteúdo,podendo este ser proveniente de uma stream ou de um ficheiro. O resultado final é uma stream,com o menor atraso possível, onde os segmentos publicitários iniciais são substituídos por outrosmais relevantes. Tendo em conta a modularidade deste projeto, será possível adicionar ou removercomponentes consoante o caso de uso, possibilitando soluções não só na área da publicidade, mastambém na transmissão e edição de vídeo e áudio na cloud.

iii

iv

Acknowledgements

First, I would like to thank my family for providing me with all the conditions to evolve as aprofessional and as a person. To Maria Teresa Andrade for accepting this challenge and guiding methrough it. To MOG Technologies, for providing me with the conditions necessary to develop thiswork, especially to Alexandre Ulisses, for the challenge proposed, and to Pedro Santos, MiguelPoeira and Vasco Gonçalves for all the help provided during the semester.

Vasco Filipe

v

vi

“Whether you think that you can, or that you can’t, you are usually right.”

Henry Ford

vii

viii

Contents

1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Cloud Computing 52.1 Cloud Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Private Vs Public Vs Hybrid Clouds . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Hypervisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.2 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.3 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Advantages of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Digital TV Production over IP 113.1 SDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Internet Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Transport Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4 MPEG-TS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Real-Time Transport Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5.1 RTP Payload Format for Uncompressed Video . . . . . . . . . . . . . . 143.6 SMPTE 2022-6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.7 JT-NM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.8 NMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Automation of Advertising Replacement 194.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Market Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2.1 Anvato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.2 Audible Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.3 ACRCLOUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.4 Ivitec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.5 Adobe Primetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.6 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

ix

CONTENTS

4.4.2 Modularity and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5.1 Prototype limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5.2 MOG MPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Results and analysis 355.1 Test Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.1 Frame Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.2 Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.3 RAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.4 CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.5 Cascade Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Conclusions and Future Work 416.1 Fulfillment of Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

References 43

x

List of Figures

2.1 Cloud Model Layer System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Comparison between traditional and cloud models . . . . . . . . . . . . . . . . . 72.3 Comparison between hypervisors and container engines . . . . . . . . . . . . . . 9

3.1 PAT and PMT [Cia09] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 JT-NM simplified Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Node proposed by NMOS [Ass16a] . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Live dynamic server side ad insertion[Anv] . . . . . . . . . . . . . . . . . . . . 204.2 VOD dynamic server side ad insertion[Anv] . . . . . . . . . . . . . . . . . . . . 204.3 Technology of Audible Magic[Mag] . . . . . . . . . . . . . . . . . . . . . . . . 214.4 - Fingerprint density versus type of search [Tec] . . . . . . . . . . . . . . . . . . 234.5 Cloud Application Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.6 Input Distributor media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.7 VOD Replacement media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.8 Video Switcher media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.9 Business Logic media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.10 Advertising Detector media flows . . . . . . . . . . . . . . . . . . . . . . . . . 274.11 Output media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.12 VoD Storage use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.13 Additional Input Feeds Architecture . . . . . . . . . . . . . . . . . . . . . . . . 294.14 Prototype Developed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Application deploy scenario and data flow . . . . . . . . . . . . . . . . . . . . . 375.2 Cascade Test Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

xi

LIST OF FIGURES

xii

List of Tables

4.1 Comparative analysis by features . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1 Technical Specification of the test machines . . . . . . . . . . . . . . . . . . . . 355.2 Video Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3 Frame Rate in each module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4 Average Inbound and Outbound traffic in each module . . . . . . . . . . . . . . 375.5 Average Module RAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.6 Average Module CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

xiii

LIST OF TABLES

xiv

Abbreviations

API Application Programming InterfaceCPU Central Processing UnitEDL Edit Decision ListFPS Frames Per SecondHTTP HyperText Transfer ProtocolHTTPS HyperText Transfer Protocol SecureIGMP Internet Group Management ProtocolIAB Interactive Advertising BureauIP Internet ProtocolJT-NM Joint Task Force on Networked MediaMPEG-TS MPEG Transport StreamNMOS Networked Media Open SpecificationsOS Operating SystemRFC Request for CommentsRTP Real-time Transport ProtocolSLA Service-Level AgreementSMPTE Society of Motion Picture and Television EngineersTCP Transmission Control ProtocolUDP User Datagram ProtocolVAST Video Ad Serving TemplateVM Virtual MachineVOD Video on DemandXML eXtensible Markup Language

xv

Chapter 1

Introduction

In 2016, more than 500 thousand million dollars were expended in advertising, a 7.1% increase

from the year before. From this total, 36% was spent on the television market and 29.9% on online

advertising. [Mar]

Although being a market in expansion, advertising has been experiencing some problems,

namely in targeted publicity. This problem is especially relevant in television, where a TV segment

can be broadcasted to multiple countries. This reduces the efficiency of the advertising, as its

content needs to be relevant to a large group of viewers.

So, the personalization of advertising has become increasingly important for end-users, broad-

casters and entities that have contracted advertising space. This assumes a special importance in

the case of television content.

1.1 Context

Currently, the delimitation of the advertising segments in television broadcasts – i.e., the identifi-

cation of the beginning and end moments of a contiguous set of advertising content - is typically

done by a human operator. As a result, the process is expensive and potentially prone to errors

[CA08]. Also, it contributes to the introduction of delays in the availability of content over other

types of medium, since the operator has to wait for the end of the broadcast to edit the resulting

video.

There is also a trend and a set of standards and architectures that the industry is following in

order to achieve the virtualization of television production. SMPTE 2022-6 [S.M12] is one such

standard, and the one architecture is being developed by Joint Task Force on Networked Media

(JT-NM), a group formed by the European Broadcasting Union (EBU), Society of Motion Picture

and Television Engineers (SMPTE) and Video Services Forum. The purpose of this team and these

standards is the transition from physical equipment-based transmissions to IP networks.

1

Introduction

This dissertation was developed in a corporate environment at MOG Technologies. MOG is

involved in the broadcasting market since 2007 and provides solutions for post-production envi-

ronments.

This solution is part of a national project called MOG Cloud Setup, where MOG Technologies

partners with INESC TEC with the goal of developing audiovisual content preparation platforms

for ingest in the cloud.

1.2 Motivation

By automating the process of identifying and replacing advertising segments in television content

it is possible to reduce the delay in content availability in other mediums while also reducing costs

and the possibility of errors introduced by human operators.

A television broadcast capable of delivering personalized advertising segments provides an

upgraded product to the advertising space buyers, as more specialized segments can have signifi-

cantly more impact in the end-user.

The transition from video processing hardware to a cloud-based application can result in lower

costs in hardware (processing power, storage) while providing more flexibility, allowing the appli-

cation to scale as needed, without major changes to the architecture.

1.3 Goals

The main goal of this dissertation is to develop a cloud-based application prototype capable of

receiving and re-sending an audiovisual stream, replacing predetermined segments of such stream

with content originated from other sources.

This application should aim to reduce the delay inserted by such operations while maintaining

the quality of the content and must be fully automated.

1.4 Document structure

Besides Chapter 1, this document contains 5 other chapters.

Chapter 2 analyzes cloud computing characteristics and models, the role of virtualization in

those same models, Hypervisors and Containers and the advantages of a cloud based application.

Chapter 3 studies the IP protocol, as well as its uses in television production. It also analyzes

multiple protocols and standards used for IP broadcasts.

Chapter 4 analyzes the automation of advertising substitution, describing and comparing so-

lutions already developed by television companies. It then proposes an architecture for a cloud

based application for real-time advertising substitution and presents the developed prototype.

Chapter 5 presents the set of validation and acceptance tests that were performed in order to

validate the developed prototype, as well as analyzes the results of these tests.

2

Introduction

The dissertation ends with chapter 6, where a general appreciation of the proposed objectives

and future work is made.

3

Introduction

4

Chapter 2

Cloud Computing

With technologies in processing, storage and network areas progressing rapidly, computing re-

sources saw a rise in power while decreasing its price. To take advantage of this phenomenon,

research was made in order to accomplish the sharing of physical resources such as CPU and

storage among multiple applications, thus arising the Cloud Computing model.

Cloud Computing is a model that provides on-demand network access to computing resources.

It allows for a quick provisioning and low managing effort, adapting to the user’s needs and system

changes. This model has five main characteristics[MG11]:

• On-demand Self-service:

A cloud model needs to be able to automatically provide computing capabilities to the con-

sumer, such as network storage or memory.

• Broad Network Access

It should be available over the network and accessible by any platform, be it computers or

mobile devices.

• Resource Pooling

This model can provide services to multiple consumers with different needs, assigning re-

sources according their demands.

• Rapid Elasticity:

The model should allow for a rapid scaling, increasing or decreasing its capabilities as de-

manded, giving a sense of unlimited resources for the consumer.

• Measured Service

Lastly, the model should monitor resource usage, providing a report to the user containing

all the metrics associated with the system.

5

Cloud Computing

Figure 2.1: Cloud Model Layer System

The architecture of cloud computing model is structured by layers based on their level of

abstraction. So, a cloud layer is classified as of a higher level if its services can be composed from

services of the layers below [YBDS08], as represented in Figure 2.1.

• The first layer and the least virtualized is the Server layer, composed of the actual hardware.

• The Infrastructure layer provides tools to manage virtual machines.

• The Platform layer allows configuration of the resources available, such as CPU, storage

and bandwidth, using the services provided by the Infrastructure.

• The Application layer allows the implementation and deployment of cloud applications. It

uses services from the Platform layer to improve scalability, increasing or decreasing the

resources available as needed.

• The last layer and the least virtualized is the Client Layer. This layer provides the user

access to the application, usually through a mobile app or a website.

2.1 Cloud Models

There are multiple models of cloud services, different from each other in its level of abstraction

[RCL09]:

• Infrastructure-as-a-Service (IaaS)

In this model, the abstraction is located at the infrastructure level. The users can rent cloud

resources depending on their needs, paying only for the actual used resources.

6

Cloud Computing

Figure 2.2: Comparison between traditional and cloud models

• Platform-as-a-Service (PaaS)

The abstraction is located at the platform level, with the service provider being responsible

for the servers and the user focused only on the application.

• Software-as-a-Service (SaaS)

In this model, a full application is deployed and updated remotely, removing the user’s need

to install software on his machine.

As seen in Figure 2.2,the user’s control over the lower layers decreseas as the level of abstrac-

tion increases (Components managed by the user are represented in blue).

2.2 Private Vs Public Vs Hybrid Clouds

A private cloud infrastructure may be deployed on or off premises and is usually only available to

a single organization. The cloud can either be managed by local human resources, placing control

and responsibility on the organization or, on the other hand, cloud infrastructure management can

be completely outsourced or shared between the organization and a third party. Private clouds

provide more infrastructure control and data security but result in higher technical and economical

costs.

A public cloud infrastructure is open to the public and managed by a third party institution

providing a service for the users. In this model, control, cost and infrastructure are all responsi-

bilities of the third party provider. On the other hand, the provider can have access to the user’s

data.

A hybrid cloud infrastructure leverages a private cloud and public cloud services by imple-

menting a common communication interface able to create interoperability between them. The

integration is valuable for companies with sensitive data that should not be exported to third par-

ties, but that at the same time intend to integrate business processes with external services.

7

Cloud Computing

2.3 Virtualization

The introduction of multicore processors and the integration of virtual technology led to single

machines being capable of executing multiple parallel tasks. Using such powerful hardware to run

a single application in not sustainable, as the resources would not utilized while the application

was not running. On the other hand, using the same operating system to run multiple applications

can generate security problems, with no isolation available between applications. There can also

be situations where multiple applications try to access the same storage locations, ports or sockets.

Virtualization provides a hybrid approach between centralized and decentralized applications,

while at the same time promoting horizontal scalability and resource elasticity. Virtualization ab-

stracts the hardware and allows multiple guest Operative Systems running in a single host where

each guest is completely isolated and running a full OS. This approach brings the following bene-

fits:

• The idle time of a system decreases because of the number of virtual machines running

simultaneously.

• Resources can be managed and allocated to Virtual Machines individually.

• The system can be managed in a decentralized fashion.

• Heterogeneous systems can run within a single host.

• Failover times can be decreased by promoting simplified migration strategies or even live

migrations.

• Isolation mitigates guest security issues from a compromised or out of date host.

• Green computing is fostered by reducing the number of necessary physical machines.

2.3.1 Hypervisors

The evolution of virtualization revolves around the work on one piece of software, the hypervi-

sor, also known as VMM (Virtual Machine Manager). It allows physical devices to share their

resources with virtual machines running as guests [Sri09]. In this sense a physical computer can

be used to run multiple virtualized instances, each with its own OS and virtual hardware, CPU,

memory, network, I/O, all provided by the hypervisor.

The hypervisor also provides the possibility of running guest’s applications without modifying

them or their OS. This way, the guests are unaware of whether the environment is virtualized, due

to the hypervisor providing the same communication interface as a physical system.

There are several hypervisor categories, namely Type 1 and Type 2. A type 1 hypervisor is

implemented in the bare-metal, directily over the hardware, while a Type 2 is installed over the OS,

similar to a piece of software. A Type 1 hypervisor is superior to a Type 2 in terms of performance,

since the latter has to go through an additional layer, the host OS.

8

Cloud Computing

Figure 2.3: Comparison between hypervisors and container engines

There are four main competitors in the hypervisor market, responsible for 93% of its share

[PBSL13]. Xen [Bar03] and KVM [Kiv07] are two open source hypervisores based on Linux

while VMWare’s ESXi [Cha08] and Hyper-V [VV09] are closed-source solutions.

2.3.2 Containers

Application containers are an alternative to hypervisors, when the overhead introduced by the latter

is undesirable. Simply put, containers are a lightweight virtualization technology that enables an

application to be packaged along with its virtual environment, configurations and dependencies,

isolating it from the deployment environment. As represented in Figure 2.3, the main feature that

differentiates a container from a hypervisor is the fact that multiple containers can share the same

OS.

2.3.3 Docker

Docker is an open-source technology that allows applications to run in containers [Doc]. Using

a Dockerfile, a configuration document, Docker can then execute all sorts of instructions, ranging

from dependencies installation to configure environment variables. Compared to a virtual ma-

chine, Docker containers contain a similar level of isolation while requiring less storage space,

resulting in a more efficient solution.

9

Cloud Computing

2.4 Advantages of Cloud Computing

On-demand resources enable users to provision or decommission resources such as virtual pro-

cessors, memory or storage capacity manually or automatically. This highly adaptable and per-

sonalized feature provides the necessary resource elasticity to fulfill a given SLA (Service-Level

Agreement) and at the same time promoting cost savings by eliminating unnecessary resources.

Resources may be elastically provisioned and released either manually or automatically according

to a given resource demand. Also, more is achieved with less hardware, as cloud computing re-

sources serve multiple clients in a multi-tenant model where resources are assigned, released and

reassigned on demand.

Cloud computing also reduces on-premises hardware and the expert human-resources to man-

age them leading to easy implementation. Users without technical knowledge are able to run

complex applications and grow their businesses.

10

Chapter 3

Digital TV Production over IP

The virtualization of mobile production, transitioning from SDI to IP, is an object of discussion

and study in the broadcasting industry. By using a single location to broadcast all signals and

substituting hardware for software, television companies can significantly reduce broadcasting

costs.

3.1 SDI

Serial Digital Interface (SDI) is a family of digital video, audio and metadata interfaces normalized

by SMPTE (Society of Motion Picture and Television Engineers). Since the nineties, this interface

is used as the standard for television production [Kov13].

SDI is only available in professional equipment and allows the transmission of uncompressed

video and audio.

Each time a new format emerges, SDI needs to be adapted to accommodate it. So, each time a

new technological breakthrough is made regarding television transmission (4K, 8K, 3D, etc.), all

SDI infrastructures need to be adapted [Gol].

3.2 Internet Protocol

TCP/IP is a set of communication protocols used on the Internet and similar networks. This model

is composed by five layers, each one responsible for providing services to the upper layer protocol.

These layers are, from top to bottom, Application, Transport, Network, Data Link and Physical.

Taking into account the scope of this dissertation, it is worth analyzing the two upper layers,

Application and Transport. The Transport layers provides means for communication between

different machines, while the application layer allows different processes to communicate among

them, using the Transport layer services.

11


3.3 Transport Layer

UDP (User Datagram Protocol) is a transport layer protocol for network applications based on IP

(Internet Protocol). It is a simple protocol, where each packet is sent only once, regardless of it

being corrupted or lost. This protocol is mainly used in real-time applications such as video games

and video conference applications, prioritizing efficiency above integrity. It is a connectionless

protocol.

On the other hand, TCP (Transmission Control Protocol) is a connection-based protocol. Be-

fore starting the data transfer, a connection is established between the emitter and the receiver.

Each time a packet is received, the receiver send a message to the emitter with a confirmation of

the packet reception. This protocol ensures that the data is fully received, in the right order.

Despite the TCP being a more complete protocol, the UDP can be used in cases where the

overhead introduced by the connection protocol endangers the viability of the application. Also,

in controlled environments such as private networks, the downsides of the UDP are significantly

reduced.

There are three methods of communicating via IP. These methods differ in the number of

receivers for each transmission:

• Unicast

In a unicast transmission, for each message sent, there can only be one receiver. It is a

one-to-one method.

• Broadcast

In a broadcast transmission, the message sent can be received by every node of the network,

without exceptions.

• Multicast

Multicast is, similarly to the broadcast, a one-to-many transmission. With this method, the

message is only received by interested nodes. To declare interest, a node can send messages

asking to join or leave a multicast group. The emitter, instead of sending a message for each

receiver, sends one single message to the multicast address. This allows the emitter to send

messages to multiple nodes without prior knowledge about their interest.

3.4 MPEG-TS

MPEG-TS [Mpe00] is a media container for content storage and transmission. MPEG-TS streams

are composed by one or more program, described in a Program Association Table (PAT). If the

stream contains only one program, it can be designated as a Single Program Transport Stream

(SPTS) while if it is composed of multiple programs, the stream is defined as a Multiple Program

Transport Stream (MPTS) [ETS07].

12


Figure 3.1: PAT and PMT [Cia09]

A program is a combination of one or more stream of PES (Packetized Elementary Stream)

packets. For example, a program can contain an audio PES, a video PES and a subtitle PES. A

PMT (Program Map Table) stores the information of all programs present in a MPEG-TS stream,

as seen in Figure 3.1.

A MPEG-TS stream is a group of TS packets, each with a size of 188 bytes, 4 of them used

for the header. This header contains a Packet ID (PID) that allows the identification of its content.

Each PES packet contains a PTS (Presentation Timestamp) and a DTS (Decode Timestamp),

allowing the synchronization of multiple ES.

A MPEG-TS stream must be encapsulated in a RTP packet to be transported using IP. Each IP

packet contains a IP header, a UDP header, a RTP header and the number of TS packets it contains.

3.5 Real-Time Transport Protocol

Real-time Transport Protocol is a protocol used to send audio and video over IP networks. Each

RTP packet contains a header with information regarding timestamps, sequence numbers, among

others, while leaving a customizable field, allowing the extension of the protocol.

By marking each packet with a sequence number, it is possible to re-order packets regardless

of the order in which they where received.

By using RTP allied with UDP, it is possible to maximize the speed of data transfer while

keeping the packets ordered.

13


Real-Time Control Protocol is a protocol used simultaneously with RTP to monitor and pro-

vide statistics about the transmission.

3.5.1 RTP Payload Format for Uncompressed Video

RFC 4175 [GP05] specifies the norm for the transport of uncompressed video using RTP. While

an RTP packet header only allows the use of 16 bits to identify the packet sequence number, RFC

4175 contains a 16 bit extension called extended sequence number. In a 1-Gbps video stream

and sending 1000 byte packets using only RTP, all possible sequence numbers would roll over in

0.5 seconds. This could create problems identifying lost and out-of-order packets. With a 32 bit

sequence number, it would take approximately 9 hours for the number to roll over.

3.6 SMPTE 2022-6

SMPTE ST 2022-6 [S.M12] is a standard published by SMPTE, which belongs to the SMPTE ST

2022 family of standards that allows the use of IP technology in the broadcasting industry. The ST

2022-6 defines the transport of SDI over IP using the RTP protocol, so it is dubbed "SDI over IP".

By encapsulating SDI payloads into IP packets, the ST 2022-6 allows the SDI signal to be

packaged in multiple 1376-byte packets, and it is possible to transmit the content over an Ethernet

network, receive the packets, and rebuild the SDI signal. In spite of allowing interoperability with

other devices that use SDI, it means that there is no separation between the various streams in SDI

[Laa12]. Imagine, for example, that you want to modify the audio that is part of an SDI stream

that is transported with the corresponding video. In that case, you will have to deal with the video

and all the overhead it will bring to the system when you only want to modify the audio. That is,

since there is no separation between the various contents present in the SDI, there is no flexibility

in the transport of only part of the content.

3.7 JT-NM Architecture

Joint Task Force on Network Media (JT-NM) is a consortium formed between the European Union,

SMPTE (Society of Motion Picture and Television Engineers) and the Video Services Forum. This

task-force designed an architecture called JT-NM Reference Architecture v1.0 [oNM15] with the

purpose of bringing together good practices, recommendations and frameworks so that there can

be interoperability between devices from different manufacturers in the transition from SDI to IP.

The JT-NM defines a conceptual model that allows the mapping of the workflows in order to

ensure the desired interoperability. Figure 3.2 represents a simplified architecture, focusing on the

scope of this dissertation.

The Network is the heart of the media operations, usually a Ethernet Network.

The Nodes are connected to the Network and provide infrastructure such as storage, processing

power and interfaces.

14


Figure 3.2: JT-NM simplified Architecture

15


Devices can be software-only services or physical Devices, such as cameras, and are deployed

onto the Nodes to provide the Capabilities necessary to complete Tasks.

In the case of cameras or other equipment such as microphones, a Device can be a Source

for Essences. These Essences are moved as Grain payloads that are transported over the Network

divided in network packets.

A Grain is composed by its media content (video, audio, metadata) and a timestamp. This

time represents the instant in which the grain was created and is generated using the Clock present

on the Node. All Nodes should should be synchronized, using PTP (Precision Time Protocol) to

achieve a nanosecond precision. [Com08]

Essences are sent over the Network by Senders to the Receivers. This communications is made

in the form of Flows. A Flow is the result of a Source and there can be multiple Flows for each

Source. For example, a Source can generate a Flow for uncompressed video and another one for

H.264 compressed video.

The Registry allows connections between Receivers and Senders, by providing the means for

Nodes, Devices and Flows to register themselves and discover others, allowing connections to be

created between Devices( Receivers and Senders).

In order for this information to be properly articulated and to be used to define workflows,

three fundamental blocks are defined on which this data model is based: Timing, Identity and

Discovery & Registration.

• Timing

Each Grain contains a timestamp, allowing the consistency of the performed operations and,

consequently, ensuring that the Flows are correctly aligned.

• Identity

Each element present in the infrastructure must be easy and uniquely identifiable, so that it

can be referenced and used. All relationships between resources must make use of Identity.

• Registration and Discovery

Each Node in the Network must register itself and the Devices, Sources, Flows, Senders

and Receivers that it makes available, so that other nodes can discover them and obtain the

appropriate information about each one.

3.8 NMOS

NMOS (Network Media Workflow Association) is a series of specifications that intends to create

frameworks to allow the interoperability desired by the JT-NM architecture. It was created by the

Advanced Media Workflow Association, a group composed of several broadcasting corporations

and other companies working on the TV market.

The NMOS is based on the conceptual data model proposed by the JT-NM, in order to add

identity and relationships between content and equipment. Regardless of the specific task that

16


Figure 3.3: Node proposed by NMOS [Ass16a]

each Node performs, the logical view of a Node according to the NMOS 3.3 allows the creation

of a level of abstraction sufficient to ensure the modularity and expandability expected that can be

adapted to different needs.

NMOS does not limit how each module should work, it only specifies wich interfaces they

should expose. Each Node must expose HTTP transactions performed by a REST API. These

transactions are described in the "AMWA NMOS Discovery and Registration Specification (IS-

04)" specification proposed by NMOS [Ass16b].

17


18

Chapter 4

Automation of Advertising Replacement

4.1 Context

Most TV networks have several contractual obligations with advertisement buyers. These obli-

gations range from regional distribution, to time slots and even broadcast services. For example,

some ad segments can be played in the television broadcast but not on the web player, others are

only relevant to a certain location, losing its efficiency when broadcasted to a larger audience.

Also, in order to accommodate Video-on-Demand services, the broadcast needs to be divided by

segments, separating advertising blocks from program blocks. In order to fulfill these require-

ments, the broadcasters need to allocate physical resources and manpower.

4.2 Market Solutions

In order to address the problem described in 4.1, several companies have started research in the

areas of detection of advertising content.

4.2.1 Anvato

Anvato Media Content Platform (MCP) is a solution from Anvato [Anv] which offers live stream-

ing, video encoding, cloud editing, syndication, dynamic ad insertion among other services.

They claim that their stitching technology replaces broadcast ad units with dynamically placed

digital ads which are frame accurate in-stream, and using IAB (Interactive Advertising Bureau)

standardized VAST (Video Ad Serving Template) tags, they can help to prepare and deliver ads

that are both nationally, regionally and locally most relevant to clients’ viewers.

It is designed to monetize both user’s live streams (Figure 4.1) and video-on-demand (Fig-

ure4.2) on any screen, any device. It allows the insertion of ads on the server side dynamically

on all platforms: desktop, iOS and Android apps, AppleTV, Chromecast, Roku, Amazon Fire TV

and others.

This is a solution that delivers the ads in a contextualized way but doesn’t detect the ads when

there are no AD triggers.

19


Figure 4.1: Live dynamic server side ad insertion[Anv]

Figure 4.2: VOD dynamic server side ad insertion[Anv]

20


Figure 4.3: Technology of Audible Magic[Mag]

4.2.2 Audible Magic

Audible Magic’s solutions [Mag] use what is called audio fingerprints to match unknown media

content against known, registered media content. Analogous to the idea that human fingerprints

can be measured to compactly and uniquely identify every person, there are processes that allow

very small clips of audio to be measured for distinctive characteristics. These compact audio

fingerprint measurements can be uniquely distinguished when compared to measurements taken

from any other audio clip.

Audible Magic uses this kind of technology, called automatic content recognition (ACR) tech-

nology, to identify unknown media content when fingerprints of that content are matched against

known fingerprints registered in an Audible Magic database.

One of the use cases of this technology provided by Audible Magic is their service for TV ad

detection and marking, shown in Figure 4.3. It allows to detect ads in unmarked broadcast streams

and, in real-time, provide frame-accurate timing to trigger the injection of ad markers. With the

injection of those ad markers, dynamic ad insertion (DAI) technology can be fully utilized.

Despite the fact that Audible Magic maintains content identification databases for multiple

content types, including live TV programming and advertising running on national channels in the

USA, it does not include international regions and it does not allow to identify ads outside that

database.

4.2.3 ACRCLOUD

ACRCLOUD provides a set of cloud based solutions [Ser] for automatic content recognition,

using audio fingerprinting technologies, which are more directed to second screen applications.

21


For example, the live channel detection service allows the collection of live streams from TV or

radio stations in real-time and enables the channel to be detected at the exact point of broadcast

from any user’s mobile devices. With live content-generated audio fingerprints, supplementary

information about the content or interactive campaigns can be triggered to appear on the viewer’s

second screen devices.

The usage of this live channel detection service can be summarized as follows:

• Pre-designed contents and interactions are organized by marketing editors in the server end.

• Users’ apps identify the TV channel and specific times with audio recognition while they

are watching TV.

• Detailed contents are triggered and retrieved from the server to the users’ app.

It is important to point out that this solution does not allow content replacement or elimination.

4.2.4 Ivitec

Ivitec offers a set of solutions to analyze video clips which are based in adaptative video finger-

printing technology [Tec]. This technology is able to adjust the density and granularity of finger-

prints to match specific uses cases (Figure 4.4) instead of using a single algorithm in the hope that

this “average” solution will fit the majority of the use cases (as conventional video fingerprinting

approaches do).

The adaptive video fingerprinting is fully implemented into the MediaSeeker Core Platform

which is the base of all Ivitec products, and it allows to identify content by comparing to a database

of known video information inside the platform. Their solutions are capable of recognizing video

clips as they are broadcasted, uploaded or downloaded or within preexisting repositories of video

content.

AdMon is a software automated solution for advertisement workflows which promises to

streamline and simplify the process for recognition and tracking.

This system analyzes a set of TV channels for advertisements to be automatically recognized

and provides valuable detection information such as channel name and detection time within min-

utes of airing. It can be integrated with third party capture sources. However, it does not allow

content replacement or elimination.

4.2.5 Adobe Primetime

Adobe Primetime is a multiscreen TV platform for live, linear and VOD programming [Pri].

Its modular distribution and monetization capabilities include TVSDK for multiscreen playback,

DRM, authentication, dynamic ad insertion (DAI) and audience-centric ad decisioning.

Adobe Primetime ad insertion solution is available in both client and server side configura-

tions, and the commercial breaks are identified using traditional broadcast ad break cues, real-time

markers, or ad timelines from the publisher’s CMS. The user can even skip or replace burned-in

22


Figure 4.4: - Fingerprint density versus type of search [Tec]

advertisements. Adobe Primetime features turnkey integration with Adobe’s video ad-decisioning

solution to deliver true DAI into live, linear, and video-on-demand content across desktops, mobile

devices, gaming consoles, and IP-enabled set-top boxes. Adobe Prime time ad insertion can also

be integrated with third-party ad servers and sell-side platforms.

4.2.6 Comparative Analysis

Table 4.1 summarizes each solution, comparing them by available features. These features include

the capacity of outputting a cloud-ready format. Another important functionality is the ability to

ingest live programs or offline media, in which case it must be possible to interact with several

file systems,memory card readers, temporary storage systems, or virtual directories. In terms of

advertising, the different solutions are compared by their capacity to detect advertising segments,

be it on or off the cloud and also by the capability of replacing said segments.

As seen in Table 4.1, there isn’t one solution containing all the features analyzed. The solutions

that are closest to that goal are Anvato and Audible Magic products. In Anvato’s case, there is no

solution for cloud based advertising detection. In the case of Audible Magic product, the ingest of

offline content is not possible.

23


Table 4.1: Comparative analysis by features

Solution Cloud readyoutput formats

Ingest AdvertisingDetection Content

replacementLive File based Cloud

OffCloudBased

Anvato X X X X XAudibleMagic X X X X X

ACRCLOUD XIvitec XAdobe

Primetime X X

4.3 Requirements

To ensure a solution to the automation of advertising substitution, some requirements need to be

met:

• The application must be able to receive one or more input feeds from outside the cloud. The

provided bandwith must also be large enough to ensure the transmission of uncompressed

video between modules.

• Internet Group Management Protocol (IGMP) must be active, to allow multicast communi-

cation in the network.

• The application is to be used in real-time scenarios. To ensure that there are no delays during

video processing, the cloud components must be able to scale vertically, so that the time to

process a frame is lower than the video frame rate.

• All components must be deployed in the same private cloud, in order to reduce the delay in

the communication between each component and ensure network reliability.

4.4 Proposal

This dissertation proposes a cloud-based application capable of analizing a livestream, identifying

ad segments it contains in real time and replacing them with more relevant content, using IP for

video transportation.

4.4.1 Architecture

This application is composed by several modules, each containing a unique function and designed

in a way that allows changes in the architecture to accommodate different use cases. Some of

this modules may also be instantiated multiple times, depending on the number of consumers and

detection algorithms utilized. These modules interact with each other as seen in Figure 4.5. Since

24


Figure 4.5: Cloud Application Architecture

this is a time-critical application, it is recommended that all these modules be deployed in the same

private cloud network.

4.4.1.1 Input Distributor

The Input Distributor is the module responsible for the reception and processing of a live stream

originated from outside the cloud, making it available to the other modules of the application.

It receives an audiovisual stream, decodes it and sends it via multicast. This will be the primary

feed, and the one that is played out by default by the application. This feed will be then used by

both the Video Switcher and Advertisement Detector modules.

Figure 4.6 shows the media flows of this module.

4.4.1.2 VOD Replacement

Similarly to the Input Distributor, the VOD Replacement module processes and sends video and

audio via multicast but, instead of receiving content from a live stream, it reads from a file.

Figure 4.6: Input Distributor media flows

25


Figure 4.7: VOD Replacement media flows

This node also contains a REST server, allowing it to receive requests to restart the video being

played. This will be the alternate feed, only played out when an advertisement segment is detected

in the primary one. The output of this module will be accessed by the Video Switcher module.


4.4.1.3 Video Switcher

Video Switcher is the module responsible for selecting the feed that will be played out. It sub-

scribes to two multicast addresses, the first one where the Input Distributor resulting stream is

available and the second one from the VOD Replacement module, and selects one of them, redi-

recting it to another multicast address, which will be accessed by the Output module.

The selection is made according to information received from Business Logic. In order to

receive this information, a REST Server is present in this module, capable of receiving requests

containing information about switching instances.

To ensure that the VOD contents are played from the very start when an advertising segment is

detected, the Video Switcher can use a REST API to send a request to replay the video from start

to the VOD Replacement. Moments before commuting to the alternate feed, the module sends

the request to the VOD Replacement and starts scanning the contents of the stream in search of

its frame 0to then store its data in memory. This way, when the switch actually happens, the first

frame sent by this module will equal the first frame from the original video. This ensures no data

is lost during the feed change.


Figure 4.8: Video Switcher media flows

26


Figure 4.9: Business Logic media flows

4.4.1.4 Business Logic

This module receives EDL (Edit Decision List) files from the Advertisement Detectors. This file

lists the frame numbers where the advertisement segments start and finish.

This information is sent to the Video Switcher with the help of a REST API.


4.4.1.5 Advertisement Detector

Subscribes to the Input Distributor multicast address and scans its contents, detecting advertise-

ment segments and saving the frames corresponding to its start and finish. It then sends the result

to the Business Logic in the form of an EDL.

There can be multiple instances of this module, one for each detection algorithm used. For

example, one module can analyze the video stream, while the other focuses on the audio.


4.4.1.6 Output

Subscribes to the multicast address containing the output from the Video Switcher. It then encodes

the stream and sends it using RTP to the end-user.

The output of this module is sent by unicast so, for each end-user, a different instantiation of

the module is needed. Also, one module is needed for each encode format. So, if the stream needs

to be sent to two different users with two different encoding formats for each, four instantiations

of this module are needed.


Figure 4.10: Advertising Detector media flows

27


Figure 4.11: Output media flows

4.4.2 Modularity and Scalability

Each module is independent from all the others. With this level of modularity, it is possible to

develop and integrate other modules on this application, providing a larger range of use cases. For

example, a module can be developed that stores the resulting stream of the application, or even

an ad-free stream, as a video file, as seen in Figure 4.12 . It is also possible to replace the VOD

Replacement with another Input Distributor, or vice-versa.

The application is also capable of scaling up, allowing multiple instances of each module

according to the user needs. In this case, the prototype developed is optimized to receive two

video sources, the main one coming from a live stream and the alternative coming from a video

file. However, it is possible to increase this number, doing so by deploying several Video Switcher

modules, as seen in the figure 4.13, to accommodate to the use case. So, each additional feed

requires an additional Input Distributor and Video Switcher.

The end result of the application is a MPEG-TS stream equivalent to the one received by the

Input Distributor. This allows multiple instances of the application in cascade, where the output

feed of the first is used as an input for the following one. The consumer can then apply video

transformations, such as overlays, if needed.

4.5 Prototype

A prototype was implemented in order to validate the architecture described in 4.4.1 and analyze

the value of future products based on it.

The prototype is able to receive two sources of audiovisual content, one from a live stream

and the other from a video file, decode and send the video between modules using RTP. It is also

capable of parsing an Edit Decision List to find instances of advertising segments. In the end, the

video is encoded and sent to the end-user.

The end result is a MPEG-TS stream in which the content abides by the switching instances

defined in the EDL.

In order to implement the prototype, the modules Input Distributor, VODReplacement, Video

Switcher, Output and Business Logic were developed. To simulate the Ad Detector module, an

EDL was created describing the time instances in which the feeds should be commuted (represent-

ing advertising segments), as seen in Listing 4.1.

28


Figure 4.12: VoD Storage use case

Figure 4.13: Additional Input Feeds Architecture

29


1 < PubP lug in >2 < S e t t i n g s >3 <Type v i d e o =" t r u e " a u d i o =" f a l s e " / >4 < A l g o r i t h m s >5 <Video name=" v i s u a l r h y t h m " / >6 < / A l g o r i t h m s >7 < / S e t t i n g s >8 < Re po r t >9 < Gl ob a l n P r o c e s s e d F r a m e s =" 2000 " nEven t s =" 15 " nPubFrames=" 708 " / >

10 < Ev en t s >11 <PubSequence i n i t i a l F r a m e =" 481 " f i n a l F r a m e =" 578 " s t a r t P t s =" 481481/30000 " e n d P t s =" 289289/15000 " / >12 <PubSequence i n i t i a l F r a m e =" 687 " f i n a l F r a m e =" 797 " s t a r t P t s =" 229229/10000 " e n d P t s =" 797797/30000 " / >13 <PubSequence i n i t i a l F r a m e =" 886 " f i n a l F r a m e =" 1042 " s t a r t P t s =" 443443/15000 " e n d P t s =" 521521/15000 " / >14 <PubSequence i n i t i a l F r a m e =" 1061 " f i n a l F r a m e =" 1064 " s t a r t P t s =" 1062061/30000 " e n d P t s =" 133133/3750 " / >15 <PubSequence i n i t i a l F r a m e =" 1090 " f i n a l F r a m e =" 1105 " s t a r t P t s =" 109109/3000 " e n d P t s =" 221221/6000 " / >16 <PubSequence i n i t i a l F r a m e =" 1107 " f i n a l F r a m e =" 1115 " s t a r t P t s =" 369369/10000 " e n d P t s =" 223223/6000 " / >17 <PubSequence i n i t i a l F r a m e =" 1120 " f i n a l F r a m e =" 1122 " s t a r t P t s =" 14014/375 " e n d P t s =" 187187/5000 " / >18 <PubSequence i n i t i a l F r a m e =" 1124 " f i n a l F r a m e =" 1128 " s t a r t P t s =" 281281/7500 " e n d P t s =" 47047/1250 " / >19 <PubSequence i n i t i a l F r a m e =" 1130 " f i n a l F r a m e =" 1134 " s t a r t P t s =" 113113/3000 " e n d P t s =" 189189/5000 " / >20 <PubSequence i n i t i a l F r a m e =" 1142 " f i n a l F r a m e =" 1176 " s t a r t P t s =" 571571/15000 " e n d P t s =" 49049/1250 " / >21 <PubSequence i n i t i a l F r a m e =" 1180 " f i n a l F r a m e =" 1249 " s t a r t P t s =" 59059/1500 " e n d P t s =" 1250249/30000 " / >22 <PubSequence i n i t i a l F r a m e =" 1256 " f i n a l F r a m e =" 1288 " s t a r t P t s =" 157157/3750 " e n d P t s =" 161161/3750 " / >23 <PubSequence i n i t i a l F r a m e =" 1290 " f i n a l F r a m e =" 1320 " s t a r t P t s =" 43043/1000 " e n d P t s =" 11011/250 " / >24 <PubSequence i n i t i a l F r a m e =" 1328 " f i n a l F r a m e =" 1355 " s t a r t P t s =" 83083/1875 " e n d P t s =" 271271/6000 " / >25 <PubSequence i n i t i a l F r a m e =" 1365 " f i n a l F r a m e =" 1467 " s t a r t P t s =" 91091/2000 " e n d P t s =" 489489/10000 " / >26 < / Ev en t s >27 < / Re po r t >28 < / PubP lug in >

Listing 4.1: Ad Detector Report

4.5.1 Prototype limitations

Since the prototype is just a proof of concept, some restrictions were defined for the development:

• Input:The prototype is only capable of receiving one MPEG-TS stream and one video file

as input.

• Output:The output feed must be a MPEG-TS stream, similar to the input stream.

• Environment:The prototype only works in a Windows environment.

• The audio component of the input feeds is discarded after the demux operation. The output

feed has no audio component.

4.5.2 MOG MPL

MOG Technologies Media Processing Library is used in the development of its products and is

private property of MOG Technologies. It is mainly used for broadcast solutions and provides

optimized functions and methods for video processing and transmission. This allows the develop-

ment of real-time applications with minimum delay. By using this application, it is also assured

compatibility of this project with other MOG products.

4.5.3 Implementation

From the architecture defined in Section 4.4.1, all modules described where implemented, with

the exception of the Ad Detector module (Figure 4.14). Instead, a file was generated, simulating

30


Figure 4.14: Prototype Developed

a possible output of such component, with information regarding the instance where an advertise-

ment segment starts and ends, as seen in Listing 4.1. This file can be accessed by the Business

Logic.

By developing this prototype it is possible to:

• Test sending and receiving high definition uncompressed video in a cloud environment.

• Test the capabilities of demux, decode, mux and encode operations in a cloud environment.

• Analyze the switching capabilities of the application, in terms of frame accuracy and delay.

• Analyze metrics such as bandwidth, RAM usage and processing capacity.

In order to keep the modularity of the application, each component of the prototype is devel-

oped as a Docker container based on a Windows Server Core image.

4.5.3.1 Input Distributor

In the Input Distributor module, a stream compliant with the RFC 2250 norm is received. Then,

each RTP packet is demuxed, separating the video from the audio. The video packets are then

decoded and stored in frames. It is possible to define a number of frames to be stored in memory

as a buffer, in order to ensure the quality of the stream, although this results in a large amount of

RAM usage, as the video is stored uncompressed.

Each frame is then assigned a decoding timestamp, starting in 0 and incrementing by 1/fram-

erate, in this case, 1/50. To send the video to the other modules, the frames must be packetized

in RTP packets. Each packet header contains the new decoding timestamp, created by the Input

Distributor.

Each packet is then sent via multicast, at a constant frame rate equal to the original feed.

31


4.5.3.2 VOD Replacement

The VoD Replacement module is fed by a video file. For each frame of the video, the module

demuxes, separating audio from video, decodes the video and stores it in new frames. These

frames can also be stored in local memory to reduce jittery in the broadcast.

Then, it re-packets the uncompressed video in RTP packets and sends them to a multicast

address, maintaining the original video frame rate. The multicast address is the same as the Input

Distributor, but with a different port.

This component also contains a REST server. With this server, the Video Switcher module can

request the VoD Replacement module to restart the video streaming from the initial frame, usually

before the beginning of and advertisement segment present in the main input feed, originated from

the Input Distributor.

4.5.3.3 Business Logic

First, this module reads and parses an Edit List Decision file provided by the user (Listing 4.1).

Then it sends the initial and final frame of each ad segment to the Video Switcher using a REST

service.

4.5.3.4 Video Switcher

Video Switcher contains a REST server. This server is used to receive the frame numbers that

correspond to an ad segment start or ad segment end from the Business Logic.

This module also subscribes to the multicast address of both Input Distributor and VoD Re-

placement feeds. By default, only the primary feed, originated from the Input Distributor, is re-

ceived. The module then depacketizes the packets, grouping them in frames, keeping the original

timestamps.

Using the information gathered from the Business Logic, this component analyzes the current

frame timestamp and acts accordingly:

1. If the difference between the next advertisement segment start timestamp and the current

frame timestamp is 2 seconds, the Video Switcher starts preparing the commutation by send

a request to the VoD Replacement to restart the video streaming. Then, in the following

frames, in addition to receiving and handling the packets received from the Input Distributor,

it also analyzes the ones from VoD Replacement, searching for a frame with a timestamp of

0.

When this frame is found, its contents are stored in local memory, so that when the feed

switching occurs, the first frame sent is the first frame of the video.

2. If the timestamp of the frame received is the same as the next advertising segment start

frame, the Video Switcher stops processing the Input Distributor packets to instead handling

the VoD Replacement ones.

32


The packets received are stored in the memory buffer mentioned above as a frame and

the module then packetizes and sends the first frame from the buffer to another multicast

address. This address may be subscribed by the Output module.

3. If the timestamp of the frame received is the same as the next advertising segment end frame,

the module stops the processing of the secondary feed, the VoD Replacement one, to instead

process the main feed.

It also clears the buffer containing the video frames, in order to prevent a memory leak.

4. When none of the above conditions are met, the Video Switcher continues processing the

active feed, repacketing the received frames and sending them via multicast.

4.5.3.5 Output

The Output module performs the inverted operations of the Input Distributor. It subscribes to the

multicast address where the Video Switcher packets were sent, grouping them into frames. Then

encodes them to H.264 format, sending the result as an MPEG-TS stream via RTP.

Modules Similarities

All modules containing video processing operations (Input Distributor, VoD Replacement and

Output) share multiple functions, such as receiving RTP packets and sending RTP packets.

These modules can also contain a customizable buffer. With this buffer, the module can store

a number of frames in local memory before beginning its sending functions. This operations can

minimize the problems introduced by network failures.

The RFC 4175 norm is used to send uncompressed video between modules.

33


34

Chapter 5

Results and analysis

5.1 Test Methodologies

During the testing phase, a scenario was established composed of a client PC and a Host Cloud.

The PC is used to generate a MPEG-TS stream and send it to the Cloud using RTP. This is ac-

complished with the help of the command line interface of FFMPEG [FFM]. The Host Cloud is

composed by two identical physical servers.

By using Docker Engine on the Host, it is possible approximate the scenario to a virtualized

environment, where each container has no information about the physical location of any other.

The characteristics of the test machines are described in table 5.1.

Table 5.1: Technical Specification of the test machines

Client HostServer 1 Server 2

CPU Intel Core I7-4770@ 3.4GHz

Intel Core I7-4790S@ 3.2GHz

Intel Core I7-4790S@ 3.2GHz

RAM 16 GB 16 GB 16 GBNetwork Intel Ethernet I217-LM Intel X550 10 Gbit Intel X550 10 GbitOperating System Windows 10 Enterprise Windows Server Windows Server

In terms of network specifications, IGMP and multicast are active and the Maximum Trans-

mission Unit is 1500 bytes.

The characteristics of the video utilized during the tests is described in Table 5.2. Video #1

was used as the Input Distributor feed while Video #2 was streamed by VOD Replacement. Both

videos were played on loop.

To balance the load of the two available servers, the distribution of the containers was made

according to Figure 5.1. Figure 5.1 also shows the flow of data between each component of the

application, including the client.

35


Table 5.2: Video Specifications

Video #1 Video #2File Size 550 MB 2.19 GBFile Format MPEG-TS MXFDuration 39s 5m 56sColor Space 4:2:2 4:2:2Color Depth 8 bits 8 bitsScan method Progressive ProgressiveResolution 1280x720 1280x720Frame rate 50 fps 50 fps

5.2 Results

In order to analyze the performance of the developed prototype, some metrics were monitored and

analyzed, such as:

• Frame Rate

• Bandwidth

• RAM usage

• CPU usage

To test the flexibility of the application and the quality and format of the resulting video, a

last test was made where the input stream of an application instantiation was originated from the

output stream of another application instance.

5.2.1 Frame Rate

Table 5.3 shows the average frame rate in each module. Since the original video has a frame rate

of 50 frames per second, the application should maintain this frame rate in all its video processing

modules (Input Distributor, VOD Replacement, Video Switcher and Output). A below expected

frame rate can mean a lack of processing power, which means that the inbound traffic is higher

than the outbound, leading to video errors and memory leaks.

Table 5.3: Frame Rate in each module

Frame RateInput Distributor 50 FPSVOD Replacement 50 FPSVideo Switcher 50 FPSOutput 50 FPS

36


Figure 5.1: Application deploy scenario and data flow

5.2.2 Bandwidth

In Table 5.4 is detailed the bandwidth used by each module, divided by inbound and outbound

traffic. Since the video feed sent from the Client to the Input Distributor is in the form of com-

pressed video, the inbound traffic in the module is not constant, varying based on the detail of each

frame and consequent rate of compression.

Table 5.4: Average Inbound and Outbound traffic in each module

Module Inbound OutboundInput Distributor 14 Mbps 771 MbpsVOD Replacement - 771 MbpsVideo Switcher 1.5 Gbps 771 MbpsOutput 771 Mpbs 14 MbpsBusiness Logic - -

5.2.3 RAM Usage

The RAM consumed by each module can be observed in Table 5.5. For this metric, an additional

test was performed. The scenario of this second test is similar to the first one, with only different

being in the fact that each module has a frame buffer of 50 frames (1 second). This means that

before sending data, the module stores a full second worth of video.

It’s important to mention that, in situations where the processing power available is insufficient

to fulfill the video processing requirements, the outbound traffic will be lower than the inbound,

causing the memory buffer to increase, leading to a possible memory leak.

37


Table 5.5: Average Module RAM Usage

Module Test #1 Test #2Input Distributor 40.3 MB 127.2 MBVOD Replacement 37,7 MB 125.8 MBVideo Switcher 73.6 MB 161.1 MBOutput 249,1 MB 332.3 MBBusiness Logic 2.3 MB 2.3 MB

5.2.4 CPU Usage

Table 5.6 shows the CPU usage of each module. As expected, the most demanding containers are

the Input Distributor, VOD Replacement and Output, followed by the Video Switcher, since video

processing operations require a large amount of computing power.

Table 5.6: Average Module CPU Usage

Module Test #1Input Distributor 7.1%VOD Replacement 4.6%Video Switcher 6.2%Output 18.2%Business Logic 0.2%

5.2.5 Cascade Scenario

The goal of this test is to prove that the output of the application (MPEG-TS stream generated by

the Output module) can be used as an input to another instance of the same application (to be read

by the Input Distributor module).

Figure 5.2 shows the scenario instantiated for the final test. As the main goal of the test was to

verify input and output streams of the application, only an Input Distributor and an Output module

were deployed for each application.

The result was a success, with the second instance of the application being capable of reading

and outputting the original stream originated from the first application.

The network throughput was analyzed in order to find differences between inbound and out-

bound traffic, but the results were similar to Figure 5.4, as expected.

5.2.6 Results

By analyzing the frame rate and bandwidth, it is possible to conclude that there are no losses in

information during each of the modules. All modules are capable of running at 50 frames per

second, the rate of the original video. In the modules where no change to the video format was

made, the bandwidth was constant and the expected for an uncompressed video.

38


Figure 5.2: Cascade Test Scenario

In terms of memory, all modules where capable of maintaining a constant, low RAM usage.

The Output module experienced a higher usage rate compared to the other modules, which sug-

gests that it can be optimized.

When comparing processing power needed, the results were very similar to the RAM usage

test. The Output module performed the worst, depleting almost 20% of the CPU provided. All

other modules had a low CPU usage.

The final result of the application (MPEG-TS stream generated by the Output module) was

compatible with the Input Distributor, proving that multiple instances of the same application can

be deployed in sequence, with the output stream of the first serving as an input stream for the

second.

39


40

Chapter 6

Conclusions and Future Work

By automating the process of content substitution, it is possible to provide the end-user with a

real-time, personalized content without the need to reduce the range of the broadcast.

With the use of cloud services, it is possible to reduce costs in broadcast operations, using

software for video processing, such as transcoding and video switching.

This dissertation proposed an architecture and developed a prototype for a distributed appli-

cation capable of identifying and replacing advertising segments in a livestream, providing the

end-user with a personalized, more effective advertising, without changing the actual content of

the broadcast. It also suggested other use cases in which this application can be used, due to its

high modularity.

6.1 Fulfillment of Goals

The main goals proposed for this dissertation where achieved. The proposed architecture allows

the development of other applications for TV production by adding new modules with new func-

tions.

The developed prototype was successful in testing the proposed architecture, proving that it is

possible to use the cloud for television operations, such as advertisement substitution, maintaining

the quality of the original broadcast, while providing a better advertising experience and providing

the content in real-time.

6.2 Future Work

The proposed architecture provides a starting point to future experimentation based on IP TV,

while being flexible enough to allow other applications to be developed using an extended ver-

sion of the architecture. This can lead to the transition of television production operations from

hardware to software.

41

Conclusions and Future Work

The Output module can be optimized to reduce the computing load used during the decoding

operation, reducing its computing load to values similar to the other modules.

The time expended in frame processing can also be reduced, providing the means necessary to

process higher resolution videos.

A new module can be developed to allow each node to register to the system and discover other

nodes, creating an environment where all modules can communicate with each other without prior

information regarding their locations.

Since all modules are independent, it is possible to develop new applications with various

use cases, using the developed modules as a starting point. It is also possible to expand on the

prototype, creating new modules, adapting to the complexity of the system.

42

References

[Anv] Anvato. http://www.anvato.com.

[Ass16a] A. M. W. Association. Networked media open specification, 2016. https://www.github.com/AMWA-TV/nmos.

[Ass16b] A. M. W. Association. “is-04: Nmos discovery and registration, 2016. http://www.amwa.tv/projects/IS-04.shtml.

[Bar03] Paul Barham. Xen and the art of virtualization. ACM SIGOPS Operating SystemsReview, 37(5), 2003.

[CA08] David Conejero and Xavier Anguera. Tv advertisements detection and cluster-ing based on acoustic information. In Masoud Mohammadian, editor, CIMCA/I-AWTIC/ISE, pages 452–457. IEEE Computer Society, 2008.

[Cha08] Charu Chaubal. The architecture of vmware esxi. VMware White Paper, 1(7), 2008.

[Cia09] P. Cianci. Technology and workflows for multiple channel content distribution: In-frastructure implementation strategies for converged production, 2009.

[Com08] T. Committee. Ieee std 1588-2008, ieee standard for a precision clock synchronizationprotocol for networked measurement and. Control Systems, 2008.

[Doc] Docker. https://www.docker.com/.

[ETS07] T.S. ETSI. Transport of mpeg-2 ts based dvb services over ip based networks, 2007.

[FFM] FFMPEG. https://ffmpeg.org/.

[Gol] Michael Goldman. What’s the best ip video path forward? https://www.smpte.org/publications/past-issues/January-2015.

[GP05] Dr. Ladan Gharai and Charles E. Perkins. RTP Payload Format for UncompressedVideo. RFC 4175, September 2005.

[Kiv07] Avi Kivity. kvm: the linux virtual machine monitor. In Proceedings of the Linuxsymposium, volume 1, 2007.

[Kov13] A. Kovalick. The fundamentals of the all-it media facility,. In SMPTE 2013 AnnualTechnical Conference & Exhibition, page 1–14. SMPTE, 2013.

[Laa12] M. Laabs. Sdi over ip—seamless signal switching in smpte 2022-6 and a novel mul-ticast routing concept„ 2012.

43

http://www.anvato.com

https://www.github.com/AMWA-TV/nmos

https://www.github.com/AMWA-TV/nmos

http://www.amwa.tv/projects/IS-04.shtml

http://www.amwa.tv/projects/IS-04.shtml

https://www.docker.com/

https://ffmpeg.org/

https://www.smpte.org/publications/past-issues/January-2015

https://www.smpte.org/publications/past-issues/January-2015

REFERENCES

[Mag] Audible Magic. http://www.audiblemagic.com.

[Mar] IHS Markit. Global advertising trends in 2016: A snapshot. https://technology.ihs.com/586624/global-advertising-trends-in-2016-a-snapshot.

[MG11] P M Mell and T Grance. The NIST definition of cloud computing. National Instituteof Standards and Technology, 2011.

[Mpe00] Mpeg-2. Generic coding of moving pictures and associated audio information – part1: Systems. ISO/IEC, 13818(1), 2000.

[oNM15] Joint Task Force on Networked Media. European broadcasting union, society of mo-tion picture and television engineers, and video services forum, “joint task force onnetworked media - reference architecture v1.0”, 2015.

[PBSL13] Diego Perez-Botero, Jakub Szefer, , and Ruby B. Lee. Characterizing hypervisorvulnerabilities in cloud computing servers. In Proceedings of the 2013 internationalworkshop on Security in cloud computing. ACM, 2013.

[Pri] Adobe Primetime. http://www.adobe.com/marketing-cloud/primetime.html.

[RCL09] Bhaskar Prasad Rimal, Eunmi Choi, and Ian Lumb. A taxonomy and survey of cloudcomputing systems. In Proceedings of the 2009 Fifth International Joint Conferenceon INC, IMS and IDC, NCM ’09, pages 44–51, Washington, DC, USA, 2009. IEEEComputer Society.

[Ser] Automatic Content Recognition Cloud Services. http://www.acrcloud.com.

[S.M12] S.M.P.T.E. St 2022-6: Transport of high bit rate media signals over ip networks(hbrmt), 2012.

[Sri09] T Sridhar. Cloud computing—a primer part 1: Models and technologies. The InternetProtocol Journal, 12(3):2–19, 2009.

[Tec] Intelligent Video Technologies. http://www.ivitec.com.

[VV09] Anthony Velte, , and Toby Velte. Microsoft virtualization with Hyper-V. McGrawHill,Inc, 2009.

[YBDS08] L. Youseff, M. Butrico, and D. Da Silva. Toward a unified ontology of cloud com-puting. In Grid Computing Environments Workshop, 2008. GCE ’08, pages 1–10,November 2008.

44

http://www.audiblemagic.com

https://technology.ihs.com/586624/global-advertising-trends-in-2016-a-snapshot

https://technology.ihs.com/586624/global-advertising-trends-in-2016-a-snapshot

http://www.adobe.com/marketing-cloud/primetime.html

http://www.adobe.com/marketing-cloud/primetime.html

http://www.acrcloud.com

http://www.ivitec.com

Documents

New Cloud Services for Dynamic Advertising · Para além disso, existem ainda contratos tele-visivos que obrigam as estações a limitar a emissão de conteúdo a uma determinada