Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO
New Cloud Services for DynamicAdvertising
Vasco Manuel Pérola Filipe
Mestrado Integrado em Engenharia Informática e Computação
Supervisor: Maria Teresa Magalhães da Silva Pinto de Andrade
Co-Supervisor: Alexandre Ulisses F. Almeida e Silva
July 28, 2017
New Cloud Services for Dynamic Advertising
Vasco Manuel Pérola Filipe
Mestrado Integrado em Engenharia Informática e Computação
Approved in oral examination by the committee:
Chair: Luis F. TeixeiraExternal Examiner: José Manuel TorresSupervisor: Maria Teresa Andrade
July 28, 2017
Abstract
With the constant technological evolution that has occurred in recent years, and with informationbecoming increasingly globalized and with a higher level of consumption, advertising has playedan increasing role in the revenue of television stations and other suppliers of audiovisual material .However, this globalization and ease of access to information makes targeted advertising difficult,which has diminished its effectiveness. In addition, there are also television contracts that obligestations to limit the broadcast of content to a particular region. This need to present a personalizedproduct almost instantaneously results in a greater emphasis on automation, namely in the detec-tion and replacement of advertisements in real time. In this dissertation, a distributed applicationcapable of identifying and replacing advertising segments in an audiovisual stream is then pro-posed. The prototype will be able to receive audio and video in real time and detect advertisingsegments, replacing them with other content, which may come from a stream or a file. The endresult is a stream, with the least possible delay, where the initial advertising segments are replacedby more relevant ones. Taking into account the modularity of this project, it will be possible toadd or remove components depending on the use case, enabling solutions not only in advertisingbut also in the transmission and editing of video and audio in the cloud.
i
ii
Resumo
Com a constante evolução tecnológica que se tem verificado nos últimos anos, e com a informaçãoa ser cada vez mais globalizada e com um nível de consumo mais elevado, a publicidade tem tidoum papel crescente na receita das estações televisivas e outros fornecedores de material audiovi-sual. No entanto, esta globalização e facilidade de acesso a informação torna difícil a publicidadedirecionada, o que diminuiu a eficácia da mesma. Para além disso, existem ainda contratos tele-visivos que obrigam as estações a limitar a emissão de conteúdo a uma determinada região. Estanecessidade de apresentar um produto personalizado e de forma quase instantânea resulta numamaior aposta na automatização, nomeadamente na deteção e substituição de anúncios publicitáriosem tempo real. Nesta dissertação é então proposta uma aplicação distribuída capaz de identificare substituir segmentos de publicidade numa stream audiovisual. O protótipo será capaz de receberáudio e vídeo em tempo real e detetar segmentos publicitários, substituindo-os por outro conteúdo,podendo este ser proveniente de uma stream ou de um ficheiro. O resultado final é uma stream,com o menor atraso possível, onde os segmentos publicitários iniciais são substituídos por outrosmais relevantes. Tendo em conta a modularidade deste projeto, será possível adicionar ou removercomponentes consoante o caso de uso, possibilitando soluções não só na área da publicidade, mastambém na transmissão e edição de vídeo e áudio na cloud.
iii
iv
Acknowledgements
First, I would like to thank my family for providing me with all the conditions to evolve as aprofessional and as a person. To Maria Teresa Andrade for accepting this challenge and guiding methrough it. To MOG Technologies, for providing me with the conditions necessary to develop thiswork, especially to Alexandre Ulisses, for the challenge proposed, and to Pedro Santos, MiguelPoeira and Vasco Gonçalves for all the help provided during the semester.
Vasco Filipe
v
vi
“Whether you think that you can, or that you can’t, you are usually right.”
Henry Ford
vii
viii
Contents
1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Cloud Computing 52.1 Cloud Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Private Vs Public Vs Hybrid Clouds . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Hypervisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.2 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.3 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Advantages of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Digital TV Production over IP 113.1 SDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Internet Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Transport Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4 MPEG-TS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Real-Time Transport Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5.1 RTP Payload Format for Uncompressed Video . . . . . . . . . . . . . . 143.6 SMPTE 2022-6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.7 JT-NM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.8 NMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Automation of Advertising Replacement 194.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Market Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Anvato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.2 Audible Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.3 ACRCLOUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.4 Ivitec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.5 Adobe Primetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.6 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
ix
CONTENTS
4.4.2 Modularity and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5.1 Prototype limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5.2 MOG MPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Results and analysis 355.1 Test Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.1 Frame Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.2 Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.3 RAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.4 CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.5 Cascade Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6 Conclusions and Future Work 416.1 Fulfillment of Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
References 43
x
List of Figures
2.1 Cloud Model Layer System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Comparison between traditional and cloud models . . . . . . . . . . . . . . . . . 72.3 Comparison between hypervisors and container engines . . . . . . . . . . . . . . 9
3.1 PAT and PMT [Cia09] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 JT-NM simplified Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Node proposed by NMOS [Ass16a] . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 Live dynamic server side ad insertion[Anv] . . . . . . . . . . . . . . . . . . . . 204.2 VOD dynamic server side ad insertion[Anv] . . . . . . . . . . . . . . . . . . . . 204.3 Technology of Audible Magic[Mag] . . . . . . . . . . . . . . . . . . . . . . . . 214.4 - Fingerprint density versus type of search [Tec] . . . . . . . . . . . . . . . . . . 234.5 Cloud Application Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.6 Input Distributor media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.7 VOD Replacement media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.8 Video Switcher media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.9 Business Logic media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.10 Advertising Detector media flows . . . . . . . . . . . . . . . . . . . . . . . . . 274.11 Output media flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.12 VoD Storage use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.13 Additional Input Feeds Architecture . . . . . . . . . . . . . . . . . . . . . . . . 294.14 Prototype Developed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1 Application deploy scenario and data flow . . . . . . . . . . . . . . . . . . . . . 375.2 Cascade Test Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
xi
LIST OF FIGURES
xii
List of Tables
4.1 Comparative analysis by features . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1 Technical Specification of the test machines . . . . . . . . . . . . . . . . . . . . 355.2 Video Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3 Frame Rate in each module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.4 Average Inbound and Outbound traffic in each module . . . . . . . . . . . . . . 375.5 Average Module RAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.6 Average Module CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
xiii
LIST OF TABLES
xiv
Abbreviations
API Application Programming InterfaceCPU Central Processing UnitEDL Edit Decision ListFPS Frames Per SecondHTTP HyperText Transfer ProtocolHTTPS HyperText Transfer Protocol SecureIGMP Internet Group Management ProtocolIAB Interactive Advertising BureauIP Internet ProtocolJT-NM Joint Task Force on Networked MediaMPEG-TS MPEG Transport StreamNMOS Networked Media Open SpecificationsOS Operating SystemRFC Request for CommentsRTP Real-time Transport ProtocolSLA Service-Level AgreementSMPTE Society of Motion Picture and Television EngineersTCP Transmission Control ProtocolUDP User Datagram ProtocolVAST Video Ad Serving TemplateVM Virtual MachineVOD Video on DemandXML eXtensible Markup Language
xv
Chapter 1
Introduction
In 2016, more than 500 thousand million dollars were expended in advertising, a 7.1% increase
from the year before. From this total, 36% was spent on the television market and 29.9% on online
advertising. [Mar]
Although being a market in expansion, advertising has been experiencing some problems,
namely in targeted publicity. This problem is especially relevant in television, where a TV segment
can be broadcasted to multiple countries. This reduces the efficiency of the advertising, as its
content needs to be relevant to a large group of viewers.
So, the personalization of advertising has become increasingly important for end-users, broad-
casters and entities that have contracted advertising space. This assumes a special importance in
the case of television content.
1.1 Context
Currently, the delimitation of the advertising segments in television broadcasts – i.e., the identifi-
cation of the beginning and end moments of a contiguous set of advertising content - is typically
done by a human operator. As a result, the process is expensive and potentially prone to errors
[CA08]. Also, it contributes to the introduction of delays in the availability of content over other
types of medium, since the operator has to wait for the end of the broadcast to edit the resulting
video.
There is also a trend and a set of standards and architectures that the industry is following in
order to achieve the virtualization of television production. SMPTE 2022-6 [S.M12] is one such
standard, and the one architecture is being developed by Joint Task Force on Networked Media
(JT-NM), a group formed by the European Broadcasting Union (EBU), Society of Motion Picture
and Television Engineers (SMPTE) and Video Services Forum. The purpose of this team and these
standards is the transition from physical equipment-based transmissions to IP networks.
1
Introduction
This dissertation was developed in a corporate environment at MOG Technologies. MOG is
involved in the broadcasting market since 2007 and provides solutions for post-production envi-
ronments.
This solution is part of a national project called MOG Cloud Setup, where MOG Technologies
partners with INESC TEC with the goal of developing audiovisual content preparation platforms
for ingest in the cloud.
1.2 Motivation
By automating the process of identifying and replacing advertising segments in television content
it is possible to reduce the delay in content availability in other mediums while also reducing costs
and the possibility of errors introduced by human operators.
A television broadcast capable of delivering personalized advertising segments provides an
upgraded product to the advertising space buyers, as more specialized segments can have signifi-
cantly more impact in the end-user.
The transition from video processing hardware to a cloud-based application can result in lower
costs in hardware (processing power, storage) while providing more flexibility, allowing the appli-
cation to scale as needed, without major changes to the architecture.
1.3 Goals
The main goal of this dissertation is to develop a cloud-based application prototype capable of
receiving and re-sending an audiovisual stream, replacing predetermined segments of such stream
with content originated from other sources.
This application should aim to reduce the delay inserted by such operations while maintaining
the quality of the content and must be fully automated.
1.4 Document structure
Besides Chapter 1, this document contains 5 other chapters.
Chapter 2 analyzes cloud computing characteristics and models, the role of virtualization in
those same models, Hypervisors and Containers and the advantages of a cloud based application.
Chapter 3 studies the IP protocol, as well as its uses in television production. It also analyzes
multiple protocols and standards used for IP broadcasts.
Chapter 4 analyzes the automation of advertising substitution, describing and comparing so-
lutions already developed by television companies. It then proposes an architecture for a cloud
based application for real-time advertising substitution and presents the developed prototype.
Chapter 5 presents the set of validation and acceptance tests that were performed in order to
validate the developed prototype, as well as analyzes the results of these tests.
2
Introduction
The dissertation ends with chapter 6, where a general appreciation of the proposed objectives
and future work is made.
3
Introduction
4
Chapter 2
Cloud Computing
With technologies in processing, storage and network areas progressing rapidly, computing re-
sources saw a rise in power while decreasing its price. To take advantage of this phenomenon,
research was made in order to accomplish the sharing of physical resources such as CPU and
storage among multiple applications, thus arising the Cloud Computing model.
Cloud Computing is a model that provides on-demand network access to computing resources.
It allows for a quick provisioning and low managing effort, adapting to the user’s needs and system
changes. This model has five main characteristics[MG11]:
• On-demand Self-service:
A cloud model needs to be able to automatically provide computing capabilities to the con-
sumer, such as network storage or memory.
• Broad Network Access
It should be available over the network and accessible by any platform, be it computers or
mobile devices.
• Resource Pooling
This model can provide services to multiple consumers with different needs, assigning re-
sources according their demands.
• Rapid Elasticity:
The model should allow for a rapid scaling, increasing or decreasing its capabilities as de-
manded, giving a sense of unlimited resources for the consumer.
• Measured Service
Lastly, the model should monitor resource usage, providing a report to the user containing
all the metrics associated with the system.
5
Cloud Computing
Figure 2.1: Cloud Model Layer System
The architecture of cloud computing model is structured by layers based on their level of
abstraction. So, a cloud layer is classified as of a higher level if its services can be composed from
services of the layers below [YBDS08], as represented in Figure 2.1.
• The first layer and the least virtualized is the Server layer, composed of the actual hardware.
• The Infrastructure layer provides tools to manage virtual machines.
• The Platform layer allows configuration of the resources available, such as CPU, storage
and bandwidth, using the services provided by the Infrastructure.
• The Application layer allows the implementation and deployment of cloud applications. It
uses services from the Platform layer to improve scalability, increasing or decreasing the
resources available as needed.
• The last layer and the least virtualized is the Client Layer. This layer provides the user
access to the application, usually through a mobile app or a website.
2.1 Cloud Models
There are multiple models of cloud services, different from each other in its level of abstraction
[RCL09]:
• Infrastructure-as-a-Service (IaaS)
In this model, the abstraction is located at the infrastructure level. The users can rent cloud
resources depending on their needs, paying only for the actual used resources.
6
Cloud Computing
Figure 2.2: Comparison between traditional and cloud models
• Platform-as-a-Service (PaaS)
The abstraction is located at the platform level, with the service provider being responsible
for the servers and the user focused only on the application.
• Software-as-a-Service (SaaS)
In this model, a full application is deployed and updated remotely, removing the user’s need
to install software on his machine.
As seen in Figure 2.2,the user’s control over the lower layers decreseas as the level of abstrac-
tion increases (Components managed by the user are represented in blue).
2.2 Private Vs Public Vs Hybrid Clouds
A private cloud infrastructure may be deployed on or off premises and is usually only available to
a single organization. The cloud can either be managed by local human resources, placing control
and responsibility on the organization or, on the other hand, cloud infrastructure management can
be completely outsourced or shared between the organization and a third party. Private clouds
provide more infrastructure control and data security but result in higher technical and economical
costs.
A public cloud infrastructure is open to the public and managed by a third party institution
providing a service for the users. In this model, control, cost and infrastructure are all responsi-
bilities of the third party provider. On the other hand, the provider can have access to the user’s
data.
A hybrid cloud infrastructure leverages a private cloud and public cloud services by imple-
menting a common communication interface able to create interoperability between them. The
integration is valuable for companies with sensitive data that should not be exported to third par-
ties, but that at the same time intend to integrate business processes with external services.
7
Cloud Computing
2.3 Virtualization
The introduction of multicore processors and the integration of virtual technology led to single
machines being capable of executing multiple parallel tasks. Using such powerful hardware to run
a single application in not sustainable, as the resources would not utilized while the application
was not running. On the other hand, using the same operating system to run multiple applications
can generate security problems, with no isolation available between applications. There can also
be situations where multiple applications try to access the same storage locations, ports or sockets.
Virtualization provides a hybrid approach between centralized and decentralized applications,
while at the same time promoting horizontal scalability and resource elasticity. Virtualization ab-
stracts the hardware and allows multiple guest Operative Systems running in a single host where
each guest is completely isolated and running a full OS. This approach brings the following bene-
fits:
• The idle time of a system decreases because of the number of virtual machines running
simultaneously.
• Resources can be managed and allocated to Virtual Machines individually.
• The system can be managed in a decentralized fashion.
• Heterogeneous systems can run within a single host.
• Failover times can be decreased by promoting simplified migration strategies or even live
migrations.
• Isolation mitigates guest security issues from a compromised or out of date host.
• Green computing is fostered by reducing the number of necessary physical machines.
2.3.1 Hypervisors
The evolution of virtualization revolves around the work on one piece of software, the hypervi-
sor, also known as VMM (Virtual Machine Manager). It allows physical devices to share their
resources with virtual machines running as guests [Sri09]. In this sense a physical computer can
be used to run multiple virtualized instances, each with its own OS and virtual hardware, CPU,
memory, network, I/O, all provided by the hypervisor.
The hypervisor also provides the possibility of running guest’s applications without modifying
them or their OS. This way, the guests are unaware of whether the environment is virtualized, due
to the hypervisor providing the same communication interface as a physical system.
There are several hypervisor categories, namely Type 1 and Type 2. A type 1 hypervisor is
implemented in the bare-metal, directily over the hardware, while a Type 2 is installed over the OS,
similar to a piece of software. A Type 1 hypervisor is superior to a Type 2 in terms of performance,
since the latter has to go through an additional layer, the host OS.
8
Cloud Computing
Figure 2.3: Comparison between hypervisors and container engines
There are four main competitors in the hypervisor market, responsible for 93% of its share
[PBSL13]. Xen [Bar03] and KVM [Kiv07] are two open source hypervisores based on Linux
while VMWare’s ESXi [Cha08] and Hyper-V [VV09] are closed-source solutions.
2.3.2 Containers
Application containers are an alternative to hypervisors, when the overhead introduced by the latter
is undesirable. Simply put, containers are a lightweight virtualization technology that enables an
application to be packaged along with its virtual environment, configurations and dependencies,
isolating it from the deployment environment. As represented in Figure 2.3, the main feature that
differentiates a container from a hypervisor is the fact that multiple containers can share the same
OS.
2.3.3 Docker
Docker is an open-source technology that allows applications to run in containers [Doc]. Using
a Dockerfile, a configuration document, Docker can then execute all sorts of instructions, ranging
from dependencies installation to configure environment variables. Compared to a virtual ma-
chine, Docker containers contain a similar level of isolation while requiring less storage space,
resulting in a more efficient solution.
9
Cloud Computing
2.4 Advantages of Cloud Computing
On-demand resources enable users to provision or decommission resources such as virtual pro-
cessors, memory or storage capacity manually or automatically. This highly adaptable and per-
sonalized feature provides the necessary resource elasticity to fulfill a given SLA (Service-Level
Agreement) and at the same time promoting cost savings by eliminating unnecessary resources.
Resources may be elastically provisioned and released either manually or automatically according
to a given resource demand. Also, more is achieved with less hardware, as cloud computing re-
sources serve multiple clients in a multi-tenant model where resources are assigned, released and
reassigned on demand.
Cloud computing also reduces on-premises hardware and the expert human-resources to man-
age them leading to easy implementation. Users without technical knowledge are able to run
complex applications and grow their businesses.
10
Chapter 3
Digital TV Production over IP
The virtualization of mobile production, transitioning from SDI to IP, is an object of discussion
and study in the broadcasting industry. By using a single location to broadcast all signals and
substituting hardware for software, television companies can significantly reduce broadcasting
costs.
3.1 SDI
Serial Digital Interface (SDI) is a family of digital video, audio and metadata interfaces normalized
by SMPTE (Society of Motion Picture and Television Engineers). Since the nineties, this interface
is used as the standard for television production [Kov13].
SDI is only available in professional equipment and allows the transmission of uncompressed
video and audio.
Each time a new format emerges, SDI needs to be adapted to accommodate it. So, each time a
new technological breakthrough is made regarding television transmission (4K, 8K, 3D, etc.), all
SDI infrastructures need to be adapted [Gol].
3.2 Internet Protocol
TCP/IP is a set of communication protocols used on the Internet and similar networks. This model
is composed by five layers, each one responsible for providing services to the upper layer protocol.
These layers are, from top to bottom, Application, Transport, Network, Data Link and Physical.
Taking into account the scope of this dissertation, it is worth analyzing the two upper layers,
Application and Transport. The Transport layers provides means for communication between
different machines, while the application layer allows different processes to communicate among
them, using the Transport layer services.
11
Digital TV Production over IP
3.3 Transport Layer
UDP (User Datagram Protocol) is a transport layer protocol for network applications based on IP
(Internet Protocol). It is a simple protocol, where each packet is sent only once, regardless of it
being corrupted or lost. This protocol is mainly used in real-time applications such as video games
and video conference applications, prioritizing efficiency above integrity. It is a connectionless
protocol.
On the other hand, TCP (Transmission Control Protocol) is a connection-based protocol. Be-
fore starting the data transfer, a connection is established between the emitter and the receiver.
Each time a packet is received, the receiver send a message to the emitter with a confirmation of
the packet reception. This protocol ensures that the data is fully received, in the right order.
Despite the TCP being a more complete protocol, the UDP can be used in cases where the
overhead introduced by the connection protocol endangers the viability of the application. Also,
in controlled environments such as private networks, the downsides of the UDP are significantly
reduced.
There are three methods of communicating via IP. These methods differ in the number of
receivers for each transmission:
• Unicast
In a unicast transmission, for each message sent, there can only be one receiver. It is a
one-to-one method.
• Broadcast
In a broadcast transmission, the message sent can be received by every node of the network,
without exceptions.
• Multicast
Multicast is, similarly to the broadcast, a one-to-many transmission. With this method, the
message is only received by interested nodes. To declare interest, a node can send messages
asking to join or leave a multicast group. The emitter, instead of sending a message for each
receiver, sends one single message to the multicast address. This allows the emitter to send
messages to multiple nodes without prior knowledge about their interest.
3.4 MPEG-TS
MPEG-TS [Mpe00] is a media container for content storage and transmission. MPEG-TS streams
are composed by one or more program, described in a Program Association Table (PAT). If the
stream contains only one program, it can be designated as a Single Program Transport Stream
(SPTS) while if it is composed of multiple programs, the stream is defined as a Multiple Program
Transport Stream (MPTS) [ETS07].
12
Digital TV Production over IP
Figure 3.1: PAT and PMT [Cia09]
A program is a combination of one or more stream of PES (Packetized Elementary Stream)
packets. For example, a program can contain an audio PES, a video PES and a subtitle PES. A
PMT (Program Map Table) stores the information of all programs present in a MPEG-TS stream,
as seen in Figure 3.1.
A MPEG-TS stream is a group of TS packets, each with a size of 188 bytes, 4 of them used
for the header. This header contains a Packet ID (PID) that allows the identification of its content.
Each PES packet contains a PTS (Presentation Timestamp) and a DTS (Decode Timestamp),
allowing the synchronization of multiple ES.
A MPEG-TS stream must be encapsulated in a RTP packet to be transported using IP. Each IP
packet contains a IP header, a UDP header, a RTP header and the number of TS packets it contains.
3.5 Real-Time Transport Protocol
Real-time Transport Protocol is a protocol used to send audio and video over IP networks. Each
RTP packet contains a header with information regarding timestamps, sequence numbers, among
others, while leaving a customizable field, allowing the extension of the protocol.
By marking each packet with a sequence number, it is possible to re-order packets regardless
of the order in which they where received.
By using RTP allied with UDP, it is possible to maximize the speed of data transfer while
keeping the packets ordered.
13
Digital TV Production over IP
Real-Time Control Protocol is a protocol used simultaneously with RTP to monitor and pro-
vide statistics about the transmission.
3.5.1 RTP Payload Format for Uncompressed Video
RFC 4175 [GP05] specifies the norm for the transport of uncompressed video using RTP. While
an RTP packet header only allows the use of 16 bits to identify the packet sequence number, RFC
4175 contains a 16 bit extension called extended sequence number. In a 1-Gbps video stream
and sending 1000 byte packets using only RTP, all possible sequence numbers would roll over in
0.5 seconds. This could create problems identifying lost and out-of-order packets. With a 32 bit
sequence number, it would take approximately 9 hours for the number to roll over.
3.6 SMPTE 2022-6
SMPTE ST 2022-6 [S.M12] is a standard published by SMPTE, which belongs to the SMPTE ST
2022 family of standards that allows the use of IP technology in the broadcasting industry. The ST
2022-6 defines the transport of SDI over IP using the RTP protocol, so it is dubbed "SDI over IP".
By encapsulating SDI payloads into IP packets, the ST 2022-6 allows the SDI signal to be
packaged in multiple 1376-byte packets, and it is possible to transmit the content over an Ethernet
network, receive the packets, and rebuild the SDI signal. In spite of allowing interoperability with
other devices that use SDI, it means that there is no separation between the various streams in SDI
[Laa12]. Imagine, for example, that you want to modify the audio that is part of an SDI stream
that is transported with the corresponding video. In that case, you will have to deal with the video
and all the overhead it will bring to the system when you only want to modify the audio. That is,
since there is no separation between the various contents present in the SDI, there is no flexibility
in the transport of only part of the content.
3.7 JT-NM Architecture
Joint Task Force on Network Media (JT-NM) is a consortium formed between the European Union,
SMPTE (Society of Motion Picture and Television Engineers) and the Video Services Forum. This
task-force designed an architecture called JT-NM Reference Architecture v1.0 [oNM15] with the
purpose of bringing together good practices, recommendations and frameworks so that there can
be interoperability between devices from different manufacturers in the transition from SDI to IP.
The JT-NM defines a conceptual model that allows the mapping of the workflows in order to
ensure the desired interoperability. Figure 3.2 represents a simplified architecture, focusing on the
scope of this dissertation.
The Network is the heart of the media operations, usually a Ethernet Network.
The Nodes are connected to the Network and provide infrastructure such as storage, processing
power and interfaces.
14
Digital TV Production over IP
Figure 3.2: JT-NM simplified Architecture
15
Digital TV Production over IP
Devices can be software-only services or physical Devices, such as cameras, and are deployed
onto the Nodes to provide the Capabilities necessary to complete Tasks.
In the case of cameras or other equipment such as microphones, a Device can be a Source
for Essences. These Essences are moved as Grain payloads that are transported over the Network
divided in network packets.
A Grain is composed by its media content (video, audio, metadata) and a timestamp. This
time represents the instant in which the grain was created and is generated using the Clock present
on the Node. All Nodes should should be synchronized, using PTP (Precision Time Protocol) to
achieve a nanosecond precision. [Com08]
Essences are sent over the Network by Senders to the Receivers. This communications is made
in the form of Flows. A Flow is the result of a Source and there can be multiple Flows for each
Source. For example, a Source can generate a Flow for uncompressed video and another one for
H.264 compressed video.
The Registry allows connections between Receivers and Senders, by providing the means for
Nodes, Devices and Flows to register themselves and discover others, allowing connections to be
created between Devices( Receivers and Senders).
In order for this information to be properly articulated and to be used to define workflows,
three fundamental blocks are defined on which this data model is based: Timing, Identity and
Discovery & Registration.
• Timing
Each Grain contains a timestamp, allowing the consistency of the performed operations and,
consequently, ensuring that the Flows are correctly aligned.
• Identity
Each element present in the infrastructure must be easy and uniquely identifiable, so that it
can be referenced and used. All relationships between resources must make use of Identity.
• Registration and Discovery
Each Node in the Network must register itself and the Devices, Sources, Flows, Senders
and Receivers that it makes available, so that other nodes can discover them and obtain the
appropriate information about each one.
3.8 NMOS
NMOS (Network Media Workflow Association) is a series of specifications that intends to create
frameworks to allow the interoperability desired by the JT-NM architecture. It was created by the
Advanced Media Workflow Association, a group composed of several broadcasting corporations
and other companies working on the TV market.
The NMOS is based on the conceptual data model proposed by the JT-NM, in order to add
identity and relationships between content and equipment. Regardless of the specific task that
16
Digital TV Production over IP
Figure 3.3: Node proposed by NMOS [Ass16a]
each Node performs, the logical view of a Node according to the NMOS 3.3 allows the creation
of a level of abstraction sufficient to ensure the modularity and expandability expected that can be
adapted to different needs.
NMOS does not limit how each module should work, it only specifies wich interfaces they
should expose. Each Node must expose HTTP transactions performed by a REST API. These
transactions are described in the "AMWA NMOS Discovery and Registration Specification (IS-
04)" specification proposed by NMOS [Ass16b].
17
Digital TV Production over IP
18
Chapter 4
Automation of Advertising Replacement
4.1 Context
Most TV networks have several contractual obligations with advertisement buyers. These obli-
gations range from regional distribution, to time slots and even broadcast services. For example,
some ad segments can be played in the television broadcast but not on the web player, others are
only relevant to a certain location, losing its efficiency when broadcasted to a larger audience.
Also, in order to accommodate Video-on-Demand services, the broadcast needs to be divided by
segments, separating advertising blocks from program blocks. In order to fulfill these require-
ments, the broadcasters need to allocate physical resources and manpower.
4.2 Market Solutions
In order to address the problem described in 4.1, several companies have started research in the
areas of detection of advertising content.
4.2.1 Anvato
Anvato Media Content Platform (MCP) is a solution from Anvato [Anv] which offers live stream-
ing, video encoding, cloud editing, syndication, dynamic ad insertion among other services.
They claim that their stitching technology replaces broadcast ad units with dynamically placed
digital ads which are frame accurate in-stream, and using IAB (Interactive Advertising Bureau)
standardized VAST (Video Ad Serving Template) tags, they can help to prepare and deliver ads
that are both nationally, regionally and locally most relevant to clients’ viewers.
It is designed to monetize both user’s live streams (Figure 4.1) and video-on-demand (Fig-
ure4.2) on any screen, any device. It allows the insertion of ads on the server side dynamically
on all platforms: desktop, iOS and Android apps, AppleTV, Chromecast, Roku, Amazon Fire TV
and others.
This is a solution that delivers the ads in a contextualized way but doesn’t detect the ads when
there are no AD triggers.
19
Automation of Advertising Replacement
Figure 4.1: Live dynamic server side ad insertion[Anv]
Figure 4.2: VOD dynamic server side ad insertion[Anv]
20
Automation of Advertising Replacement
Figure 4.3: Technology of Audible Magic[Mag]
4.2.2 Audible Magic
Audible Magic’s solutions [Mag] use what is called audio fingerprints to match unknown media
content against known, registered media content. Analogous to the idea that human fingerprints
can be measured to compactly and uniquely identify every person, there are processes that allow
very small clips of audio to be measured for distinctive characteristics. These compact audio
fingerprint measurements can be uniquely distinguished when compared to measurements taken
from any other audio clip.
Audible Magic uses this kind of technology, called automatic content recognition (ACR) tech-
nology, to identify unknown media content when fingerprints of that content are matched against
known fingerprints registered in an Audible Magic database.
One of the use cases of this technology provided by Audible Magic is their service for TV ad
detection and marking, shown in Figure 4.3. It allows to detect ads in unmarked broadcast streams
and, in real-time, provide frame-accurate timing to trigger the injection of ad markers. With the
injection of those ad markers, dynamic ad insertion (DAI) technology can be fully utilized.
Despite the fact that Audible Magic maintains content identification databases for multiple
content types, including live TV programming and advertising running on national channels in the
USA, it does not include international regions and it does not allow to identify ads outside that
database.
4.2.3 ACRCLOUD
ACRCLOUD provides a set of cloud based solutions [Ser] for automatic content recognition,
using audio fingerprinting technologies, which are more directed to second screen applications.
21
Automation of Advertising Replacement
For example, the live channel detection service allows the collection of live streams from TV or
radio stations in real-time and enables the channel to be detected at the exact point of broadcast
from any user’s mobile devices. With live content-generated audio fingerprints, supplementary
information about the content or interactive campaigns can be triggered to appear on the viewer’s
second screen devices.
The usage of this live channel detection service can be summarized as follows:
• Pre-designed contents and interactions are organized by marketing editors in the server end.
• Users’ apps identify the TV channel and specific times with audio recognition while they
are watching TV.
• Detailed contents are triggered and retrieved from the server to the users’ app.
It is important to point out that this solution does not allow content replacement or elimination.
4.2.4 Ivitec
Ivitec offers a set of solutions to analyze video clips which are based in adaptative video finger-
printing technology [Tec]. This technology is able to adjust the density and granularity of finger-
prints to match specific uses cases (Figure 4.4) instead of using a single algorithm in the hope that
this “average” solution will fit the majority of the use cases (as conventional video fingerprinting
approaches do).
The adaptive video fingerprinting is fully implemented into the MediaSeeker Core Platform
which is the base of all Ivitec products, and it allows to identify content by comparing to a database
of known video information inside the platform. Their solutions are capable of recognizing video
clips as they are broadcasted, uploaded or downloaded or within preexisting repositories of video
content.
AdMon is a software automated solution for advertisement workflows which promises to
streamline and simplify the process for recognition and tracking.
This system analyzes a set of TV channels for advertisements to be automatically recognized
and provides valuable detection information such as channel name and detection time within min-
utes of airing. It can be integrated with third party capture sources. However, it does not allow
content replacement or elimination.
4.2.5 Adobe Primetime
Adobe Primetime is a multiscreen TV platform for live, linear and VOD programming [Pri].
Its modular distribution and monetization capabilities include TVSDK for multiscreen playback,
DRM, authentication, dynamic ad insertion (DAI) and audience-centric ad decisioning.
Adobe Primetime ad insertion solution is available in both client and server side configura-
tions, and the commercial breaks are identified using traditional broadcast ad break cues, real-time
markers, or ad timelines from the publisher’s CMS. The user can even skip or replace burned-in
22
Automation of Advertising Replacement
Figure 4.4: - Fingerprint density versus type of search [Tec]
advertisements. Adobe Primetime features turnkey integration with Adobe’s video ad-decisioning
solution to deliver true DAI into live, linear, and video-on-demand content across desktops, mobile
devices, gaming consoles, and IP-enabled set-top boxes. Adobe Prime time ad insertion can also
be integrated with third-party ad servers and sell-side platforms.
4.2.6 Comparative Analysis
Table 4.1 summarizes each solution, comparing them by available features. These features include
the capacity of outputting a cloud-ready format. Another important functionality is the ability to
ingest live programs or offline media, in which case it must be possible to interact with several
file systems,memory card readers, temporary storage systems, or virtual directories. In terms of
advertising, the different solutions are compared by their capacity to detect advertising segments,
be it on or off the cloud and also by the capability of replacing said segments.
As seen in Table 4.1, there isn’t one solution containing all the features analyzed. The solutions
that are closest to that goal are Anvato and Audible Magic products. In Anvato’s case, there is no
solution for cloud based advertising detection. In the case of Audible Magic product, the ingest of
offline content is not possible.
23
Automation of Advertising Replacement
Table 4.1: Comparative analysis by features
Solution Cloud readyoutput formats
Ingest AdvertisingDetection Content
replacementLive File based Cloud
OffCloudBased
Anvato X X X X XAudibleMagic X X X X X
ACRCLOUD XIvitec XAdobe
Primetime X X
4.3 Requirements
To ensure a solution to the automation of advertising substitution, some requirements need to be
met:
• The application must be able to receive one or more input feeds from outside the cloud. The
provided bandwith must also be large enough to ensure the transmission of uncompressed
video between modules.
• Internet Group Management Protocol (IGMP) must be active, to allow multicast communi-
cation in the network.
• The application is to be used in real-time scenarios. To ensure that there are no delays during
video processing, the cloud components must be able to scale vertically, so that the time to
process a frame is lower than the video frame rate.
• All components must be deployed in the same private cloud, in order to reduce the delay in
the communication between each component and ensure network reliability.
4.4 Proposal
This dissertation proposes a cloud-based application capable of analizing a livestream, identifying
ad segments it contains in real time and replacing them with more relevant content, using IP for
video transportation.
4.4.1 Architecture
This application is composed by several modules, each containing a unique function and designed
in a way that allows changes in the architecture to accommodate different use cases. Some of
this modules may also be instantiated multiple times, depending on the number of consumers and
detection algorithms utilized. These modules interact with each other as seen in Figure 4.5. Since
24
Automation of Advertising Replacement
Figure 4.5: Cloud Application Architecture
this is a time-critical application, it is recommended that all these modules be deployed in the same
private cloud network.
4.4.1.1 Input Distributor
The Input Distributor is the module responsible for the reception and processing of a live stream
originated from outside the cloud, making it available to the other modules of the application.
It receives an audiovisual stream, decodes it and sends it via multicast. This will be the primary
feed, and the one that is played out by default by the application. This feed will be then used by
both the Video Switcher and Advertisement Detector modules.
Figure 4.6 shows the media flows of this module.
4.4.1.2 VOD Replacement
Similarly to the Input Distributor, the VOD Replacement module processes and sends video and
audio via multicast but, instead of receiving content from a live stream, it reads from a file.
Figure 4.6: Input Distributor media flows
25
Automation of Advertising Replacement
Figure 4.7: VOD Replacement media flows
This node also contains a REST server, allowing it to receive requests to restart the video being
played. This will be the alternate feed, only played out when an advertisement segment is detected
in the primary one. The output of this module will be accessed by the Video Switcher module.
Figure 4.7 shows the media flows of this module.
4.4.1.3 Video Switcher
Video Switcher is the module responsible for selecting the feed that will be played out. It sub-
scribes to two multicast addresses, the first one where the Input Distributor resulting stream is
available and the second one from the VOD Replacement module, and selects one of them, redi-
recting it to another multicast address, which will be accessed by the Output module.
The selection is made according to information received from Business Logic. In order to
receive this information, a REST Server is present in this module, capable of receiving requests
containing information about switching instances.
To ensure that the VOD contents are played from the very start when an advertising segment is
detected, the Video Switcher can use a REST API to send a request to replay the video from start
to the VOD Replacement. Moments before commuting to the alternate feed, the module sends
the request to the VOD Replacement and starts scanning the contents of the stream in search of
its frame 0to then store its data in memory. This way, when the switch actually happens, the first
frame sent by this module will equal the first frame from the original video. This ensures no data
is lost during the feed change.
Figure 4.8 shows the media flows of this module.
Figure 4.8: Video Switcher media flows
26
Automation of Advertising Replacement
Figure 4.9: Business Logic media flows
4.4.1.4 Business Logic
This module receives EDL (Edit Decision List) files from the Advertisement Detectors. This file
lists the frame numbers where the advertisement segments start and finish.
This information is sent to the Video Switcher with the help of a REST API.
Figure 4.9 shows the media flows of this module.
4.4.1.5 Advertisement Detector
Subscribes to the Input Distributor multicast address and scans its contents, detecting advertise-
ment segments and saving the frames corresponding to its start and finish. It then sends the result
to the Business Logic in the form of an EDL.
There can be multiple instances of this module, one for each detection algorithm used. For
example, one module can analyze the video stream, while the other focuses on the audio.
Figure 4.10 shows the media flows of this module.
4.4.1.6 Output
Subscribes to the multicast address containing the output from the Video Switcher. It then encodes
the stream and sends it using RTP to the end-user.
The output of this module is sent by unicast so, for each end-user, a different instantiation of
the module is needed. Also, one module is needed for each encode format. So, if the stream needs
to be sent to two different users with two different encoding formats for each, four instantiations
of this module are needed.
Figure 4.11 shows the media flows of this module.
Figure 4.10: Advertising Detector media flows
27
Automation of Advertising Replacement
Figure 4.11: Output media flows
4.4.2 Modularity and Scalability
Each module is independent from all the others. With this level of modularity, it is possible to
develop and integrate other modules on this application, providing a larger range of use cases. For
example, a module can be developed that stores the resulting stream of the application, or even
an ad-free stream, as a video file, as seen in Figure 4.12 . It is also possible to replace the VOD
Replacement with another Input Distributor, or vice-versa.
The application is also capable of scaling up, allowing multiple instances of each module
according to the user needs. In this case, the prototype developed is optimized to receive two
video sources, the main one coming from a live stream and the alternative coming from a video
file. However, it is possible to increase this number, doing so by deploying several Video Switcher
modules, as seen in the figure 4.13, to accommodate to the use case. So, each additional feed
requires an additional Input Distributor and Video Switcher.
The end result of the application is a MPEG-TS stream equivalent to the one received by the
Input Distributor. This allows multiple instances of the application in cascade, where the output
feed of the first is used as an input for the following one. The consumer can then apply video
transformations, such as overlays, if needed.
4.5 Prototype
A prototype was implemented in order to validate the architecture described in 4.4.1 and analyze
the value of future products based on it.
The prototype is able to receive two sources of audiovisual content, one from a live stream
and the other from a video file, decode and send the video between modules using RTP. It is also
capable of parsing an Edit Decision List to find instances of advertising segments. In the end, the
video is encoded and sent to the end-user.
The end result is a MPEG-TS stream in which the content abides by the switching instances
defined in the EDL.
In order to implement the prototype, the modules Input Distributor, VODReplacement, Video
Switcher, Output and Business Logic were developed. To simulate the Ad Detector module, an
EDL was created describing the time instances in which the feeds should be commuted (represent-
ing advertising segments), as seen in Listing 4.1.
28
Automation of Advertising Replacement
Figure 4.12: VoD Storage use case
Figure 4.13: Additional Input Feeds Architecture
29
Automation of Advertising Replacement
1 < PubP lug in >2 < S e t t i n g s >3 <Type v i d e o =" t r u e " a u d i o =" f a l s e " / >4 < A l g o r i t h m s >5 <Video name=" v i s u a l r h y t h m " / >6 < / A l g o r i t h m s >7 < / S e t t i n g s >8 < Re po r t >9 < Gl ob a l n P r o c e s s e d F r a m e s =" 2000 " nEven t s =" 15 " nPubFrames=" 708 " / >
10 < Ev en t s >11 <PubSequence i n i t i a l F r a m e =" 481 " f i n a l F r a m e =" 578 " s t a r t P t s =" 481481/30000 " e n d P t s =" 289289/15000 " / >12 <PubSequence i n i t i a l F r a m e =" 687 " f i n a l F r a m e =" 797 " s t a r t P t s =" 229229/10000 " e n d P t s =" 797797/30000 " / >13 <PubSequence i n i t i a l F r a m e =" 886 " f i n a l F r a m e =" 1042 " s t a r t P t s =" 443443/15000 " e n d P t s =" 521521/15000 " / >14 <PubSequence i n i t i a l F r a m e =" 1061 " f i n a l F r a m e =" 1064 " s t a r t P t s =" 1062061/30000 " e n d P t s =" 133133/3750 " / >15 <PubSequence i n i t i a l F r a m e =" 1090 " f i n a l F r a m e =" 1105 " s t a r t P t s =" 109109/3000 " e n d P t s =" 221221/6000 " / >16 <PubSequence i n i t i a l F r a m e =" 1107 " f i n a l F r a m e =" 1115 " s t a r t P t s =" 369369/10000 " e n d P t s =" 223223/6000 " / >17 <PubSequence i n i t i a l F r a m e =" 1120 " f i n a l F r a m e =" 1122 " s t a r t P t s =" 14014/375 " e n d P t s =" 187187/5000 " / >18 <PubSequence i n i t i a l F r a m e =" 1124 " f i n a l F r a m e =" 1128 " s t a r t P t s =" 281281/7500 " e n d P t s =" 47047/1250 " / >19 <PubSequence i n i t i a l F r a m e =" 1130 " f i n a l F r a m e =" 1134 " s t a r t P t s =" 113113/3000 " e n d P t s =" 189189/5000 " / >20 <PubSequence i n i t i a l F r a m e =" 1142 " f i n a l F r a m e =" 1176 " s t a r t P t s =" 571571/15000 " e n d P t s =" 49049/1250 " / >21 <PubSequence i n i t i a l F r a m e =" 1180 " f i n a l F r a m e =" 1249 " s t a r t P t s =" 59059/1500 " e n d P t s =" 1250249/30000 " / >22 <PubSequence i n i t i a l F r a m e =" 1256 " f i n a l F r a m e =" 1288 " s t a r t P t s =" 157157/3750 " e n d P t s =" 161161/3750 " / >23 <PubSequence i n i t i a l F r a m e =" 1290 " f i n a l F r a m e =" 1320 " s t a r t P t s =" 43043/1000 " e n d P t s =" 11011/250 " / >24 <PubSequence i n i t i a l F r a m e =" 1328 " f i n a l F r a m e =" 1355 " s t a r t P t s =" 83083/1875 " e n d P t s =" 271271/6000 " / >25 <PubSequence i n i t i a l F r a m e =" 1365 " f i n a l F r a m e =" 1467 " s t a r t P t s =" 91091/2000 " e n d P t s =" 489489/10000 " / >26 < / Ev en t s >27 < / Re po r t >28 < / PubP lug in >
Listing 4.1: Ad Detector Report
4.5.1 Prototype limitations
Since the prototype is just a proof of concept, some restrictions were defined for the development:
• Input:The prototype is only capable of receiving one MPEG-TS stream and one video file
as input.
• Output:The output feed must be a MPEG-TS stream, similar to the input stream.
• Environment:The prototype only works in a Windows environment.
• The audio component of the input feeds is discarded after the demux operation. The output
feed has no audio component.
4.5.2 MOG MPL
MOG Technologies Media Processing Library is used in the development of its products and is
private property of MOG Technologies. It is mainly used for broadcast solutions and provides
optimized functions and methods for video processing and transmission. This allows the develop-
ment of real-time applications with minimum delay. By using this application, it is also assured
compatibility of this project with other MOG products.
4.5.3 Implementation
From the architecture defined in Section 4.4.1, all modules described where implemented, with
the exception of the Ad Detector module (Figure 4.14). Instead, a file was generated, simulating
30
Automation of Advertising Replacement
Figure 4.14: Prototype Developed
a possible output of such component, with information regarding the instance where an advertise-
ment segment starts and ends, as seen in Listing 4.1. This file can be accessed by the Business
Logic.
By developing this prototype it is possible to:
• Test sending and receiving high definition uncompressed video in a cloud environment.
• Test the capabilities of demux, decode, mux and encode operations in a cloud environment.
• Analyze the switching capabilities of the application, in terms of frame accuracy and delay.
• Analyze metrics such as bandwidth, RAM usage and processing capacity.
In order to keep the modularity of the application, each component of the prototype is devel-
oped as a Docker container based on a Windows Server Core image.
4.5.3.1 Input Distributor
In the Input Distributor module, a stream compliant with the RFC 2250 norm is received. Then,
each RTP packet is demuxed, separating the video from the audio. The video packets are then
decoded and stored in frames. It is possible to define a number of frames to be stored in memory
as a buffer, in order to ensure the quality of the stream, although this results in a large amount of
RAM usage, as the video is stored uncompressed.
Each frame is then assigned a decoding timestamp, starting in 0 and incrementing by 1/fram-
erate, in this case, 1/50. To send the video to the other modules, the frames must be packetized
in RTP packets. Each packet header contains the new decoding timestamp, created by the Input
Distributor.
Each packet is then sent via multicast, at a constant frame rate equal to the original feed.
31
Automation of Advertising Replacement
4.5.3.2 VOD Replacement
The VoD Replacement module is fed by a video file. For each frame of the video, the module
demuxes, separating audio from video, decodes the video and stores it in new frames. These
frames can also be stored in local memory to reduce jittery in the broadcast.
Then, it re-packets the uncompressed video in RTP packets and sends them to a multicast
address, maintaining the original video frame rate. The multicast address is the same as the Input
Distributor, but with a different port.
This component also contains a REST server. With this server, the Video Switcher module can
request the VoD Replacement module to restart the video streaming from the initial frame, usually
before the beginning of and advertisement segment present in the main input feed, originated from
the Input Distributor.
4.5.3.3 Business Logic
First, this module reads and parses an Edit List Decision file provided by the user (Listing 4.1).
Then it sends the initial and final frame of each ad segment to the Video Switcher using a REST
service.
4.5.3.4 Video Switcher
Video Switcher contains a REST server. This server is used to receive the frame numbers that
correspond to an ad segment start or ad segment end from the Business Logic.
This module also subscribes to the multicast address of both Input Distributor and VoD Re-
placement feeds. By default, only the primary feed, originated from the Input Distributor, is re-
ceived. The module then depacketizes the packets, grouping them in frames, keeping the original
timestamps.
Using the information gathered from the Business Logic, this component analyzes the current
frame timestamp and acts accordingly:
1. If the difference between the next advertisement segment start timestamp and the current
frame timestamp is 2 seconds, the Video Switcher starts preparing the commutation by send
a request to the VoD Replacement to restart the video streaming. Then, in the following
frames, in addition to receiving and handling the packets received from the Input Distributor,
it also analyzes the ones from VoD Replacement, searching for a frame with a timestamp of
0.
When this frame is found, its contents are stored in local memory, so that when the feed
switching occurs, the first frame sent is the first frame of the video.
2. If the timestamp of the frame received is the same as the next advertising segment start
frame, the Video Switcher stops processing the Input Distributor packets to instead handling
the VoD Replacement ones.
32
Automation of Advertising Replacement
The packets received are stored in the memory buffer mentioned above as a frame and
the module then packetizes and sends the first frame from the buffer to another multicast
address. This address may be subscribed by the Output module.
3. If the timestamp of the frame received is the same as the next advertising segment end frame,
the module stops the processing of the secondary feed, the VoD Replacement one, to instead
process the main feed.
It also clears the buffer containing the video frames, in order to prevent a memory leak.
4. When none of the above conditions are met, the Video Switcher continues processing the
active feed, repacketing the received frames and sending them via multicast.
4.5.3.5 Output
The Output module performs the inverted operations of the Input Distributor. It subscribes to the
multicast address where the Video Switcher packets were sent, grouping them into frames. Then
encodes them to H.264 format, sending the result as an MPEG-TS stream via RTP.
Modules Similarities
All modules containing video processing operations (Input Distributor, VoD Replacement and
Output) share multiple functions, such as receiving RTP packets and sending RTP packets.
These modules can also contain a customizable buffer. With this buffer, the module can store
a number of frames in local memory before beginning its sending functions. This operations can
minimize the problems introduced by network failures.
The RFC 4175 norm is used to send uncompressed video between modules.
33
Automation of Advertising Replacement
34
Chapter 5
Results and analysis
5.1 Test Methodologies
During the testing phase, a scenario was established composed of a client PC and a Host Cloud.
The PC is used to generate a MPEG-TS stream and send it to the Cloud using RTP. This is ac-
complished with the help of the command line interface of FFMPEG [FFM]. The Host Cloud is
composed by two identical physical servers.
By using Docker Engine on the Host, it is possible approximate the scenario to a virtualized
environment, where each container has no information about the physical location of any other.
The characteristics of the test machines are described in table 5.1.
Table 5.1: Technical Specification of the test machines
Client HostServer 1 Server 2
CPU Intel Core I7-4770@ 3.4GHz
Intel Core I7-4790S@ 3.2GHz
Intel Core I7-4790S@ 3.2GHz
RAM 16 GB 16 GB 16 GBNetwork Intel Ethernet I217-LM Intel X550 10 Gbit Intel X550 10 GbitOperating System Windows 10 Enterprise Windows Server Windows Server
In terms of network specifications, IGMP and multicast are active and the Maximum Trans-
mission Unit is 1500 bytes.
The characteristics of the video utilized during the tests is described in Table 5.2. Video #1
was used as the Input Distributor feed while Video #2 was streamed by VOD Replacement. Both
videos were played on loop.
To balance the load of the two available servers, the distribution of the containers was made
according to Figure 5.1. Figure 5.1 also shows the flow of data between each component of the
application, including the client.
35
Results and analysis
Table 5.2: Video Specifications
Video #1 Video #2File Size 550 MB 2.19 GBFile Format MPEG-TS MXFDuration 39s 5m 56sColor Space 4:2:2 4:2:2Color Depth 8 bits 8 bitsScan method Progressive ProgressiveResolution 1280x720 1280x720Frame rate 50 fps 50 fps
5.2 Results
In order to analyze the performance of the developed prototype, some metrics were monitored and
analyzed, such as:
• Frame Rate
• Bandwidth
• RAM usage
• CPU usage
To test the flexibility of the application and the quality and format of the resulting video, a
last test was made where the input stream of an application instantiation was originated from the
output stream of another application instance.
5.2.1 Frame Rate
Table 5.3 shows the average frame rate in each module. Since the original video has a frame rate
of 50 frames per second, the application should maintain this frame rate in all its video processing
modules (Input Distributor, VOD Replacement, Video Switcher and Output). A below expected
frame rate can mean a lack of processing power, which means that the inbound traffic is higher
than the outbound, leading to video errors and memory leaks.
Table 5.3: Frame Rate in each module
Frame RateInput Distributor 50 FPSVOD Replacement 50 FPSVideo Switcher 50 FPSOutput 50 FPS
36
Results and analysis
Figure 5.1: Application deploy scenario and data flow
5.2.2 Bandwidth
In Table 5.4 is detailed the bandwidth used by each module, divided by inbound and outbound
traffic. Since the video feed sent from the Client to the Input Distributor is in the form of com-
pressed video, the inbound traffic in the module is not constant, varying based on the detail of each
frame and consequent rate of compression.
Table 5.4: Average Inbound and Outbound traffic in each module
Module Inbound OutboundInput Distributor 14 Mbps 771 MbpsVOD Replacement - 771 MbpsVideo Switcher 1.5 Gbps 771 MbpsOutput 771 Mpbs 14 MbpsBusiness Logic - -
5.2.3 RAM Usage
The RAM consumed by each module can be observed in Table 5.5. For this metric, an additional
test was performed. The scenario of this second test is similar to the first one, with only different
being in the fact that each module has a frame buffer of 50 frames (1 second). This means that
before sending data, the module stores a full second worth of video.
It’s important to mention that, in situations where the processing power available is insufficient
to fulfill the video processing requirements, the outbound traffic will be lower than the inbound,
causing the memory buffer to increase, leading to a possible memory leak.
37
Results and analysis
Table 5.5: Average Module RAM Usage
Module Test #1 Test #2Input Distributor 40.3 MB 127.2 MBVOD Replacement 37,7 MB 125.8 MBVideo Switcher 73.6 MB 161.1 MBOutput 249,1 MB 332.3 MBBusiness Logic 2.3 MB 2.3 MB
5.2.4 CPU Usage
Table 5.6 shows the CPU usage of each module. As expected, the most demanding containers are
the Input Distributor, VOD Replacement and Output, followed by the Video Switcher, since video
processing operations require a large amount of computing power.
Table 5.6: Average Module CPU Usage
Module Test #1Input Distributor 7.1%VOD Replacement 4.6%Video Switcher 6.2%Output 18.2%Business Logic 0.2%
5.2.5 Cascade Scenario
The goal of this test is to prove that the output of the application (MPEG-TS stream generated by
the Output module) can be used as an input to another instance of the same application (to be read
by the Input Distributor module).
Figure 5.2 shows the scenario instantiated for the final test. As the main goal of the test was to
verify input and output streams of the application, only an Input Distributor and an Output module
were deployed for each application.
The result was a success, with the second instance of the application being capable of reading
and outputting the original stream originated from the first application.
The network throughput was analyzed in order to find differences between inbound and out-
bound traffic, but the results were similar to Figure 5.4, as expected.
5.2.6 Results
By analyzing the frame rate and bandwidth, it is possible to conclude that there are no losses in
information during each of the modules. All modules are capable of running at 50 frames per
second, the rate of the original video. In the modules where no change to the video format was
made, the bandwidth was constant and the expected for an uncompressed video.
38
Results and analysis
Figure 5.2: Cascade Test Scenario
In terms of memory, all modules where capable of maintaining a constant, low RAM usage.
The Output module experienced a higher usage rate compared to the other modules, which sug-
gests that it can be optimized.
When comparing processing power needed, the results were very similar to the RAM usage
test. The Output module performed the worst, depleting almost 20% of the CPU provided. All
other modules had a low CPU usage.
The final result of the application (MPEG-TS stream generated by the Output module) was
compatible with the Input Distributor, proving that multiple instances of the same application can
be deployed in sequence, with the output stream of the first serving as an input stream for the
second.
39
Results and analysis
40
Chapter 6
Conclusions and Future Work
By automating the process of content substitution, it is possible to provide the end-user with a
real-time, personalized content without the need to reduce the range of the broadcast.
With the use of cloud services, it is possible to reduce costs in broadcast operations, using
software for video processing, such as transcoding and video switching.
This dissertation proposed an architecture and developed a prototype for a distributed appli-
cation capable of identifying and replacing advertising segments in a livestream, providing the
end-user with a personalized, more effective advertising, without changing the actual content of
the broadcast. It also suggested other use cases in which this application can be used, due to its
high modularity.
6.1 Fulfillment of Goals
The main goals proposed for this dissertation where achieved. The proposed architecture allows
the development of other applications for TV production by adding new modules with new func-
tions.
The developed prototype was successful in testing the proposed architecture, proving that it is
possible to use the cloud for television operations, such as advertisement substitution, maintaining
the quality of the original broadcast, while providing a better advertising experience and providing
the content in real-time.
6.2 Future Work
The proposed architecture provides a starting point to future experimentation based on IP TV,
while being flexible enough to allow other applications to be developed using an extended ver-
sion of the architecture. This can lead to the transition of television production operations from
hardware to software.
41
Conclusions and Future Work
The Output module can be optimized to reduce the computing load used during the decoding
operation, reducing its computing load to values similar to the other modules.
The time expended in frame processing can also be reduced, providing the means necessary to
process higher resolution videos.
A new module can be developed to allow each node to register to the system and discover other
nodes, creating an environment where all modules can communicate with each other without prior
information regarding their locations.
Since all modules are independent, it is possible to develop new applications with various
use cases, using the developed modules as a starting point. It is also possible to expand on the
prototype, creating new modules, adapting to the complexity of the system.
42
References
[Anv] Anvato. http://www.anvato.com.
[Ass16a] A. M. W. Association. Networked media open specification, 2016. https://www.github.com/AMWA-TV/nmos.
[Ass16b] A. M. W. Association. “is-04: Nmos discovery and registration, 2016. http://www.amwa.tv/projects/IS-04.shtml.
[Bar03] Paul Barham. Xen and the art of virtualization. ACM SIGOPS Operating SystemsReview, 37(5), 2003.
[CA08] David Conejero and Xavier Anguera. Tv advertisements detection and cluster-ing based on acoustic information. In Masoud Mohammadian, editor, CIMCA/I-AWTIC/ISE, pages 452–457. IEEE Computer Society, 2008.
[Cha08] Charu Chaubal. The architecture of vmware esxi. VMware White Paper, 1(7), 2008.
[Cia09] P. Cianci. Technology and workflows for multiple channel content distribution: In-frastructure implementation strategies for converged production, 2009.
[Com08] T. Committee. Ieee std 1588-2008, ieee standard for a precision clock synchronizationprotocol for networked measurement and. Control Systems, 2008.
[Doc] Docker. https://www.docker.com/.
[ETS07] T.S. ETSI. Transport of mpeg-2 ts based dvb services over ip based networks, 2007.
[FFM] FFMPEG. https://ffmpeg.org/.
[Gol] Michael Goldman. What’s the best ip video path forward? https://www.smpte.org/publications/past-issues/January-2015.
[GP05] Dr. Ladan Gharai and Charles E. Perkins. RTP Payload Format for UncompressedVideo. RFC 4175, September 2005.
[Kiv07] Avi Kivity. kvm: the linux virtual machine monitor. In Proceedings of the Linuxsymposium, volume 1, 2007.
[Kov13] A. Kovalick. The fundamentals of the all-it media facility,. In SMPTE 2013 AnnualTechnical Conference & Exhibition, page 1–14. SMPTE, 2013.
[Laa12] M. Laabs. Sdi over ip—seamless signal switching in smpte 2022-6 and a novel mul-ticast routing concept„ 2012.
43
REFERENCES
[Mag] Audible Magic. http://www.audiblemagic.com.
[Mar] IHS Markit. Global advertising trends in 2016: A snapshot. https://technology.ihs.com/586624/global-advertising-trends-in-2016-a-snapshot.
[MG11] P M Mell and T Grance. The NIST definition of cloud computing. National Instituteof Standards and Technology, 2011.
[Mpe00] Mpeg-2. Generic coding of moving pictures and associated audio information – part1: Systems. ISO/IEC, 13818(1), 2000.
[oNM15] Joint Task Force on Networked Media. European broadcasting union, society of mo-tion picture and television engineers, and video services forum, “joint task force onnetworked media - reference architecture v1.0”, 2015.
[PBSL13] Diego Perez-Botero, Jakub Szefer, , and Ruby B. Lee. Characterizing hypervisorvulnerabilities in cloud computing servers. In Proceedings of the 2013 internationalworkshop on Security in cloud computing. ACM, 2013.
[Pri] Adobe Primetime. http://www.adobe.com/marketing-cloud/primetime.html.
[RCL09] Bhaskar Prasad Rimal, Eunmi Choi, and Ian Lumb. A taxonomy and survey of cloudcomputing systems. In Proceedings of the 2009 Fifth International Joint Conferenceon INC, IMS and IDC, NCM ’09, pages 44–51, Washington, DC, USA, 2009. IEEEComputer Society.
[Ser] Automatic Content Recognition Cloud Services. http://www.acrcloud.com.
[S.M12] S.M.P.T.E. St 2022-6: Transport of high bit rate media signals over ip networks(hbrmt), 2012.
[Sri09] T Sridhar. Cloud computing—a primer part 1: Models and technologies. The InternetProtocol Journal, 12(3):2–19, 2009.
[Tec] Intelligent Video Technologies. http://www.ivitec.com.
[VV09] Anthony Velte, , and Toby Velte. Microsoft virtualization with Hyper-V. McGrawHill,Inc, 2009.
[YBDS08] L. Youseff, M. Butrico, and D. Da Silva. Toward a unified ontology of cloud com-puting. In Grid Computing Environments Workshop, 2008. GCE ’08, pages 1–10,November 2008.
44