Singularity @ Globo.com HackDay 2014-12-02

Preview:

Citation preview

SingularityAmbiente de Computação Interativa para Big Data baseado no Spark e IPython.

Ciro CavaniPersonalização

Globo.com

HackDay 02/12/2014

MotivaçãoA tecnologia necessária para mudar como a Globo.com faz negócio está em produção.

Hadoop2, Kafka e Spark.

A ideia é orientar a Globo para tomar decisões baseada em dados.

Proposta● ter acesso a todos os dados da empresa● rodar algoritmos de machine learning● identificar informações relevantes● formular hipóteses e explorar os dados● formular experimentos, testes AB● um sistema interativo

HadoopHadoop2 é dois sistemas:● HDFS, sistema de

arquivos distribuído;● YARN, sistema de

execução distribuído.

HBase, Pig, Mahout, Solr

imagem: http://hortonworks.com/hadoop/yarn/

KafkaCluster de distribuição de mensagens (bilhões por dia) criado pelo LinkedIn.

Performance - alto throughput

Escalabilidade - muitos consumidores

Mensagens pequenas, não estruturadas / opacas (bytes)

imagem: http://hortonworks.com/hadoop/kafka/

SparkA fast and general-purpose cluster computing system.

High-level APIs in PythonSpark SQL for SQL and structured data processingMLlib for machine learningGraphX for graph processingSpark Streaming for stream processing

http://spark.apache.org/

IPython Notebookweb-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.

Wolfram Language (inspiração)

http://youtu.be/_P9HqHVPeik

Stephen Wolfram introduces the Wolfram Language in this video that shows how the symbolic programming language enables powerful functional programming, querying of large databases, flexible interactivity, easy deployment, and much, much more.

Databricks Cloud (inspiração)

http://youtu.be/dJQ5lV5Tldw

The Databricks Cloud provides the full power of Spark to you, in the cloud, plus a powerful set of features for exploring and visualization your data, as well as writing and deploying production data products.

* Visualize data right as you explore it* Collaborate in real-time* Export your analysis to production dashboards in seconds

Jupyter e Julia (futuro)

http://youtu.be/jhlVHoeB05A

This talk will begin with an introduction to the Julia language, both explaining why it is able to attain C-like performance in many cases. (...) we will explain how connecting to the IPython "Jupyter" front-end from an IJulia back-end allows Julia to benefit from IPython's rich multimedia notebook interface, and how Julia can even use IPython 2's interactive-widget infrastructure to provide truly interactive computations.https://github.com/stevengj/Julia-EuroSciPy14

Globo.comGostou?

Quer Trabalhar na Globo.com?Estamos Contratando

https://github.com/globocom/IWantToWorkAtGloboCom

ciro.cavani@corp.globo.comhttps://www.linkedin.com/in/cirocavani