Upload
imasters
View
74
Download
1
Embed Size (px)
Citation preview
October 2016
First 90SLA vs. AgileMicroservices and cloud monitoring
Why this talk?
This is our visionBuilding the foundation to Build a 3B Company by FY20
Agenda1 . “Old World”: MercadoLivre’s original architecture.
2 . “Ground Zero”: shifting to microservices on the cloud
3 . Monitoring the cloud
4. Alarms: when things go south
5. “Fury”: streamlining DevOps at MercadoLivre
In numbers
+400 deploys/dayOn +650 APPS
+1000 DevelopersIn 8 development centers
+10 programming languages
In numbers
+25.000.000Request per minute
+22.000 VM’sIn 7 data centers
+700 DB’sIn 4 different engines
OldWorld
Old world architecture
User ml.jarHuge DB
This is our visionBuilding the foundation to Build a 3B Company by FY20
Old world properties
● Monolithic
● Highly coupled code
● Unified SVN repository
● Single DB
● Simple infrastructure with little overhead
● Single QA team
● Closed system
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
Some people, anytime
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
Some people, anytime
Some people, once a week
This is our visionBuilding the foundation to Build a 3B Company by FY20
Deployments as ML grew
Anyone at anytime
Some people, anytime
Some people, once a week
Only by all experts together, at 3 AM, on thursdays not covered by any “freeze”
GroundZero
Shifting to microservices
Frontend
API
Frontend CRMMobile apps
3rd party devsAPI API
This is our visionBuilding the foundation to Build a 3B Company by FY20
Ground zero properties
● Multiple technologies and frameworks (dev’s choice)
● Completely decoupled code in multiple Github repositories
● One DB for each app, multiple engines
● Complex infrastructure with possible high overhead
● QA, testing and Continuous Integrations is done by each team
● Independent deployments, environments and policies
● Open platform
“With great power comes great responsibility”.
Stan Lee
This is our visionBuilding the foundation to Build a 3B Company by FY20
Developer responsibilities● Developer gets ownership of entire dev cycle
● Massive empowerment of dev team -> OWNERSHIP
Manage resourcesVMs
Choose support systems required and create them
DevelopCodeChoose your technology and keep your Github repository
Test
Create tests, regressions or CI as needed
Ensure qualityDefine uptime
Define what “up” means for your own app (health.sh)
Measure
Create metrics to analyze performance and downtime
DBs and services
NetworkingCreate rules and loadbalancers to route traffic to application
Create & scale computing pools for dev/test/prod
React
Deploy
Write all routines for automatically deploying your app on any VM React to critical events
that affect your app
DevTools in ML
Developer
Melicloud API
- Create apps- Manage pools (test/prod)- Manage VMs & loadbalancers- Build & deploy
- Create queues- Create DBaaS or KVSaaS- Create caches
Github repo- Code app- Write test & deploy strategy- Write uptime definitions
Nginx
eventRouting & OpsGenie
- Write rules to route traffic to your pools
- Write rules to manage alarms- Define alarm escalation policies & schedules- Manage contact channels
Microservices in ML
Mobile apps
Module
Test app
CI
Main appAutomated build & store deployment
Repo
Team
Module
Test app
CI
Repo
Team
Module
Test app
CI
Repo
Team
Monitoring mobile apps
Module
Main app
Team
Module
Module
Crash reporting
Team
Team
Monitoring the cloud
This is our visionBuilding the foundation to Build a 3B Company by FY20
New Relic● Default monitoring in VMs golden image
● No configuration necessary (initially)
HTTP errorsUnhandled errors
See if other devs/clients misuse your entry params
Stack tracesFast debugging
See what’s going on in production
Unified pool data
All instances’ traces in the same place
Performance metricsTransaction traces
See what’s taking so long
Recognize deviations
Graphs to see if traffic or response time vary w/ respect to another period
Unsupported params
Other services
Detect down services affecting you
Unexpected issues appear in production
Apdex Score
This is our visionBuilding the foundation to Build a 3B Company by FY20
Datadog● Easy to use for different frameworks
● Good for business specific metrics
Custom metricsComplex metrics
Graphs filtered with different dimensions
Infra monitoringFull info
More data than NR on disk, memory, network
Scalable
Handles well aggregating information from many different VMs
Real time analysisFast response
Almost no latency
Dashboards
Customizable dashboards to show what’s more relevant for each app
Online filtering
Alarms
Flexible alarms based on custom metrics
You can send multiple parameters for events
This is our visionBuilding the foundation to Build a 3B Company by FY20
Log collection
● Logs are collected by an agent on all VMs
● They are sent to an ElasticSearch
● Access via a Kibana frontend
● Developers can use special syntax to create queryable
dimensions for all logged events
● All instances’ logs in the same place
● Request tracing through multiple applications/APIs
(request_id)
Alarms
Unified handling of events
health.sh
Code triggered alarms
eventRouting
This is our visionBuilding the foundation to Build a 3B Company by FY20
Event routing
● Rules added by each team
● Check alarm origin, type and importance
● Check “quiet hours”
● Assign escalation policy and forward to OpsGenie
This is our visionBuilding the foundation to Build a 3B Company by FY20
OpsGenie
● Manage teams to deal with escalation policies
● Set “on call” schedules (w/substitutes & manager escalation)
● Everyone manages his contact methods (SMS, mail, phone call, app)
Fury
This is our visionBuilding the foundation to Build a 3B Company by FY20
Evolution
Old world Ground zero Fury
This is our visionBuilding the foundation to Build a 3B Company by FY20
Fury: DevOps to NoOps
● Still microservices
● Full service oriented
● Easier dev cycle and learning curve
● Pre-assembled flavors for popular frameworks
● Less bash scripts, more UI based configuration
● Auto-scaling & auto-healing
● Docker based (smaller dev/prod environment gap)
● Designed to run on AWS
● Continuous integration already included
This is our visionBuilding the foundation to Build a 3B Company by FY20
Fury dashboard
This is our visionBuilding the foundation to Build a 3B Company by FY20
Dev Cycle in Fury: create app
● Creates repository
● Creates Jenkins CI server
● Creates network infra
This is our visionBuilding the foundation to Build a 3B Company by FY20
Dev Cycle in Fury: create scope
● Creates load balancer (ELB)
● Creates auto scaling group (ASG) for scope instances
● Creates instances
● Initialize logs & metrics services
● Download containers to instances
● Start traffic
This is our visionBuilding the foundation to Build a 3B Company by FY20
Dev Cycle in Fury: deploy
● Creates ASG for new version
● Create instances for new ASG
● Initialize logs & metrics services
● Download containers to instances
● Progressive traffic switch
● If candidate is OK, destroy
previous infrastructure
?
Thankyou!