出版時(shí)間:2011-1 出版社:東南大學(xué)出版社 作者:(美)阿爾斯帕瓦,(美)羅賓斯 著 頁(yè)數(shù):315
Tag標(biāo)簽:無(wú)
內(nèi)容概要
網(wǎng)絡(luò)應(yīng)用牽涉到很多專業(yè)人土,而網(wǎng)站運(yùn)維人員必須確保應(yīng)用的每一部分在其整個(gè)生命周期中都能正常工作。當(dāng)初創(chuàng)公司遭遇了未曾預(yù)期的訪問(wèn)流量尖峰,或者當(dāng)某個(gè)新特性導(dǎo)致成熟應(yīng)用失效時(shí),你就需要這樣的專業(yè)知識(shí)。在這部文章和訪談集中,網(wǎng)站運(yùn)維老手theo
schlossnagle、baron schwartz和alistair
croll向這個(gè)日新月異的領(lǐng)域提供了他們的真知灼見(jiàn)。你還將學(xué)到如何使網(wǎng)站蓬勃發(fā)展的秘訣,這是來(lái)自·最大規(guī)模網(wǎng)站建?者的第一手資料。
·學(xué)習(xí)網(wǎng)站運(yùn)維技能,了解這些技巧來(lái)自于經(jīng)驗(yàn)而非學(xué)校教育的原因
·理解為何從應(yīng)用程序和基礎(chǔ)設(shè)施收集統(tǒng)計(jì)數(shù)據(jù)都很重要
·為數(shù)據(jù)庫(kù)架構(gòu)和規(guī)模日益增長(zhǎng)帶來(lái)的隱患考慮通用的處理方法
·學(xué)習(xí)如何處理宕機(jī)和降級(jí)相關(guān)的人為因素
·找到在蜂擁而至的巨大流量后避免災(zāi)難的方法
·問(wèn)題發(fā)生后了解癥結(jié)所在,防止其再次發(fā)生
作者簡(jiǎn)介
作者:(美國(guó))阿爾斯帕瓦(John Allspaw) (美國(guó))羅賓斯(Jesse Robbins)
書籍目錄
foreword
preface
1 web operations: the career
theo schlossnagle
why does web operations have it tough?
from apprentice to master
conclusion
2 how picnik uses cloud computing: lessons learned
justin huff
where the cloud fits (and why!)
where the cloud doesn't fit (for picnik)
conclusion
3 infrastructure and application metrics
john aiispaw, with matt massie
time resolution and retention concerns
locality of metrics collection and storage
layers of metrics
providing context for anomaly detection and alerts
log lines are metrics, too
correlation with change management and incident timelines
making metrics available to your alerting mechanisms
using metrics to guide load-feedback mechanisms
a metrics collection system, illustrated: ganglia
conclusion
4 continuous deployment
eric ries
small batches mean faster feedback
small batches mean problems are instantly localized
small batches reduce risk
small batches reduce overhead
the quality defenders' lament
getting started
continuous deployment is for mission-critical
applications
conclusion
5 infrastructure as code
adam jacob
service-oriented architecture
conclusion
6 monitoring
patrick debois
story: "the start of a journey"
step 1: understand what you are monitoring
step 2: understand normal behavior
step 3: be prepared and learn
conclusion
7 how complex systems fail
john aiispaw and richard cook
how complex systems fail
further reading
8 community management and web operations
heather champ and john aiispaw
9 dealing with unexpected traffic spikes
brian moon
how it all started
alarms abound
putting out the fire
surviving the weekend
preparing for the future
cdn to the rescue
proxy servers
?corralling the stampede
streamlining the codebase
how do we know it works?
the real test
lessons learned
improvements since then
10 dev and cps collaboration and cooperation
paul hammond
deployment
shared, open infrastructure
trust
on-call developers
avoiding blame
conclusion
11 how your visitors feel: user-facing metrics
alistair croll and sean power
why collect user-facing metrics?
what makes a site slow?
measuring delay
building an sla
visitor outcomes: analytics
other metrics marketing cares about
how user experience affects web cps
the future of web monitoring
conclusion
12 relational database strategy and tactics for the web
baron schwartz
requirements for web databases
how typical web databases grow
the yearning for a cluster
database strategy
database tactics
conclusion
13 how to make failure beautiful: the art and science of
postmortems
jake loomis
the worst postmortem
what is a postmortem?
when to conduct a postmortem
who to invite to a postmortem
running a postmortem
postmortem follow-up
conclusion
14 storage
anoop nagwani
data asset inventory
data protection
capacity planning
storage sizing
operations
conclusion
15 nonrelational databases
eric florenzano
nosql database overview
some systems in detail
conclusion
16 agile infrastructure
andrew clay sharer
agile infrastructure
so, what's the problem?
communities of interest and practice
trading zones and apologies
conclusion
17 things that go bump in the night (and how to sleep through
them)
mike christian
definitions
how many 9s?
impact duration versus incident duration
datacenter footprint
gradual failures
trust nobody
failover testing
monitoring and history of patterns
getting a good night's sleep
contributors
index
章節(jié)摘錄
版權(quán)頁(yè):插圖:capacity planning needs, the daily resolution is fine. Adding higher resolution morethan once per day wouldn't change any of the results and would only increase theamount of time it would take to run reports or make it a pain to move the dataaround. Gathering these metrics once a day can be as simple as a nightly cron jobworking on a replicated slave database kept solely for crunching these numbers.Because we store these metrics in a database, being able to manipulate or correlatedata across different metrics is pretty straightforward, because the date is held constantacross metrics.For example, it might not be a surprise that during the holiday season, the average sizeof photo uploads increases significantly compared to the rest of the year, because of'the new digital cameras being given as gifts during that time. Because we have thosevalues, we can lay out others on the same dates. Then, it's not difficult to see howaverage upload size can increase disk space consumption (because the original sizes arelarger), which can increase Flickr Pro subscriptions (because the limits are extended,compared to free accounts).
圖書封面
圖書標(biāo)簽Tags
無(wú)
評(píng)論、評(píng)分、閱讀與下載