SRE实战

本书特色

[

  《SRE实战(影印版 英文版)》是软件开发人员在网站灾难性故障中的生存指南。随着企业力求实现正常运行时间的大化,站点可靠性工程(Site Reli ability Engineering,SRE)首当其冲。当你的站点出现问题,修复故障已经迫在眉睫的时候,《SRE实战(影印版 英文版)》可以作为一个手把手的操作指南。
  Nat Welch在可靠性工程方面丰富的实战经验源自于某些对于系统中断事件极为敏感的互联网大公司。他用于监控现代Web服务、设置警报和评估事件响应的方法都经过了实践的考验,学会这些必将助你一臂之力。
  《SRE实战(影印版 英文版)》可不仅仅是教你如何应对灾难,它还为你揭示了安全测试和发布软件所需的工具和策略、长期增长计划以及预见未来的瓶颈。通过《SRE实战(影印版 英文版)》,你将学会如何制定自己的强健行动计划,以便在全公司的网站危机中凸显你的价值。

]

目录

PrefaceChapter 1: IntroductionA brief historyWhat is SRE?What is in the book?SRE as a framework for new projectsSummaryReferencesChapter 2: MonitoringWhy monitoring?Instrumenting an applicationWhat should we measure?A short introduction to SLIs, SLOs, and error budgetsService levelsError budgetsCollecting and saving monitoring dataPolling applicationsNagiosPrometheusCactiSensuPush applicationsStatsDTelegrafELKDisplaying monitoring informationArbitrary queriesGraphsDashboardsChatbotsManaging and maintaining monitoring dataCommunicating about monitoringDo they even know there is monitoring?References and related readingFuture readingSummaryChapter 3: Incident ResponseWhat is an incident?What is incident response?AlertingWhen do you alert?How do you alert?Alerting servicesWhat is in an alert?Who do you alert?Being on callCommunicationIncident Command System (ICS)Where do you communicate?Recovering the systemCalling all clearSummaryChapter 4: PostmortemsWhat is a postmortem?Why write a postmortem?When to write a postmortem documentCarrying out incident analysisHow to write a postmortem documentSummaryImpactTimelineRoot causeAction itemsPostmortems without action itemsAppendixBlameless postmortemsHolding a postmortem meetingAnalyzing past postmortemsMTFR and MTBFAlert fatigueDiscussing past outagesSummaryReferencesChapter 5: Testing_and Releasing_TestingWhat do you test?Testing codeTesting infrastructureTesting processesReleasingWhen to releaseReleasing to productionValidating your releaseRollbacksAutomationContinuous everythingSummaryChapter 6: Capacity PlanningA quick introduction to business financeWhy plan?Managing risk and managing expectationsDefining a planWhat is our current capacity?When are we going to run out of capacity?How should we change our capacity?State and concurrencyIs your service limited by another service?Scaling for eventsUnpredictable growth-user-generated contentPreplanned versus autoscalingDeliveringExecute the planArchitecture——where performance changes come fromTech as a profit center and procurementSummaryChapter 7: Building ToolsFinding projectsDefining projectsRDDExampleDesign documentsPlanning projectsExampleRetrospectives and standupsAllocationBuilding projectsAdvice for writing codeSeparation of concernsLong-term workExample OKRsNotebooksDocumenting and maintaining projectsSummaryChapter 8: User ExperienceAn introduction to design and UXReal-world interaction designUser testingPicking an experienceDesigning the testFinding people to testDeveloper experienceExperience of toolsPerformance budgetsSecurityAuthenticationAuthorizationRisk profilePhishingACM code of ethicsSummaryReferencesChapter 9: Networking FoundationsThe internetSending an HTTP requestDNSdigEthernet and TCP/IPEthernetIPCIDR notationICMPUDPTCPHTTPcurl and wgetTools for watching the networknetstatnctcpdumpSummaryChapter 10: Linux and Cloud FoundationsLinux fundamentalsEverything is a fileFiles, directories, and inodesSocketsDevices/procFilesystem layoutWhat is a process?ZombiesOrphansWhat is nice?syscallsHow to traceWatching processesBuild your ownCloud fundamentalsVMsContainersLoad balancingAutoscalingStorageQueues and Pub/SubUnits of scaleExample architecture interviewSummaryReferencesOther Books You May EnjoyIndex

封面

SRE实战

书名:SRE实战

作者:Nat Welch著

页数:10,323页

定价:¥96.0

出版社:东南大学出版社

出版日期:2019-03-01

ISBN:9787564182939

PDF电子书大小:124MB 高清扫描完整版

百度云下载:http://www.chendianrong.com/pdf

发表评论

邮箱地址不会被公开。 必填项已用*标注