Pivotal + VMware: Transforming how more of the world builds software

Facebook and the limits of DIY distributed systems

“Yesterday, as a result of a server configuration change, many people had trouble accessing our apps and services.”

Facebook’s explanation of its 14-hour outage last week sounds simple enough, but very possibly belies an incredibly complex series of failures across its incredibly complex infrastructure that spans data centers across the world. Fourteen hours is an awfully long time for a company whose systems are more or less designed to maximize uptime, and that employs some of the smartest software engineers on the planet.

But Facebook is hardly alone in suffering lengthy outages caused by seemingly inconsequential things. Just about every large website, web company and cloud provider has been through the same thing, including AWS, Google, Microsoft and Apple. At their scale and with the complexity of their architectures—physical and software—all the automation and engineers in the world sometimes aren’t enough. One thing goes wrong, and it cascades.

This is one of the reasons why some people have a difficult time understanding, or at least accepting, the rush toward microservices architectures and all things Kubernetes. As the saying goes, “Shit happens.” When it does, it’s probably easier to debug a relatively simple monolith than to track down the cause across a collection of interconnected microservices running on ever-changing infrastructure.

That being said, when a company’s software footprint, user count and ambitions reach a certain scale—things that are almost certainly true for any large enterprise—microservices (done right) are almost certainly the right option for bringing order and agility to its IT organization. Depending on its application portfolio, Kubernetes might be, too. Companies like Facebook and Google don’t operate globally distributed systems and build the tools they build because they want to; they do it because they have to.

Of course, there are also business benefits to these types of architectures when they’re done well. Google’s just-announced streaming gaming service is perhaps an extreme example, but the software engineering culture and technologies the company has put in place do help it jump into new digital opportunities when it sees an opportunity.

However, the trick for most mainstream enterprises is taking advantage of the architectural lessons large web companies have taught the world (and the software they’ve developed) without taking on their do-it-yourself and/or not-built-here attitudes. Finding the budget, the people and, frankly, the institutional DNA to tackle every part of enterprise IT is hard work (thus the upcoming PagerDuty IPO). For example, standing up a Kubernetes cluster might be easy enough; operating it and all the complementary components at any reasonable scale, security level, etc., can prove to be a different story.

That’s why there’s a raging debate over open source licensing happening right now, but the gist of the argument is who has the right to serve enterprise customers with commercial versions of popular projects.

The great message of Amazon CTO Werner Vogels in the early days of cloud computing was that companies shouldn’t invest in “undifferentiated heavy lifting,” by which he meant managing data centers and provisioning servers. The message seems to have resonated (if the success of AWS and its peers is any indicator), only now that heavy lifting has shifted to operating complex data center software and application architectures. Technologies like Kubernetes (or Hadoop or OpenStack before that) might not cost anything to install, but that’s where the free lunch ends.

Perhaps the rash of recent outages at webscale services, including Facebook, will be a useful reminder for enterprises to not fall into that old trap.


How CEOs can drive digital innovation without learning to code

You can't take security too seriously

Norsk Hydro ransomware attack is ‘severe’ but all too common (Bloomberg): These stories should be terrifying, and hopefully inspiring, as well. As in, they should inspire companies to get proactive on automating security tasks where possible, and probably on training staff to be safe.

Marriott CEO shares post-mortem on last year’s hack (ZDNet): There’s a lot of blame to go around when talking about a years-long infiltration, but the major takeaway is to be aggressive about identifying breaches and closing avenues of attack.

Box.com’s good ambitions take a wrong turn (ITPro Today): No evidence of an actual breach, but this is still a good example of how simple things (like access settings and file names) can expose sensitive data.

Applications are the future of AI ...

HR departments turn to AI-enabled recruiting in race for talent (Wall Street Journal): This is still not a great idea for a lot of roles, but the desire for automation is understandable. And where things can be sorted by features, there will be machine learning.

Salesforce update brings AI and Quip to customer chat experience (TechCrunch):  This speaks to the broader point about not doing it yourself if you don’t need to. If CRM is an important function, let Salesforce invest in building AI capabilities in that area.

Cisco taps AI to make conference calls more productive (Bloomberg): This particular idea might not set the world on fire, but it’s a step in the right direction because there definitely are ways to make conference calls more productive with AI.

... because building internal AI teams is hard

Artificial intelligence progress gets gummed up in silos and cultural issues (ZDNet): Let’s just call this the story with everything in technology for the past couple of decades. Cloud, AI, microservices, data—they’re all limited in effectiveness by poor culture and walled-off data.

The AI roles some companies forget to fill (Harvard Business Review): This is a good list, largely focused around managing data and data systems. “Big data” might not have panned out as planned, but its lessons around getting data in order should be heeded.

Cleveland Clinic launches the Center for Clinical Artificial Intelligence (Business Insider): The big word of caution on projects like this is jumping into “centers of excellence” without having the tech side sorted out. That can lead to good ideas dying on the vine.


Cloud complexity report (Dynatrace)

The 451 take on cloud-native: truly transformative for enterprise IT (451 Group) (no registration required)

Accelerate State of DevOps survey (DORA / Google)

8 reasons more CEOs will be fired over cybersecurity incidents (Gartner) 

Five questions to answer for your first AI project (Gartner)

Toolkit: Strategic industry maps of AI use cases (Gartner)

Scaling agile: Can the Spotify approach work for you? (Forrester)

Assess the pain-gain tradeoff of multicloud strategies (Forrester)

Subscribe Now

Thank you!

Follow us on Twitter Join the conversation on LinkedIn Like us on Facebook Visit our YouTube channel



Pivotal, and the Pivotal logo are registered trademarks or trademarks of Pivotal Software, Inc. in the United States and other countries. All other trademarks used herein are the property of their respective owners.
© 2019 Pivotal Software, Inc. All rights reserved. 875 Howard Street, Fifth Floor, San Francisco, CA 94103. Published in the USA.
Contact us