I hope this won’t be counted as some form of self-promotion, even though I am sharing a post from my own blog.
As a tech worker who works in a Cloud shop, I wanted to elaborate the many reasons why I find working with Clouds terrible, from multiple points of view.
I tried to organize my thoughts in a (relatively long) post, in which both technical aspects and political aspects (which are very related) are covered.
I am sure many people will have different perspectives, and this could be potentially also a nice prompt for a discussion.
How could I miss the opportunity to use this picture!
It definitely felt like that at times.
Haha, I couldn’t resist. Great post though.
I was really hoping this was going to be a rant about clouds in the sky.
As someone who burns with the faintest touch of stellar light, I love clouds in the sky.
Haha that’s great. Took me a second to notice.
So the whole thing is well worth a read IMO, and addresses a lot of the issues I have with cloud as the solution for everything.
My main point here is that individuals and organizations that require all the flexibility that cloud services offer are a (tiny) minority. This means that for the majority of us, all the complexity necessary to provide this flexibility ends up being purely a complication or worse, a liability.
There are absolutely companies who need the scaling. But it’s a fucking lot of overhead if you don’t.
Let’s repeat it one more time: complexity hides and creates security issues.
This is similar to all the LLM code stuff. If you don’t actually fully understand what your code does, bad stuff happens.
This premise has the consequence that Cloud systems are a big puzzle. The pieces of the puzzle are the Cloud products. Engineers working with Cloud systems essentially need to understand the abstraction but not necessarily the underlying, ultimate working mechanism of what those abstractions do. For example, a cloud expert might know everything about the difference between NACLs and Security Groups, all the details about how to configure them, their limitations etc., but the main idea is that such expert doesn’t need to know anything below that (e.g., how the traffic is filtered).
Ultimately my perspective, and I appreciate it’s a very personal one, is that building and working with the Cloud makes me feel like a glorified application administrator. My job becomes researching how the Cloud solved the problem that I need to solve, and compose the solution in the way the Cloud provider imagined it should be solved, rather than solving the problem
I was going to bring up basically this point:
because vendor-lock is not something that has only to do with infrastructure. It has also to do with the skills of the engineers involved. Cloud knowledge, for the most part, is not portable. You are a wizard of IAM policies in GCP? Good job, this is completely useless if you go to Azure. Oh, you are a guru of VPCs and private endpoints? Well done, this is completely useless if you move to a different cloud.
But you covered it pretty well. Abstractions are great. Proprietary abstractions that are more focused on how they can bill you than real, useful, functional categories? Not so much.
Despite the efforts means something which is ironic: many companies which run on Cloud, at some point, will have one or more teams whose main purpose is understanding how they are spending money in the Cloud and to reduce those costs. If this sounds conflicting with the idea of reducing personnel, well, it is. The digital infrastructure of my organization is not that huge. Give or take 2000 compute instances (some very small). Something that 200 servers could easily provide. Cloud bills are more than $15 millions/year. I checked a server builder for example, and an absolute beast (something like 2x Xeon platinum processor, 200TB of NVME disks, 1TB of RAM etc.) would still stay comfortably under $250k. 100 servers this powerful will probably be a multiple of our computing power, and cost almost a third if we consider a lifetime of 3 years, which is very low. A more realistic estimation of 5 years leads to a saving of ~$50 millions over 5 years. Completely insane! This is of course if you want to buy hardware. Powerful servers rented run you for $500-1000/month. Assuming a cost of $1000/month, my company could rent more than 1000 powerful servers, and still save money compared to Cloud costs, leaving plenty for additional services such as networking, storage, premium support (remote hands) or actual engineers salary
So there’s a level of rent seeking behind all the software moving to subscriptions, and them wanting to lock you in just like their service providers are doing to them. But I have to think the massive costs of cloud junk also pay a role in stuff like a calendar charging double digit annual fees for something that takes very little storage and very little computation (and you of course can’t just buy software any more).
I have no words for multi-cloud. Even like a Facebook or YouTube scale site, are you really going to double (or more for some reason?) your storage costs (plus whatever intercommunication between the two), just in case the provider goes down for a couple hours (which is extremely rare, and you won’t be the only site impacted, so people won’t really blame you for.) Plus that architecture sounds like the shitshow to end all shitshows.
Agreed on it all.
I think a big driver for cloud clients is bean counters - cloud is an expense, while having your own systems is capital investment.
They’d rather have the waste of leasing too much compute than have to pay taxes on systems plus the cost of staff to run it.
We won’t really see this get addressed until companies have to truly own the risks they take on (see all the hacks that happen on a daily basis because CIO won’t pay for the security that IT management is screaming to build). When fines for these breaches are meaningful, cloud will be less interesting.
Thanks!
But I have to think the massive costs of cloud junk also pay a role in stuff like a calendar charging double digit annual fees for something that takes very little storage and very little computation (and you of course can’t just buy software any more).
Absolutely agree. I did not even think about this aspect, but I think you are absolutely spot on. Building something with huge costs is something that ultimately gets passed to the users in addition to the rent-seeking aspect.
I have no words for multi-cloud.
You and me both. I have to work with it and the reality is, there is nobody who actually understands the whole thing. The level of complexity (and fragility, I might add) of it all is astonishing. And all of this to mitigate some (honestly) low risk of downtime from the cloud provider. I have lobbied a little bit against at work, but ultimately it has become a marketing tool to sell to customers, so goodbye any hope of rational evaluation…
It’s all shits and giggles until a network config takes down your cloud provider for 11 hours and you can’t even look at the logs. And multicloud is quite robust if done right, more so than a single cloud, if your setup is fragile someone is not doing their job right.
Complexity brings fragility. It’s not about doing the job right, is that “right” means having to deal with a level of complexity, a so high number of moving parts and configuration options, that the bar is set very high.
Also, I would argue that a large number of organizations don’t actually need the resilience that they pay a very high price for.
Complexity in this case should bring redundancy, not fragility. You are adding components in parallel, not in series, thus reducing fragility.
A raid 5 is more complex than a single drive, but it’s less fragile.
I wish it worked like that, but I donct think it does. Connecting clouds means introducing many complex problems. Data synchronization and avoiding split-brain scenarios, a network setup way more complex, stateful storage that needs to take into account all the quirks and peculiarities of all services across all clouds, service accounts and permissions that need to be granted and segregated for all of them, and way more. You may gain resilience in some areas, but you introduce a lot more things that can fail, be misconfigured or compromised.
Plus, a complex setup makes it harder by definition to identify SPOFs, especially considering it’s very likely nobody in the workforce is going to be an expert in all the clouds in use.
To keep using your simile of the disks, a single disk with a backup might be a better solution for many people, considering you otherwise might need a RAID controller that can fail and all the knowledge to handle and manage a RAID array properly, in addition to paying 4 or 5 times the storage. Obviously this is just to make a point, I don’t actually think that RAID 5 vs JBOD introduces comparable complexity compared to what multi-cloud architecture does to single-cloud.
Split brain are easily solved, there’s of the shelf solutions and if you have some custom code you can use plenty of well researched solutions, for instance raft. Putting bizantine fault in Google scholar yields thousands of papers,if you want something fancier.
Same for most problems you mentioned, they were an issue 10 years ago, nowadays you can federate, abstract or outsource most of it.
Making it harder to identify SPFOs doesn’t increase fragility. If you whole system a single instance it’s trivial to identify (the whole thing) but very brittle.
Of course the problem is solved, but that doesn’t mean that the solution is easy. Also, distributed protocols still need to work on top of a complicated network and with real-life constraints in terms of performances (to list a few). A bug, misconfiguration, oversight and you have a problem.
Just to make an example, I remember a Kafka cluster with 5 replicas completely shitting its pants for 6h to rebalance data during a planned maintenance where one node was brought offline. It caused one of the longest outages to date with the websites which relied on it offline. Was it our fault? Was it a misconfiguration? A bug? It doesn’t matter, it’s a complex system which was implemented and probably something was missed.
Technology is implemented by people, complexity increased the chances of mistakes, not sure this can be argued.
Making it harder to identify SPOF means you might miss your SPOF, and that means having liabilities, and having anyway scenarios where your system can crash, in addition for paying quite a lot to build a resilience that you don’t achieve.
A single instance with 2 failure scenarios (disk failure and network failure) - to make an example - is not more fragile than a distributed system with 20 failure scenarios. Failure scenarios and SPOF can have compensating controls and be mitigated successfully. A complex system where these can’t be fully identified can’t have compensating control and residual risk might be much harder. So yes, a single disk can fail more likely than 3 disks at once, but this doesn’t give the whole picture.
I’m sorry, but this started like a recipe article and I lost all interest. I don’t care about your life story, I clicked the link to read your opinions, and you spent the first several paragraphs avoiding them.
Nothing to be sorry for. I didn’t write for you nor for any particular individual, and it’s fair if you are not interested in it. I also added a table of content at the beginning, so you can jump directly to the relevant section (Technical Side) skipping the (in my opinion needed) introduction completely, if you wish. Cheers
Two brief paragraphs of light nonsense on a blog post, then a quick summary of what the article will cover?
Tell me you don’t read often without telling me you don’t read often:
The cloud is just someone else’s computer
With a lot of stuff on top!
I hate websites with low contrast text.
How do you get this? Anything that tries to force a light mode?
This is how the site is supposed to look like (there is no light/dark theme selection):
I was reading the site on Android, and it looked dark, but after seeing this comment, I tried disabling Android system wide dark mode, and sure enough it became white like in the screenshot! For the record, I tried with both Firefox and a Chromium-based browser.
Thanks! I went and tried on my phone and indeed setting Firefox to light mode indeed causes that horrendous and unreadable result. I will need to figure out way, eventually, and provide an alternative light scheme.
I get the same white background on Windows, Chromium and Firefox. Checking settings, I see FF is set to “Automatic” light/dark mode. When I manually select Dark mode, I see the dark background.
I will have a look if there is something that suggests how to “make” a light theme. Thanks for the info!
Thanks for the feedback, and same to @ilmagico@lemmy.world and @jg1i@lemmy.world. I fixed the configuration of the site and now the site should be readable even in light mode.
You’re welcome! And yes, I can confirm it works in light mode as well :)
That’s an interesting gotcha.
Yes, I hate cloud too. Now tell this to my company, which received about 100k dollar credits from Azure and Google Cloud :)
What do you mean by “promotion”? A discount? Credits to get started?
Yeah credits makes more sense 👍
Oh yeah, I know that that’s a thing. It’s a practice not too different from the stereotypical drug dealer who gets you hooked on free drugs. In this case the idea is that if you start there, you get vendor-locked and you will have to pay that amount many times over. I understand the appeal from the company perspective, though.
Yes absolutely true. For example, GKE looks very nice, but when we use one of their features, it creates the need to use other features too. That’s why I warned the boss a lot. Even though they have great features, we try to use generic applications to avoid hooks.
I hope they don’t take the credit back :)
How do u feel about cotton candy
Pink or Blue?
Pink of course
Anything that requires a fancy buzzword is usually stupid but a good way to make money for someone. The “cloud” has always existed as offsite hosting. Off-site shared servers, VPSs, whatever. It’s no different than running CPanel on an LAMP VPS in 2003.
But calling it “the cloud” gave all the business majors a hard on and then the accounts department realized they could manipulate share pricing by reducing the amount of assets a company holds. It’s the same stupid reason many companies don’t own their corporate headquarters or remote centers. They lease the, even if from themselves through another holding. It looks better on paper so the share price goes up. It’s all mind boggling stupid.
The cloud today significantly different than the 2003 cpanel LAMP server. It’s a whole new landscape. Complex, highly-available architectures that cannot be replicated in an on-prem environment are easily built from code in minutes on AWS.
Those capabilities come with a steep learning curve on how to operate them in a secure and effective manor, but that’s always going to be the case in this industry. The people that can grow and learn will.
I’m fully aware of the few buzzword and marketing pitches that cloud hosting uses. I’m forced to use both GCP and AWS for different contracts and I’m good at it.
The real truth is that most websites and internet services do not need scale. They do not need all this crap. A Pentium 3 could host all the data for most of these businesses and services. You don’t need serverless lambda functions to handle an api when an actual endpoint does the same thing to pull some info. The few companies that need such distributed computing and power, will need a big on-site or off-site implementation. It makes sense for that sometimes. But most times, it doesn’t even then. You’re just outsourcing your engineering and paying a premium.
I have seen so many startups spin up cloud accounts costing thousands of dollars a month when they’re in “private beta stealth”. Literally a $500 laptop could host all of their services just as quickly with no monthly fee. But as long as the VCs are paying, just flush that cash down.
The costs are definitely a huge consideration and need to be optimized. A few years back we ran a POC of Open Shift in AWS that seemed to idle at like $3k/mo with barely anything running at all. That was a bad experiment. I could compare that to our new VMWare bill, which more than doubled this year following the Broadcom acquisition.
The products in AWS simplify costs into an opex model unlike anything that exists on prem and eliminate costly and time consuming hardware replacements. We just put in new load balancers recently because our previous ones were going EoL. They were a special model that ran us a about a half-mil for a few HA pairs including the pro services for installation assistance. How long will it take us to hit that amount using ALBs in AWS? What is the cost of the months that it took us to select the hardware, order, wait 90 days for delivery, rack-power-connect, configure with pro services, load hundreds of certs, gather testers, and run cutover meetings? What about the time spent patching for vulnerabilities? In 5-7 years it’ll be the same thing all over again.
Now think about having to do all of the above for routers, switches, firewalls, VM infra, storage, HVAC, carrier circuits, power, fire suppression.
I’m immensely disappointed!
Not kidding: when I first saw the post title, I was fully convinced that I’ll read the post of a crazy person, rambling about (rain) clouds.
I am sorry! As an amateur landscape photographer I actually like very much those clouds. There are a few r-word posts about people hating those clouds though, but I checked and they are nowhere near as long as you would expect a proper rant to be
Yup, me too!
deleted by creator
Yep. My first move is to ask "could this just live in an ec2 box? Do we really need any of aws’ marketed custom options?
But then I would ask, what’s the point of paying 10-20x per computing unit at that point? If you just use ec2 instance, all AWS offers you is an API to manage them, is it worth the premium? Besides, you will still need to mess with a lot of other services (VPCs, SGs, etc.) anyways.
What’s the selling point in your opinion?
Well I would have more questions, like why AWS at all.
But for some, cognito auth management is important, to align with other product goals.
cognito auth
But then at that point you are already vendor-locked, right? At that point, running on bare ec2 instances and taking more control in your hands (vs using even more AWS-specific services) is going to help very little, when your whole user management is now tied to a specific provider.
The concerns of product auth and isolated ec2 driven work are two separate conversations.
If there is zero contact with AWS services (and ad you say, locks) then I would keep asking questions about why AWS is a good choice at all.
I used to love ‘the cloud’. Rather, a specific slice of it.
I worked almost exclusively on AppEngine, it was simple. You uploaded a zip of your code to appengine and it ran it at near infinite scale. They gave you a queue, a database, a volatile cache, and some other gizmos. It was so simple you’d struggle to fuck it up really.
It was easy, it was simple, and it worked for my clients who had 10 DAU, and my clients who had 5 million DAU. Costs scaled nearly linearly, and for my hobby projects that had 0 DAU, the costs were comparable.
Then something happened and it slowly became complicated. The rest of the GCP cloud crept in and after spending a term with a client who didn’t use “the cloud” I came back to it and had to relearn nearly everything.
Pretty much all of the companies I’ve worked for could be run on early AppEngine. Nobody has needed anything more than it, and I’m confident the only reason they had more was because tech is like water. You need to put it in a bucket or it goes everywhere.
Give me my AppEngine back. It allowed me to focus on my (or my clients) problems. Not the ones that come with the platform.