Schneider Electric Warns That Existing Datacenters Aren't Buff … – Slashdot

Become a fan of Slashdot on Facebook




The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
We need to continue to make it possible to do more processing with less power, whether that means continuing refinement on silicon or some other thing, but right now we don’t need to be adding more always-on load. How’s about we just get fast residential internet in all the cold places and do that distributed computing space heater thing? You could use computing power to heat ammonia refrigeration as well.
Iceland already is popular for datacentres.
So is Virginia, your point?
Hellifikno why people build datacentres in VA. They do it in Iceland for the free cooling. And the geothermal can be handy too. Somewhat.
“Hellifikno why people build datacentres in VA.”
Because government networking needs have caused a shit-ton of networking infrastructure to be built here. Datacenters like being near all that bandwidth.
Simply because of all those juicy federal contracts. Proximity to your customer counts.
Processing information can’t have the power requirements optimized to zero, at some point we hit physical limits on moving things around to store, transmit, and process information. That quantity of energy might be moving an electron over an angstrom, or something else that is equally very very small, but it’s not going to be zero.
Given the money that can be made on lowering energy costs of processing data there’s likely already considerable incentive to do better. But, again, there’s limits. One example of this is with solar PV. I don’t recall specifics but there was someone that came out with a new “highly efficient” PV cell that was tested at something like 20% efficiency. The problem was that with the much higher cost of this 20% efficiency PV technology everyone kept buying the 15% efficient, or whatever, PV cells because they gave the best return on investment. Maybe it wasn’t 20% and 15% but the point is that even if we find something that can bring more processing with less power it still needs to come at a cost that is worth the energy savings.
People are already moving data centers to Canada for the colder outdoor temperatures (which makes for more efficient heat sinks) and the cheap and reliable electricity from nuclear and hydro. Oh, and Canada is apparently investing big into onshore wind like their neighbor to the south since it provides cheap electricity but without the hydro and/or nuclear to be a backup there’s still a reliability issue. I’ve heard nice things about geothermal power lately, and that might help with the power supply, especially in colder climates that give more of a temperature gradient to help with efficiency.
People are also building datacenters in Phoenix metro area. So far cooling isn’t the problem. Modern datacenters can cool 20kw in a rack without an issue. Legacy ones have the issues Schneider is discussing. Think DCs that have raised floors. Our DC uses regular mini-split style to cool the room or cold isle. The magic is what they do with the heat which is a cooling unit on top of each row that is more or less a radiator they push water through. They typically do 3 units for each row to ensure proper cooling redundancy.
So as long as you orient your gear right, all is good in the world of modern DCs. I wouldn’t think that would be a problem but I run into rack all the time where they orient the switches to make it easier to connect their servers except that then means the switches are taking in air from the hot isle and exhausting into cold. So frustrating to see as recabling to fix is no fun, especially because at that point its a live patient.
The biggest problem for me personally is that I now have to use hearing protection when I go into the datacenter even though my racks aren’t doing AI. Communication while operating in the DC verbally is basically a non-starter unless you’re using bone conduction mics. That AI gear is quite obviously very loud as cooling 700W GPUs requires a lot of fan internally.
While a lot of switches have fan part numbers that go in one direction or another, some vent out the sides, opening them up is a good way to void the warranty on devices that can exceed 100k in cost which makes that proposal a non-starter. Frankly even a low-end 10k switch I wouldn’t do that on. Maintaining enterprise support is critical in a DC where you often don’t have a lot of physical access after its deployed.
A lot of enterprise firewalls are designed to only be oriented in one direction as well. The
Processing information can’t have the power requirements optimized to zero, at some point we hit physical limits on moving things around to store, transmit, and process information. That quantity of energy might be moving an electron over an angstrom, or something else that is equally very very small, but it’s not going to be zero.
While currently in the realm of science fiction, it is not against the laws of physics to do close to zero energy computation. Energy only needs to used when erasing informati
Did you miss the part where they said that the key difference with AI training is the importance of low-latency connections? And your solution is to spread training out to home internet connections in every corner of the world?
To give a sense of why:
First off, training is hugely memory intensive. Like, it’s hard to train a 3B model with a single batch on a top-end consumer GPU with 24GB VRAM.
But GPT-3 isn’t 3B parameters – it’s 175B parameters. And GPT-4 is 1,76T parameters.
Oh but wait, there’s more! Because again, there’s this thing called batching / microbatching. Basically, the goal of training is to find gradients where, if the model had been adjusted along them, it would have done better at computing a given training sample. But if you just take training samples one at a time, the gradients may be wildly different from each other. You’re in effect doing simulated annealing, but constantly, nonstop bouncing yourself out of the optimum. So we train in batches, where you calculate the gradients of many samples at once, average them, and use the averaged gradient instead, which can be thought of as reflecting broader truths than what you’d get from a single sample.
Well, GPT-4 is said to have had a batch size of 60 million.
The short of it is, you’re not solving many small problems where it’s just a task of throwing enough compute at it. You’re solving a single, massively-interconnected problem that’s far too large to fit onto a single GPU. So latency is really critical to training. To get a sense of how critical, watch the Tesla AI Day video where they unveiled the Dojo architecture. That will give a sense of how important reducing latency is.

Did you miss the part where they said that the key difference with AI training is the importance of low-latency connections? And your solution is to spread training out to home internet connections in every corner of the world?

But GPT-3 isn’t 3B parameters – it’s 175B parameters. And GPT-4 is 1,76T parameters.

Did you miss the part where they said that the key difference with AI training is the importance of low-latency connections? And your solution is to spread training out to home internet connections in every corner of the world?
But GPT-3 isn’t 3B parameters – it’s 175B parameters. And GPT-4 is 1,76T parameters.
GPT-4 is really more like a dozen or so roughly 100B parameter models. MoE is a game changer for scaling by making specialization across smaller more tractable models. Still not likely to be something you can distribute over the Internet any time soon.

The short of it is, you’re not solving many small problems where it’s just a task of throwing enough compute at it. You’re solving a single, massively-interconnected problem that’s far too large to fit onto a single GPU. So latency is really critical to training.

The short of it is, you’re not solving many small problems where it’s just a task of throwing enough compute at it. You’re solving a single, massively-interconnected problem that’s far too large to fit onto a single GPU. So latency is really critical to training.
There is another possibility that can enable Internet distribution. Model merging where many independently work to train up different aspects from the same base models and then merge the results. Think Neo’s stack of educational mini-discs from the Matrix.
Oh but wait, there’s more! Because again, there’s this thing called batching / microbatching. Basically, the goal of training is to find gradients where, if the model had been adjusted along them, it would have done better at computing a given training sample. But if you just take training samples one at a time, the gradients may be wildly different from each other. You’re in effect doing simulated annealing, but constantly, nonstop bouncing yourself out of the optimum. So we train in batches, where you cal
Liquid as a heat transfer agent has been in use for many, many years. Nothing weird about that. Most significant data centers i’ve run have had coolant systems with a water tower involved. The big honking air handlers need somewhere to put that heat.
Perhaps servers will have to bring the coolant closer to the CPUs to handle the greater heat density. The issues with leakage and the complexity of already messy racks probably have prevented this up until this point. Also, the location of PDUs towards the bottom of racks may be an issue – who wants fluid leaking onto those? And oh, they do leak. Many floods in data centers over the years. Looking through a vent tile to see flowing water is not unusual and something you need to keep an eye out for.
Mostly an engineering problem. But they want to sell more Schneider gear, so unsurprising they’d be writing whitepapers about this.
Time to bring in the ammonia cooling systems.
Liquid cooling’s main problem is scaling. We ran several pilots in the past years testing different approaches, but the main issue is that they exponentially increase maintenance time. Leakage was present but minor.

Liquid cooling’s main problem is scaling. We ran several pilots in the past years testing different approaches, but the main issue is that they exponentially increase maintenance time. Leakage was present but minor.

Liquid cooling’s main problem is scaling. We ran several pilots in the past years testing different approaches, but the main issue is that they exponentially increase maintenance time. Leakage was present but minor.
This is exactly it, it’s still early days with this tech so we’re still feeling our way towards what is a sensible solution. My DC runs chilled doors as standard on our 20kW racks and they take a water feed to each rack so the jump to DLC isn’t as far. The problem is this size of compute is not sustainable, with 20kW racks is you can only fit 2 DGX H100s in them and that’s without a switch! Ultimately we need a low maintenance solution, just like the jump we made to tool-free servers.
My response is “fittings break”. Usually in stress situations, like cold temperatures outside, for instance.
I remember a situation where I was made to hold half of a fitting together with the other half for the better part of an hour trying to staunch the flow while someone got some replacement parts and the tools necessary to fix it up, as pressurized water shot everywhere. No one wanted to shut the data center down…
We didn’t have anything that catastrophic, and we definitely wouldn’t have asked a person to hold a leaky pipe over live systems for one hour.
I think they main issue in your scenario is that you didn’t have an adequate SOP, and yes, sometimes you need to shut down hardware, that’s why you have redundancy and shifting procedures.
Would you believe this was IBM? It was.
I have no difficulty in believing it was IBM, I’ve personally interviewed droves of SoftLayer/IBM Cloud employees willing to jump ships.
Company that sells power transmission equipment and accessories says you will need more power.
We’re using old technology to support new ideas. There’s money to be had for investors who can come up with a disruptive massively cool alternative to heat loss and radical approaches to propagation delays.
I wonder when someone is finally going to produce actual hardware neurons. Right now we’re just simulating neural nets with classic computers. No matter how much they distribute the load over many cores, it’s still just processors doing calculations. If we can have the silicon equivalent of actual neurons, that would be a major breakthrough.
so that I can type “Make me a photo of Lindsay Lohan naked covered in grey poupon” in DALL-E.
Impressive, eh?
The future must be a grand place. I’d like to see it someday.

Impressive, eh?
The future must be a grand place. I’d like to see it someday.

Impressive, eh?
The future must be a grand place. I’d like to see it someday.
I am sometimes amazed at the technology available today, and the speed in which new technologies come to market. Others appear to have made the same observation and exclaimed, “we are living in the future but its not evenly distributed.”
We can see the future, but we have to travel some to see it in bits and pieces. Travel can also take us into the past, if you want to see what things were like living in some historical period then we can take an airplane ride to see what life was like going back to anythi
so that I can type “Make me a photo of Lindsay Lohan naked covered in grey poupon” in DALL-E.
But, curiously, all I got back were images of Natalie Portman. And the Grey Poupon looked an awful lot like hot grits. I guess that’s what you get when you train your AI by scraping the internet.
For those not familiar with data center state-of-the art 10 years ago this is pretty good information. I haven’t designed any AI data centers, but plenty of facilities over 40-50kW/rack and a couple over 100kW/rack, so many of the warnings are pretty old-hat.
What is important for anybody tangentially involved with data centers is to understand the impact on peak:average workload in these facilities. It is a big deal and many things are not adequately designed to accommodate it. We used to design for a 3:1 peak/average, which would be consistent for electrical and HVAC; for well designed systems once you go over 2:1 you will have significant HVAC issues and over 1.25:1 electrical systems will need major overhauls.
For a given system, it’s all about how much energy can be converted into heat over a unit of time.
Watts in â BTUs out.
There are many ways to optimize the left hand side of the equation but doing the rhs part in an efficient and stable manner remains a challenge. This is where scaling out is going to win.
If we are to get battery electric vehicles to replace internal combustion engines then we need to have a similar discussion.
The usual comments about BEVs is that it’s no big deal because people will drive their BEV home, plug it in, then have a delay timer on the charger to charge the car after supper has been cooked and cleaned up, everyone had finished their homework and nightly TV watching, then turned out all the lights and gone to sleep. The grid can handle this as can the power plants providing power
The solution for datacenters already exists. Itâ(TM)s a small reactor instead of the diesel generators. The primary problem right now is not IN the datacenter it is transport and grid problems, primarily due to green energy investments and shutting down nuclear and other base-load plants that have made it a lot more brittle.
We have to accept, fission and eventually fusion are practically infinite power sources. If you want to live in a future like Star Trek, you have to solve the primary problems Star
>> We have to accept, fission and eventually fusion are practically infinite power sources.
Yep. I use fission with my PV panels.
Sorry, correction:
  “I use fusion with my PV panels.”
So we make AI more efficient. Then we just have more of it and power consumption goes up or stays the same. Or, we make compute less power hungry. Then have more AI because the power’s cheaper and available. Either way, we use more power than we have and demand continues to exceed supply.
The hype will be over, nobody will care much anymore and the remaining few applications will be optimized a lot more an run well on pretty nirmal hardware.
Why is it “news”? (It’s not, it’s Slashfiller.)
Data centers are tools. When they cease to serve, replace them.
There’s no shortage of money and the work will be welcomed.
 
Ah, the old AI adage:
Just throw more processors at it, it’s bound to become intelligent this time round!
People who design data centers rarely understand what they are doing. Once you hit 32kW per rack, which is well and truly achievable in a modern data center with enclosed cold aisle and hot aisle plenums, which usually use water cooling.
Problem is, once the pump goes out and the water stops, a data center like that has less than a minute before the temperature inside reaches the point that you can no longer escape. The research that identified the problem originally came from the US military, with respect to how long soldiers could remain inside a tank that was on fire, before escape was no longer possible. ( Somewhere around 72 degrees ).
And you thought non-breathable gas extinguishers were bad…
GrpA
72C sounds a bit low since people voluntarily sit in saunas much hotter than that with very high humidity and they (usually) can leave by themselves.
A Sauna is a little simpler to escape than a Tank.
Just sayin.
datacenter alley is also pretty easy to escape compared to tank. Kinda similar to sauna.
they spent 3 pages explaining why you cant draw more power than is available, what is this fluff?
There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.
RNA Has Been Recovered From an Extinct Species For the First Time
Google Wants To Map More of the World’s Roads With Expansion of ‘Road Mapper’ Volunteer Community
Your code should be more efficient!

source

Jesse
https://playwithchatgtp.com