/John Shalf from LBNL on Computing Challenges Beyond Moore’s Law (via Qpute.com)
John Shalf from LBNL on Computing Challenges Beyond Moore's Law

John Shalf from LBNL on Computing Challenges Beyond Moore’s Law (via Qpute.com)

In this special guest feature from Scientific Computing World, Robert Roe interviews John Shalf from LBNL on the development of digital computing in the post Moore’s law era.

John Shalf is Department Head for Computer Science at Lawrence Berkeley National Lab.

As not only the HPC industry but the larger computing ecosystem tries to overcome the slowdown and eventual end of transistor scaling described by Moore’s Law, scientists and researchers are implementing programs that will define the technologies and new materials needed to supplement, or replace, traditional transistor technologies.

In his keynote speech at the ISC conference in Frankfurt, John Shalf Department Head for Computer Science at Lawrence Berkeley National Laboratory discussed the need to increase the pace of development for new technologies which can help to deliver the next generation of computing performance improvements. Shalf described the decline in Moore’s Law as we approach the physical limits of transistor fabrication, which is estimated to be in the 3 to 5nm range. Shalf also described the lab-wide project at Berkeley and the DOE’s efforts to overcome these challenges through the development acceleration of the design of new computing technologies. Finally, he provided a view into what a system might look like in 2021 to 2023, and the challenges ahead, based on our most recent understanding of technology roadmaps.

The keynote highlighted the tapering of historical improvements in lithography, and how it affects options available to continue scaling of successors to the first exascale machine.

What are the options available to the HPC industry in a post-Moore’s Law environment?

There are really three paths forward. The first being the one that is pursued most immediately is architecture specialization. This is creating architectures that are tailored for the problem that you are solving. An example of that would be Google’s tensor processing unit (TPU). It’s a purpose-built architecture for their inferencing workload and it is much more efficient than using a general-purpose chip to accomplish the same goal.

This isn’t a new thing. GPU’s where specialized for graphics and we also have many video codecs that have been specialized, that are specialized processors for video encoding/decoding. It’s a known quantity and it is known to work for a lot of target applications.

But the question is, what is this going to do for science? How do we create customizations that are effective for scientific computing?

The second direction that we can go is CMOS replacements. This would be the new transistor that could replace the silicon-based transistors that we have today. There is a lot of activity in that space, but we also know that it takes about 10 years to get from a laboratory demonstration to fabrication, and the lab demonstrations do not demonstrate that there is a clear alternative to CMOS yet. There are a lot of promising candidates but that demonstration of something new that could replace silicon is not there yet. The crystal ball is not very clear on which way to go.

For the direction of these carbon nanotubes, or negative capacitance FETS, or some other CMOS replacement technology, we need to accelerate the pace of discovery. We need to have more capable simulation frameworks to find better materials, that have properties that can outperform silicon CMOS. New device concepts, such as these tiny relays called NEMS (A nanoelectromechanical (NEM) relay), carbon nanotubes or magentoelectronics – there is a lot of opportunities there.

We don’t know what is going to win in that space but we definitely need to accelerate the pace of discovery. But that is ten years out probably.

The last direction is new modes of computation which is the neuro-inspired computing and quantum computing. These are all very interesting ways to go, but I should point out that they are not a direct replacement for digital logic as we know it. They expand computing into areas where digital logic is not very effective, such as solving combinatorial Np-hard problems with quantum, digital computers are not so good at that. Or neuromorphic or AI image recognition, this is another area where traditional computers are not as efficient, but AI could expand computing into those areas. But we still need to pay attention to the pace of capability of digital computing, because it directly solves important mathematical equations, and it has a role that is very important for us.

The more you specialize, the more benefit you could get. Examples such as the ANTON and ANTON 2 computers (which were extremely specialized for molecular dynamics computations), they had some flexibility but it really doesn’t do anything other than molecular dynamics.

But there is a wide spectrum of specialization and there isn’t any one path. There are some codes, like the climate code, that are so broad in terms of the algorithms that they have in it, that you have to go in the general purpose direction, but you would still want to

include some specialization. However, there are other examples like Density Functional Theory (DFT) codes, which are material science codes that have a handful of algorithms, which, if you could accelerate them, you would get a lot of bang for your buck. It is possible that we will see both kinds of specialization, the GPU kind, which is very broad, or some more narrow specializations that might be targeted to one application.

How does architecture specialization change the way HPC systems are designed and purchased?

It may be the case that in the future we have to decide how much of the capital acquisition budget will go into working together with a company to add specializations to the machine through non-recurring engineering expenses. It might change the model of acquisition, where you have to make a decision about how much of your budget you’re willing to put into R&D, as opposed to just strictly acquisition and site preparation costs.

If you look at the mega data centre market, it is already happening, so it is a question of when is HPC going to catch up?

Microsoft Research have Project Catapult, which has FPGAs integrated throughout the interconnect on the machine to do processing in-network. Google has its TPU and it is already on its third generation. Amazon has its own specialized chip that it is designing using Arm IP – that’s one way to reduce the costs of specialization, to use technology from the embedded IP ecosystem.

So the mega datacenters are already doing this, it is a forgone conclusion that this is an approach that is being adopted, the question is, how do you adopt it productively for scientific computing?

How much benefit do CMOS 
replacements offer?

The answer is that we don’t know what is physically possible, but we do know the fundamental limit in physics for digital computing; it is the Landauer limit. We are many orders of magnitude above the Landauer limit. There is a lot of room at the bottom but we don’t know the physical limits of the devices we can construct.

The materials that we have to construct these devices, the pace at which we are able to discover those devices and the expense it takes to create just one of those devices as a demonstration – the process is incredibly slow and very artisanal.

Because of the urgency of the issue, we have started a lab-wide initiative for Beyond Moore’s Law Microelectronics to industrialize the process using modeling and simulation of candidate materials, using something called the Materials Project. This aims to optimize the search for candidate materials. You say what characteristics you want to optimize, then the materials project framework can automate that search for better materials. Sifting through tens of thousands of materials, it can find the handful that have that optimized property.

You then conduct device-scale simulation. Researchers do full ab initio material science simulations using a code called LS3DF, which is able to do those kinds of device-scale simulations. It takes a whole supercomputer to be able to do it, but it is so much better to simulate the behavior of the device before you construct it, because it is so costly to fabricate them.

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter

This is a syndicated post. Read the original post at Source link .