Left: computation to communication ratio for the LU factorization
with partial pivoting of a dense matrix on a model of exascale machine
(ISC 2014). Right: preconditioned Krylov solver as used in the map
making problem in astrophysics, results obtained on a Cray XE6 machine
(Astronomy and Astrophysics, 2014). Please click to enlarge.
This talk will address one of the main challenges in high performance computing which is the increased cost of communication with respect to computation, where communication refers to data transferred either between processors or between different levels of memory hierarchy, including possibly NVMs.
I will overview novel communication avoiding numerical methods and algorithms that reduce the communication to a minimum for operations that are at the heart of many calculations, in particular numerical linear algebra algorithms.
Communication avoiding LU uses tournament pivoting to minimize communication (SC08). Lightweight scheduling combines static and dynamic scheduling to provide a good trade-off beween load balance, data locality and dequeue overhead (IPDPS 2012). Please click to enlarge.
Those algorithms range from iterative methods as used in numerical simulations to low rank matrix approximations as used in data analytics. I will also discuss the algorithm/architecture matching of those algorithms and their integration in several applications.
Speaker Background:
Dr. Laura Grigori
Dr. Laura Grigori obtained her Ph.D. in Computer Science in 2001 from
University Henri Poincare in France. She was a postdoctoral researcher
at UC Berkeley and Lawrence Berkeley National Laboratory, before joining French Institute for Research in Computer Science and Automation (INRIA) in France in 2004.
Currently she now leads a joint research group between INRIA, University of Pierre
and Marie Curie, and the National Center for Scientific Research (CNRS), called Alpines.
Her field of expertise is
high performance scientific computing, numerical linear algebra, and
combinatorial scientific computing. She co-authored the papers
introducing communication avoiding algorithms that provably minimize
communication.
She is leading several projects in preconditioning,
communication avoiding algorithms, and associated numerical libraries
for large scale parallel machines. She is currently the Program
Director of the SIAM Special Interest Group on Supercomputing.
A new breakthrough battery—one that has significantly higher energy, lasts longer, and is cheaper and safer—will likely be impossible without a new material discovery. And a new material discovery could take years, if not decades, since trial and error has been the best available approach. But Lawrence Berkeley National Laboratory (Berkeley Lab) scientist Kristin Persson says she can take some of the guesswork out of the discovery process with her Electrolyte Genome.
Think of it as a Google-like database of molecules. A battery scientist looking for a new electrolyte would specify the desired parameters and properties, and the Electrolyte Genome would return a short list of promising candidate molecules, dramatically speeding up the discovery timeline. Click here to watch the video.
“This is just one of several compelling videos that SC15 will be releasing over the coming weeks to help describe how high performance computing is helping to transform society and have a tremendous positive impact on everyday life,” said Jackie Kern, SC15 Conference Chair from University of Illinois at Urbana Champaign.
According to Kern, It is also part of a three-year “HPC Matters” campaign that will be a big focus at the SC15 conference in Austin this November. This includes a free plenary session led by Diane Bryant – one of Intel’s top executives and recently named to Fortune’s list of the 51 most powerful women.
Faster, Smarter, Better
Besides being faster and more efficient in screening out bad candidates, the Electrolyte Genome offers two other significant advantages to battery scientists. The first is that it could generate novel ideas. “While there are some amazing organic chemists out there, this allows us to be agnostic in how we search for novel ideas instead of relying purely on chemical intuition,” Persson said. “We can be surprised by what we find by combining experience with new, non-traditional ideas.”
The second advantage of the Electrolyte Genome is that it can add to scientists’ fundamental understanding of chemical interactions.
“It adds explanations to why certain things work or don’t work,” Persson said. “Frequently we rely on trial and error. If something doesn’t work, we throw it away and go to the next thing, but we don’t understand why it didn’t work. Having an explanation becomes very useful—we can apply the principles we’ve learned to future guesses. So the process becomes knowledge-driven rather than trial and error.”
How it Works – Funnel Method
The Electrolyte Genome uses the infrastructure of the Materials Project, a database of calculated properties of thousands of known materials, co-founded by Persson and Gerbrand Ceder. The researchers apply a funnel idea, doing a first screening of materials by applying a series of first principles calculations for properties that can be calculated quickly and robustly. This winnows down the candidate pool, on which they do a second screening for another property, and so on.
The concept was described in a recent essay in The Journal of Physical Chemistry Letters co-authored by Persson and her collaborators at Berkeley Lab and Argonne National Laboratory. With a short list of candidate molecules, researchers can then perform more detailed computational evaluations, applying molecular dynamics simulations or other calculations as needed, for example to characterize the interactions of the different components.
The number of possible combinations is infinite since so many different salts can be combined with so many different solvents; plus impurities play a role. So Persson and her team do work closely with experimentalists to guide their research. “Because the space is so vast, we typically don’t throw the whole kitchen sink at it because it would take forever,” she said. “We tend to take some base molecule or some idea, then we explore all the variations on that idea. That’s the way to attack it.”
The methodology has been validated with known electrolytes. Using the supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) at Berkeley Lab, the researchers can screen hundreds of molecules per day.
To date, more than 15,000 molecules for electrolytes—including 10,000 redox active molecules, hundreds of conductive network molecules, and salts, solvents, and more—have been calculated. Screening such quantities of molecules for suitable properties using traditional synthesis and testing techniques would take decades. Early Success Stories
The Electrolyte Genome’s first major scientific finding—that magnesium electrolytes are very prone to forming ion pairs, which impacts several crucial aspects such as conductivity, charge transfer and stability of the electrolyte—was published in February in the Journal of the American Chemical Society.
They had another success screening molecules for redox capabilities for flow batteries for fellow Berkeley Lab scientist Brett Helms. “He basically gave us a chemical space of organogelator molecules and asked, ‘Can you tell me the best molecule if I want a voltage window that’s precisely here,’” Persson said. “We filtered down about a hundred candidates to one. It worked, and the molecule fit the intended purpose perfectly.”
The Electrolyte Genome is funded by the Joint Center for Energy Storage Research (JCESR), a Department of Energy multi-partner Energy Innovation Hub announced in 2012, led by Argonne National Laboratory and including Berkeley Lab. It is open source and will be made public by the end of JCESR’s five-year charter, at the latest, according to Persson. This is the first in a series of SC15 videos that will be released leading up to the conference to help tell compelling and interesting stories of why HPC Matters. About SC15
SC15, sponsored by ACM (Association for Computing Machinery) and IEEE Computer Society offers a complete technical education program and exhibition to showcase the many ways high performance computing, networking, storage and analysis lead to advances in scientific discovery, research, education and commerce. This premier international conference includes a globally attended technical program, workshops, tutorials, a world class exhibit area, demonstrations and opportunities for hands-on learning. For more information on SC15, please visit http://www.sc15.supercomputing.org/, or contact communications@info.supercomputing.org for more information. About ACM
ACM, the Association for Computing Machinery www.acm.org, is the world’s largest educational and scientific computing society, uniting computing educators, researchers and professionals to inspire dialogue, share resources and address the field’s challenges. ACM strengthens the computing profession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical excellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, career development, and professional networking. Special thanks to Julie Chao from Lawrence Berkeley National Laboratory for her assistance with this article and video.
Lawrence Berkeley National Laboratory Deputy Director Horst Simon still displays the 1988 Gordon Bell Prize he shared in. Behind him is the 2009 Gordon Bell Prize awarded to a team he was a part of. Photo by Roy Kaltschmidt, LBNL.
In the foyer of the main building at Lawrence Berkeley National Laboratory, a panel displaying the 13 Nobel Prize-winning researchers and projects associated with the lab takes pride of place. Just down the hallway, Deputy Lab Director Horst Simon has two awards displayed prominently in his office, Gordon Bell Prize certificates from 1988 and 2009.
Though not as famous as the prizes created by Alfred Nobel, the prizes endowed by Gordon Bell, who rose to fame as a computer designer for Digital Equipment Corp., are highly valued by the scientists whose scientific applications push the sustained performance of leading edge supercomputers. The awarding of each year’s prizes are a highlight of the SC conference held every November.
An industry article from the time.
Simon, along with Phong Vu, Cleve Ashcraft, Roger Grimes, John Lewis and Barry Peyton, achieved the first 1 gigaflop/s performance of a science application, running a general sparse matrix vectorization on a Cray Y-MP computer. At the time, Simon, Grimes and Lewis worked for Boeing Computing Services, Simon was based at NASA Ames, Vu worked at Cray Research, Ashcraft was at Yale and Peyton was at Oak Ridge National Laboratory. The 1988 prize was awarded at the IEEE CompCon meeting held in March 1989 in San Francisco.
While Lewis, Grimes and Ashcraft developed the code, Simon had been using the eight-processor
Y-MP with a peak performance of 1.6 gigaflop/s at NASA Ames and saw the potential. After doing a lot of fine tuning, the team was able to get the highest performance from what many considered a machine with the “old” vector processing technology.
“At the time, there was a lot of debate about how to go parallel,” Simon recalled. “One path was with the Cray that had a few very powerful processors, while others were looking to systems with hundreds of smaller processors.”
In fact, the other two Gordon Bell Prizes awarded in 1988, one for best price-performance and another for compiler parallelization, were for applications run on a 1,024-processor N-CUBE. But none of the highly parallel systems of that time could claim a theoretical peak in excess of 1 gigaflop/s.
Winners of the 1988 Gordon Bell Prize are presented with the
award.
“I was absolutely elated to be a member of the winning team – I was early in my career and thought that an award for parallel performance was a great idea,” Simon said. “I was fortunate to be in a great group at Boeing and be part of a great team at NASA. The award came at a time when parallel computing was emerging as a hot topic and it was a great career boost – it established my credentials in HPC.”
He has continued to add to those credentials, including serving as one of four editors of the twice-yearly TOP500 list, which rates the performance of the world’s fastest supercomputers.
Simon said that the original motivation for parallel computers was not how to solve problems that ran on one processor faster, but to solve problems that needed more processors. And the goal was to increase the overall speed as you scaled up to bigger and bigger machines.
The idea for the annual prize grew out of a SIAM meeting in 1985, when a group tossed around the idea of having a prize recognizing the speedup of real applications on real parallel machines. Alan Karp of IBM got things rolling by offering $100 out of his own pocket as a prize. Since 2011, the winners share $10,000.
The very first prize was given to Robert Benner, John Gustafson and Gary Montry, all of Sandia National Laboratories. Simon said their work was a conceptual breakthrough that helped define the future of parallel computing. In the late 1980s there was a lot of discussion of how much speed-up one could actually obtain on parallel computer for a fixed problem, later called strong scaling. The Sandia group defined what became to be known as weak scaling: the motivation for parallel computers was not how to solve problems that ran on one processor faster, but to solve problems that needed more processors. And the goal was to increase the overall speed as you scaled up to bigger and bigger machines.
For the first few years, the prize was awarded at CompCon, a general computing conference. But in the early 1990s, Simon worked with the SC conference committee to bring the Gordon Bell Prize into the supercomputing conference and the submissions deadline was changed to coincide with that of the conference. But unlike today where the prize entries are a formal part of the tech program, they were initially relegated to a Birds-of-a-Feather session.
At SC06 in Tampa Bay, the Gordon Bell Prize submissions were incorporated into a single track of the Technical Program. Also in 2006, the ACM assumed sponsorship and the name was officially changed to the ACM Gordon Bell Prize.
Fast forward to 2009 and Simon was a member of a team led by IBM’s Dharmendra Modha that created the largest brain simulation to date on a supercomputer, with the number of neurons and synapses in the simulation exceeding those in a cat’s brain. The team, which also included Rajagopal Ananthanarayanan and Steven K. Esser, won a Gordon Bell Prize in a special category for “The Cat is Out of the Bag: Cortical Simulations with 109 Neurons, 1013 Synapses.” For the project, the IBM team members came up with the idea and Simon provided the link to the supercomputing resources. Although the team did not claim to have actually simulated a cat brain, the award generated a flurry of controversy.
But the goal of simulating brain activity is nothing new. For years, Simon said, scientists have wondered if super computers could be used to create super intelligence, which although it has occurred in a number of films, cannot be done in silico.
“But it’s an interesting challenge and if we can understand how the brain computes, it could help us design more efficient computers,” Simon said. “After all, our brain only needs about 20 watts of power to easily outperform a supercomputer drawing 20 megawatts. If we could simulate a chip with brainlike characteristics, the results could help us build a better chip.”
The simulation of chips on HPC systems is interesting, but not mainstream, Simon said, which leads him to a parting anecdote.
“Steve Jobs wanted to design a one-piece plastic casing for the Apple 2 and used a Cray for the simulation of the injection mold flow process,” Simon said. “And when Seymour Cray was building the Cray 2, he used an Apple computer.”