July 27, 2018
ERC Starting Grant for Dan Alistarh
Project to make machine learning faster through distribution is supported by prestigious European Research Council funding award | Project leverages the robustness of machine learning algorithms to noise as a tool to distribute work efficiently
In this round of awards, two Starting Grants of the European Research Council (ERC) go to professors at the Institute of Science and Technology Austria (IST Austria). One of the awardees is Dan Alistarh, a computer scientist who joined IST Austria in 2017. In his project, which the ERC will support with about 1.5 million Euro, he plans to use new approaches to dramatically decrease the time it takes to train large-scale machine learning models. Currently it can be hard to distribute machine learning computation efficiently among many computation nodes. This is what Dan Alistarh wants to change by leveraging the robustness of machine learning algorithms to noise in order to distribute efficiently.
Machine learning and data science are areas that have made tremendous progress over the last decade. But training a machine state-of-the-art datasets can take significant time, which limits the number of ideas that researchers can test within reasonable turnaround time. In such a case, computer scientists would normally use distributed systems, meaning that they let several computers or processing units work together simultaneously to complete the computation faster. But standard distribution methods are not easily applicable to algorithms in machine learning.
“What happens if you apply standard methods to machine learning is that it does not seem to work well. It may well happen that you cannot obtain highly accurate models, or that performance is significantly lower than expected,” explains Dan Alistarh.
If a task is distributed to several computation nodes (either CPUs or GPUs, for instance), one would hope to cut down the training time proportionally with the number of CPUs. This is what computer scientists call scalability. But after a distributing among a small number of nodes many algorithms can stop scaling. The reason behind this is that the nodes have to pass on a lot of information, and as the number of nodes goes up the system is using more and more of its computational power for communication. “You end up with a system that spends more time on communicating than on the actual useful computation that it is supposed to do,” Alistarh adds.
The solution might lie in the robustness of machine learning algorithms to noise: for example, an image recognition algorithm may be confronted with some mislabeled images, and the overall outcome will not be affected. Dan Alistarh will make use of similar notions of robustness to reduce the amount of communication and synchronization between nodes, an idea he calls “elastic coordination.”
Normally, the nodes would transfer a complete and extensive amount of information like precise values of each parameter involved. But in machine learning this precision does not seem to be always necessary, which provides the potential to dramatically cut the costs of communication and synchronization. Dan Alistarh and his research group will pursue this approach to reduce training time for machine learning. At the same time he expects his work to examine fundamental questions about distributed computing.
Dan Alistarh received his PhD from the École Polytechnique Fédérale de Lausanne (EPFL) and then took positions at MIT, Microsoft Research Cambridge, UK, and ETH Zurich. In 2017, he joined IST Austria where he leads a research group entitled “Distributed Algorithms and Systems”.