Using the matrix to help Meta gear up
For a user base of this size, large-scale automated-systems must be utilized to understand user experience in order to ensure accuracy and success. EPFL’s Machine Learning and Optimization Laboratory (MLO), led by Professor Martin Jaggi, is in active collaboration with Meta Platforms, Inc., Facebook’s parent company, to solve this unique challenge.
With funding from EPFL’s EcoCloud Research Center, MLO collaborates with Meta through internships at the company for MLO researchers and the use by Meta of a pioneering MLO invention: PowerSGD. MLO is helping Meta to analyze and better understand millions of users' experiences while at the same time respecting user privacy. This requires collaborative learning, that is, privacy-preserving analysis of information from many devices for the training of a neural network that gathers, and even predicts, patterns of behavior.
To do this, a key strategy is to divide the study of these patterns over "the edge", using both the user's device, and others that sit between it and the data center, as a form of distributed training. This requires a fast flow of information and efficient analysis of the data. PowerSGD is an algorithm which compresses model updates in matrix form, allowing a drastic reduction in the communication required for distributed training. When applied to standard deep learning benchmarks, such as image recognition or transformer models for text, the algorithm saves up to 99% of the communication while retaining good model accuracy.
PowerSGD was used to speed up training of the XLM-R model by up to 2x. XLM-R is a critical Natural Language Processing model powering most of the text understanding services at Meta. Facebook, Instagram, WhatsApp and Workplace all rely on XLM-R for their text understanding needs. Use cases include: 1) Content Integrity: detecting hate speech, violence, bullying and harassment; 2) Topic Classification: the classification of topics enabling feed ranking of products like Facebook; 3) Business Integrity: detecting any policy violation for Ads across all products; 4) Shops: providing better product understanding and recommendations for shops.
"There are three aspects to the process. The first is to develop gradient compression algorithms to speed up the training, reducing the time required to prepare this information for its transfer to a centralized hub. The second is efficient training of the neural network within a data center – it would normally take several weeks to process all the information, but we distribute the training, reducing computation from months to days,” said MLO doctoral researcher Tao Lin.
As a third aspect, privacy is a constant factor under consideration. "We have to distinguish between knowledge and data. We need to ensure users' privacy by making sure that our learning algorithms can extract knowledge without extracting their data and we can do this through federated learning,” continued Lin.
The PowerSGD algorithm has been gaining in reputation over the last few years. The developers of deep learning software PyTorch have included it as part of their software suite (PyTorch 1.10), which is used by Meta, OpenAI, Tesla and similar technology corporations that rely on artificial intelligence. The collaboration with Meta is due to run until 2023.