AICamp

Magnet Shuffle Service: Push-based Shuffle at LinkedIn

Sep 08, 12:00PM PST(07:00PM GMT).

Free 207 Attendees

This event is hosted by San Francisco Big Analytics meetup group

Agenda (US Pacific Time):
12:00 - 12:05 pm: Introduction
12:05 - 12:40 pm: Tech Talk: Magnet Shuffle Service: Push-based Shuffle at LinkedIn
1:30 pm - event closed

Abstract:
The number of daily Spark applications at LinkedIn has increased by more than 3X in the past year. The shuffle process alone is processing 10+ PB of data and billions of blocks daily in our clusters nowadays. With such a rapid increase of Spark workloads, we quickly realized that the shuffle process can become a severe bottleneck for both infrastructure scalability and workload efficiency. In our production clusters, we have observed both reliability issues due to shuffle fetch connection failures and efficiency issues due to the random reads of small shuffle blocks on disks.

To tackle those challenges and optimize shuffle performance in Spark, we have developed Magnet shuffle service, a push-based shuffle mechanism that works natively with Spark. Our paper describing this work has recently been accepted by VLDB 2020. In this talk, we will introduce how push-based shuffle can drastically increase shuffle efficiency when compared with the existing pull-based shuffle. In addition, by combining push-based shuffle and pull-based shuffle, we show how Magnet shuffle service helps to harden shuffle infrastructure at LinkedIn scale by both reducing shuffle related failures and removing scaling bottlenecks. Furthermore, we will also talk about a few highlights of the implementation behind Magnet shuffle service, which can work natively with Spark and does not require deploying any external infrastructure or specialized hardware.

Min Shen

Min Shen is Staff Software Engineer at LinkedIn .

The event ended.

Watch Recording
*Recordings hosted on Youtube, click the link will open the Youtube page.