Analyzing bottlenecks in large recommendation systems

Zhang, Jialiang

Abstract

Training and inferencing recommendation systems often have a greater need for analysis and computation over a large number of unstructured user-specific data blobs. One of the state-of-the-art recommendation models is Deep Learning Recommendation Model (DLRM) by Facebook. DLRM model consumes a large memory for storing embedding features with terabytes in size during training and inference. Aside from the memory cost, the long training time of DLRM is another issue. In this work, we investigated the potential bottlenecks of DLRM and discuss in detail two recent improvements proposed in the literature: pipeDLRM and TT-Rec. PipeDLRM proposes pipeline parallelism and split the whole model onto several GPUs to address compute time witho...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Analyzing bottlenecks in large recommendation systems

Abstract

Extracted data

Analyzing bottlenecks in large recommendation systems

Abstract

Extracted data

Related items

Related items