In Situ Neighborhood Sampling for Large-Scale GNN Training (DaMoN '24)

Table of Contents

TL;DR #

We propose an in situ neighborhood sampling kernel on Samsung SmartSSD to reduce redundant host PCIe transfers across epochs in GNN training. By moving sampling near storage and batching work at the epoch granularity, we cut host I/O and achieve strong sampling speedups that translate into faster end-to-end training.

Motivation #

On large graphs, preparing training data (sampling N-hop neighborhoods) dominates the overall training time. When graphs exceed memory, repeated transfers of overlapping neighborhoods across epochs inflate I/O. We address this by executing sampling close to where the data resides.

Approach #

Preprocess edges into large, FPGA-friendly chunks (aligned to 512 bytes) with per-chunk metadata for offsets and node ranges.
Execute sampling kernels on the SmartSSD FPGA, streaming chunks and producing per-epoch sampled neighborhoods.
Overlap sampling for the next epoch with current-epoch training on the host.

Results #

Reduced redundant host-device transfers by reusing near-storage sampling results across epochs.
Sampling-time speedups up to 4.26× over a sequential-read baseline were observed in our experiments.

Links #

ACM DOI: https://dl.acm.org/doi/abs/10.1145/3662010.3663443
Project overview: https://syh.one/posts/bu-ms-thesis-accelerating-gnn-training-with-smartssd
Original Code: https://github.com/CASP-Systems-BU/damon24-gnn-in-situ-sampling
An updated version is available https://syh.one/posts/smartssd-code-repository/

Citation #

Please refer to the ACM page for the authoritative citation and BibTeX.