About Me
Iβm seeking an internship where I can contribute, learn, and grow at the intersection of systems and AI.
π Get in touch here
π Education #
Boston University β
π PhD Student in Computer Science
π Sep 2024 β Present Β· π Boston, MA
Boston University β
π Master of Science in Computer Science β GPA: 3.96/4.0
π Sep 2022 β May 2024 Β· π Boston, MA
South China University of Technology β
π Bachelor of Engineering in Software Engineering
π Sep 2017 β Jun 2021 Β· π Guangzhou, China
π§° Work & Internship #
Intel Asia-Pacific Research & Development Ltd β AI Framework Engineer β
π Aug 2021 β Aug 2022 Β· π Shanghai, China
- Contributed to the oneDNN Graph Compiler with fusion, loop rescheduling, inlining, and SIMD intrinsics, delivering measurable speedups in deep learning workloads across AVX512/AVX512-BF16 platforms.
- Implemented a range of elementwise and graph operators including normalization and VNNI reordering, developed graph patterns such as MHA to identify and analyze performance bottlenecks.
Tencent β WeChat Group β Software Engineer Intern β
π Jun 2020 β Sep 2020 Β· π Guangzhou, China
- Built an internal test case management system using Python Tornado + React.
- Developed Grafana dashboards for internal statistical data visualization.
π Research & Publications #
[Paper] RingSampler: GNN Sampling on Large-Scale Graphs with io_uring β HotStorage β25 β
π Jul 2025 (Second Author) Β· π ACM Link
- Introduced an io_uring-based out-of-core sampling method for graphs larger than memory.
- Fully utilized CPU resources while achieving near in-memory sampling performance.
[Poster] GNN Training with Graph Summarization β MLSys β25 β
π May 2025
- Explored graph summarization as preprocessing to shrink graph size before training.
- Demonstrated lower sampling time vs. random-access baseline and significant PCIe bandwidth savings.
SmartSSD Code Repository β
π Dec 2024 Β·
Β GitHub
- Published an evolved implementation of SmartSSD-based GNN sampling.
- Refactored host-side design for clearer results and usability.
[Paper] In Situ Neighborhood Sampling for Large-Scale GNN Training β DaMoN β24
π Jun 2024 Β· π ACM Link
- Proposed a SmartSSD-based streaming sampler for efficient GNN training.
- Demonstrated lower sampling time vs. random-access baseline and significant PCIe bandwidth savings.
[Masterβs Thesis] Accelerating Large-Scale GNN Training with Programmable SSDs
π May 2024 Β· π Boston University Β· π Thesis Link
- Designed FPGA kernels with Vitis HLS to offload computation onto SSD and reduce PCIe transfers.
- Compared preprocessing costs, file layouts, and sampling speeds on Papers100M and Yahoo datasets.
π¬ Projects #
Advanced Matrix Multiplication β Strassen Algorithm β
π Feb 2023 β May 2023 Β· π Boston, MA
- Implemented and optimized the Strassen algorithm for fast matrix multiplication.
- Leveraged OpenMP parallelism and SIMD intrinsics to accelerate computation.
- Conducted detailed performance analysis and compared against baseline algorithms.
Fine-Tuning Pretrained Vision Transformers β
π Feb 2023 β May 2023 Β· π Boston, MA
- Fine-tuned NAT, DiNAT, MaxViT, and DaViT models on the iNaturalist dataset.
- Achieved competitive accuracy compared to state-of-the-art models.
- Analyzed architectural differences and compared fine-tuning vs. training from scratch.
Online Signature Verification (CNN + GRU) β
π Jan 2021 β June 2021 Β· π Guangzhou, China
- Extracted static features from signature images via a CNN-based autoencoder.
- Captured dynamic features from signature trajectories using a GRU-based autoencoder.
- Applied early/late fusion and a CNN-based Siamese network to verify authenticity.
Parallel Group-By Operation (MPI + OpenMP) β
π Mar 2020 - June 2020 Β· π Guangzhou, China
- Simulated the βGroup-Byβ operation of databases on large datasets.
- Used OpenMP for node-level parallelism and MPI for distributed cross-node processing.
- Achieved multi-fold efficiency improvements over serial implementations.
Short Video Copyright Detection β
π Sep 2019 β Nov 2019 Β· π Guangzhou, China
- Designed an approximate detection system to find short clips within long videos.
- Extracted features with SIFT and indexed them using FAISS vector database.
- Applied hierarchical matching + sliding window for accurate detection within 5s tolerance.
UWP-Based Property Management System β
π Jun 2019 β Sep 2019 Β· π Guangzhou, China
- Built a frontend with C++ and UWP and a backend with Python/Django.
- Designed and implemented RESTful APIs for client-server communication.
- Delivered a complete property management solution for internal use.
π Honors & Awards #
- 1st Scholarship, SCUT β Awarded for academic excellence (Oct 2020) β
- 2nd Prize, CCF NOIP (Nov 2015) β Prestigious algorithm competition akin to ACM-ICPC β
π± Contact #
- π§ Email: i [at] syh.one
- π» GitHub: github.com/Souukou
- π LinkedIn: linkedin.com/in/yuhangsong