Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

Paper List for Machine Learning Systems

Awesome PRs Welcome

Paper list for broad topics in machine learning systems

NOTE: Survey papers are annotated with [Survey 🔍] prefix.

Table of Contents

Data Processing

Data pipeline optimization

General

Preprocessing stalls

Fetch stalls (I/O)

Specific workloads (GNN, DLRM)

Caching and distributed storage for ML training

LLM data plane

Others

Data formats

  • [ECCV'22] L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training
  • [VLDB'21] Progressive compressed records: Taking a byte out of deep learning data

Data pipeline fairness and correctness

  • [CIDR'21] Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines

Data labeling automation

  • [VLDB'18] Snorkel: Rapid Training Data Creation with Weak Supervision

Training System

ML job analysis on GPU clusters

  • [ICSE'24] An Empirical Study on Low GPU Utilization of Deep Learning Jobs
  • [NSDI'24] Characterization of Large Language Model Development in the Datacenter
  • [NSDI'22] MLaaS in the wild: workload analysis and scheduling in large-scale heterogeneous GPU clusters (PAI)
  • [ATC'19] Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads (Philly)

Resource scheduling

Distributed training

AutoML

  • [OSDI'23] Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters
  • [NSDI'23] ModelKeeper: Accelerating DNN Training via Automated Training Warmup
  • [OSDI'20] Retiarii: A Deep Learning Exploratory-Training Framework

GNN training system

For comprehensive list of GNN systems papers, refer to https://github.com/chwan1016/awesome-gnn-systems.

Inference System

Attention Optimization

Mixture of Experts (MoE)

Communication Optimization & Network Infrastructure for Distributed ML

Fault tolerance & Straggler mitigation

GPU Memory Management & Optimization

GPU Sharing

Compiler

GPU Kernel Optimization

LLM Long Context

Model Compression

For comprehensive list of quantization papers, refer to https://github.com/Efficient-ML/Awesome-Model-Quantization.

Federated Learning

Privacy-Preserving ML

ML APIs & Application-Side Optimization

ML for Systems

Energy Efficiency

Retrieval-Augmented Generation (RAG)

Simulation

Systems for Agentic AI

RL Post-Training

Multimodal

https://github.com/friedrichor/Awesome-Multimodal-Papers

Hybrid LLMs

Others

References

This repository is motivated by:

关于 About

Curated collection of papers in machine learning systems
awesome-listawesome-papersmachine-learning-systems

语言 Languages

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
8
Total Commits
峰值: 2次/周
Less
More

核心贡献者 Contributors