Deep Learning Model Deployment
Deployment server
Note: the order doesn’t indicate its popularity.
1. Ray
Ray is an open-source unified framework for scaling AI and Python applications like machine learning. It provides the compute layer for parallel processing so that you don’t need to be a distributed systems expert.
2. Nvidia Triton
Nvidia Triton an open-source inference serving software, standardizes AI model deployment and execution and delivers fast and scalable AI in production.
3. Truss
Truss: the simplest way to serve AI/ML models in production.
Model conversion or coding languages
1. TensorRT
NVIDIA TensorRT is an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
2. AITemplate
AITemplate(AIT) is a Python framework that transforms deep neural networks into CUDA (NVIDIA GPU) / HIP (AMD GPU) C++ code for lightning-fast inference serving.
3. TorchScript
TorchScript is an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.
Here is an introduction tutorial on TorchScript and the documentation about it.
4. Tensor Comprehensions
Tensor Comprehensions (TC) is a notation based on generalized Einstein notation for computing on multi-dimensional arrays. TC greatly simplifies ML framework implementations by providing a concise and powerful syntax which can be efficiently translated to high-performance computation kernels, automatically.
5. Apache TVM
Apache TVM is an End to End Machine Learning Compiler Framework for CPUs, GPUs and accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend.
6. OpenAI Triton
Triton is an open-source Python-like programming language which enables researchers with no CUDA experience to write highly efficient GPU code—most of the time on par with what an expert would be able to produce.
6. ONNX
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
7. Pytorch Lightning
Lightning is a hyper-minimalistic framework used to build machine learning components that can plug into existing ML workflows.
8. Torch.compile
Torch.compile makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, all while requiring minimal code changes.
Profiling tools:
1. NVIDIA NSight Systems
NVIDIA Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs, from large servers to our smallest system on a chip (SoC).
Enjoy Reading This Article?
Here are some more articles you might like to read next: