Triton Inference Server: The Basics and a Quick Tutorial

Github of Triton inference server.

Introduction

Specify triton model by providing model repository path:

tritonserver --model-repository=<repository-path>

There can be multiple versions of each model, with each version stored in a numerically-named subdirectory. The subdirectory’s name must be the model’s version number and it should not be 0.

For example, an ONNX model directory structure looks like this:

<repository-path>/
-<model-name>/
--config.pbtxt
--1/
---model.onnx

How Triton Client communicate with Triton? Through GRPC or HTTP requests, to send inputs to Triton and receive outputs. Examples could be found here.

Install and Run Triton

Install Triton Docker Image
docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3 
#<xx.yy> represents the version of Triton
Create Your Model Repository
Run Triton
docker run --gpus=3 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models