Triton server tutorial

Triton Inference Server: The Basics and a Quick Tutorial

Introduction

Specify triton model by providing model repository path:

tritonserver --model-repository=<repository-path>

There can be multiple versions of each model, with each version stored in a numerically-named subdirectory. The subdirectory’s name must be the model’s version number and it should not be 0.

For example, an ONNX model directory structure looks like this:

<repository-path>/
-<model-name>/
--config.pbtxt
--1/
---model.onnx

How Triton Client communicate with Triton? Through GRPC or HTTP requests, to send inputs to Triton and receive outputs. Examples could be found here.

Install and Run Triton

Install Triton Docker Image

docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3 
#<xx.yy> represents the version of Triton

Create Your Model Repository

Run Triton

docker run --gpus=3 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models

Introduction

Install and Run Triton

Install Triton Docker Image

Create Your Model Repository

Run Triton

Enjoy Reading This Article?