Positron

Available May ‘24

Transformer Inference Server

10x Better Performance/$ than DGX-H100

$175,000

Available April ‘24

Positron Developer Workstation

DGX-H100 performance on less than 1 KW

$50,000

Coming soon

Managed Transformer Inference

High performance, low latency model serving

Mixtral 8x7B tokens per second

(per user)

TPS / User @ Batch 1

Positron Atlas

4.4X PERFORMANCE FOR 50% OF THE COST

328

Nvidia DGX-H100

TPS / User @ Batch 32

Positron Atlas

4.9X PERFORMANCE FOR 50% OF THE COST

267

Nvidia DGX-H100

Every Transformer Runs on Positron

Supports all Transformer models seamlessly with zero time and zero effort

Model Deployment on Positron in 4 Easy Steps

Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use

Develop or procure a model using the HuggingFace Transformers Library.
Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.
Update client applications to use Positron’s OpenAI API-compliant endpoint.
Issue API requests and receive the best performance.

GCS

Amazon S3

.pt

.safetensors

Drag & Drop to uploadorBROWSE FILES

“mistralai/Mixtral-8x7B-Instruct-v0.1”

Rest API { }

Model Manager

Model Loader

HF Model Fetcher

from openai import OpenAI
client = OpenAI(uri="api.positron.ai")

client.chat.completions

.create(

model="mixtral8x7b"

)

…

OpenAI-compatible

Python client

Positron Performance and Efficiency Advantages

Model

Inference Server

Performance(batch = 8 tokens/sec/user)

Price

Power

Performance per Watt Advantage

Performance per Dollar Advantage

LLama-2
(70B)

Positron Atlas

NVIDIA DGX-H100

151.9

46.8

$175K

$309K

1,800W

3,800W

6.9x

6.5x

Mixtral
(8x7B)

Positron Atlas

NVIDIA DGX-H100

319.4

73.4

$175K

$309K

1,800W

3,800W

10.3x

8.7x

Atlas

Redstone

Cloud

Mixtral 8x7B tokens per second

TPS / User @ Batch 1

TPS / User @ Batch 32

Every Transformer Runs on Positron

Supports all Transformer models seamlessly with zero time and zero effort

Model Deployment on Positron in 4 Easy Steps

Positron Performance and Efficiency Advantages