POSITRON

Positron
Available May ‘24
AtlasAtlas
Atlas

Atlas

Transformer Inference Server

  • 10x Better Performance/$ than DGX-H100

$175,000
Available April ‘24
RedstoneRedstone
Redstone

Redstone

Positron Developer Workstation

  • DGX-H100 performance on less than 1 KW

$50,000
Coming soon
CloudCloud
Cloud

Cloud

Managed Transformer Inference

  • High performance, low latency model serving

Mixtral 8x7B tokens per second

(per user)

01

TPS / User @ Batch 1

Positron Atlas

4.4X PERFORMANCE FOR 50% OF THE COST
328

Nvidia DGX-H100

74
02

TPS / User @ Batch 32

Positron Atlas

4.9X PERFORMANCE FOR 50% OF THE COST
267

Nvidia DGX-H100

55

Every Transformer Runs on Positron

Supports all Transformer models seamlessly with zero time and zero effort

Model Deployment on Positron in 4 Easy Steps

Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use

  • Develop or procure a model using the HuggingFace Transformers Library.

  • Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.

  • Update client applications to use Positron’s OpenAI API-compliant endpoint.

  • Issue API requests and receive the best performance.

GCS

GCS

Amazon S3

Amazon S3

Files

.pt

.safetensors

Drag & Drop to uploadorBROWSE FILES

“mistralai/Mixtral-8x7B-Instruct-v0.1”

Hugging Face
Positron

Rest API { }

Model Manager

Model Loader

HF Model Fetcher

from openai import OpenAI
client = OpenAI(uri="api.positron.ai")

client.chat.completions
.create(
model="mixtral8x7b"
)

OpenAI-compatible

Python client

Positron Performance and Efficiency Advantages

Model
Inference Server
Performance(batch = 8 tokens/sec/user)
Price
Power
Performance per Watt Advantage
Performance per Dollar Advantage
LLama-2
(70B)
Positron Atlas
NVIDIA DGX-H100
151.9
46.8

$175K

$309K
1,800W
3,800W

6.9x

6.5x

Mixtral
(8x7B)
Positron Atlas
NVIDIA DGX-H100
319.4
73.4

$175K

$309K
1,800W
3,800W

10.3x

8.7x