POSITRON
Atlas
Transformer Inference Server
10x Better Performance/$ than DGX-H100
Redstone
Positron Developer Workstation
DGX-H100 performance on less than 1 KW
Cloud
Managed Transformer Inference
High performance, low latency model serving
Mixtral 8x7B tokens per second
(per user)
TPS / User @ Batch 1
Positron Atlas
Nvidia DGX-H100
TPS / User @ Batch 32
Positron Atlas
Nvidia DGX-H100
Every Transformer Runs on Positron
Supports all Transformer models seamlessly with zero time and zero effort
Model Deployment on Positron in 4 Easy Steps
Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use
Develop or procure a model using the HuggingFace Transformers Library.
Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.
Update client applications to use Positron’s OpenAI API-compliant endpoint.
Issue API requests and receive the best performance.
GCS
Amazon S3
.pt
.safetensors
“mistralai/Mixtral-8x7B-Instruct-v0.1”
Rest API { }
Model Manager
Model Loader
HF Model Fetcher
client = OpenAI(uri="api.positron.ai")
client.chat.completions
OpenAI-compatible
Python client
Positron Performance and Efficiency Advantages
(70B)
$175K
6.9x
6.5x
(8x7B)
$175K
10.3x
8.7x