POSITRON
Atlas
Transformer Inference Server
4x Performance per Watt versus GPUs
2.5x Performance per Dollar vs H100
Testflight
Managed Transformer Inference
Remote Access for Evaluation
Performance versus H100 (Tokens per Second, Mixtral 8x7B)
Positron Release 1.1
Positron Atlas
Nvidia DGX-H100
Positron Release 2.0
Positron Atlas
Nvidia DGX-H100
Positron Performance and Efficiency Advantages in Software V1.x
Performance
per Watt
Advantage
per $
Advantage
Llama 3.1 70B
Every Transformer Runs on Positron
Supports all Transformer models seamlessly with zero time and zero effort
Model Deployment on Positron in 4 Easy Steps
Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use
Develop or procure a model using the HuggingFace Transformers Library.
Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.
Update client applications to use Positron’s OpenAI API-compliant endpoint.
Issue API requests and receive the best performance.
GCS
Amazon S3
.pt
.safetensors
“mistralai/Mixtral-8x7B-Instruct-v0.1”
Rest API { }
Model Manager
Model Loader
HF Model Fetcher
client = OpenAI(uri="api.positron.ai")
client.chat.completions
OpenAI-compatible
Python client
Increased density for power-constrained racks
Based on V1.1 power and performance.
Upcoming events
AI Hardware and Edge AI Summit
At Kisaco's AI Hardware Summit, various systems providers share their latest pro
Go to event →NeurIPS 2024
The world’s premiere AI hardware providers share their latest progress. Members of the engineering team will be demoing both on-site and remote systems.
Go to event →