POSITRON

Positron
Now Available
AtlasAtlas
Atlas

Atlas

Transformer Inference Server

  • 4x Performance per Watt versus GPUs

  • 2.5x Performance per Dollar vs H100

Now Available
TestflightTestflight
Testflight

Testflight

Managed Transformer Inference

  • Remote Access for Evaluation

Performance versus H100 (Tokens per Second, Mixtral 8x7B)

01

Positron Release 1.1

Positron Atlas

3.8X PERFORMANCE FOR 1/3 THE COST
110

Nvidia DGX-H100

93
02

Positron Release 2.0

Positron Atlas

5.2X PERFORMANCE FOR 1/4 THE COST
165

Nvidia DGX-H100

93

Positron Performance and Efficiency Advantages in Software V1.x

September 2024
Software Release
Models Benchmarked
Relative
Performance
Performance
per Watt
Advantage
Performance
per $
Advantage
Confidence
V1.1
Mixtral 8x7B
Llama 3.1 70B
1.1*
3.9
2.6
In-dev, measured.
* Nvidia performance is based on vLLM 0.5.4 for both Mixtral 8x7B, Llama 3.1 8B, and Llama 3.1 70B.

Every Transformer Runs on Positron

Supports all Transformer models seamlessly with zero time and zero effort

Model Deployment on Positron in 4 Easy Steps

Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use

  • Develop or procure a model using the HuggingFace Transformers Library.

  • Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.

  • Update client applications to use Positron’s OpenAI API-compliant endpoint.

  • Issue API requests and receive the best performance.

GCS

GCS

Amazon S3

Amazon S3

Files

.pt

.safetensors

Drag & Drop to uploadorBROWSE FILES

“mistralai/Mixtral-8x7B-Instruct-v0.1”

Hugging Face
Positron

Rest API { }

Model Manager

Model Loader

HF Model Fetcher

from openai import OpenAI
client = OpenAI(uri="api.positron.ai")

client.chat.completions
.create(
model="mixtral8x7b"
)

OpenAI-compatible

Python client

Increased density for power-constrained racks

Based on V1.1 power and performance.

Mixtral 8x7B Performance
DGX-H100
Atlas α
Aggregate Tokens per Second (TPS)
744
4400
Number of Users
8
40
DGX-H100
5,900 W
Atlas α
Atlas α
Atlas α
Atlas α
Atlas α

Upcoming events

AI Hardware and Edge AI Summit
September 10, 2024Signia By Hilton, San Jose, CA

AI Hardware and Edge AI Summit

At Kisaco's AI Hardware Summit, various systems providers share their latest pro

Go to event
NeurIPS 2024
December 9, 2024Vancouver Convention Centre, Vancouver, Canada

NeurIPS 2024

The world’s premiere AI hardware providers share their latest progress. Members of the engineering team will be demoing both on-site and remote systems.

Go to event
Go to events