An SDK to run transformer models anywhere
Exla aggressively quantizes AI models to minimize memory usage and maximize inference speed. Whether you're deploying LLMs, VLMs, VLAs, or custom models, Exla reduces memory footprint by up to 80% and accelerates inference by 3–20x - all with just a few lines of code. https://cal.com/exla-ai/schedule
Tech & App Stack is available on paid plans
Upgrade to Silver or higher to reveal the full technology and app stack for any company.
View pricingGroq delivers fast, low cost AI inference using custom LPU (Language Processing Unit) chips and G...
Provides AI inference hardware and cloud platform for enterprises, using RDUs and SambaCloud to r...

AI hardware company building wafer-scale processors for deep learning training and inference