Llama API overview

Llama API is a Meta-hosted API service that helps you integrate Llama models into your applications quickly and efficiently.

Llama API provides access to Llama models through a simple API interface, with inference provided by Meta, so you can focus on building AI-powered solutions without managing your own inference infrastructure.

With Llama API, you get access to state-of-the-art AI capabilities through a developer-friendly interface designed for simplicity and performance.

Llama API features

Llama API exposes the capabilities of the latest Llama models via convenient API endpoints, including chat completion, image understanding and tool calling.

Chat completion: Generate text from a prompt, or build a chat-based AI assistant using multi-modal input (text, images) and text-based outputs.
Image understanding: Process and analyze visual data to extract insights, interpret charts, and more.
JSON structured output: Generate responses that follow pre-defined JSON schemas.
Tool calling: Integrate with your existing tools by defining tools that can be called when generating responses.
Moderation: Use sophisticated safety models to check user and model text for problematic content.
OpenAI compatibility: Use OpenAI clients with Llama API using the compatibility endpoint.
Fine-tuning and evaluation: Fine-tune a pre-trained Llama model on specialized datasets to improve performance for specific use cases.
Accelerated inference: Use accelerated inference from third-party inference providers for faster responses in latency-sensitive use cases.

Using Llama API

Llama API offers endpoints in a REST-like interface that makes it easy to make API calls directly from most programming languages.

Meta maintains SDKs for Llama API in multiple languages, including Python and TypeScript. See SDKs and libraries for more information on official libraries for Llama API.

Llama API is compatible with OpenAI-based libraries. See OpenAI compatibility for more information on OpenAI-based library support.

Data commitments

Meta does not use your content, including API inputs (prompts) or API outputs (model responses), for training our models.

No training
Encryption at rest and in transit
Data not used for ads
Separation in storage
Strict access control
Compliance & vulnerability management

See data commitments for more information.

Other ways to use Llama

Llama API is a great way to use Llama models in your application, but it is just one of many ways to use Llama.

Llama cloud providers

Meta partners with cloud providers to offer Llama models and cloud inference services at competitive prices. See Meta Llama in the Cloud for a detailed list of cloud providers that offer Llama models.

Llama self-hosted

To host and run Llama models on your own infrastructure, take a look at the Llama Everywhere guide that shows you how to run on common desktop operating systems and Linux-based infrastructure.

Llama Stack

Similar to Llama API, Llama Stack offers a REST-like interface to Llama models, with both server and client implementations, making it easy to host your own API layer with Meta models or your own finetuned models.

Help & support

Find frequently asked questions, get support and assistance, and share your feedback with Meta in the Help Center, or report any problematic content generated by a Llama model.

If you encounter a technical issue with a Llama model, file a GitHub issue in the llama-models repository on GitHub.

Report security concerns at facebook.com/whitehat/info

Report violations of the Acceptable Use Policy or unlicensed uses of Llama at [email protected]