Skip to content
  • Auto
  • Light
  • Dark
Log in to API

Chat completion with Llama API

Chat completion lets developers create text based on a given prompt or conversation history. This feature uses a Llama model to predict the next word or sequence of words in a conversation, allowing developers to create AI models that can respond to user input in a natural and intuitive way.

Chat completion enables models to process a sequence of messages with different roles and generate appropriate responses.

Llama API supports three distinct roles in conversations:

  • System messages: Define overall behavior instructions for the model.
  • User messages: Represent inputs from your application’s users.
  • Assistant messages: Contain previous responses from the model.

Structuring a conversation into these roles allows developers to provide the model with context across multiple interactions, ensuring coherent, contextually relevant responses.

Use chat completion to build sophisticated conversational experiences, such as:

  • Customer Support: Create intelligent assistants that can handle inquiries and resolve issues.
  • Content Creation: Generate ideas, outlines, and drafts through interactive dialogue.
  • Education: Build tutoring systems that adapt explanations based on student questions.
  • Information Retrieval: Develop question-answering systems that provide concise, relevant information.

You can create a basic conversation with Llama API using a standard HTTP request.

Here’s how to implement a simple chat using cURL:

curl
curl "https://api.llama.com/v1/chat/completions" \
-H "Authorization: Bearer $LLAMA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-4-Maverick-17B-128E-Instruct-FP8",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that provides concise answers."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"max_tokens": 256
}'

Explore the full capabilities of Llama API with these resources: