Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

LLM Providers

MoFA supports multiple LLM providers with a unified interface. This guide covers configuration and usage.

Supported Providers

ProviderEnvironment VariablesFeatures
OpenAIOPENAI_API_KEY, OPENAI_MODELStreaming, Function Calling
AnthropicANTHROPIC_API_KEY, ANTHROPIC_MODELStreaming, Extended Context
OllamaOPENAI_BASE_URLLocal Inference, Free
OpenRouterOPENAI_API_KEY, OPENAI_BASE_URLMultiple Models
vLLMOPENAI_BASE_URLHigh Performance

OpenAI

Configuration

OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o           # optional
OPENAI_BASE_URL=...           # optional, for proxies

Usage

#![allow(unused)]
fn main() {
use mofa_sdk::llm::{LLMClient, openai_from_env};

let provider = openai_from_env()?;
let client = LLMClient::new(Arc::new(provider));

// Simple query
let response = client.ask("What is Rust?").await?;

// With system prompt
let response = client
    .ask_with_system("You are a Rust expert.", "Explain ownership")
    .await?;

// Streaming
let mut stream = client.stream().system("You are helpful.").user("Tell a story").start().await?;
while let Some(chunk) = stream.next().await {
    print!("{}", chunk?);
}
}

Available Models

ModelDescriptionContext Length
gpt-4oLatest flagship (default)128K
gpt-4-turboHigh performance128K
gpt-3.5-turboFast, economical16K

Anthropic

Configuration

ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-5-latest  # optional

Usage

#![allow(unused)]
fn main() {
use mofa_sdk::llm::{LLMClient, anthropic_from_env};

let provider = anthropic_from_env()?;
let client = LLMClient::new(Arc::new(provider));

let response = client
    .ask_with_system("You are Claude, a helpful AI.", "Hello!")
    .await?;
}

Available Models

ModelDescriptionContext Length
claude-sonnet-4-5-latestBalanced (default)200K
claude-opus-4-latestMost capable200K
claude-haiku-3-5-latestFastest200K

Ollama (Local)

Setup

  1. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
  2. Pull a model: ollama pull llama3.2
  3. Run Ollama: ollama serve

Configuration

OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_MODEL=llama3.2

Usage

Same as OpenAI (uses OpenAI-compatible API):

#![allow(unused)]
fn main() {
let provider = openai_from_env()?;
let client = LLMClient::new(Arc::new(provider));
}
ModelSizeBest For
llama3.23BGeneral purpose
llama3.1:8b8BBetter quality
mistral7BFast responses
codellama7BCode generation

OpenRouter

Configuration

OPENAI_API_KEY=sk-or-...
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_MODEL=google/gemini-2.0-flash-001

Usage

#![allow(unused)]
fn main() {
let provider = openai_from_env()?;  // Uses OPENAI_BASE_URL
let client = LLMClient::new(Arc::new(provider));
}
ModelProviderNotes
google/gemini-2.0-flash-001GoogleFast, capable
meta-llama/llama-3.1-70b-instructMetaOpen source
mistralai/mistral-largeMistralEuropean AI

vLLM

Setup

pip install vllm
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf

Configuration

OPENAI_API_KEY=unused
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=meta-llama/Llama-2-7b-chat-hf

Custom Provider

Implement the LLMProvider trait:

#![allow(unused)]
fn main() {
use mofa_sdk::llm::{LLMProvider, LLMResponse, LLMError};
use async_trait::async_trait;

pub struct MyCustomProvider {
    api_key: String,
    endpoint: String,
}

#[async_trait]
impl LLMProvider for MyCustomProvider {
    async fn complete(&self, prompt: &str) -> Result<String, LLMError> {
        // Your implementation
    }

    async fn complete_with_system(
        &self,
        system: &str,
        prompt: &str,
    ) -> Result<String, LLMError> {
        // Your implementation
    }

    async fn stream_complete(
        &self,
        system: &str,
        prompt: &str,
    ) -> Result<impl Stream<Item = Result<String, LLMError>>, LLMError> {
        // Optional streaming implementation
    }
}
}

Best Practices

API Key Security

#![allow(unused)]
fn main() {
// NEVER hardcode API keys
// BAD:
let key = "sk-...";

// GOOD: Use environment variables
dotenvy::dotenv().ok();
let key = std::env::var("OPENAI_API_KEY")?;
}

Error Handling

#![allow(unused)]
fn main() {
use mofa_sdk::llm::LLMError;

match client.ask(prompt).await {
    Ok(response) => println!("{}", response),
    Err(LLMError::RateLimited { retry_after }) => {
        tokio::time::sleep(Duration::from_secs(retry_after)).await;
        // Retry
    }
    Err(LLMError::InvalidApiKey) => {
        eprintln!("Check your API key configuration");
    }
    Err(e) => {
        eprintln!("Error: {}", e);
    }
}
}

Token Management

#![allow(unused)]
fn main() {
// Use sliding window to manage context
let agent = LLMAgentBuilder::from_env()?
    .with_sliding_window(10)  // Keep last 10 messages
    .build_async()
    .await;

// Or manual token counting
let tokens = client.count_tokens(&prompt).await?;
if tokens > 4000 {
    // Truncate or summarize
}
}

See Also