Project Overview
- Goals
- What This Project Is Not
Running
Configuration
- User-specific Configuration
  - Models Section
  - Display Settings
- LLM Scripts
Example scripts

Project Overview

Simple library and command-line tools for experimenting with LLMs.

Goals

Provide developers with a simple way to experiment with LLMs and LangChain:

Easy setup and configuration
Basic chat / CLI tools
Own tool integration (both in Python and via composition of other tools)
Support for less-mainstream LLMs like AWS Bedrock

What This Project Is Not

Not an end-user tool: This project is geared toward developers and researchers with knowledge of Python, LLM capabilities, and programming fundamentals.
Not a complete automation system: It relies on human oversight and guidance for optimal performance.

Running

Library comes with command-line tools for running and testing LLM scripts: llm-workers-cli, llm-workers-chat, and llm-workers-evaluate.

To chat using LLM script:

llm-workers-cli [--verbose] [--debug] <script_file>

To resume previous chat session:

llm-workers-cli [--verbose] [--debug] --resume .last <script_file>

To run LLM script with prompt(s) as command-line arguments:

llm-workers-cli [--verbose] [--debug] <script_file> [<prompt1> ... <promptN>]

To run LLM script with prompt(s) read from stdin, each line as separate prompt:

llm-workers-cli [--verbose] [--debug] <script_file> -

Results of LLM script execution will be printed to the stdout without any extra formatting.

To chat with LLM script:

llm-workers-chat [--verbose] [--debug] <script_file>

The tool provides terminal chat interface where user can interact with LLM script.

To run evaluation suites against LLM scripts:

llm-workers-evaluate [--verbose] [--debug] [-n iterations] <script_file> <evaluation_suite>

The tool runs automated tests and reports scores. See Evaluation Framework for details.

Common flags:

--verbose - increases verbosity of stderr logging, can be used multiple times (info / debug)
--debug - increases amount of debug logging to file/stderr, can be used multiple times (debug only main worker / debug whole llm_workers package / debug all)

Configuration

User-specific Configuration

User-specific configuration is stored in ~/.config/llm-workers/config.yaml.

models:
  - name: <model_name>
    provider: <provider_name>
    model: <model_id>
    # [additional parameters...]

# Display settings
display_settings:
  # see below

On first launch, llm-workers CLI will guide you through initial setup. You can choose from:

OpenAI presets: Configure OpenAI GPT models for the standard model slots
Anthropic preset: Configure Claude models for the standard model slots
Google preset: Configure Google Gemini models for the standard model slots
Manual configuration: Set up custom model configurations

Models Section

Defines the LLMs to use. Configuration must define at least those standard models:

fast: Optimized for speed and simple tasks
default: Balanced performance for most use cases
thinking: Advanced reasoning with internal thought processes

There are two types of model configurations:

Standard Model Configuration

name: Identifier for the model
provider: Service provider (e.g., bedrock, bedrock_converse, openai)
model: Model identifier
rate_limiter: Optional rate limiting configuration
pricing: Optional cost estimation configuration (see Cost Estimation below)
config: Optional model-specific parameters (overrides main section parameters if used)

models:
  - name: default
    provider: openai
    model: gpt-4o
    rate_limiter:
      requests_per_second: 1.0
      max_bucket_size: 10
    # model specific parameters defined inline
    temperature: 0.7
    max_tokens: 1500
    # optional pricing for cost estimation
    pricing:
      currency: USD
      input_tokens_per_million: 2.50
      output_tokens_per_million: 10.00
      cache_read_tokens_per_million: 1.25

Import Model Configuration

name: Identifier for the model
import_from: Fully-qualified Python class/function path for custom model implementation
rate_limiter: Optional rate limiting configuration
config: Optional parameters passed to the model constructor/factory (overrides main section parameters if used)

The imported symbol can be:

A BaseChatModel instance (used directly)
A class (instantiated with config parameters)
A function/method (called with config parameters to create the model)

models:
  - name: custom_model
    import_from: my_module.models.CustomChatModel
    rate_limiter:
      requests_per_second: 2.0
      max_bucket_size: 5
    config:
      base_url: "https://api.example.com"
      api_key: "your-api-key"
      model_type: "advanced"
      timeout: 30

Model-specific Configuration

Any parameters from the config section will be passed to the model.

Display Settings

The display_settings section controls various user experience and display options for the chat interface:

display_settings:
  # Token usage display (default: true)
  show_token_usage: true

  # Reasoning tokens display (default: false)
  show_reasoning: false

  # Auto-open changed files (default: false)
  auto_open_changed_files: false

  # Markdown output formatting (default: false)
  markdown_output: true

  # Response streaming (default: true)
  stream: true

  # File monitoring patterns (defaults shown)
  file_monitor_include: [ '*.jpg', '*.jpeg', '*.png', '*.gif', '*.tiff', '*.svg', '*.wbp' ]
  file_monitor_exclude: ['.*', '*.log']

Token Usage Display

When show_token_usage is enabled (true), the chat interface will:

Display current token usage after each AI response
Show detailed per-model token summary when exiting the chat session
Include input, output, reasoning tokens (when available), and cache usage

When disabled (false), no token usage information is displayed.

Cost Estimation

Cost estimation provides automatic calculation of API costs based on token usage. To enable cost estimation, add a pricing section to your model configuration:

models:
  - name: default
    provider: anthropic
    model: claude-sonnet-4-5
    pricing:
      currency: USD
      input_tokens_per_million: 3.00
      output_tokens_per_million: 15.00
      cache_read_tokens_per_million: 0.30
      cache_write_tokens_per_million: 3.75

Pricing fields:

currency: Currency code (e.g., “USD”, “EUR”, “GBP”) - default: “USD”
input_tokens_per_million: Cost per million input tokens (optional)
output_tokens_per_million: Cost per million output tokens (optional)
cache_read_tokens_per_million: Cost per million cache read tokens (optional)
cache_write_tokens_per_million: Cost per million cache write tokens (optional)

Notes:

All pricing fields are optional - costs are only calculated for configured token types
Reasoning tokens are counted as output tokens (no separate pricing)
Cost display appears alongside token usage when using the /cost command or on exit
Models without pricing configuration will show token usage only

Example output with cost estimation:

Total Session Tokens: 1,234 total
  fast: 500 (200 in, 300 out) → $0.0018 USD
  default: 734 (334 in, 400 out) → $0.0036 USD
Total Session Cost: $0.0054 USD

Reasoning Display

When show_reasoning is enabled (true), the chat interface will display reasoning tokens from models that support them (like Claude with thinking). This setting can also be toggled during chat sessions using the /show_reasoning command.

Response Streaming

When stream is enabled (true, default), the chat interface will stream LLM responses token-by-token as they are generated, providing immediate feedback. When disabled (false), the complete response is received before being displayed. Streaming is generally recommended for better user experience with longer responses.

File Management

auto_open_changed_files: When enabled, automatically opens files that are created or modified during the session
file_monitor_include/file_monitor_exclude: Patterns controlling which files are monitored for changes

Output Formatting

markdown_output: When enabled, formats AI responses as markdown for better readability

LLM Scripts

LLM scripts are YAML configuration files that define how to interact with large language models (LLMs) and what tools LLMs can use. You should treat them like a normal scripts. In particular - DO NOT run LLM scripts from unknown / untrusted sources. Scripts can easily download and run malicious code on your machine, or submit your secrets to some web site.

See LLM Script file for reference.

Example scripts

The examples page contains sample LLM scripts demonstrating various features:

Metacritic-monkey.yaml - Custom tools with statement composition, web fetching tools, inline tool definitions, match statements with stubbed data, LLM tool integration, template variables, UI hints
explicit-approval-tools.yaml - Explicit approval workflow with token-based confirmation system, custom tool composition with inline imports, approval tools (request/validate/consume), safe execution of potentially dangerous operations
find-concurrency-bugs.yaml - CLI mode with statement composition, file reading tool, thinking model via model_ref, structured JSON output (by instruction)
navigation-planning.yaml - Web fetching tools with markdown conversion, nested custom tools, tool composition with return_direct flag, CLI mode with tool restrictions, chat mode configuration
reformat-Scala.yaml - CLI mode with complex file processing pipeline, match statements with conditional file operations, file I/O tools, LLM tool integration for code transformation

Table of contents