> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-mintlify-8c05c8a2.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> Documentation for AI Functions

# AI Functions

AI Functions are built-in functions in ClickHouse that you can use to call AI or generate embeddings to work with your data, extract information, classify data, etc...

<Note>
  AI functions are experimental. Set [`allow_experimental_ai_functions`](/reference/settings/session-settings#allow_experimental_ai_functions) to enable them.
</Note>

<Note>
  AI functions can return unpredictable outputs. The result will highly depend on the quality of the prompt and the model used.
</Note>

All functions are sharing a common infrastructure that provides:

* **Quota enforcement**: Per-query limits on tokens ([`ai_function_max_input_tokens_per_query`](/reference/settings/session-settings#ai_function_max_input_tokens_per_query), [`ai_function_max_output_tokens_per_query`](/reference/settings/session-settings#ai_function_max_output_tokens_per_query)) and API calls ([`ai_function_max_api_calls_per_query`](/reference/settings/session-settings#ai_function_max_api_calls_per_query)).
* **Retry with backoff**: Transient failures are retried ([`ai_function_max_retries`](/reference/settings/session-settings#ai_function_max_retries)) with exponential backoff ([`ai_function_retry_initial_delay_ms`](/reference/settings/session-settings#ai_function_retry_initial_delay_ms)).

<h2 id="configuration">
  Configuration
</h2>

AI functions resolve provider credentials and configuration from a [**named collection**](/concepts/features/configuration/server-config/named-collections). To set a named collection to use for credentials, use the [`ai_function_credentials`](/reference/settings/session-settings#ai_function_credentials) setting.

Example statement to create a named collection with provider credentials:

```sql theme={null}
CREATE NAMED COLLECTION my_ai_credentials AS
    provider = 'openai',
    endpoint = 'https://api.openai.com/v1/chat/completions',
    model = 'gpt-4o-mini',
    api_key = 'sk-...';
```

Select the collection with the `ai_function_credentials` setting, for the session or for a single query:

```sql theme={null}
-- For the session:
SET allow_experimental_ai_functions = 1;
SET ai_function_credentials = 'my_ai_credentials';
SELECT aiClassify('I love this product!', ['positive', 'negative', 'neutral']);

-- Or for a single query:
SELECT aiClassify('I love this product!', ['positive', 'negative', 'neutral'])
SETTINGS allow_experimental_ai_functions = 1, ai_function_credentials = 'my_ai_credentials';
```

When `ai_function_credentials` is empty (the default), an exception is raised.

<h3 id="named-collection-parameters">
  Named collection parameters
</h3>

| Parameter     | Type   | Default | Description                                                                                                                                                                    |
| ------------- | ------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `provider`    | String | —       | Model provider. Supported: `'openai'`, `'anthropic'`. See note below.                                                                                                          |
| `endpoint`    | String | —       | API endpoint URL.                                                                                                                                                              |
| `model`       | String | —       | Model name (e.g. `'gpt-4o-mini'`, `'text-embedding-3-small'`).                                                                                                                 |
| `api_key`     | String | —       | Authentication key for the provider. Optional: when omitted, the auth header is not sent, which allows targeting OpenAI-compatible servers that do not require authentication. |
| `max_tokens`  | UInt64 | `1024`  | Maximum number of output tokens per API call.                                                                                                                                  |
| `api_version` | String | —       | API version string. Used by Anthropic (`'2023-06-01'`).                                                                                                                        |

<Note>
  Any OpenAI-compatible API (e.g. vLLM, Ollama, LiteLLM) can be used by setting `provider = 'openai'` and pointing the `endpoint` to your service.
</Note>

<h3 id="query-level-settings">
  Query-level settings
</h3>

Which named collection to use is controlled by the [`ai_function_credentials`](/reference/settings/session-settings#ai_function_credentials) setting. Other AI-related settings are listed in [Settings](/reference/settings/session-settings) under the `ai_function_` prefix.

<h3 id="default-and-materialized-columns">
  Use in `DEFAULT` and `MATERIALIZED` columns
</h3>

The `ai_function_credentials` setting is read when the default expression is evaluated, NOT when the column is defined. The collection name is not stored in the column definition:

```sql theme={null}
CREATE TABLE t (id UInt32, doc String, vector Array(Float32) DEFAULT aiEmbed(doc)) ...;
-- The stored default is `aiEmbed(doc)`; no collection is captured.
```

Evaluating the expression requires three things: `allow_experimental_ai_functions` and `ai_function_credentials` must be set, and the evaluating user must hold `GRANT NAMED COLLECTION` on the collection (resolving the credentials runs a `NAMED COLLECTION` access check). Any of them missing raises an exception (`SUPPORT_IS_DISABLED`, an empty-credentials error, or `ACCESS_DENIED`).

A `DEFAULT` column is evaluated at `INSERT`, so both settings must be set in the inserting session or query:

```sql theme={null}
GRANT NAMED COLLECTION ON my_ai_credentials TO user;
SET allow_experimental_ai_functions = 1;
SET ai_function_credentials = 'my_ai_credentials';
INSERT INTO t (id, doc) VALUES (1, 'hello');
```

To make such tables insertable without setting these per session, set both in a [settings profile](/concepts/features/configuration/settings/settings-profiles):

```xml theme={null}
<profiles>
    <default>
        <allow_experimental_ai_functions>1</allow_experimental_ai_functions>
        <ai_function_credentials>my_ai_credentials</ai_function_credentials>
    </default>
</profiles>
```

A `MATERIALIZED` column is computed at `INSERT` like a `DEFAULT` column, and is also recomputed by mutations such as `ALTER TABLE ... MATERIALIZE COLUMN`. Mutations run outside a user session and do not inherit a query's `SETTINGS` clause, but they do inherit settings from a settings profile. Set both settings in a settings profile, and grant `NAMED COLLECTION` to the table owner, for mutation-driven recomputation to succeed.

<h3 id="restricting-endpoint-hosts">
  Restricting endpoint hosts
</h3>

The `endpoint` URL in an AI named collection is an outbound destination the server connects to under its own identity, potentially carrying (if specified) the named collection's `api_key` in the request headers. By default, ClickHouse permits any host. To restrict functions to a specific set of providers, configure [`remote_url_allow_hosts`](/reference/settings/server-settings/settings#remote_url_allow_hosts) in the server config, e.g.:

```xml theme={null}
<remote_url_allow_hosts>
    <host>api.openai.com</host>
    <host>api.anthropic.com</host>
</remote_url_allow_hosts>
```

Note that this setting is server-wide and applies to all HTTP-using features.

<h2 id="supported-providers">
  Supported providers
</h2>

| Provider  | `provider` value | Chat functions | Notes                         |
| --------- | ---------------- | -------------- | ----------------------------- |
| OpenAI    | `'openai'`       | Yes            | Default provider.             |
| Anthropic | `'anthropic'`    | Yes            | Uses `/v1/messages` endpoint. |

<h2 id="observability">
  Observability
</h2>

AI function activity is tracked through ClickHouse [ProfileEvents](/reference/system-tables/query_log):

| ProfileEvent      | Description                                                                              |
| ----------------- | ---------------------------------------------------------------------------------------- |
| `AIAPICalls`      | Number of HTTP requests made to the AI provider.                                         |
| `AIInputTokens`   | Total input tokens consumed.                                                             |
| `AIOutputTokens`  | Total output tokens consumed.                                                            |
| `AIRowsProcessed` | Number of rows that received a result.                                                   |
| `AIRowsSkipped`   | Number of rows skipped (quota exceeded, or error with `ai_function_throw_on_error = 0`). |

Query these events:

```sql theme={null}
SELECT
    ProfileEvents['AIAPICalls'] AS api_calls,
    ProfileEvents['AIInputTokens'] AS input_tokens,
    ProfileEvents['AIOutputTokens'] AS output_tokens
FROM system.query_log
WHERE query_id = 'query_id'
AND type = 'QueryFinish'
ORDER BY event_time DESC;
```

{/*AUTOGENERATED_START*/}

<h2 id="aiClassify">
  aiClassify
</h2>

Introduced in: v26.4.0

Classifies the given text into one of the provided categories using an LLM provider.

The function sends the text together with a fixed classification prompt and a JSON-schema response format
constraining the model to return exactly one of the supplied labels. When the response is returned as a JSON
object of the form `{"category": "..."}`, the label is unwrapped and the label string is returned.

Provider credentials and configuration are taken from the named collection specified by the `ai_function_credentials` setting.

**Syntax**

```sql theme={null}
aiClassify(text, categories[, temperature])
```

**Aliases**: `AIClassify`

**Arguments**

* `text` — Text to classify. [`String`](/reference/data-types/string)
* `categories` — Constant list of candidate category labels. [`Array(String)`](/reference/data-types/array)
* `temperature` — Sampling temperature controlling randomness. Default: `0.0`. [`Float64`](/reference/data-types/float)

**Returned value**

One of the provided category labels, or the default value for the column type (empty string) if the request failed and `ai_function_throw_on_error` is disabled. [`String`](/reference/data-types/string)

**Examples**

**Classify sentiment**

```sql title=Query theme={null}
SELECT aiClassify('I love this product!', ['positive', 'negative', 'neutral']) SETTINGS ai_function_credentials = 'my_ai_credentials'
```

```response title=Response theme={null}
positive
```

**Classify a column**

```sql title=Query theme={null}
SELECT body, aiClassify(body, ['bug', 'question', 'feature']) AS kind FROM issues LIMIT 5
```

```response title=Response theme={null}
```

<h2 id="aiEmbed">
  aiEmbed
</h2>

Introduced in: v26.6.0

Generates an embedding vector for the given text using the configured AI provider.

The function sends the text to the configured embedding endpoint and returns the resulting vector as `Array(Float32)`.
Within a single block of rows, inputs are grouped into batches of up to
[`ai_function_embedding_max_batch_size`](/reference/settings/session-settings#ai_function_embedding_max_batch_size)
entries per HTTP request to reduce per-call overhead.

Provider credentials and configuration are taken from the named collection specified by the `ai_function_credentials` setting.
The optional `dimensions` argument, when supported by the model (e.g. OpenAI's `text-embedding-3-*`),
requests a vector of the given size; otherwise the model's native size is returned.

**Syntax**

```sql theme={null}
aiEmbed(text[, dimensions])
```

**Arguments**

* `text` — Text to embed. [`String`](/reference/data-types/string)
* `dimensions` — Optional target dimensionality for the output vector. `0` or omitted means the model's native size. [`UInt64`](/reference/data-types/int-uint)

**Returned value**

The embedding vector, or an empty array if the input is NULL or empty, the request failed and `ai_function_throw_on_error` is disabled, or a quota was exceeded with `ai_function_throw_on_quota_exceeded` disabled. [`Array(Float32)`](/reference/data-types/array)

**Examples**

**Embed a single string**

```sql title=Query theme={null}
SELECT aiEmbed('Hello world') SETTINGS ai_function_credentials = 'my_ai_credentials'
```

```response title=Response theme={null}
```

**With explicit dimensions**

```sql title=Query theme={null}
SELECT aiEmbed('Hello world', 256) SETTINGS ai_function_credentials = 'my_ai_credentials'
```

```response title=Response theme={null}
```

**Embed a column of texts**

```sql title=Query theme={null}
SELECT aiEmbed(title, 256) FROM articles LIMIT 10
```

```response title=Response theme={null}
```

<h2 id="aiExtract">
  aiExtract
</h2>

Introduced in: v26.4.0

Extracts structured information from unstructured text using an LLM provider.

The second argument may be either a free-form natural-language instruction (e.g. `'the main complaint'`) or a
JSON-encoded schema of the form `'{"field_a": "description of field a", "field_b": "description of field b"}'`.

In instruction mode, the function returns the extracted value as a plain string, or an empty string if nothing was found.
In schema mode, the function returns a JSON object string whose keys match the requested schema; missing fields are `null`.

Provider credentials and configuration are taken from the named collection specified by the `ai_function_credentials` setting.

**Syntax**

```sql theme={null}
aiExtract(text, instruction_or_schema[, temperature])
```

**Aliases**: `AIExtract`

**Arguments**

* `text` — Text to extract information from. [`String`](/reference/data-types/string)
* `instruction_or_schema` — Free-form extraction instruction, or a constant JSON object describing the fields to extract. [`const String`](/reference/data-types/string)
* `temperature` — Sampling temperature controlling randomness. Default: `0.0`. [`const Float64`](/reference/data-types/float)

**Returned value**

A single extracted value (instruction mode) or a JSON object string (schema mode). Returns the default value for the column type (empty string) if the request failed and `ai_function_throw_on_error` is disabled. [`String`](/reference/data-types/string)

**Examples**

**Free-form instruction**

```sql title=Query theme={null}
SELECT aiExtract('The package arrived late and was damaged.', 'the main complaint') SETTINGS ai_function_credentials = 'my_ai_credentials'
```

```response title=Response theme={null}
late and damaged package
```

**Schema extraction**

```sql title=Query theme={null}
SELECT aiExtract(review, '{"sentiment": "positive, negative or neutral", "topic": "main topic of the review"}') FROM reviews LIMIT 5
```

```response title=Response theme={null}
```

<h2 id="aiGenerate">
  aiGenerate
</h2>

Introduced in: v26.4.0

Generates free-form text content from a prompt using an LLM provider.

The function sends the prompt to the configured AI provider and returns the generated text.
An optional system prompt can be provided to guide the model's behavior (e.g. tone, format, role).
If no system prompt is given, the default system prompt is: `You are a helpful assistant. Provide a clear and concise response.`

Provider credentials and configuration are taken from the named collection specified by the `ai_function_credentials` setting.

**Syntax**

```sql theme={null}
aiGenerate(prompt[, system_prompt[, temperature]])
```

**Aliases**: `AIGenerate`

**Arguments**

* `prompt` — The user prompt or question to send to the model. [`String`](/reference/data-types/string)
* `system_prompt` — Optional constant system-level instruction that guides the model's behavior (e.g. persona, output format), sent along with each prompt. [`String`](/reference/data-types/string)
* `temperature` — Sampling temperature controlling randomness. Default: `0.7`. [`Float64`](/reference/data-types/float)

**Returned value**

The generated text response, or the default value for the column type (empty string) if the request failed and `ai_function_throw_on_error` is disabled. [`String`](/reference/data-types/string)

**Examples**

**Simple question**

```sql title=Query theme={null}
SELECT aiGenerate('What is 2 + 2? Reply with just the number.') SETTINGS ai_function_credentials = 'my_ai_credentials'
```

```response title=Response theme={null}
4
```

**With system prompt**

```sql title=Query theme={null}
SELECT aiGenerate('Explain ClickHouse', 'You are a database expert. Be concise.') SETTINGS ai_function_credentials = 'my_ai_credentials'
```

```response title=Response theme={null}
```

**Summarize column values**

```sql title=Query theme={null}
SELECT article_title, aiGenerate(concat('Summarize in one sentence: ', article_body)) AS summary FROM articles LIMIT 5
```

```response title=Response theme={null}
```

<h2 id="aiTranslate">
  aiTranslate
</h2>

Introduced in: v26.4.0

Translates the given text into the specified target language using an LLM provider.

Additional style or dialect instructions may be passed as a third argument (e.g. `'keep technical terms untranslated'`).

Provider credentials and configuration are taken from the named collection specified by the `ai_function_credentials` setting.

**Syntax**

```sql theme={null}
aiTranslate(text, target_language[, instructions[, temperature]])
```

**Aliases**: `AITranslate`

**Arguments**

* `text` — Text to translate. [`String`](/reference/data-types/string)
* `target_language` — Target language name or BCP-47 code (e.g. `'French'`, `'es-MX'`). [`String`](/reference/data-types/string)
* `instructions` — Optional constant additional instructions for the translator. [`String`](/reference/data-types/string)
* `temperature` — Sampling temperature controlling randomness. Default: `0.3`. [`Float64`](/reference/data-types/float)

**Returned value**

The translated text, or the default value for the column type (empty string) if the request failed and `ai_function_throw_on_error` is disabled. [`String`](/reference/data-types/string)

**Examples**

**Translate to French**

```sql title=Query theme={null}
SELECT aiTranslate('Hello, world!', 'French') SETTINGS ai_function_credentials = 'my_ai_credentials'
```

```response title=Response theme={null}
Bonjour le monde!
```

**Translate to Japanese with style instructions**

```sql title=Query theme={null}
SELECT aiTranslate(body, 'Japanese', 'Use polite form (desu/masu)') FROM articles LIMIT 5
```

```response title=Response theme={null}
```
