> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-mintlify-8c05c8a2.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> Documentation for Functions for Searching in Strings

# Functions for Searching in Strings

All functions in this section search case-sensitively by default. Case-insensitive search is usually provided by separate function variants.

<Note>
  Case-insensitive search follows the lowercase-uppercase rules of the English language. E.g. Uppercased `i` in the English language is
  `I` whereas in the Turkish language it is `İ` - results for languages other than English may be unexpected.
</Note>

Functions in this section also assume that the searched string (referred to in this section as `haystack`) and the search string (referred to in this section as `needle`) are single-byte encoded text. If this assumption is
violated, no exception is thrown and results are undefined. Search with UTF-8 encoded strings is usually provided by separate function
variants. Likewise, if a UTF-8 function variant is used and the input strings are not UTF-8 encoded text, no exception is thrown and the
results are undefined. Note that no automatic Unicode normalization is performed, however you can use the
[normalizeUTF8\*()](/reference/functions/regular-functions/string-functions#normalizeUTF8NFC) functions for that.

[General strings functions](/reference/functions/regular-functions/string-functions) and [functions for replacing in strings](/reference/functions/regular-functions/string-replace-functions) are described separately.

<Note>
  The documentation below is generated from the `system.functions` system table.
</Note>

{/*AUTOGENERATED_START*/}

<h2 id="countMatches">
  countMatches
</h2>

Introduced in: v21.1.0

Returns number of matches of a regular expression in a string.

<Info>
  **Version dependent behavior**

  The behavior of this function depends on the ClickHouse version:

  * in versions \< v25.6, the function stops counting at the first empty match even if a pattern accepts.
  * in versions >= 25.6, the function continues execution when an empty match occurs. The legacy behavior can be restored using setting `count_matches_stop_at_empty_match = true`;
</Info>

**Syntax**

```sql theme={null}
countMatches(haystack, pattern)
```

**Arguments**

* `haystack` — The string to search in. [`String`](/reference/data-types/string)
* `pattern` — Regular expression pattern. [`String`](/reference/data-types/string)

**Returned value**

Returns the number of matches found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Count digit sequences**

```sql title=Query theme={null}
SELECT countMatches('hello 123 world 456 test', '[0-9]+')
```

```response title=Response theme={null}
┌─countMatches('hello 123 world 456 test', '[0-9]+')─┐
│                                                   2 │
└─────────────────────────────────────────────────────┘
```

<h2 id="countMatchesCaseInsensitive">
  countMatchesCaseInsensitive
</h2>

Introduced in: v21.1.0

Like [`countMatches`](#countMatches) but performs case-insensitive matching.

**Syntax**

```sql theme={null}
countMatchesCaseInsensitive(haystack, pattern)
```

**Arguments**

* `haystack` — The string to search in. [`String`](/reference/data-types/string)
* `pattern` — Regular expression pattern. [`const String`](/reference/data-types/string)

**Returned value**

Returns the number of matches found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Case insensitive count**

```sql title=Query theme={null}
SELECT countMatchesCaseInsensitive('Hello HELLO world', 'hello')
```

```response title=Response theme={null}
┌─countMatchesCaseInsensitive('Hello HELLO world', 'hello')─┐
│                                                         2 │
└───────────────────────────────────────────────────────────┘
```

<h2 id="countSubstrings">
  countSubstrings
</h2>

Introduced in: v21.1.0

Returns how often a substring `needle` occurs in a string `haystack`.

**Syntax**

```sql theme={null}
countSubstrings(haystack, needle[, start_pos])
```

**Arguments**

* `haystack` — String in which the search is performed. [String](/reference/data-types/string) or [Enum](/reference/data-types/enum). - `needle` — Substring to be searched. [String](/reference/data-types/string). - `start_pos` — Position (1-based) in `haystack` at which the search starts. [UInt](/reference/data-types/int-uint). Optional.

**Returned value**

The number of occurrences. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT countSubstrings('aaaa', 'aa');
```

```response title=Response theme={null}
┌─countSubstrings('aaaa', 'aa')─┐
│                             2 │
└───────────────────────────────┘
```

**With start\_pos argument**

```sql title=Query theme={null}
SELECT countSubstrings('abc___abc', 'abc', 4);
```

```response title=Response theme={null}
┌─countSubstrings('abc___abc', 'abc', 4)─┐
│                                      1 │
└────────────────────────────────────────┘
```

<h2 id="countSubstringsCaseInsensitive">
  countSubstringsCaseInsensitive
</h2>

Introduced in: v21.1.0

Like [`countSubstrings`](#countSubstrings) but counts case-insensitively.

**Syntax**

```sql theme={null}
countSubstringsCaseInsensitive(haystack, needle[, start_pos])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`Enum`](/reference/data-types/enum)
* `needle` — Substring to be searched. [`String`](/reference/data-types/string)
* `start_pos` — Optional. Position (1-based) in `haystack` at which the search starts. [`UInt*`](/reference/data-types/int-uint)

**Returned value**

Returns the number of occurrences of the neddle in the haystack. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT countSubstringsCaseInsensitive('AAAA', 'aa');
```

```response title=Response theme={null}
┌─countSubstri⋯AAA', 'aa')─┐
│                        2 │
└──────────────────────────┘
```

**With start\_pos argument**

```sql title=Query theme={null}
SELECT countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4);
```

```response title=Response theme={null}
┌─countSubstri⋯, 'abc', 4)─┐
│                        2 │
└──────────────────────────┘
```

<h2 id="countSubstringsCaseInsensitiveUTF8">
  countSubstringsCaseInsensitiveUTF8
</h2>

Introduced in: v21.1.0

Like [`countSubstrings`](#countSubstrings) but counts case-insensitively and assumes that haystack is a UTF-8 string.

**Syntax**

```sql theme={null}
countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
```

**Arguments**

* `haystack` — UTF-8 string in which the search is performed. [`String`](/reference/data-types/string) or [`Enum`](/reference/data-types/enum)
* `needle` — Substring to be searched. [`String`](/reference/data-types/string)
* `start_pos` — Optional. Position (1-based) in `haystack` at which the search starts. [`UInt*`](/reference/data-types/int-uint)

**Returned value**

Returns the number of occurrences of the needle in the haystack. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА');
```

```response title=Response theme={null}
┌─countSubstri⋯шка', 'КА')─┐
│                        4 │
└──────────────────────────┘
```

**With start\_pos argument**

```sql title=Query theme={null}
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА', 13);
```

```response title=Response theme={null}
┌─countSubstri⋯, 'КА', 13)─┐
│                        2 │
└──────────────────────────┘
```

<h2 id="extract">
  extract
</h2>

Introduced in: v1.1.0

Extracts the first match of a regular expression in a string.
If 'haystack' doesn't match 'pattern', an empty string is returned.

This function uses the RE2 regular expression library. Please refer to [re2](https://github.com/google/re2/wiki/Syntax) for supported syntax.

If the regular expression has capturing groups (sub-patterns), the function matches the input string against the first capturing group.

**Syntax**

```sql theme={null}
extract(haystack, pattern)
```

**Arguments**

* `haystack` — String from which to extract. [`String`](/reference/data-types/string)
* `pattern` — Regular expression, typically containing a capturing group. [`const String`](/reference/data-types/string)

**Returned value**

Returns extracted fragment as a string. [`String`](/reference/data-types/string)

**Examples**

**Extract domain from email**

```sql title=Query theme={null}
SELECT extract('test@clickhouse.com', '.*@(.*)$')
```

```response title=Response theme={null}
┌─extract('test@clickhouse.com', '.*@(.*)$')─┐
│ clickhouse.com                            │
└───────────────────────────────────────────┘
```

**No match returns empty string**

```sql title=Query theme={null}
SELECT extract('test@clickhouse.com', 'no_match')
```

```response title=Response theme={null}
┌─extract('test@clickhouse.com', 'no_match')─┐
│                                            │
└────────────────────────────────────────────┘
```

<h2 id="extractAll">
  extractAll
</h2>

Introduced in: v1.1.0

Like [`extract`](#extract), but returns an array of all matches of a regular expression in a string.
If 'haystack' doesn't match the 'pattern' regex, an empty array is returned.

If the regular expression has capturing groups (sub-patterns), the function matches the input string against the first capturing group.

**Syntax**

```sql theme={null}
extractAll(haystack, pattern)
```

**Arguments**

* `haystack` — String from which to extract fragments. [`String`](/reference/data-types/string)
* `pattern` — Regular expression, optionally containing capturing groups. [`const String`](/reference/data-types/string)

**Returned value**

Returns array of extracted fragments. [`Array(String)`](/reference/data-types/array)

**Examples**

**Extract all numbers**

```sql title=Query theme={null}
SELECT extractAll('hello 123 world 456', '[0-9]+')
```

```response title=Response theme={null}
┌─extractAll('hello 123 world 456', '[0-9]+')─┐
│ ['123','456']                               │
└─────────────────────────────────────────────┘
```

**Extract using capturing group**

```sql title=Query theme={null}
SELECT extractAll('test@example.com, user@domain.org', '([a-zA-Z0-9]+)@')
```

```response title=Response theme={null}
┌─extractAll('test@example.com, user@domain.org', '([a-zA-Z0-9]+)@')─┐
│ ['test','user']                                                    │
└────────────────────────────────────────────────────────────────────┘
```

<h2 id="extractAllGroupsHorizontal">
  extractAllGroupsHorizontal
</h2>

Introduced in: v20.5.0

Matches all groups of a string using the provided regular expression and returns an array of arrays, where each array contains all captures from the same capturing group, organized by group number.

**Syntax**

```sql theme={null}
extractAllGroupsHorizontal(s, regexp)
```

**Arguments**

* `s` — Input string to extract from. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `regexp` — Regular expression to match by. [`const String`](/reference/data-types/string) or [`const FixedString`](/reference/data-types/fixedstring)

**Returned value**

Returns an array of arrays, where each inner array contains all captures from one capturing group across all matches. The first inner array contains all captures from group 1, the second from group 2, etc. If no matches are found, returns an empty array. [`Array(Array(String))`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
WITH '< Server: nginx
< Date: Tue, 22 Jan 2019 00:26:14 GMT
< Content-Type: text/html; charset=UTF-8
< Connection: keep-alive
' AS s
SELECT extractAllGroupsHorizontal(s, '< ([\\w\\-]+): ([^\\r\\n]+)');
```

```response title=Response theme={null}
[['Server','Date','Content-Type','Connection'],['nginx','Tue, 22 Jan 2019 00:26:14 GMT','text/html; charset=UTF-8','keep-alive']]
```

<h2 id="extractGroups">
  extractGroups
</h2>

Introduced in: v20.5.0

Extracts the capturing groups from the first substring matched by a regular expression. To extract groups from all matches, use [`extractAllGroupsHorizontal`](#extractAllGroupsHorizontal) or [`extractAllGroupsVertical`](/reference/functions/regular-functions/splitting-merging-functions#extractAllGroupsVertical).

**Syntax**

```sql theme={null}
extractGroups(s, regexp)
```

**Arguments**

* `s` — Input string to extract from. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `regexp` — Regular expression. Must contain at least one capturing group. Constant. [`const String`](/reference/data-types/string) or [`const FixedString`](/reference/data-types/fixedstring)

**Returned value**

If the regular expression matches, returns an array containing the captured groups (`1` to `N`, where `N` is the number of capturing groups in `regexp`) of the first match. If there is no match, returns an empty array. [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
WITH '< Server: nginx
< Date: Tue, 22 Jan 2019 00:26:14 GMT
< Content-Type: text/html; charset=UTF-8
< Connection: keep-alive
' AS s
SELECT extractGroups(s, '< ([\\w\\-]+): ([^\\r\\n]+)');
```

```response title=Response theme={null}
['Server','nginx']
```

<h2 id="hasAllTokens">
  hasAllTokens
</h2>

Introduced in: v25.10.0

Like [`hasAnyTokens`](#hasAnyTokens), but returns 1, if all tokens in the `needle` string or array match the `input` string, and 0 otherwise. If `input` is a column, returns all rows that satisfy this condition.

<Note>
  Column `input` should have a [text index](/reference/engines/table-engines/mergetree-family/textindexes) defined for optimal performance.
  If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.
</Note>

Prior to searching, the function tokenizes

* the `input` argument (always), and
* the `needle` argument (if given as a [String](/reference/data-types/string))
  using the tokenizer specified for the text index.
  If the column has no text index defined, the `splitByNonAlpha` tokenizer is used instead.
  If the `needle` argument is of type [Array(String)](/reference/data-types/array), each array element is treated as a token — no additional tokenization takes place.

Duplicate tokens are ignored.
For example, needles = \['ClickHouse', 'ClickHouse'] is treated the same as \['ClickHouse'].

<Note>
  When a text index defines a [preprocessor](/reference/engines/table-engines/mergetree-family/textindexes#creating-a-text-index) (for example `lowerUTF8`), `hasAllTokens` applies it to `input` and, when `needles` is a [String](/reference/data-types/string), to `needles` before tokenization. When `needles` is an [Array(String)](/reference/data-types/array), its elements are passed through as-is and the preprocessor is not applied to them.
  The preprocessor is only applied on the text index path, so results may differ between queries that use the text index and queries that do not (e.g. `SETTINGS use_skip_indexes = 0`).
  This inconsistency is tolerated to improve the usability of full-text search.
</Note>

**Syntax**

```sql theme={null}
hasAllTokens(input, needles)
```

**Aliases**: `hasAllToken`

**Arguments**

* `input` — The input column. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring) or [`Array(String)`](/reference/data-types/array) or [`Array(FixedString)`](/reference/data-types/array)
* `needles` — Tokens to be searched. [`String`](/reference/data-types/string) or [`Array(String)`](/reference/data-types/array)
* `tokenizer` — The tokenizer to use. Valid arguments are `splitByNonAlpha`, `splitByString`, `asciiCJK`, `ngrams`, `sparseGrams`, and `array`. Optional, if not set explicitly, defaults to `splitByNonAlpha`. [`const String`](/reference/data-types/string)

**Returned value**

Returns 1, if all needles match. 0, otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Basic usage with a string needle**

```sql title=Query theme={null}
CREATE TABLE table (
    id UInt32,
    msg String,
    INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');

SELECT count() FROM table WHERE hasAllTokens(msg, 'a\\d()');
```

```response title=Response theme={null}
┌─count()─┐
│       1 │
└─────────┘
```

**Specify needles to be searched for AS-IS (no tokenization) in an array**

```sql title=Query theme={null}
SELECT count() FROM table WHERE hasAllTokens(msg, ['a', 'd']);
```

```response title=Response theme={null}
┌─count()─┐
│       1 │
└─────────┘
```

**Generate needles using the `tokens` function**

```sql title=Query theme={null}
SELECT count() FROM table WHERE hasAllTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));
```

```response title=Response theme={null}
┌─count()─┐
│       1 │
└─────────┘
```

**Use a custom tokenizer via the 3rd argument**

```sql title=Query theme={null}
SELECT hasAllTokens('abcdef', 'abc', 'ngrams(3)');
```

```response title=Response theme={null}
┌─hasAllTokens('abcdef', 'abc', 'ngrams(3)')─┐
│                                            1 │
└──────────────────────────────────────────────┘
```

**Usage examples for array and map columns**

```sql title=Query theme={null}
CREATE TABLE log (
    id UInt32,
    tags Array(String),
    attributes Map(String, String),
    INDEX idx_tags (tags) TYPE text(tokenizer = splitByNonAlpha),
    INDEX idx_attributes_keys mapKeys(attributes) TYPE text(tokenizer = array),
    INDEX idx_attributes_vals mapValues(attributes) TYPE text(tokenizer = array)
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO log VALUES
    (1, ['clickhouse', 'clickhouse cloud'], {'address': '192.0.0.1', 'log_level': 'INFO'}),
    (2, ['chdb'], {'embedded': 'true', 'log_level': 'DEBUG'});
```

```response title=Response theme={null}
```

**Example with an array column**

```sql title=Query theme={null}
SELECT count() FROM log WHERE hasAllTokens(tags, 'clickhouse');
```

```response title=Response theme={null}
┌─count()─┐
│       1 │
└─────────┘
```

**Example with mapKeys**

```sql title=Query theme={null}
SELECT count() FROM log WHERE hasAllTokens(mapKeys(attributes), ['address', 'log_level']);
```

```response title=Response theme={null}
┌─count()─┐
│       1 │
└─────────┘
```

**Example with mapValues**

```sql title=Query theme={null}
SELECT count() FROM log WHERE hasAllTokens(mapValues(attributes), ['192.0.0.1', 'DEBUG']);
```

```response title=Response theme={null}
┌─count()─┐
│       0 │
└─────────┘
```

<h2 id="hasAnyTokens">
  hasAnyTokens
</h2>

Introduced in: v25.10.0

Returns 1, if at least one token in the `needle` string or array matches the `input` string, and 0 otherwise. If `input` is a column, returns all rows that satisfy this condition.

<Note>
  Column `input` should have a [text index](/reference/engines/table-engines/mergetree-family/textindexes) defined for optimal performance.
  If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.
</Note>

Prior to searching, the function tokenizes

* the `input` argument (always), and
* the `needle` argument (if given as a [String](/reference/data-types/string))
  using the tokenizer specified for the text index.
  If the column has no text index defined, the `splitByNonAlpha` tokenizer is used instead.
  If the `needle` argument is of type [Array(String)](/reference/data-types/array), each array element is treated as a token — no additional tokenization takes place.

Duplicate tokens are ignored.
For example, \['ClickHouse', 'ClickHouse'] is treated the same as \['ClickHouse'].

<Note>
  When a text index defines a [preprocessor](/reference/engines/table-engines/mergetree-family/textindexes#creating-a-text-index) (for example `lowerUTF8`), `hasAnyTokens` applies it to `input` and, when `needles` is a [String](/reference/data-types/string), to `needles` before tokenization. When `needles` is an [Array(String)](/reference/data-types/array), its elements are passed through as-is and the preprocessor is not applied to them.
  The preprocessor is only applied on the text index path, so results may differ between queries that use the text index and queries that do not (e.g. `SETTINGS use_skip_indexes = 0`).
  This inconsistency is tolerated to improve the usability of full-text search.
</Note>

**Syntax**

```sql theme={null}
hasAnyTokens(input, needles)
```

**Aliases**: `hasAnyToken`

**Arguments**

* `input` — The input column. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring) or [`Nullable(String)`](/reference/data-types/nullable) or [`Nullable(FixedString)`](/reference/data-types/nullable) or [`Array(String)`](/reference/data-types/array) or [`Array(FixedString)`](/reference/data-types/array) or [`Array(Nullable(String))`](/reference/data-types/array) or [`Array(Nullable(FixedString))`](/reference/data-types/array)
* `needles` — Tokens to be searched. [`String`](/reference/data-types/string) or [`Array(String)`](/reference/data-types/array)
* `tokenizer` — The tokenizer to use. Valid arguments are `splitByNonAlpha`, `splitByString`, `asciiCJK`, `ngrams`, `sparseGrams`, and `array`. Optional, if not set explicitly, defaults to `splitByNonAlpha`. [`const String`](/reference/data-types/string)

**Returned value**

Returns `1`, if there was at least one match. `0`, otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Basic usage with a string needle**

```sql title=Query theme={null}
CREATE TABLE table (
    id UInt32,
    msg String,
    INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');

SELECT count() FROM table WHERE hasAnyTokens(msg, 'a\\d()');
```

```response title=Response theme={null}
┌─count()─┐
│       3 │
└─────────┘
```

**Specify needles to be searched for AS-IS (no tokenization) in an array**

```sql title=Query theme={null}
SELECT count() FROM table WHERE hasAnyTokens(msg, ['a', 'd']);
```

```response title=Response theme={null}
┌─count()─┐
│       3 │
└─────────┘
```

**Generate needles using the `tokens` function**

```sql title=Query theme={null}
SELECT count() FROM table WHERE hasAnyTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));
```

```response title=Response theme={null}
┌─count()─┐
│       3 │
└─────────┘
```

**Usage examples for array and map columns**

```sql title=Query theme={null}
CREATE TABLE log (
    id UInt32,
    tags Array(String),
    attributes Map(String, String),
    INDEX idx_tags (tags) TYPE text(tokenizer = splitByNonAlpha),
    INDEX idx_attributes_keys mapKeys(attributes) TYPE text(tokenizer = array),
    INDEX idx_attributes_vals mapValues(attributes) TYPE text(tokenizer = array)
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO log VALUES
    (1, ['clickhouse', 'clickhouse cloud'], {'address': '192.0.0.1', 'log_level': 'INFO'}),
    (2, ['chdb'], {'embedded': 'true', 'log_level': 'DEBUG'});
```

```response title=Response theme={null}
```

**Example with an array column**

```sql title=Query theme={null}
SELECT count() FROM log WHERE hasAnyTokens(tags, 'clickhouse');
```

```response title=Response theme={null}
┌─count()─┐
│       1 │
└─────────┘
```

**Example with mapKeys**

```sql title=Query theme={null}
SELECT count() FROM log WHERE hasAnyTokens(mapKeys(attributes), ['address', 'log_level']);
```

```response title=Response theme={null}
┌─count()─┐
│       2 │
└─────────┘
```

**Example with mapValues**

```sql title=Query theme={null}
SELECT count() FROM log WHERE hasAnyTokens(mapValues(attributes), ['192.0.0.1', 'DEBUG']);
```

```response title=Response theme={null}
┌─count()─┐
│       2 │
└─────────┘
```

<h2 id="hasPhrase">
  hasPhrase
</h2>

Introduced in: v26.4.0

Checks if the `input` contains all tokens from the `phrase` in consecutive order.

<Note>
  Column `input` should have a [text index](/reference/engines/table-engines/mergetree-family/textindexes) defined for optimal performance.
  If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.
</Note>

Prior to searching, the function tokenizes both the `input` and the `phrase` arguments using the tokenizer specified for the text index.
If the column has no text index defined, the `splitByNonAlpha` tokenizer is used instead — unless a tokenizer is provided as the optional third argument.
The tokenizer argument must be one of `splitByNonAlpha`, `splitByString`, `ngrams`, or `asciiCJK`.

<Note>
  When a text index defines a [preprocessor](/reference/engines/table-engines/mergetree-family/textindexes#creating-a-text-index) (for example `lowerUTF8`), `hasPhrase` applies it to both `input` and `phrase` before tokenization.
  The preprocessor is only applied on the text index path, so results may differ between queries that use the text index and queries that do not (e.g. `SETTINGS use_skip_indexes = 0`).
  This inconsistency is tolerated to improve the usability of full-text search.
</Note>

Unlike [`hasToken`](#hasToken), [`hasAnyTokens`](#hasAnyTokens) and [`hasAllTokens`](#hasAllTokens), `hasPhrase` requires the tokens to appear in the same order
and without any intervening tokens. For example, `hasPhrase('the quick brown fox', 'quick fox')` returns 0
because "brown" appears between "quick" and "fox".

**Syntax**

```sql theme={null}
hasPhrase(input, phrase[, tokenizer])
```

**Aliases**: `matchPhrase`

**Arguments**

* `input` — The input column. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `phrase` — Phrase to search for. [`const String`](/reference/data-types/string)
* `tokenizer` — The tokenizer to use. Optional, defaults to `splitByNonAlpha`. [`const String`](/reference/data-types/string)

**Returned value**

Returns `1` if the phrase is found as a consecutive token sequence, `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Phrase match**

```sql title=Query theme={null}
SELECT hasPhrase('the quick brown fox jumps', 'quick brown')
```

```response title=Response theme={null}
┌─hasPhrase('the quick brown fox jumps', 'quick brown')─┐
│                                                      1 │
└────────────────────────────────────────────────────────┘
```

**Non-consecutive tokens**

```sql title=Query theme={null}
SELECT hasPhrase('the quick brown fox jumps', 'quick fox')
```

```response title=Response theme={null}
┌─hasPhrase('the quick brown fox jumps', 'quick fox')─┐
│                                                    0 │
└──────────────────────────────────────────────────────┘
```

<h2 id="hasSubsequence">
  hasSubsequence
</h2>

Introduced in: v23.7.0

Checks if a needle is a subsequence of a haystack.
A subsequence of a string is a sequence that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.

**Syntax**

```sql theme={null}
hasSubsequence(haystack, needle)
```

**Arguments**

* `haystack` — String in which to search for the subsequence. [`String`](/reference/data-types/string)
* `needle` — Subsequence to be searched. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if needle is a subsequence of haystack, `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Basic subsequence check**

```sql title=Query theme={null}
SELECT hasSubsequence('Hello World', 'HlWrd')
```

```response title=Response theme={null}
┌─hasSubsequence('Hello World', 'HlWrd')─┐
│                                      1 │
└────────────────────────────────────────┘
```

**No subsequence found**

```sql title=Query theme={null}
SELECT hasSubsequence('Hello World', 'xyz')
```

```response title=Response theme={null}
┌─hasSubsequence('Hello World', 'xyz')─┐
│                                    0 │
└──────────────────────────────────────┘
```

<h2 id="hasSubsequenceCaseInsensitive">
  hasSubsequenceCaseInsensitive
</h2>

Introduced in: v23.7.0

Like [`hasSubsequence`](#hasSubsequence) but searches case-insensitively.

**Syntax**

```sql theme={null}
hasSubsequenceCaseInsensitive(haystack, needle)
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — Subsequence to be searched. [`String`](/reference/data-types/string)

**Returned value**

Returns 1, if needle is a subsequence of haystack, 0 otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT hasSubsequenceCaseInsensitive('garbage', 'ARG');
```

```response title=Response theme={null}
┌─hasSubsequenceCaseInsensitive('garbage', 'ARG')─┐
│                                               1 │
└─────────────────────────────────────────────────┘
```

<h2 id="hasSubsequenceCaseInsensitiveUTF8">
  hasSubsequenceCaseInsensitiveUTF8
</h2>

Introduced in: v23.7.0

Like [`hasSubsequenceUTF8`](#hasSubsequenceUTF8) but searches case-insensitively.

**Syntax**

```sql theme={null}
hasSubsequenceCaseInsensitiveUTF8(haystack, needle)
```

**Arguments**

* `haystack` — UTF8-encoded string in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — UTF8-encoded subsequence string to be searched. [`String`](/reference/data-types/string)

**Returned value**

Returns 1, if needle is a subsequence of haystack, 0 otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА');
```

```response title=Response theme={null}
┌─hasSubsequen⋯ 'СИСТЕМА')─┐
│                        1 │
└──────────────────────────┘
```

<h2 id="hasSubsequenceUTF8">
  hasSubsequenceUTF8
</h2>

Introduced in: v23.7.0

Like [`hasSubsequence`](/reference/functions/regular-functions/string-search-functions#hasSubsequence) but assumes haystack and needle are UTF-8 encoded strings.

**Syntax**

```sql theme={null}
hasSubsequenceUTF8(haystack, needle)
```

**Arguments**

* `haystack` — The string in which to search. [`String`](/reference/data-types/string)
* `needle` — The subsequence to search for. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if `needle` is a subsequence of `haystack`, otherwise `0`. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT hasSubsequenceUTF8('картошка', 'кошка');
```

```response title=Response theme={null}
┌─hasSubsequen⋯', 'кошка')─┐
│                        1 │
└──────────────────────────┘
```

**Non-matching subsequence**

```sql title=Query theme={null}
SELECT hasSubsequenceUTF8('картошка', 'апельсин');
```

```response title=Response theme={null}
┌─hasSubsequen⋯'апельсин')─┐
│                        0 │
└──────────────────────────┘
```

<h2 id="hasToken">
  hasToken
</h2>

Introduced in: v20.1.0

Checks if the given token is present in the haystack.

Uses [splitByNonAlpha](/reference/functions/regular-functions/splitting-merging-functions#splitByNonAlpha) as tokenizer, i.e. a token is defined as the longest possible sub-sequence of consecutive characters `[0-9A-Za-z_]` (numbers, ASCII characters and underscore).

**Syntax**

```sql theme={null}
hasToken(haystack, token)
```

**Arguments**

* `haystack` — String to be searched. [`String`](/reference/data-types/string)
* `token` — Token to search for. [`const String`](/reference/data-types/string)

**Returned value**

Returns `1` if the token is found, `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Token search**

```sql title=Query theme={null}
SELECT hasToken('clickhouse test', 'test')
```

```response title=Response theme={null}
┌─hasToken('clickhouse test', 'test')─┐
│                                   1 │
└─────────────────────────────────────┘
```

<h2 id="hasTokenCaseInsensitive">
  hasTokenCaseInsensitive
</h2>

Introduced in: v20.1.0

Performs case insensitive lookup of needle in haystack using tokenbf\_v1 index.

**Syntax**

```sql theme={null}
hasTokenCaseInsensitive(haystack, needle)
```

**Arguments**

* None.

**Returned value**

**Examples**

<h2 id="hasTokenCaseInsensitiveOrNull">
  hasTokenCaseInsensitiveOrNull
</h2>

Introduced in: v23.1.0

Performs case insensitive lookup of needle in haystack using tokenbf\_v1 index. Returns null if needle is ill-formed.

**Syntax**

```sql theme={null}
hasTokenCaseInsensitiveOrNull(haystack, needle)
```

**Arguments**

* None.

**Returned value**

**Examples**

<h2 id="hasTokenOrNull">
  hasTokenOrNull
</h2>

Introduced in: v20.1.0

Like [`hasToken`](#hasToken) but returns null if token is ill-formed.

**Syntax**

```sql theme={null}
hasTokenOrNull(haystack, token)
```

**Arguments**

* `haystack` — String to be searched. Must be constant. [`String`](/reference/data-types/string)
* `token` — Token to search for. [`const String`](/reference/data-types/string)

**Returned value**

Returns `1` if the token is found, `0` otherwise, null if token is ill-formed. [`Nullable(UInt8)`](/reference/data-types/nullable)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT hasTokenOrNull('apple banana cherry', 'ban ana');
```

```response title=Response theme={null}
┌─hasTokenOrNu⋯ 'ban ana')─┐
│                     ᴺᵁᴸᴸ │
└──────────────────────────┘
```

<h2 id="highlight">
  highlight
</h2>

Introduced in: v26.4.0

Highlights occurrences of search terms in a text string by wrapping them with HTML tags.

The function performs ASCII case-insensitive matching. If multiple search terms overlap or are adjacent in the text, the matched regions are merged into a single highlighted span.

**Syntax**

```sql theme={null}
highlight(haystack, needles[, open_tag, close_tag])
```

**Arguments**

* `haystack` — The text to search in. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `needles` — An array of search terms to highlight. [`const Array(String)`](/reference/data-types/array)
* `open_tag` — The opening tag to insert before each match. Default: `<em>`. [`const String`](/reference/data-types/string)
* `close_tag` — The closing tag to insert after each match. Default: `</em>`. [`const String`](/reference/data-types/string)

**Returned value**

Returns the input text with matched terms wrapped in the specified tags. [`String`](/reference/data-types/string)

**Examples**

**Basic highlight**

```sql title=Query theme={null}
SELECT highlight('The quick brown fox', ['quick', 'fox'])
```

```response title=Response theme={null}
┌─highlight('The quick brown fox', ['quick', 'fox'])─┐
│ The <em>quick</em> brown <em>fox</em>              │
└────────────────────────────────────────────────────┘
```

**Custom tags**

```sql title=Query theme={null}
SELECT highlight('Hello World', ['hello'], '<b>', '</b>')
```

```response title=Response theme={null}
┌─highlight('Hello World', ['hello'], '<b>', '</b>')─┐
│ <b>Hello</b> World                                 │
└────────────────────────────────────────────────────┘
```

<h2 id="ilike">
  ilike
</h2>

Introduced in: v20.6.0

Like [`like`](#like) but searches case-insensitively. Supports the optional `ESCAPE` clause (see `like`).

**Syntax**

```sql theme={null}
ilike(haystack, pattern[, escape_character])
-- haystack ILIKE pattern [ESCAPE 'escape_character']
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `pattern` — LIKE pattern to match against. [`String`](/reference/data-types/string)
* `escape_character` — Optional single-character string to use as the escape character instead of `\`. Default: `\`. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the string matches the LIKE pattern (case-insensitive), otherwise `0`. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT ilike('ClickHouse', '%house%');
```

```response title=Response theme={null}
┌─ilike('ClickHouse', '%house%')─┐
│                              1 │
└────────────────────────────────┘
```

<h2 id="like">
  like
</h2>

Introduced in: v1.1.0

Returns whether string `haystack` matches the `LIKE` expression `pattern`.

A `LIKE` expression can contain normal characters and the following metasymbols:

* `%` indicates an arbitrary number of arbitrary characters (including zero characters).
* `_` indicates a single arbitrary character.
* `\` is for escaping literals `%`, `_` and `\`.

Matching is based on UTF-8, e.g. `_` matches the Unicode code point `¥` which is represented in UTF-8 using two bytes.

If the haystack or the `LIKE` expression are not valid UTF-8, the behavior is undefined.

No automatic Unicode normalization is performed, you can use the `normalizeUTF8*` functions for that.

To match against literal `%`, `_` and `\` (which are `LIKE` metacharacters), prepend them with a backslash: `\%`, `\_` and `\\`.
The backslash loses its special meaning (i.e. is interpreted literally) if it prepends a character different than `%`, `_` or `\`.

<Note>
  ClickHouse requires backslashes in strings [to be quoted as well](/reference/syntax#string), so you would actually need to write `\\%`, `\\_` and `\\\\`.
</Note>

For `LIKE` expressions of the form `%needle%`, the function is as fast as the `position` function.
All other LIKE expressions are internally converted to a regular expression and executed with a performance similar to function `match`.

## ESCAPE clause

The optional `ESCAPE` clause specifies a custom escape character (must be a single ASCII character).
When provided, the custom escape character replaces the default backslash for escaping `%` and `_` metacharacters.
The escape character can escape three things: `%` (literal percent), `_` (literal underscore), and itself (literal escape character).
When a custom escape character is used, the backslash has no special meaning and is treated as a literal character.

**Syntax**

```sql theme={null}
like(haystack, pattern[, escape_character])
-- haystack LIKE pattern [ESCAPE 'escape_character']
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `pattern` — `LIKE` pattern to match against. Can contain `%` (matches any number of characters), `_` (matches single character), and `\` for escaping. [`String`](/reference/data-types/string)
* `escape_character` — Optional single-character string to use as the escape character instead of `\`. Default: `\`. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the string matches the `LIKE` pattern, otherwise `0`. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT like('ClickHouse', '%House');
```

```response title=Response theme={null}
┌─like('ClickHouse', '%House')─┐
│                            1 │
└──────────────────────────────┘
```

**Single character wildcard**

```sql title=Query theme={null}
SELECT like('ClickHouse', 'Click_ouse');
```

```response title=Response theme={null}
┌─like('ClickH⋯lick_ouse')─┐
│                        1 │
└──────────────────────────┘
```

**Non-matching pattern**

```sql title=Query theme={null}
SELECT like('ClickHouse', '%SQL%');
```

```response title=Response theme={null}
┌─like('ClickHouse', '%SQL%')─┐
│                           0 │
└─────────────────────────────┘
```

**ESCAPE clause**

```sql title=Query theme={null}
SELECT '50%off' LIKE '50#%off' ESCAPE '#';
```

```response title=Response theme={null}
┌─like('50%off', '50#%off', '#')─┐
│                              1 │
└────────────────────────────────┘
```

<h2 id="locate">
  locate
</h2>

Introduced in: v18.16.0

Like [`position`](#position) but with arguments `haystack` and `locate` switched.

<Info>
  **Version dependent behavior**

  The behavior of this function depends on the ClickHouse version:

  * in versions \< v24.3, `locate` was an alias of function `position` and accepted arguments `(haystack, needle[, start_pos])`.
  * in versions >= 24.3, `locate` is an individual function (for better compatibility with MySQL) and accepts arguments `(needle, haystack[, start_pos])`.
    The previous behavior can be restored using setting `function_locate_has_mysql_compatible_argument_order = false`.
</Info>

**Syntax**

```sql theme={null}
locate(needle, haystack[, start_pos])
```

**Arguments**

* `needle` — Substring to be searched. [`String`](/reference/data-types/string)
* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`Enum`](/reference/data-types/enum)
* `start_pos` — Optional. Position (1-based) in `haystack` at which the search starts. [`UInt`](/reference/data-types/int-uint)

**Returned value**

Returns starting position in bytes and counting from 1, if the substring was found, `0`, if the substring was not found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Basic usage**

```sql title=Query theme={null}
SELECT locate('ca', 'abcabc')
```

```response title=Response theme={null}
┌─locate('ca', 'abcabc')─┐
│                      3 │
└────────────────────────┘
```

<h2 id="match">
  match
</h2>

Introduced in: v1.1.0

Checks if a provided string matches the provided regular expression pattern.

This function uses the RE2 regular expression library. Please refer to [re2](https://github.com/google/re2/wiki/Syntax) for supported syntax.

Matching works under UTF-8 assumptions, e.g. `¥` uses two bytes internally but matching treats it as a single codepoint.
The regular expression must not contain NULL bytes.
If the haystack or the pattern are not valid UTF-8, the behavior is undefined.

Unlike re2's default behavior, `.` matches line breaks. To disable this, prepend the pattern with `(?-s)`.

The pattern is not anchored. To match the entire string, anchor the pattern yourself using `^` and `$`.

If you just want to search for substrings, you can use functions [`like`](#like) or [`position`](#position) instead, which work much faster than this function.

Alternative operator syntax: `haystack REGEXP pattern`.

**Syntax**

```sql theme={null}
match(haystack, pattern)
```

**Aliases**: `REGEXP_MATCHES`

**Arguments**

* `haystack` — String in which the pattern is searched. [`String`](/reference/data-types/string)
* `pattern` — Regular expression pattern. Can be a constant or come from a column. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the pattern matches, `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Basic pattern matching**

```sql title=Query theme={null}
SELECT match('Hello World', 'Hello.*')
```

```response title=Response theme={null}
┌─match('Hello World', 'Hello.*')─┐
│                               1 │
└─────────────────────────────────┘
```

**Pattern not matching**

```sql title=Query theme={null}
SELECT match('Hello World', 'goodbye.*')
```

```response title=Response theme={null}
┌─match('Hello World', 'goodbye.*')─┐
│                                 0 │
└───────────────────────────────────┘
```

**Matching a substring**

```sql title=Query theme={null}
SELECT match('abcde', 'b.*d'), match('abcde', '^b.*d$')
```

```response title=Response theme={null}
┌─match('abcde', 'b.*d')─┬─match('abcde', '^b.*d$')─┐
│                       1 │                         0 │
└─────────────────────────┴───────────────────────────┘
```

<h2 id="multiFuzzyMatchAllIndices">
  multiFuzzyMatchAllIndices
</h2>

Introduced in: v20.1.0

Like [`multiFuzzyMatchAny`](#multiFuzzyMatchAny) but returns the array of all indices in any order that match the haystack within a constant [edit distance](https://en.wikipedia.org/wiki/Edit_distance).

**Syntax**

```sql theme={null}
multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, ..., patternN])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `distance` — The maximum edit distance for fuzzy matching. [`UInt8`](/reference/data-types/int-uint)
* `pattern` — Array of patterns to match against. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns an array of all indices (starting from 1) that match the haystack within the specified edit distance in any order. Returns an empty array if no matches are found. [`Array(UInt64)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiFuzzyMatchAllIndices('ClickHouse', 2, ['ClickHouse', 'ClckHouse', 'ClickHose', 'House']);
```

```response title=Response theme={null}
┌─multiFuzzyMa⋯, 'House'])─┐
│ [3,1,4,2]                │
└──────────────────────────┘
```

<h2 id="multiFuzzyMatchAny">
  multiFuzzyMatchAny
</h2>

Introduced in: v20.1.0

Like [`multiMatchAny`](#multiMatchAny) but returns 1 if any pattern matches the haystack within a constant [edit distance](https://en.wikipedia.org/wiki/Edit_distance).
This function relies on the experimental feature of [hyperscan](https://intel.github.io/hyperscan/dev-reference/compilation.html#approximate-matching) library, and can be slow for some edge cases.
The performance depends on the edit distance value and patterns used, but it's always more expensive compared to non-fuzzy variants.

<Note>
  `multiFuzzyMatch*()` function family do not support UTF-8 regular expressions (it treats them as a sequence of bytes) due to restrictions of hyperscan.
</Note>

**Syntax**

```sql theme={null}
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, ..., patternN])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `distance` — The maximum edit distance for fuzzy matching. [`UInt8`](/reference/data-types/int-uint)
* `pattern` — Optional. An array of patterns to match against. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns `1` if any pattern matches the haystack within the specified edit distance, otherwise `0`. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiFuzzyMatchAny('ClickHouse', 2, ['ClickHouse', 'ClckHouse', 'ClickHose']);
```

```response title=Response theme={null}
┌─multiFuzzyMa⋯lickHose'])─┐
│                        1 │
└──────────────────────────┘
```

<h2 id="multiFuzzyMatchAnyIndex">
  multiFuzzyMatchAnyIndex
</h2>

Introduced in: v20.1.0

Like [`multiFuzzyMatchAny`](#multiFuzzyMatchAny) but returns any index that matches the haystack within a constant [edit distance](https://en.wikipedia.org/wiki/Edit_distance).

**Syntax**

```sql theme={null}
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, ..., patternn])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `distance` — The maximum edit distance for fuzzy matching. [`UInt8`](/reference/data-types/int-uint)
* `pattern` — Array of patterns to match against. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the index (starting from 1) of any pattern that matches the haystack within the specified edit distance, otherwise `0`. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiFuzzyMatchAnyIndex('ClickHouse', 2, ['ClckHouse', 'ClickHose', 'ClickHouse']);
```

```response title=Response theme={null}
┌─multiFuzzyMa⋯ickHouse'])─┐
│                        2 │
└──────────────────────────┘
```

<h2 id="multiMatchAllIndices">
  multiMatchAllIndices
</h2>

Introduced in: v20.1.0

Like [`multiMatchAny`](#multiMatchAny) but returns the array of all indices that match the haystack in any order.

**Syntax**

```sql theme={null}
multiMatchAllIndices(haystack, [pattern1, pattern2, ..., patternn])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `pattern` — Regular expressions to match against. [`String`](/reference/data-types/string)

**Returned value**

Array of all indices (starting from 1) that match the haystack in any order. Returns an empty array if no matches are found. [`Array(UInt64)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiMatchAllIndices('ClickHouse', ['[0-9]', 'House', 'Click', 'ouse']);
```

```response title=Response theme={null}
┌─multiMatchAl⋯', 'ouse'])─┐
│ [3, 2, 4]                │
└──────────────────────────┘
```

<h2 id="multiMatchAny">
  multiMatchAny
</h2>

Introduced in: v20.1.0

Check if at least one of multiple regular expression patterns matches a haystack.

If you only want to search multiple substrings in a string, you can use function [`multiSearchAny`](#multiSearchAny) instead - it works much faster than this function.

**Syntax**

```sql theme={null}
multiMatchAny(haystack, pattern1[, pattern2, ...])
```

**Arguments**

* `haystack` — String in which patterns are searched. [`String`](/reference/data-types/string)
* `pattern1[, pattern2, ...]` — An array of one or more regular expression patterns. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns `1` if any pattern matches, `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Multiple pattern matching**

```sql title=Query theme={null}
SELECT multiMatchAny('Hello World', ['Hello.*', 'foo.*'])
```

```response title=Response theme={null}
┌─multiMatchAny('Hello World', ['Hello.*', 'foo.*'])─┐
│                                                  1 │
└────────────────────────────────────────────────────┘
```

**No patterns match**

```sql title=Query theme={null}
SELECT multiMatchAny('Hello World', ['goodbye.*', 'foo.*'])
```

```response title=Response theme={null}
┌─multiMatchAny('Hello World', ['goodbye.*', 'foo.*'])─┐
│                                                    0 │
└──────────────────────────────────────────────────────┘
```

<h2 id="multiMatchAnyIndex">
  multiMatchAnyIndex
</h2>

Introduced in: v20.1.0

Like [`multiMatchAny`](#multiMatchAny) but returns any index that matches the haystack.

**Syntax**

```sql theme={null}
multiMatchAnyIndex(haystack, [pattern1, pattern2, ..., patternn])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `pattern` — Regular expressions to match against. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the index (starting from 1) of the first pattern that matches, or 0 if no match is found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiMatchAnyIndex('ClickHouse', ['[0-9]', 'House', 'Click']);
```

```response title=Response theme={null}
┌─multiMatchAn⋯, 'Click'])─┐
│                        3 │
└──────────────────────────┘
```

<h2 id="multiSearchAllPositions">
  multiSearchAllPositions
</h2>

Introduced in: v20.1.0

Like [`position`](#position) but returns an array of positions (in bytes, starting at 1) for multiple `needle` substrings in a `haystack` string.

All `multiSearch*()` functions only support up to 2^8 needles.

**Syntax**

```sql theme={null}
multiSearchAllPositions(haystack, needle1[, needle2, ...])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle1[, needle2, ...]` — An array of one or more substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns array of the starting position in bytes and counting from 1, if the substring was found, `0`, if the substring was not found. [`Array(UInt64)`](/reference/data-types/array)

**Examples**

**Multiple needle search**

```sql title=Query theme={null}
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])
```

```response title=Response theme={null}
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0]                                                          │
└───────────────────────────────────────────────────────────────────┘
```

<h2 id="multiSearchAllPositionsCaseInsensitive">
  multiSearchAllPositionsCaseInsensitive
</h2>

Introduced in: v20.1.0

Like [`multiSearchAllPositions`](#multiSearchAllPositions) but ignores case.

**Syntax**

```sql theme={null}
multiSearchAllPositionsCaseInsensitive(haystack, needle1[, needle2, ...])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle1[, needle2, ...]` — An array of one or more substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns array of the starting position in bytes and counting from 1 (if the substring was found), `0` if the substring was not found. [`Array(UInt64)`](/reference/data-types/array)

**Examples**

**Case insensitive multi-search**

```sql title=Query theme={null}
SELECT multiSearchAllPositionsCaseInsensitive('ClickHouse',['c','h'])
```

```response title=Response theme={null}
┌─multiSearchA⋯['c', 'h'])─┐
│ [1,6]                    │
└──────────────────────────┘
```

<h2 id="multiSearchAllPositionsCaseInsensitiveUTF8">
  multiSearchAllPositionsCaseInsensitiveUTF8
</h2>

Introduced in: v20.1.0

Like [`multiSearchAllPositionsUTF8`](#multiSearchAllPositionsUTF8) but ignores case.

**Syntax**

```sql theme={null}
multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — UTF-8 encoded string in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — UTF-8 encoded substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Array of the starting position in bytes and counting from 1 (if the substring was found). Returns 0 if the substring was not found. [`Array`](/reference/data-types/array)

**Examples**

**Case-insensitive UTF-8 search**

```sql title=Query theme={null}
SELECT multiSearchAllPositionsCaseInsensitiveUTF8('Здравствуй, мир!', ['здравствуй', 'МИР']);
```

```response title=Response theme={null}
┌─multiSearchA⋯й', 'МИР'])─┐
│ [1, 13]                  │
└──────────────────────────┘
```

<h2 id="multiSearchAllPositionsUTF8">
  multiSearchAllPositionsUTF8
</h2>

Introduced in: v20.1.0

Like [`multiSearchAllPositions`](#multiSearchAllPositions) but assumes `haystack` and the `needle` substrings are UTF-8 encoded strings.

**Syntax**

```sql theme={null}
multiSearchAllPositionsUTF8(haystack, needle1[, needle2, ...])
```

**Arguments**

* `haystack` — UTF-8 encoded string in which the search is performed. [`String`](/reference/data-types/string)
* `needle1[, needle2, ...]` — An array of UTF-8 encoded substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns array of the starting position in bytes and counting from 1 (if the substring was found), `0` if the substring was not found. [`Array`](/reference/data-types/array)

**Examples**

**UTF-8 multi-search**

```sql title=Query theme={null}
SELECT multiSearchAllPositionsUTF8('ClickHouse',['C','H'])
```

```response title=Response theme={null}
┌─multiSearchAllPositionsUTF8('ClickHouse', ['C', 'H'])─┐
│ [1,6]                                                 │
└───────────────────────────────────────────────────────┘
```

<h2 id="multiSearchAny">
  multiSearchAny
</h2>

Introduced in: v20.1.0

Checks if at least one of a number of needle strings matches the haystack string.

Functions [`multiSearchAnyCaseInsensitive`](#multiSearchAnyCaseInsensitive), [`multiSearchAnyUTF8`](#multiSearchAnyUTF8) and [`multiSearchAnyCaseInsensitiveUTF8`](#multiSearchAnyCaseInsensitiveUTF8) provide case-insensitive and/or UTF-8 variants of this function.

**Syntax**

```sql theme={null}
multiSearchAny(haystack, needle1[, needle2, ...])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle1[, needle2, ...]` — An array of substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns `1`, if there was at least one match, otherwise `0`, if there was not at least one match. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Any match search**

```sql title=Query theme={null}
SELECT multiSearchAny('ClickHouse',['C','H'])
```

```response title=Response theme={null}
┌─multiSearchAny('ClickHouse', ['C', 'H'])─┐
│                                        1 │
└──────────────────────────────────────────┘
```

<h2 id="multiSearchAnyCaseInsensitive">
  multiSearchAnyCaseInsensitive
</h2>

Introduced in: v20.1.0

Like [multiSearchAny](#multiSearchAny) but ignores case.

**Syntax**

```sql theme={null}
multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — Substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns `1`, if there was at least one case-insensitive match, otherwise `0`, if there was not at least one case-insensitive match. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Case insensitive search**

```sql title=Query theme={null}
SELECT multiSearchAnyCaseInsensitive('ClickHouse',['c','h'])
```

```response title=Response theme={null}
┌─multiSearchAnyCaseInsensitive('ClickHouse', ['c', 'h'])─┐
│                                                       1 │
└─────────────────────────────────────────────────────────┘
```

<h2 id="multiSearchAnyCaseInsensitiveUTF8">
  multiSearchAnyCaseInsensitiveUTF8
</h2>

Introduced in: v20.1.0

Like [multiSearchAnyUTF8](#multiSearchAnyUTF8) but ignores case.

**Syntax**

```sql theme={null}
multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — UTF-8 string in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — UTF-8 substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns `1`, if there was at least one case-insensitive match, otherwise `0`, if there was not at least one case-insensitive match. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Given a UTF-8 string 'Здравствуйте', check if character 'з' (lowercase) is present**

```sql title=Query theme={null}
SELECT multiSearchAnyCaseInsensitiveUTF8('Здравствуйте',['з'])
```

```response title=Response theme={null}
┌─multiSearchA⋯те', ['з'])─┐
│                        1 │
└──────────────────────────┘
```

<h2 id="multiSearchAnyUTF8">
  multiSearchAnyUTF8
</h2>

Introduced in: v20.1.0

Like [multiSearchAny](#multiSearchAny) but assumes `haystack` and the `needle` substrings are UTF-8 encoded strings.

**Syntax**

```sql theme={null}
multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — UTF-8 string in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — UTF-8 substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns `1`, if there was at least one match, otherwise `0`, if there was not at least one match. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Given '你好，世界' ('Hello, world') as a UTF-8 string, check if there are any 你 or 界 characters in the string**

```sql title=Query theme={null}
SELECT multiSearchAnyUTF8('你好，世界', ['你', '界'])
```

```response title=Response theme={null}
┌─multiSearchA⋯你', '界'])─┐
│                        1 │
└──────────────────────────┘
```

<h2 id="multiSearchFirstIndex">
  multiSearchFirstIndex
</h2>

Introduced in: v20.1.0

Searches for multiple needle strings in a haystack string (case-sensitive) and returns the 1-based index of the first needle found.

**Syntax**

```sql theme={null}
multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — The string to search in. [`String`](/reference/data-types/string)
* `needles` — Array of strings to search for. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the 1-based index (position in the needles array) of the first needle found in the haystack. Returns 0 if no needles are found. The search is case-sensitive. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiSearchFirstIndex('ClickHouse Database', ['Click', 'Database', 'Server']);
```

```response title=Response theme={null}
┌─multiSearchF⋯ 'Server'])─┐
│                        1 │
└──────────────────────────┘
```

**Case-sensitive behavior**

```sql title=Query theme={null}
SELECT multiSearchFirstIndex('ClickHouse Database', ['CLICK', 'Database', 'Server']);
```

```response title=Response theme={null}
┌─multiSearchF⋯ 'Server'])─┐
│                        2 │
└──────────────────────────┘
```

**No match found**

```sql title=Query theme={null}
SELECT multiSearchFirstIndex('Hello World', ['goodbye', 'test']);
```

```response title=Response theme={null}
┌─multiSearchF⋯', 'test'])─┐
│                        0 │
└──────────────────────────┘
```

<h2 id="multiSearchFirstIndexCaseInsensitive">
  multiSearchFirstIndexCaseInsensitive
</h2>

Introduced in: v20.1.0

Returns the index `i` (starting from 1) of the leftmost found needle\_i in the string `haystack` and 0 otherwise.
Ignores case.

**Syntax**

```sql theme={null}
multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN]
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — Substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the index (starting from 1) of the leftmost found needle. Otherwise `0`, if there was no match. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiSearchFirstIndexCaseInsensitive('hElLo WoRlD', ['World', 'Hello']);
```

```response title=Response theme={null}
┌─multiSearchF⋯, 'Hello'])─┐
│                        1 │
└──────────────────────────┘
```

<h2 id="multiSearchFirstIndexCaseInsensitiveUTF8">
  multiSearchFirstIndexCaseInsensitiveUTF8
</h2>

Introduced in: v20.1.0

Searches for multiple needle strings in a haystack string, case-insensitively with UTF-8 encoding support, and returns the 1-based index of the first needle found.

**Syntax**

```sql theme={null}
multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — The string to search in. [`String`](/reference/data-types/string)
* `needles` — Array of strings to search for. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the 1-based index (position in the needles array) of the first needle found in the haystack. Returns 0 if no needles are found. The search is case-insensitive and respects UTF-8 character encoding. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('ClickHouse Database', ['CLICK', 'data', 'server']);
```

```response title=Response theme={null}
┌─multiSearchF⋯ 'server'])─┐
│                        1 │
└──────────────────────────┘
```

**UTF-8 case handling**

```sql title=Query theme={null}
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('Привет Мир', ['мир', 'ПРИВЕТ']);
```

```response title=Response theme={null}
┌─multiSearchF⋯ 'ПРИВЕТ'])─┐
│                        1 │
└──────────────────────────┘
```

**No match found**

```sql title=Query theme={null}
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('Hello World', ['goodbye', 'test']);
```

```response title=Response theme={null}
┌─multiSearchF⋯', 'test'])─┐
│                        0 │
└──────────────────────────┘
```

<h2 id="multiSearchFirstIndexUTF8">
  multiSearchFirstIndexUTF8
</h2>

Introduced in: v20.1.0

Returns the index `i` (starting from 1) of the leftmost found needle\_i in the string `haystack` and 0 otherwise.
Assumes `haystack` and `needle` are UTF-8 encoded strings.

**Syntax**

```sql theme={null}
multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — UTF-8 string in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — Array of UTF-8 substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT multiSearchFirstIndexUTF8('Здравствуйте мир', ['мир', 'здравствуйте']);
```

```response title=Response theme={null}
┌─multiSearchF⋯вствуйте'])─┐
│                        1 │
└──────────────────────────┘
```

<h2 id="multiSearchFirstPosition">
  multiSearchFirstPosition
</h2>

Introduced in: v20.1.0

Like [`position`](#position) but returns the leftmost offset in a `haystack` string which matches any of multiple `needle` strings.

Functions [`multiSearchFirstPositionCaseInsensitive`](#multiSearchFirstPositionCaseInsensitive), [`multiSearchFirstPositionUTF8`](#multiSearchFirstPositionUTF8) and [`multiSearchFirstPositionCaseInsensitiveUTF8`](#multiSearchFirstPositionCaseInsensitiveUTF8) provide case-insensitive and/or UTF-8 variants of this function.

**Syntax**

```sql theme={null}
multiSearchFirstPosition(haystack, needle1[, needle2, ...])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle1[, needle2, ...]` — An array of one or more substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the leftmost offset in a `haystack` string which matches any of multiple `needle` strings, otherwise `0`, if there was no match. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**First position search**

```sql title=Query theme={null}
SELECT multiSearchFirstPosition('Hello World',['llo', 'Wor', 'ld'])
```

```response title=Response theme={null}
┌─multiSearchFirstPosition('Hello World', ['llo', 'Wor', 'ld'])─┐
│                                                             3 │
└───────────────────────────────────────────────────────────────┘
```

<h2 id="multiSearchFirstPositionCaseInsensitive">
  multiSearchFirstPositionCaseInsensitive
</h2>

Introduced in: v20.1.0

Like [multiSearchFirstPosition](#multiSearchFirstPosition) but ignores case.

**Syntax**

```sql theme={null}
multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — Array of substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the leftmost offset in a `haystack` string which matches any of multiple `needle` strings. Returns `0`, if there was no match. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Case insensitive first position**

```sql title=Query theme={null}
SELECT multiSearchFirstPositionCaseInsensitive('HELLO WORLD',['wor', 'ld', 'ello'])
```

```response title=Response theme={null}
┌─multiSearchFirstPositionCaseInsensitive('HELLO WORLD', ['wor', 'ld', 'ello'])─┐
│                                                                             2 │
└───────────────────────────────────────────────────────────────────────────────┘
```

<h2 id="multiSearchFirstPositionCaseInsensitiveUTF8">
  multiSearchFirstPositionCaseInsensitiveUTF8
</h2>

Introduced in: v20.1.0

Like [multiSearchFirstPosition](#multiSearchFirstPosition) but assumes `haystack` and `needle` to be UTF-8 strings and ignores case.

**Syntax**

```sql theme={null}
multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — UTF-8 string in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — Array of UTF-8 substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Returns the leftmost offset in a `haystack` string which matches any of multiple `needle` strings, ignoring case. Returns `0`, if there was no match. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Find the leftmost offset in UTF-8 string 'Здравствуй, мир' ('Hello, world') which matches any of the given needles**

```sql title=Query theme={null}
SELECT multiSearchFirstPositionCaseInsensitiveUTF8('Здравствуй, мир', ['МИР', 'вст', 'Здра'])
```

```response title=Response theme={null}
┌─multiSearchFirstPositionCaseInsensitiveUTF8('Здравствуй, мир', ['МИР', 'вст', 'Здра'])─┐
│                                                                                      1 │
└────────────────────────────────────────────────────────────────────────────────────────┘
```

<h2 id="multiSearchFirstPositionUTF8">
  multiSearchFirstPositionUTF8
</h2>

Introduced in: v20.1.0

Like [multiSearchFirstPosition](#multiSearchFirstPosition) but assumes `haystack` and `needle` to be UTF-8 strings.

**Syntax**

```sql theme={null}
multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])
```

**Arguments**

* `haystack` — UTF-8 string in which the search is performed. [`String`](/reference/data-types/string)
* `needle` — Array of UTF-8 substrings to be searched. [`Array(String)`](/reference/data-types/array)

**Returned value**

Leftmost offset in a `haystack` string which matches any of multiple `needle` strings. Returns `0`, if there was no match. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Find the leftmost offset in UTF-8 string 'Здравствуй, мир' ('Hello, world') which matches any of the given needles**

```sql title=Query theme={null}
SELECT multiSearchFirstPositionUTF8('Здравствуй, мир',['мир', 'вст', 'авст'])
```

```response title=Response theme={null}
┌─multiSearchFirstPositionUTF8('Здравствуй, мир', ['мир', 'вст', 'авст'])─┐
│                                                                       4 │
└─────────────────────────────────────────────────────────────────────────┘
```

<h2 id="ngramDistance">
  ngramDistance
</h2>

Introduced in: v20.1.0

Calculates the 4-gram distance between two strings.
For this, it counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities.
The smaller the returned value, the more similar the strings are.

For case-insensitive search or/and in UTF8 format use functions [`ngramDistanceCaseInsensitive`](#ngramDistanceCaseInsensitive), [`ngramDistanceUTF8`](#ngramDistanceUTF8), [`ngramDistanceCaseInsensitiveUTF8`](#ngramDistanceCaseInsensitiveUTF8).

**Syntax**

```sql theme={null}
ngramDistance(haystack, needle)
```

**Arguments**

* `haystack` — String for comparison. [`String`](/reference/data-types/string)
* `needle` — String for comparison. [`String`](/reference/data-types/string)

**Returned value**

Returns a Float32 number between `0` and `1`. The smaller the returned value, the more similar the strings are. [`Float32`](/reference/data-types/float)

**Examples**

**Calculate 4-gram distance**

```sql title=Query theme={null}
SELECT ngramDistance('ClickHouse', 'ClickHouses')
```

```response title=Response theme={null}
┌─ngramDistance('ClickHouse', 'ClickHouses')─┐
│                                        0.1 │
└────────────────────────────────────────────┘
```

<h2 id="ngramDistanceCaseInsensitive">
  ngramDistanceCaseInsensitive
</h2>

Introduced in: v20.1.0

Provides a case-insensitive variant of [`ngramDistance`](#ngramDistance).
Calculates the 4-gram distance between two strings, ignoring case.
The smaller the returned value, the more similar the strings are.

**Syntax**

```sql theme={null}
ngramDistanceCaseInsensitive(haystack, needle)
```

**Arguments**

* `haystack` — First comparison string. [`String`](/reference/data-types/string)
* `needle` — Second comparison string. [`String`](/reference/data-types/string)

**Returned value**

Returns a Float32 number between `0` and `1`. [`Float32`](/reference/data-types/float)

**Examples**

**Case-insensitive 4-gram distance**

```sql title=Query theme={null}
SELECT ngramDistanceCaseInsensitive('ClickHouse','clickhouse')
```

```response title=Response theme={null}
┌─ngramDistanceCaseInsensitive('ClickHouse','clickhouse')─┐
│                                                       0 │
└─────────────────────────────────────────────────────────┘
```

<h2 id="ngramDistanceCaseInsensitiveUTF8">
  ngramDistanceCaseInsensitiveUTF8
</h2>

Introduced in: v20.1.0

Provides a case-insensitive UTF-8 variant of [`ngramDistance`](#ngramDistance).
Assumes that `needle` and `haystack` strings are UTF-8 encoded strings and ignores case.
Calculates the 3-gram distance between two UTF-8 strings, ignoring case.
The smaller the returned value, the more similar the strings are.

**Syntax**

```sql theme={null}
ngramDistanceCaseInsensitiveUTF8(haystack, needle)
```

**Arguments**

* `haystack` — First UTF-8 encoded comparison string. [`String`](/reference/data-types/string)
* `needle` — Second UTF-8 encoded comparison string. [`String`](/reference/data-types/string)

**Returned value**

Returns a Float32 number between `0` and `1`. [`Float32`](/reference/data-types/float)

**Examples**

**Case-insensitive UTF-8 3-gram distance**

```sql title=Query theme={null}
SELECT ngramDistanceCaseInsensitiveUTF8('abcde','CDE')
```

```response title=Response theme={null}
┌─ngramDistanceCaseInsensitiveUTF8('abcde','CDE')─┐
│                                             0.5 │
└─────────────────────────────────────────────────┘
```

<h2 id="ngramDistanceUTF8">
  ngramDistanceUTF8
</h2>

Introduced in: v20.1.0

Provides a UTF-8 variant of [`ngramDistance`](#ngramDistance).
Assumes that `needle` and `haystack` strings are UTF-8 encoded strings.
Calculates the 3-gram distance between two UTF-8 strings.
The smaller the returned value, the more similar the strings are.

**Syntax**

```sql theme={null}
ngramDistanceUTF8(haystack, needle)
```

**Arguments**

* `haystack` — First UTF-8 encoded comparison string. [`String`](/reference/data-types/string)
* `needle` — Second UTF-8 encoded comparison string. [`String`](/reference/data-types/string)

**Returned value**

Returns a Float32 number between `0` and `1`. [`Float32`](/reference/data-types/float)

**Examples**

**UTF-8 3-gram distance**

```sql title=Query theme={null}
SELECT ngramDistanceUTF8('abcde','cde')
```

```response title=Response theme={null}
┌─ngramDistanceUTF8('abcde','cde')─┐
│                               0.5 │
└───────────────────────────────────┘
```

<h2 id="ngramSearch">
  ngramSearch
</h2>

Introduced in: v20.1.0

Checks if the 4-gram distance between two strings is less than or equal to a given threshold.

For case-insensitive search or/and in UTF8 format use functions `ngramSearchCaseInsensitive`, `ngramSearchUTF8`, `ngramSearchCaseInsensitiveUTF8`.

**Syntax**

```sql theme={null}
ngramSearch(haystack, needle)
```

**Arguments**

* `haystack` — String for comparison. [`String`](/reference/data-types/string)
* `needle` — String for comparison. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the 4-gram distance between the strings is less than or equal to a threshold (`1.0` by default), `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Search using 4-grams**

```sql title=Query theme={null}
SELECT ngramSearch('ClickHouse', 'Click')
```

```response title=Response theme={null}
┌─ngramSearch('ClickHouse', 'Click')─┐
│                                  1 │
└────────────────────────────────────┘
```

<h2 id="ngramSearchCaseInsensitive">
  ngramSearchCaseInsensitive
</h2>

Introduced in: v20.1.0

Provides a case-insensitive variant of [`ngramSearch`](#ngramSearch).
Calculates the non-symmetric difference between a needle string and a haystack string, i.e. the number of n-grams from the needle minus the common number of n-grams normalized by the number of needle n-grams.
Checks if the 4-gram distance between two strings is less than or equal to a given threshold, ignoring case.

**Syntax**

```sql theme={null}
ngramSearchCaseInsensitive(haystack, needle)
```

**Arguments**

* `haystack` — String for comparison. [`String`](/reference/data-types/string)
* `needle` — String for comparison. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the 4-gram distance between the strings is less than or equal to a threshold (`1.0` by default), `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Case-insensitive search using 4-grams**

```sql title=Query theme={null}
SELECT ngramSearchCaseInsensitive('Hello World','hello')
```

```response title=Response theme={null}
┌─ngramSearchCaseInsensitive('Hello World','hello')─┐
│                                                  1 │
└────────────────────────────────────────────────────┘
```

<h2 id="ngramSearchCaseInsensitiveUTF8">
  ngramSearchCaseInsensitiveUTF8
</h2>

Introduced in: v20.1.0

Provides a case-insensitive UTF-8 variant of [`ngramSearch`](#ngramSearch).
Assumes `haystack` and `needle` to be UTF-8 strings and ignores case.
Checks if the 3-gram distance between two UTF-8 strings is less than or equal to a given threshold, ignoring case.

**Syntax**

```sql theme={null}
ngramSearchCaseInsensitiveUTF8(haystack, needle)
```

**Arguments**

* `haystack` — UTF-8 string for comparison. [`String`](/reference/data-types/string)
* `needle` — UTF-8 string for comparison. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the 3-gram distance between the strings is less than or equal to a threshold (`1.0` by default), `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Case-insensitive UTF-8 search using 3-grams**

```sql title=Query theme={null}
SELECT ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз')
```

```response title=Response theme={null}
┌─ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз')─┐
│                                                        1 │
└──────────────────────────────────────────────────────────┘
```

<h2 id="ngramSearchUTF8">
  ngramSearchUTF8
</h2>

Introduced in: v20.1.0

Provides a UTF-8 variant of `ngramSearch`.
Assumes `haystack` and `needle` to be UTF-8 strings.
Checks if the 3-gram distance between two UTF-8 strings is less than or equal to a given threshold.

**Syntax**

```sql theme={null}
ngramSearchUTF8(haystack, needle)
```

**Arguments**

* `haystack` — UTF-8 string for comparison. [`String`](/reference/data-types/string)
* `needle` — UTF-8 string for comparison. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the 3-gram distance between the strings is less than or equal to a threshold (`1.0` by default), `0` otherwise. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**UTF-8 search using 3-grams**

```sql title=Query theme={null}
SELECT ngramSearchUTF8('абвгдеёжз', 'гдеёзд')
```

```response title=Response theme={null}
┌─ngramSearchUTF8('абвгдеёжз', 'гдеёзд')─┐
│                                      1 │
└────────────────────────────────────────┘
```

<h2 id="notILike">
  notILike
</h2>

Introduced in: v20.6.0

Checks whether a string does not match a pattern, case-insensitive. The pattern can contain special characters `%` and `_` for SQL LIKE matching. Supports the optional `ESCAPE` clause (see `like`).

**Syntax**

```sql theme={null}
notILike(haystack, pattern[, escape_character])
-- haystack NOT ILIKE pattern [ESCAPE 'escape_character']
```

**Arguments**

* `haystack` — The input string to search in. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `pattern` — The SQL LIKE pattern to match against. `%` matches any number of characters (including zero), `_` matches exactly one character. [`String`](/reference/data-types/string)
* `escape_character` — Optional single-character string to use as the escape character instead of `\`. Default: `\`. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the string does not match the pattern (case-insensitive), otherwise `0`. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT notILike('ClickHouse', '%house%');
```

```response title=Response theme={null}
┌─notILike('Cl⋯ '%house%')─┐
│                        0 │
└──────────────────────────┘
```

<h2 id="notLike">
  notLike
</h2>

Introduced in: v1.1.0

Similar to [`like`](#like) but negates the result. Supports the optional `ESCAPE` clause (see `like`).

**Syntax**

```sql theme={null}
notLike(haystack, pattern[, escape_character])
-- haystack NOT LIKE pattern [ESCAPE 'escape_character']
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `pattern` — LIKE pattern to match against. [`String`](/reference/data-types/string)
* `escape_character` — Optional single-character string to use as the escape character instead of `\`. Default: `\`. [`String`](/reference/data-types/string)

**Returned value**

Returns `1` if the string does not match the `LIKE` pattern, otherwise `0`. [`UInt8`](/reference/data-types/int-uint)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT notLike('ClickHouse', '%House%');
```

```response title=Response theme={null}
┌─notLike('Cli⋯ '%House%')─┐
│                        0 │
└──────────────────────────┘
```

**Non-matching pattern**

```sql title=Query theme={null}
SELECT notLike('ClickHouse', '%SQL%');
```

```response title=Response theme={null}
┌─notLike('Cli⋯', '%SQL%')─┐
│                        1 │
└──────────────────────────┘
```

<h2 id="position">
  position
</h2>

Introduced in: v1.1.0

Returns the position (in bytes, starting at 1) of a substring `needle` in a string `haystack`.

If substring `needle` is empty, these rules apply:

* if no `start_pos` was specified: return `1`
* if `start_pos = 0`: return `1`
* if `start_pos >= 1` and `start_pos <= length(haystack) + 1`: return `start_pos`
* otherwise: return `0`

The same rules also apply to functions [`locate`](#locate), [`positionCaseInsensitive`](#positionCaseInsensitive), [`positionUTF8`](#positionUTF8) and [`positionCaseInsensitiveUTF8`](#positionCaseInsensitiveUTF8).

**Syntax**

```sql theme={null}
position(haystack, needle[, start_pos])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`Enum`](/reference/data-types/enum)
* `needle` — Substring to be searched. [`String`](/reference/data-types/string)
* `start_pos` — Position (1-based) in `haystack` at which the search starts. Optional. [`UInt`](/reference/data-types/int-uint)

**Returned value**

Returns starting position in bytes and counting from 1, if the substring was found, otherwise `0`, if the substring was not found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Basic usage**

```sql title=Query theme={null}
SELECT position('Hello, world!', '!')
```

```response title=Response theme={null}
┌─position('Hello, world!', '!')─┐
│                             13 │
└────────────────────────────────┘
```

**With start\_pos argument**

```sql title=Query theme={null}
SELECT position('Hello, world!', 'o', 1), position('Hello, world!', 'o', 7)
```

```response title=Response theme={null}
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│                                 5 │                                 9 │
└───────────────────────────────────┴───────────────────────────────────┘
```

**Needle IN haystack syntax**

```sql title=Query theme={null}
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s)
```

```response title=Response theme={null}
┌─equals(6, position(s, '/'))─┐
│                           1 │
└─────────────────────────────┘
```

**Empty needle substring**

```sql title=Query theme={null}
SELECT position('abc', ''), position('abc', '', 0), position('abc', '', 1), position('abc', '', 2), position('abc', '', 3), position('abc', '', 4), position('abc', '', 5)
```

```response title=Response theme={null}
┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│                   1 │                      1 │                      1 │                      2 │                      3 │                      4 │                      0 │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘
```

<h2 id="positionCaseInsensitive">
  positionCaseInsensitive
</h2>

Introduced in: v1.1.0

Like [`position`](#position) but case-insensitive.

**Syntax**

```sql theme={null}
positionCaseInsensitive(haystack, needle[, start_pos])
```

**Aliases**: `instr`

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`Enum`](/reference/data-types/enum)
* `needle` — Substring to be searched. [`String`](/reference/data-types/string)
* `start_pos` — Optional. Position (1-based) in `haystack` at which the search starts. [`UInt*`](/reference/data-types/int-uint)

**Returned value**

Returns starting position in bytes and counting from 1, if the substring was found, otherwise `0`, if the substring was not found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Case insensitive search**

```sql title=Query theme={null}
SELECT positionCaseInsensitive('Hello, world!', 'hello')
```

```response title=Response theme={null}
┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│                                                 1 │
└───────────────────────────────────────────────────┘
```

<h2 id="positionCaseInsensitiveUTF8">
  positionCaseInsensitiveUTF8
</h2>

Introduced in: v1.1.0

Like [`positionUTF8`](#positionUTF8) but searches case-insensitively.

**Syntax**

```sql theme={null}
positionCaseInsensitiveUTF8(haystack, needle[, start_pos])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`Enum`](/reference/data-types/enum)
* `needle` — Substring to be searched. [`String`](/reference/data-types/string)
* `start_pos` — Optional. Position (1-based) in `haystack` at which the search starts. [`UInt*`](/reference/data-types/int-uint)

**Returned value**

Returns starting position in bytes and counting from 1, if the substring was found, otherwise `0`, if the substring was not found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**Case insensitive UTF-8 search**

```sql title=Query theme={null}
SELECT positionCaseInsensitiveUTF8('Привет мир', 'МИР')
```

```response title=Response theme={null}
┌─positionCaseInsensitiveUTF8('Привет мир', 'МИР')─┐
│                                                8 │
└──────────────────────────────────────────────────┘
```

<h2 id="positionUTF8">
  positionUTF8
</h2>

Introduced in: v1.1.0

Like [`position`](#position) but assumes `haystack` and `needle` are UTF-8 encoded strings.

**Syntax**

```sql theme={null}
positionUTF8(haystack, needle[, start_pos])
```

**Arguments**

* `haystack` — String in which the search is performed. [`String`](/reference/data-types/string) or [`Enum`](/reference/data-types/enum)
* `needle` — Substring to be searched. [`String`](/reference/data-types/string)
* `start_pos` — Optional. Position (1-based) in `haystack` at which the search starts. [`UInt*`](/reference/data-types/int-uint)

**Returned value**

Returns starting position in bytes and counting from 1, if the substring was found, otherwise `0`, if the substring was not found. [`UInt64`](/reference/data-types/int-uint)

**Examples**

**UTF-8 character counting**

```sql title=Query theme={null}
SELECT positionUTF8('Motörhead', 'r')
```

```response title=Response theme={null}
┌─position('Motörhead', 'r')─┐
│                          5 │
└────────────────────────────┘
```