Full text search in MySQL
  • 05 Jul 2023
  • 2 minutes to read

Full text search in MySQL


Article Summary

Full text search (FTS) allows a sophisticated approach to text search in Totara, particularly within the catalogue. Details of how to set this up using MySQL can be found on this page.

Language support

MySQL uses case- and accent-insensitive collation for full text search (FTS) by default. However, you can specify a different collation, for example:

$CFG->dboptions['ftslanguage'] = 'utf8_unicode_ci';
 $CFG->dboptions['ftslanguage'] = 'utf8mb4_0900_as_ci';
 $CFG->dboptions['ftslanguage'] = 'utf8mb4_de_pb_0900_ai_ci';

MySQL does not support Japanese and other languages with very short words without spaces in between. Enable the following setting to get a basic experimental support of these languages:

$CFG->dboptions['fts3bworkaround'] = true;

 After any of these changes, repopulate FTS tables and rebuild indexes by running:

php admin/cli/fts_repopulate_tables.php
 php admin/cli/fts_rebuild_indexes.php

Minimum search term length

The MySQL default search character limit can be changed by editing the MySQL configuration file:

[mysqld]
 innodb_ft_min_token_size=2
 ft_min_word_len=2

Stop words 

Stop words are a set of words that will be excluded from index and search queries. They depend on language being used during indexing and search, as well as dictionary presence for database installation. For example, if a user searches for 'Hotels in Wellington' then the system excludes 'in' from the search.

More details on MySQL stop words can be found in the MySQL help documentation.

Ngram is a built-in MySQL full-text parser. It determines the beginning and end of words using white space and particular letter sequences. It is usually enabled by default, but this can depend on distribution. It can parse two or more words stems from compound words. For example, for the German word 'Fußballweltmeisterschaft', it will search by words like 'meister', 'schaft' and so on. A full-text index made without using this plugin will not able to split the word into smaller pieces, and the SQL that is trying to look for a keyword such as 'meister' will not be able to find the record.

For some ideographic languages (like the example above), the normal full-text index will be limited to what it can search for, since there are no delimiters for those words. This is where Ngram can be useful. It is a contiguous sequence of a number of characters from a sequence of text. The main function of the Ngram full-text parser is tokenising a sequence of text into a contiguous sequence of N characters.

In some cases, Ngram can return false positive results (although they are low rated and appear towards the end of the list), so it is advised that you confirm that search behaviour suits your needs after enabling Ngram support.

To enable Ngram, change your config.php setting as follows:

$CFG->dboptions['ftsngram'] = true;

Next, run:

php admin/cli/fts_rebuild_indexes.php

This setting has effect only in MySQL.

By default, MySQL supports diacritics, but it depends on collation and language defined in $CFG->dboptions['ftslanguage']. For example, when collation is utf8_general_ci, then the search query containing either word 'första' or 'forsta' will return a record containing word 'första'.

© Copyright 2024 Totara Learning Solutions. All rights reserved. Some content originally obtained via GPLv3 license and continues to be available under GPLv3. All other content is the sole copyright of Totara Learning Solutions. 


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.