A Couple of Substring Functions: substr_count() and substr_by_delim()

A problem that sometimes when writing queries or stored routines is the need to use strings to do basic manipulation of the data. While from a performance perspective it is generally faster to do these manipulations inside the application, for various reasons it may be desirably to keep things inside MySQL.

This post lists two stored functions that can be used for simple manipulation of strings.


This is a port of the PHP function of the same name. It counts the number of times a given substring is encountered in a text. The signature is:

   in_haystack mediumtext,
   in_needle varchar(255),
   in_offset int unsigned,
   in_length int unsigned
) RETURNS int unsigned

The function as it stands here, has the following limitations and behaviours:

  • As the MySQL convention is to use 1 as the offset, so is the MySQL port of substr_count().
  • As MySQL stored functions do not support optional arguments, all arguments must be specified. For in_offset and in_length use NULL or 0 to use the default values.
  • The maximum length supported for the needle is 255 characters.
  • I am using the LOCATE() function as that returns the first occurrence of a substring after a given offset. This will particularly benefit the performance in cases where the needle is only sparsely present in the search string.

The definition of substr_count() is:

 FUNCTION substr_count(in_haystack mediumtext, in_needle varchar(255), in_offset int unsigned, in_length int unsigned) RETURNS int unsigned
      DECLARE v_count, v_haystack_len, v_needle_len, v_offset, v_endpos int unsigned DEFAULT 0;

      SET v_haystack_len = CHAR_LENGTH(in_haystack),
          v_needle_len   = CHAR_LENGTH(in_needle),
          v_offset       = IF(in_offset IS NOT NULL AND in_offset > 0, in_offset, 1),
          v_endpos       = IF(in_length IS NOT NULL AND in_length > 0, v_offset + in_length, v_haystack_len);

      -- The last offset to use with LOCATE is at v_endpos - v_needle_len.
      -- That also means that if v_needlen > v_endpos, the count is trivially 0
      IF (v_endpos > v_needle_len) THEN
         SET v_endpos = v_endpos - v_needle_len;
         WHILE (v_offset < v_endpos) DO
            SET v_offset = LOCATE(in_needle, in_haystack, v_offset);
            IF (v_offset > 0) THEN
               -- v_offset is now the position of the first letter in the needle.
               -- Skip the length of the needle to avoid double counting.
               SET v_count  = v_count  + 1,
                  v_offset = v_offset + v_needle_len;
               -- The needle was not found. Set v_offset = v_endpos to exit the loop.
               SET v_offset = v_endpos;
            END IF;
         END WHILE;
      END IF;

      RETURN v_count;


An example of how to use substr_count() is:

mysql> SELECT substr_count('a/b/c/d/e', '/', 3, 5);
| substr_count('a/b/c/d/e', '/', 3, 5) |
|                                    2 |
1 row in set (0.00 sec)


The substr_by_delim() function can be used to pick out one element of a delimited string; an example would be to determine the second element in the comma delimited string ‘a,b,c,d,e':

   in_set mediumtext,
   in_delimiter varchar(255),
   in_pos int
) RETURNS mediumtext

The width of in_delimiter is set to match the in_needle in substr_count() as substr_count() is used to find the number of delimiters in the search text.

 FUNCTION substr_by_delim(in_set mediumtext, in_delimiter varchar(255), in_pos int) RETURNS mediumtext
  COMMENT 'Returns the Nth element from a delimited list.'
      DECLARE v_num_parts int unsigned DEFAULT 0;

      IF (in_pos < 0) THEN
         -- substr_count returns the number of delimiters, add 1 to get the number of parts
         SET v_num_parts = substr_count(in_set, in_delimiter, NULL, NULL) + 1;
         IF (v_num_parts >= ABS(in_pos)) THEN
            -- Add the requested position (which is negative, so is actually a subtraction)
            -- Add 1 as the position is 1 based.
            SET in_pos = v_num_parts + in_pos + 1;
            -- The requested position is out of range, so set in_pos to 0.
            SET in_pos = 0;
         END IF;
      END IF;
      IF (in_pos <= 0 OR in_pos IS NULL OR in_pos > substr_count(in_set, in_delimiter, 0, NULL)+1) THEN
         -- in_pos is not BETWEEN 1 AND #of elements.
         RETURN NULL;
         RETURN SUBSTRING_INDEX(SUBSTRING_INDEX(in_set, in_delimiter, in_pos), in_delimiter, -1);
      END IF;

Like with the built-in SUBSTRING_INDEX() function, it is supported both to use positive and negative positions where a negative position counts from the end of the search string.

Examples of how to used substr_by_delim() are:

mysql> SELECT substr_by_delim('a,b,c,d,e', ',', 2);
| substr_by_delim('a,b,c,d,e', ',', 2) |
| b                                    |
1 row in set (0.01 sec)

mysql> SELECT substr_by_delim('a,b,c,d,e', ',', -2);
| substr_by_delim('a,b,c,d,e', ',', -2) |
| d                                     |
1 row in set (0.00 sec)

mysql> SELECT substr_by_delim('a||b||c||d||e', '||', 2);
| substr_by_delim('a||b||c||d||e', '||', 2) |
| b                                         |
1 row in set (0.00 sec)

Hope you will find the two functions useful.

Slides and Other Files From My Hands-On Labs at MySQL Connect 2012

First of all a big thank you to all of you who attended my two Hands-On Labs (HOL) session at this year's MySQL Connect. I ended up doing two sessions as there was a last minute cancellation, so in addition to the previously announced session about the Performance Schema, I also did an introduction to MySQL.

The slides and the workbook for the Performance Schema session will become available from the official Oracle OpenWorld/MySQL Connect catalog, but you can also get the files from my blog which for the Performance Schema session will also include the helper functions and procedures used and some sample queries used to create load on the server.

HOL10471 – Getting Started with MySQL

This session only had slides, however the slides includes the commands and queries executed during the session. Note that the part on using MySQL Workbench (by Alfredo Kojimais) is not included.

The slides are available at: HOL 10471 – Getting Started with MySQL

HOL10472 – Improving Performance with the MySQL Performance Schema

The Performance Schema lab used several files. The main one to use is the workbook which includes details on the queries and commands to run. The hol10472.tgz also includes some support files (for example stored routines) used.

The following files can be downloaded:

Have fun playing with MySQL!