A Couple of Substring Functions: substr_count() and substr_by_delim()

A problem that sometimes when writing queries or stored routines is the need to use strings to do basic manipulation of the data. While from a performance perspective it is generally faster to do these manipulations inside the application, for various reasons it may be desirably to keep things inside MySQL.

This post lists two stored functions that can be used for simple manipulation of strings.

substr_count()

This is a port of the PHP function of the same name. It counts the number of times a given substring is encountered in a text. The signature is:

The function as it stands here, has the following limitations and behaviours:

  • As the MySQL convention is to use 1 as the offset, so is the MySQL port of substr_count().
  • As MySQL stored functions do not support optional arguments, all arguments must be specified. For in_offset and in_length use NULL or 0 to use the default values.
  • The maximum length supported for the needle is 255 characters.
  • I am using the LOCATE() function as that returns the first occurrence of a substring after a given offset. This will particularly benefit the performance in cases where the needle is only sparsely present in the search string.

The definition of substr_count() is:

An example of how to use substr_count() is:

substr_by_delim()

The substr_by_delim() function can be used to pick out one element of a delimited string; an example would be to determine the second element in the comma delimited string ‘a,b,c,d,e’:

The width of in_delimiter is set to match the in_needle in substr_count() as substr_count() is used to find the number of delimiters in the search text.

Like with the built-in SUBSTRING_INDEX() function, it is supported both to use positive and negative positions where a negative position counts from the end of the search string.

Examples of how to used substr_by_delim() are:

Hope you will find the two functions useful.

4 thoughts on “A Couple of Substring Functions: substr_count() and substr_by_delim()

  1. strcmp

    Why can’t you implement substr_count() without a loop via (CHAR_LENGTH(haystack)-CHAR_LENGTH(REPLACE(haystack,needle,”)))/CHAR_LENGTH(needle)?

    1. Jesper Krogh Post author

      That will work as well (adding a haystack = SUBSTRING(…) if you want support for offset and length). Testing this with the Performance Schema suggest the performance of the two implementations are fairly similar.

Comments are closed.