Author Archives: Jesper Krogh

About Jesper Krogh

I am an Oracle Principal Technical Support Engineer who has been using MySQL as a developer and database administrator since 2006. The work I were involved in before joining Oracle included online business applications using complex database architectures and logic. Investigating and solving performance issues are among the tasks I find most interesting. I live in Sydney, Australia and enjoys visiting the national parks and traveling.

Speaking at MySQL Connect 2013

183037-mysql-tk-imspeaking-250x250-1951648It is hard to believe it is already closing in on a year since the last MySQL Connect, but it is true, it is time to start preparing again.

This year MySQL Connect will take place in the weekend of 21-23 September with the Monday being dedicated to tutorials. As last year MySQL Connect is part of Oracle OpenWorld and is hosted at Hilton, San Francisco Union Square.

I am fortunate enough this year to be taking part in three sessions:

  • Meet the MySQL Community and Support Teams [BOF2480]
    Join MySQL community and support team members in this BOF to ask all your questions as well as provide feedback, suggestions, and ideas related to the MySQL community, handling of bugs, and overall technical support at Oracle.

    This Birds-Of-a-Feather session will take place Saturday at 4:30 PM.

  • Making the Performance Schema Easier to Use [CON5282]
    The Performance Schema can seem daunting at first, with the vast amount of data available. This session focuses on tools, such as ps_helper views and stored programs, that make it easier to get started with the Performance Schema and perform common tasks. The presentation includes examples of how ps_helper can be used to simplify checking the current configuration, changing the configuration, and investigating the usage of the server.

    This talk will take place Sunday at 11:00 AM.

  • Improving Performance with MySQL Performance Schema [HOL9733]
    The Performance Schema feature of MySQL is MySQL’s gateway for looking into the engine room. It enables you not only to discover what is going on in the internals but also to get detailed information about the current connections and some historical data. MySQL 5.6, which is now GA, introduces significant enhancements to Performance Schema. This hands-on lab gives you the opportunity to use Performance Schema, going through the steps from initial configuration and high-level summaries to detailed wait events.

    This Hands-On-Labs session will take place Sunday at 2:00 PM.

However those three sessions is just a small part of the agenda for MySQL Connect. There are a total of more than 80 sessions from both Oracle developers, engineer, and staff as well as users, third party developers, and more. See also Bertrand Matthelié’s Top 10 Reasons to Attend MySQL Connect.

See you all there.

183037-mysql-tk-joinme-250x250-1951669

New Member of the Cluster API Family: The Native Node.js Connector

MySQL Cluster 7.3 went GA yesterday and with it came a new member of the MySQL Cluster API family: mysql-js – a native Node.js connector. mysql-js uses the NDB API to connect directly to the data nodes which improves performance compared to executing queries through the MySQL nodes.

For an introduction to mysql-js and installation instructions I will recommend taking a look at the official API documentation and Andrew Morgan’s blog; the latter also has an overview of the new features in MySQL Cluster 7.3 in general.

To get a feel for how the new API works, I went ahead and created a small test program that will take one or more files with delimited data (e.g. similar to what you get with SELECT … INTO OUTFILE and inserts the data into a table. I have tried to keep things simple. This means that no other external modules than mysql-js is used, not very much error handling has been included, the reading, parsing of the data files could be done much better, performance has not been considered, etc. – but I would rather focus on the usage of mysql-js.

The complete example can be found in the file nodejs_tabinsert.js. The following will go through the important bits.

Preparation

The first part of the script is not really specific to mysql-js, so I will go lightly over that. A few of the arguments deserve a couple of extra words:

  • –log-level: when set to debug or detail some output with information about what happens inside mysql-js is logged. This can be useful to learn more about the module or for debugging.
  • –basedir: this is the same as the basedir option for mysqld – it sets where MySQL has been installed. It is used for loading the mysql-js module. Default is /usr/local/mysql.
  • –database and –table: which table to insert the data into. The default database is test, but the table name must always be specified.
  • –connect-string: the script connects directly to the cluster nodes, so it needs the NDB connect-string similar to other NDB programs. The default is localhost:1186.
  • –delimiter: the delimiter used in the data files. The default is a tab (\t).

Setting Up mysql-js

With all the arguments parsed, it is not possible to load the mysql-js module:

The unified_debug class is part of mysql-js and allows to get debug information from inside mysql-js logged to the console.

The nosql.ConnectionProperties() method will return an object with the default settings for the chosen adapter – in this case ndb. After that we can change the settings where we do not want the defaults. It is also possible to use an object with the settings as the argument instead of ‘ndb’; that requires setting the name of the adapter using the “implementation” property. Currently the two supported adapters are ‘ndb’ (as in this example) and ‘mysql’ which connects to mysqld instead. ‘mysql’ required node-mysql version 2.0 and also support InnoDB.

As the ‘ndb’ adapter connects directly to the cluster nodes, no authentication is used. This is the same as for the NDB API.

Callbacks and Table Mapping Constructor

We will load each file inside a transaction. The trxCommit() callback will verify that the transaction was committed without error and then closes the session.

The onInsert callback checks whether each insert worked correctly. When all rows for the session (file) have been inserted, it commits the transaction.

The tableRow is the constructor later used for the table mapping. It is used to set up the object with the data to be inserted for that row. tableMeta is a TableMetaData object with information about the table we are inserting into.

The Session

This is were the bulk of the work is done. Each file will have it’s own session.

The onSession function is a callback that is used when creating (opening) the sessions.

The first step is to get the meta data for the table. As all data is inserted into the same table, in principle we could reuse the same meta data object for all sessions, but the getTableMetaData() method is a method of the session, so it cannot be fetched until this point.

Next a transaction is started. We get the transaction with the session.currentTransaction() method. This returns an idle transaction which can then be started using the begin() method. As such there is not need to store the transaction in a variable; as can be seen in the trxCommit() and onInsert() callbacks above, it is also possible to call session.currnetTransaction() repeatedly – it will keep returning the same transaction object.

The rest of the onSession function processes the actual data. The insert itself is performed with the session.persist() method.

Edit: using a session this way to insert the rows one by one is obviously not very efficient as it requires a round trip to the data nodes for each row. For bulk inserts the Batch class is a better choice, however I chose Session to demonstrate using multiple updates inside a transaction.

Creating the Sessions

First the table mapping is defined. Then a session is opened for each file. Opening a session means connecting to the cluster, so it can be a relatively expensive step.

Running the Script

To test the script, the table t1 in the test database will be used:

For the data files, I have been using:

t1a.txt:

t1b.txt:

Running the script:

One important observation is that even though the session for t1a.txt is created before the one for t1b, the t1b.txt file is ending up being inserted first. Actually if the inserts were using auto-increments, it would be possible to see that in fact, the actual assignment of auto-increment values will in general alternate between rows from t1b.txt and t1a.txt. The lesson: in node.js do not count on knowing the exact order of operations.

I hope this example will spark your interest in mysql-js. Feedback is most welcome – both bug reports and feature requests can be reports at bugs.mysql.com.

Easier Overview of Current Performance Schema Setting

While I prepared for my Hands-On Lab about the Performance Schema at MySQL Connect last year, one of the things that occurred to me was how difficult it was quickly getting an overview of which consumers, instruments, actors, etc. are actually enabled. For the consumers things are made more complicated as the effective setting also depends on parents in the hierarchy. So my thought was: “How difficult can it be to write a stored procedure that outputs a tree of the hierarchies.” Well, simple enough in principle, but trying to be general ended up making it into a lengthy project and as it was a hobby project, it often ended up being put aside for more urgent tasks.

However here around eight months later, it is starting to shape up. While there definitely still is work to be done, e.g. creating the full tree and outputting it in text mode (more on modes later) takes around one minute on my test system – granted I am using a standard laptop and MySQL is running in a VM, so it is nothing sophisticated.

The current routines can be found in ps_tools.sql.gz – it may later be merged into Mark Leith’s ps_helper to try to keep the Performance Schema tools collected in one place.

Note: This far the routines have only been tested in Linux on MySQL 5.6.11. Particularly line endings may give problems on Windows and Mac.

Description of the ps_tools Routines

The current status are two views, four stored procedure, and four functions – not including several functions and procedures that does all the hard work:

  • Views:
    • setup_consumers – Displays whether each consumer is enabled and whether the consumer actually will be collected based on the hierarchy rules described in Pre-Filtering by Consumer in the Reference Manual.
    • accounts_enabled – Displays whether each account defined in the mysql.user table has instrumentation enabled based on the rows in performance_schema.setup_actors.
  • Procedures:
    • setup_tree_consumers(format, color) – Create a tree based on setup_consumers displaying whether each consumer is effectively enabled. The arguments are:
      • format is the output format and can be either (see also below).:
        • Text: Left-Right
        • Text: Top-Bottom
        • Dot: Left-Right
        • Dot: Top-Bottom
      • color is whether to add bash color escapes sequences around the consumer names when used a Text format (ignored for Dot outputs).
    • setup_tree_instruments(format, color, only_enabled, regex_filter) – Create a tree based on setup_instruments displaying whether each instrument is enabled. The tree is creating by splitting the instrument names at each /. The arguments are:
      • format is the output format and can be either:
        • Text: Left-Right
        • Dot: Left-Right
        • Dot: Top-Bottom
      • color is whether to add bash color escapes sequences around the instrument names when used a Text format (ignored for Dot outputs).
      • type – whether to base the tree on the ENABLED or TIMED column of setup_instruments.
      • only_enabled – if TRUE only the enabled instruments are included.
      • regex_filter – if set to a non-empty string only instruments that match the regex will be included.
    • setup_tree_actors_by_host(format, color) – Create a tree of each account defined in mysql.user and whether they are enabled; grouped by host. The arguments are:
      • format is the output format and can be either:
        • Text: Left-Right
        • Dot: Left-Right
        • Dot: Top-Bottom
      • color is whether to add bash color escape sequences around the names when used a Text format (ignored for Dot outputs).
    • setup_tree_actors_by_user – Create a tree of each account defined in mysql.user and whether they are enabled; grouped by username. The arguments are:
      • format is the output format and can be either:
        • Text: Left-Right
        • Dot: Left-Right
        • Dot: Top-Bottom
      • color is whether to add bash color escape sequences around the names when used a Text format (ignored for Dot outputs).
  • Functions:
    • is_consumer_enabled(consumer_name) – Returns whether a given consumer is effectively enabled.
    • is_account_enabled(host, user) – Returns whether a given account (host, user) is enabled according to setup_actors.
    • substr_count(haystack, needle, offset, length) – The number of times a given substring occurs in a string. A port of the PHP function of the same name.
    • substr_by_delim(set, delimiter, pos) – Returns the Nth element from a delimiter string.

The two functions substr_count() and substr_by_delim() was also described in an earlier blog.

The formats for the four stored procedures consists of two parts: whether it is Text or Dot and the direction. Text is a tree that can be viewed directly either in the mysql command line client (coloured output not supported) or the shell (colours supported for bash). Dot will output a DOT graph file in the same way as dump_thread_stack() in ps_helper. The direction is as defined in the DOT language, so e.g. Left-Right will have the first level furthest to the left, then add each new level to the right of the parent level.

Examples

As the source code – including comments – is more than 1600 lines, I will not discuss it here, but rather go through some examples.

setup_tree_consumers

Using the coloured output:

setup_tree_consumers_tbor the same using a non-coloured output:
setup_tree_consumers_lr

setup_tree_instruments

setup_tree_instruments_lrHere a small part of the tree is selected using a regex.

setup_tree_actors_%

With only root@localhost and root@127.0.0.1 enabled, the outputs of setup_tree_actors_by_host and setup_tree_actors_by_user gives respectively:setup_tree_actors_by_host_lrsetup_tree_actors_by_user_lr

DOT Graph of setup_instruments

The full tree of setup_instruments can be created using the following sequence of steps (I am using graphviz to get support for dot files):

setup_tree_instruments_dot_lr_snipThe full output is rather large (6.7M). If you want to see if you can get to it at http://mysql.wisborg.dk/wp-content/uploads/2013/05/setup_tree_instruments_dot_lr.png.

Views

Conclusion

There is definitely more work to do on making the Performance Schema easier to access. ps_helper and ps_tools are a great start to what I am sure will be an extensive library of useful diagnostic queries and tools.

Changing the Size of the InnoDB Log Files In MySQL 5.6

In MySQL 5.5 and earlier, the steps to resize the InnoDB log files were a bit involved and for example included manually moving the log files out of the way as InnoDB would only create new files, if none existed.

In MySQL 5.6 a not so much talked about feature is the support to resize the log files in a way much more similar to changing other settings in MySQL. Now you simply update your MySQL configuration file and restart MySQL.

Let us look at an example. In MySQL 5.5 and earlier the total size of the InnoDB log files has to be less than 4G in total, so one way of staying within this limit is to have two files each 2047M large:

Now update the configuration file to take advantage of the fact that MySQL 5.6 allows much larger InnoDB log files; the actual limit is a total size of 512G, but here I will use two files each 4G large:

Restarting MySQL will then automatically resize the log files, and the error log will show something like:

One of the other requirements when changing the log file size in MySQL 5.5 and earlier was that innodb_fast_shutdown must be set to 0 or 1 (the default value is 1). What happens in MySQL 5.6 if you have innodb_fast_shutdown = 2 and try to change the log size? Well now InnoDB handles that as well – InnoDB will do its “crash recovery” and then resize the log files:

And a look into the error log for the restart (setting the size back to 2 times 2047M):

While it is not something this that makes an impact during normal operations, it just helps making the life of a DBA (or Support engineer) life a little easier.

Changes to Options and Variables in MySQL 5.6

With MySQL 5.6 just gone GA, I thought it would be good to take a look at the changes in options and variables that comes with the new release.

First of all, several of the existing options have get new default values. As James Day already have written a good post about that in his blog, I will refer to that instead of going through the changes. For a general overview of the new features and improvements, the recent blogs by Rob Young and Peter Saitsev are good starting points together with the What is New in MySQL 5.6 page in the Reference Manual are good places to start.

Instead I will focus a little on the new options that has been introduced. The first thing to note is that a in the current 5.5. release (5.5.30) there are 323 variables whereas 5.6 GA (5.6.10) returns 440 rows.

Note: this post is written using the Enterprise versions with the semi-synchronous replication plugins enabled in both versions plus the memcached and password validation plugins in 5.6.

Actually the number of new variables is not 117 but 129 as 12 variables have been removed in 5.6.

So what are all of these 129 new variables good for? Actually there is a good chance that you will never need to touch many of them as the default value is good enough, they simply have been added to provide the value of options already present in 5.5 but not exposed through SHOW GLOBAL VARIABLES, or that they are for features you are not using. If we try to group the new variables the distribution comes out as:

FeatureNew Variables
Global Transaction IDs5
Other Replication19
Memcached Plugin6
Validate Password Plugin6
Other Security Related5
InnoDB54
Optimizer Traces5
Performance Schema15
Exposing Previously Existing Variables2
Other12

New Variables in MySQL 5.6

The 54 new InnoDB variables span a number of different changes and additions such as:

  • New adaptive flushing algorithm
  • Buffer Pool dumps to disk and restore
  • Support for additional checksum algorithms
  • Improvements for compression
  • Full text indexes
  • New monitoring options (the information_schema.metrics table)
  • Configurable page size
  • Persistent statistics
  • Undo logs improvements
  • And more …

For reference I have added a list of the new variables with the release they were introduced and the default value (additionally innodb_print_all_deadlocks is also new, but that was also added to 5.5.30):

Note that while the default values are for an installation on Linux, most will also apply to other platforms. See also the Reference Manual.

For good measure here is a list of the variables that have been removed in 5.6:

  • engine_condition_pushdown – deprecated in 5.5.3, use optimizer_switch instead.
  • have_csv – use SHOW ENGINES or information_schema.ENGINES instead.
  • have_innodb – use SHOW ENGINES or information_schema.ENGINES instead.
  • have_ndbcluster – use SHOW ENGINES or information_schema.ENGINES instead.
  • have_partitioning – use SHOW ENGINES or information_schema.ENGINES instead.
  • log – deprecated in 5.1.29, use general_log instead.
  • log_slow_queries – deprecated in 5.1.29, use slow_query_log instead.
  • max_long_data_size – deprecated in 5.5.11, is now automatically controlled by max_allowed_packet.
  • rpl_recovery_rank – previously unused.
  • sql_big_tables – hasn’t really been needed since 3.23.2.
  • sql_low_priority_updates – Use low_priority_updates instead.
  • sql_max_join_size

Merry Christmas

Greeting
Merry Christmas and Happy New Year
See you in 2013

mysql is the MySQL Command Line Client – my favourite tool when using MySQL. It is available on all platforms where you can install MySQL and is part of a standard installation. As you can see there are many options, including the -H which causes the output to be HTML formatted. Ok, I cheated here and created the table manually as my blog editor escapes HTML code – the actual output looked like:

So a Happy Christmas and Happy New Year to all MySQL users.

A Couple of Substring Functions: substr_count() and substr_by_delim()

A problem that sometimes when writing queries or stored routines is the need to use strings to do basic manipulation of the data. While from a performance perspective it is generally faster to do these manipulations inside the application, for various reasons it may be desirably to keep things inside MySQL.

This post lists two stored functions that can be used for simple manipulation of strings.

substr_count()

This is a port of the PHP function of the same name. It counts the number of times a given substring is encountered in a text. The signature is:

The function as it stands here, has the following limitations and behaviours:

  • As the MySQL convention is to use 1 as the offset, so is the MySQL port of substr_count().
  • As MySQL stored functions do not support optional arguments, all arguments must be specified. For in_offset and in_length use NULL or 0 to use the default values.
  • The maximum length supported for the needle is 255 characters.
  • I am using the LOCATE() function as that returns the first occurrence of a substring after a given offset. This will particularly benefit the performance in cases where the needle is only sparsely present in the search string.

The definition of substr_count() is:

An example of how to use substr_count() is:

substr_by_delim()

The substr_by_delim() function can be used to pick out one element of a delimited string; an example would be to determine the second element in the comma delimited string ‘a,b,c,d,e’:

The width of in_delimiter is set to match the in_needle in substr_count() as substr_count() is used to find the number of delimiters in the search text.

Like with the built-in SUBSTRING_INDEX() function, it is supported both to use positive and negative positions where a negative position counts from the end of the search string.

Examples of how to used substr_by_delim() are:

Hope you will find the two functions useful.

Slides and Other Files From My Hands-On Labs at MySQL Connect 2012

First of all a big thank you to all of you who attended my two Hands-On Labs (HOL) session at this year’s MySQL Connect. I ended up doing two sessions as there was a last minute cancellation, so in addition to the previously announced session about the Performance Schema, I also did an introduction to MySQL.

The slides and the workbook for the Performance Schema session will become available from the official Oracle OpenWorld/MySQL Connect catalog, but you can also get the files from my blog which for the Performance Schema session will also include the helper functions and procedures used and some sample queries used to create load on the server.

HOL10471 – Getting Started with MySQL

This session only had slides, however the slides includes the commands and queries executed during the session. Note that the part on using MySQL Workbench (by Alfredo Kojimais) is not included.

The slides are available at: HOL 10471 – Getting Started with MySQL

HOL10472 – Improving Performance with the MySQL Performance Schema

The Performance Schema lab used several files. The main one to use is the workbook which includes details on the queries and commands to run. The hol10472.tgz also includes some support files (for example stored routines) used.

The following files can be downloaded:

Have fun playing with MySQL!

Why I am So Excited About the MySQL Performance Schema

The improved Performance Schema in MySQL 5.6 provides a new way of investigating issues in the database. Many issues that previously required tools such as strace, dtrace, etc. can now be investigated directly from inside MySQL in a platform independent way using standard SQL statements.

The Performance Schema is enabled by default starting from the latest milestone release, 5.6.6. You have instruments which are the things you can measure, and consumers which are those that use the measurements. Not all instruments and consumers are enabled out of the box, however once the plugin is enabled, individual instruments and consumers can be switched on and off dynamically.

As an example take the case mentioned in What’s the innodb main thread really doing? where the main InnoDB thread appears to be stuck in “doing background drop tables” even though no tables are being dropped. Now the underlying issue has been resolved in 5.6, but other issues could show up. So how would the Performance Schema help in cases like that?

First you need to ensure that the necessary instrumentations and consumers are enabled:

Note: the above settings are just the specifics to the investigation below. You will also need to ensure all the involved threads are instrumented, and most likely you also want to enable other instruments and/or consumers to be able to get more details, for example if an SQL query is involved, you can get more information about what the query is doing.

The mutex contention can now easily be investigated using the two tables events_waits_current and mutex_instances. The events_waits_current table will have a row for each combination of thread and event. Threads includes both background threads and foreground threads (those clients are using), so for example an interaction between the purge thread(s) in InnoDB and a query submitted by the application can be checked.

A very simple query to get the investigation started can look like:

Once you have the basic information about the threads involved, you can look for more information. One example is to use the events_statements_current, events_statements_history, and events_statements_history_long tables to get more knowledge of any foreground threads involved.

If you attend MySQL Connect 2012, you will be able to learn more about the MySQL Performance Schema for example by attending the hands-on lab “Improving Performance with the MySQL Performance Schema (HOL10472)” where you will be able to try out the Performance Schema yourself.

If you cannot make it to MySQL Connect or cannot wait to try the Performance Schema yourself, you can download MySQL Server 5.6 from https://dev.mysql.com/downloads/mysql/5.6.html. I can also recommend downloading Mark Leith’s ps_helper views and stored procedures.

What is the MySQL Performance Schema and Why is It Needed?

When you have a non-trivial database installation, you will inevitably sooner or later encounter performance related issues ranging from a query not executing as fast as desirable to complete meltdowns where the database does not respond at all.

Until MySQL 5.5 the tools available to investigate what is going on inside MySQL were somewhat limited. Some of the tools were:

  • The slow and general query logs
  • The status counters available through SHOW [SESSION|GLOBAL] STATUS
  • Storage engine status, e.g. SHOW ENGINE INNODB STATUS
  • The EXPLAIN command to investigate the query plan of a SELECT statement
  • SHOW PROFILE to profile one or more queries
  • The MySQL error log

All of these tools are very useful, but also have their limitations, for example the SHOW STATUS mainly consists of counters that does not provide much insight into what is happening specifically.

In MySQL 5.5 a new tool was introduced, the Performance Schema (often abbreviated P_S). The Performance Schema consist of instrumentation points directly in the source code which allow inspection of the internals of the server at runtime. Some of the advantages for the Performance Schema implementation are:

  • The Performance Schema data is available through the PERFORMANCE SCHEMA storage engine in the performance_schema database, so it is possible to query the data using standard SQL statements.
  • The Performance Schema is available irrespectively of the platform, so while the exact data collected will differ between platforms, the way it works from a DBAs perspective it is the same. This for example means it is possible to create tools that can work across all the MySQL instances. A great example of this is the ps_helper collection of views and stored procedures written by Mark Leith.
  • It is possible to configure the Performance Schema dynamically as long as the plugin has been enabled (this is the defaults as of MySQL 5.6.6).
  • It is easy to add new instrumentation points including adding instrumentation to third party plugins.
  • Enabling the Performance Schema is transparent to normal operations (although obviously there will be a small performance impact – MySQL 5.6 is much better in this respect than MySQL 5.5 though).

For the full list of implementation details, see the documentation of the Performance Schema in the MySQL Reference Manual.

In MySQL 5.5 the amount of information in the Performance Schema is relatively limited. If you take a look into the performance_schema database, it becomes immediately obvious that much has happened between MySQL 5.5 and 5.6.6: the former has 17 views, the latter 52 views. The information available in MySQL 5.5 is primarily I/O oriented whereas MySQL 5.6 also – among other – includes:

  • Statement information (similar to the slow query log)
  • Information about index usage, e.g. to identify unused indexes
  • The possibility to create a stack trace for a given thread
  • etc.

The Performance Schema is the subject of both a talk and a hands-on lab at the upcoming MySQL Connect 2012 which takes place on September 29th and 30th in San Francisco. These two sessions will go into greater depth about the above features. You can also read more about the sessions at: