To celebrate the publishing of my new book MySQL 8 Query Performance Tuning, the Apress team invited me (thanks Jonathan and Liz) to write a post for the Apress blog. I decided to write about my top six best practices:
I have over the last few years been fortunate to have two books published through Apress, Pro MySQL NDB Cluster which I wrote together with Mikiya Okuno and MySQL Connector/Python Revealed. With the release of MySQL 8 around a year ago, I started to think of how many changes there has been in the last few MySQL versions. Since MySQL 5.6 was released as GA in early 2013, some of the major features related to performance tuning includes the Performance Schema which was greatly changed in 5.6, histograms, EXPLAIN ANALYZE, hash joins, and visual explain. Some of these are even unique in MySQL 8.
So, I was thinking that it could be interesting to write a book that focuses on performance tuning in MySQL 8. In order to try to limit the scope somewhat (which as you can see from the page count that I was not too successful with), I decided to mainly look at the topics related to query performance. I proposed this to my acquisition editor Jonathan Gennick, and he was very interested. Oracle whom I worked for at the time was also interested (thanks Adam Dixon, Victoria Reznichenko, Edwin DeSouza, and Rich Mason for supporting me and approving the project). Also thanks to the Apress editors and staff who has been involved including but not limited to Jonathan Gennick, Jill Balzano, Laura Berendson, and Arockia Rajan Dhurai.
Now around a year later, the final result is ready: MySQL 8 Query Performance Tuning. If you are interested, you can read more about the content and/or buy it at Apress, Amazon, and others book shops:
The book is divided into six parts with a total of 27 chapters. I have attempted to keep each chapter relatively self-contained with the aim that you can use the book as a reference book. The drawback of this choice is that there is some duplication of information from time to time. An example is Chapter 18 which describes the more theoretical side of locks and how to monitor locks, and Chapter 22 which provides practical examples of investigating lock contention. Chapter 22 naturally draws on the information in Chapter 18, so some of the information is repeated. This was a deliberate choice, and I hope it helps you reduce the amount of page flipping to find the information you need.
The six parts progressively move you through the topics starting with some basic background and finishing with more solution-oriented tasks. The first part starts out discussing the methodology, benchmarks, and test data. The second part focuses on the sources of information such as the Performance Schema. The third part covers the tools such as MySQL Shell used in this book. The fourth part provides the theoretical background used in the last two parts. The fifth part focuses on analyzing queries, transactions, and locks. Finally, the sixth part discusses how to improve performance through the configuration, query optimization, replication, and caching. There are cases where some content is a little out of place, like all replication information is contained in a single chapter.
Part I: Getting Started
Part I introduces you to the concepts of MySQL query performance tuning. This includes some high-level considerations, of which some are not unique to MySQL (but are of course discussed in the context of MySQL). The four chapters are
Chapter 1 – MySQL Performance Tuning This introductory chapter covers some high-level concepts of MySQL performance tuning such as the importance of considering the whole stack and the lifecycle of a query.
Chapter 2 – Query Tuning Methodology It is important to work in an effective way to solve performance problems. This chapter introduces a methodology to work effectively and emphasizes the importance of working proactively rather than doing firefighting.
Chapter 3 – Benchmarking with Sysbench It is often necessary to use benchmarks to determine the effect of a change. This chapter introduces benchmarking in general and specifically discusses the Sysbench tool including how to create your own custom benchmarks.
Chapter 4 – Test Data The book mostly uses a few standard test databases which are introduced in this chapter.
Part II: Sources of Information
MySQL exposes information about the performance through a few sources. The Performance Schema, the sys schema, the Information Schema, and the SHOW statement are introduced in each their chapter. There are only relatively few examples of using these sources in this part; however, these four sources of information are used extensively in the remainder of the book. If you are not already familiar with them, you are strongly encouraged to read this part. Additionally, the slow query log is covered. The five chapters are
Chapter 5 – The Performance Schema The main source of performance related information in MySQL is – as the name suggests – the Performance Schema. This chapter introduces the terminology, the main concepts, the organization, and the configuration.
Chapter 6 – The sys Schema The sys schema provides reports through predefined views and utilities in stored functions and programs. This chapter provides an overview of what features are available.
Chapter 7 – The Information Schema If you need metadata about the MySQL and the databases, the Information Schema is the place to look. It also includes important information for performance tuning such as information about indexes, index statistics, and histograms. This chapter provides an overview of the views available in the Information Schema.
Chapter 8 – SHOW Statements The SHOW statements are the oldest way to obtain information ranging from which queries are executing to schema information. This chapter relates the SHOW statements to the Information Schema and Performance Schema and covers in somewhat more detail the SHOW statements without counterparts in the two schemas.
Chapter 9 – The Slow Query Log The traditional way to find slow queries is to log them to the slow query log. This chapter covers how to configure the slow query log, how to read the log events, and how to aggregate the events with the mysqldump utility.
Part III: Tools
MySQL provides several tools that are useful when performing the daily work as well as specialized tasks. This part covers three tools ranging from monitoring to simple query execution. This book uses Oracle’s dedicated MySQL monitoring solution (requires commercial subscription but is also available as a trial) as an example of monitoring. Even if you are using other monitoring solutions, you are encouraged to study the examples as there will be a large overlap. These three tools are also used extensively in the remainder of the book. The three chapters in this part are
Chapter 10 – MySQL Enterprise Monitor Monitoring is one of the most important aspects of maintaining a stable and well-performing database. This chapter introduces MySQL Enterprise Monitor (MEM) and shows how you can install the trial and helps you navigate and use the graphical user interface (GUI).
Chapter 11 – MySQL Workbench MySQL provides a graphical user interface through the MySQL Workbench product. This chapter shows how you can install and use it. In this book, MySQL Workbench is particularly important for its ability to create diagrams – known as Visual Explain – representing the query execution plans.
Part IV: Schema Considerations and the Query Optimizer
In Part IV, there is a change of pace, and the focus moves to the topics more directly related to performance tuning starting with topics related to the schema, the query optimizer, and locks. The six chapters are
Chapter 13 – Data Types In relational databases, each column has a data type. This data type defines which values can be stored, which rules apply when comparing two values, how the data is stored, and more. This chapter covers the data types available in MySQL and gives guidance on how to decide which data types to use.
Chapter 14 – Indexes An index is used to locate data, and a good indexing strategy can greatly improve the performance of your queries. This chapter covers the index concepts, considerations about indexes, index types, index features, and more. It also includes a discussion on how InnoDB uses indexes and how to come up with an indexing strategy.
Chapter 15 – Index Statistics When the optimizer needs to determine how useful an index is and how many rows match a condition on an indexed value, it needs information on the data in the index. This information is index statistics. This chapter covers how index statistics work in MySQL, how to configure them, monitoring, and updating the index statistics.
Chapter 16 – Histograms If you want the optimizer to know how frequent a value occurs for a given column, you need to create a histogram. This is a new feature in MySQL 8, and this chapter covers how histograms can be used, their internals, and how to query the histogram metadata and statistics.
Chapter 17 – The Query Optimizer When you execute a query, it is the query optimizer that determines how to execute it. This chapter covers the tasks performed by the optimizer, join algorithms, join optimizations, configuration of the optimizer, and resource groups.
Chapter 18 – Locking Theory and Monitoring One of the problems that can cause the most frustration is lock contention. The first part of this chapter explains why locks are needed, lock access levels, and lock types (granularities). The second part of the chapter goes into what happens when a lock cannot be obtained, how to reduce lock contention, and where to find information about locks.
Part V: Query Analysis
With the information from Part IV, you are now ready to start analyzing queries. This includes finding the queries for further analysis and then analyzing the query using EXPLAIN or the Performance Schema. You also need to consider how transactions work and investigate lock contention when you have two or more queries fighting for the same locks. The four chapters are
Chapter 19 – Finding Candidate Queries for Optimization Whether part of the daily maintenance or during an emergency, you need to find the queries that you need to analyze and potentially optimize. This chapter shows how you can use the Performance Schema, the sys schema, MySQL Workbench, your monitoring solution, and the slow query log to find the queries that are worth looking into.
Chapter 20 – Analyzing Queries Once you have a candidate query, you need to analyze why it is slow or impacts the system too much. The main tool is the EXPLAIN statement which provides information about the query plan chosen by the optimizer. How to generate and read – including examples – the query plans using EXPLAIN is the main focus of the chapter. You can also use the optimizer trace to get more information on how the optimizer arrived at the selected query plan. An alternative way to analyze queries is to use the Performance Schema and sys schema to break queries down into smaller parts.
Chapter 21 – Transactions InnoDB executes everything as a transaction, and transactions is an important concept. Proper use of transactions ensures atomicity, consistency, and isolation. However, transactions can also be the cause of severe performance and lock problems. This chapter discusses how transactions can become a problem and how to analyze them.
Chapter 22 – Diagnosing Lock Contention This chapter goes through four scenarios with lock contention (flush locks, metadata locks, record-level locks, and deadlocks) and discusses the symptoms, the cause, how to set up the scenario, the investigation, the solution, and how to prevent problems.
Part VI: Improving Queries
You have found your problem queries and analyzed them and their transaction to understand why they are underperforming. But how do you improve the queries? This chapter goes through the most important configuration options not covered elsewhere, how to change the query plan, schema changes and bulk loading, replication, and caching as means to improve the performance. The five chapters are
Chapter 23 – Configuration MySQL requires resources when executing a query. This chapter covers the best practices for configuring these resources and the most important configuration options that are not covered in other discussions. There is also an overview of the data lifecycle in InnoDB as background for the discussion of configuring InnoDB.
Chapter 24 – Change the Query Plan While the optimizer usually does a good job at finding the optimal query execution plan, you will from time to time have to help it on its way. It may be that you end up with full table scans because no indexes exist or the existing indexes cannot be used. You may also wish to improve the index usage, or you may need to rewrite complex conditions or entire queries. This chapter covers these scenarios as well as shows how you can use the SKIP LOCKED clause to implement a queue system.
Chapter 25 – DDL and Bulk Data Load When you perform schema changes or load large data sets into the system, you ask MySQL to perform a large amount of work. This chapter discusses how you can improve the performance of such tasks including using the parallel data load feature of MySQL Shell. There is also a section on general data load considerations which also applies to data modifications in general and shows the difference between sequential and random order inserts. That discussion is followed by considerations on what this means for the choice of primary key.
Chapter 26 – Replication The ability to replicate between instances is a popular feature in MySQL. From a performance point of view, replication has two sides: you need to ensure replication performs well, and you can use replication to improve performance. This chapter discusses both sides of the coin including covering the Performance Schema tables that can be used to monitor replication.
Chapter 27 – Caching One way to improve the performance of queries is to not execute them at all, or at least avoid executing part of the query. This chapter discusses how you can use caching tables to reduce the complexity of queries and how you can use Memcached, the MySQL InnoDB Memcached plugin, and ProxySQL to avoid executing the queries altogether.
Apress have been kind enough to invite me to write a blog in connection with my recently released book MySQL Connector/Python Revealed. I chose to write an introduction of MySQL Connector/Python including three examples illustrating the APIs and the difference between querying SQL tables and a JSON document store.
When you write programs that uses a database backend, it is necessary to use a connector/API to submit the queries and retrieve the result. If you are writing Python programs that used MySQL, you can use MySQL Connector/Python – the connector developered by Oracle Corporation.
Now there is a new book dedicated to the usage of the connector: MySQL Connector/Python Revealed, which is published by Apress. It is available in a softcover edition as well as an eBook (PDF, ePub, Mobi).
The book is divided into four parts spanning from the installation to error handling and troubleshooting. The four parts are:
Part I: Getting Ready This part consists of a single chapter that helps you to get up and running. The chapter includes an introduction to MySQL Connector/Python and getting the connector and MySQL Server installed.
Part II: The Legacy APIs The legacy APIs include the connector module that implements PEP249 (the Python Database API). The discussion of the mysql.connector module spans four chapters. In addition to query execution, the use of connection pools and the failover feature is covered. Finally, there is also a discussion about the C Extension.
Part III – The X DevAPI One of the big new features in MySQL 8 is the MySQL Document Store including the X DevAPI. It allows you to use MySQL through the NoSQL API as well as by executing SQL queries. The NoSQL API includes support both for working with MySQL as a document store where the data is stored in JSON documents and with SQL tables. Part III includes three chapters that are dedicated to the X DevAPI.
Part IV – Error Handling and Troubleshooting The final part of book goes through the two important topics of error handling and troubleshooting including several examples of how common errors and how to resolve them.
With the book comes 66 code examples that are available for download from Apress’ GitHub repository. See the book’s homepage for instructions.
MySQL Connector/Python is available from several sources including online bookshops. The following table shows some of the places, where you can buy the book. (The table if current as of 13 August 2018; changes to the available formats may happen in the future.)
I have had the opportunity to write a blog for Apress with a brief introduction to MySQL NDB Cluster. The blog gives a brief overview of the history and why you should consider it. The architecture is described before some key characteristics are discussed.
It is with great pleasure, I can announce that a new book dedicated to MySQL NDB Cluster has just been released. The book Pro MySQL NDB Cluster is written by my colleague Mikiya Okuno and myself and is a nearly 700 pages deep dive into the world of MySQL NDB Cluster. The book is published by Apress.
Tip: There are several ways to cluster MySQL. This book is about the product MySQL Cluster (often called MySQL NDB Cluster to clarify which cluster it is). There is also MySQL InnoDB Cluster, clustering using replication, and clustering through operating or hardware features. Pro MySQL NDB Cluster is only about the former.
We very much hope you will enjoy the book. Feedback and questions are most welcome, for example on Twitter (@nippondanji and @JWKrogh).
Note: At the time of writing, only the eBook is available for purchase. A softcover version will follow as soon as it has been possible to print it; this can also be pre-ordered now. – Update: The softcover version of the book is now also available.
The book is divided into five parts and 20 chapters.
Part I – The Basics
The first part provides some background information on the various parts in MySQL NDB Cluster and how it works. The chapters are:
Chapter 1: Architecture and Core Concepts
Chapter 2: The Data Nodes
Part II – Installation and Configuration
The second part focuses on the installation and configuration related topics, including replication between clusters. The chapter are:
Chapter 3: System Planning
Chapter 4: Configuration
Chapter 5: Installation
Chapter 6: Replication
Part III – Daily Tasks and Maintenance
In the third part, the topics include tasks that is part of the daily routine as a database administrator plus a tutorial where the tasks discussed in parts II and III are handled through MySQL Cluster Manager. The chapters are:
Chapter 7: The NDB Management Client and Other NDB Utilities
Chapter 8: Backups and Restores
Chapter 9: Table Maintenance
Chapter 10: Restarts
Chapter 11: Upgrades and Downgrades
Chapter 12: Security Considerations
Chapter 13: MySQL Cluster Manager
Chapter IV – Monitoring and Troubleshooting
The fourth part continues with two topics that are also part of the daily routine: monitoring and troubleshooting. The chapters are:
Chapter 14: Monitoring Solutions and the Operating System
Chapter 15: Sources for Monitoring Data
Chapter 16: Monitoring MySQL NDB Cluster
Chapter 17: Typical Troubles and Solutions
Chapter V – Development and Performance Tuning
The final part covers topics that are related to development and getting the tuning the cluster with respect to performance. The chapters are:
Chapter 18: Developing Applications Using SQL with MySQL NDB Cluster
Chapter 19: MySQL NDB Cluster as a NoSQL Database
Chapter 20: MySQL NDB Cluster and Application Performance Tuning