NoSQL/X DevAPI Tutorial with MySQL Connector/Python 8.0

The MySQL Document Store became general available (GA) with MySQL 8. One of the nice features of the MySQL Document Store is the X DevAPI that allows you to query the data from a multitude of programming languages using the same API (but while retaining the conventions of the language). The programming languages with support for the X DevAPI includes JavaScript (Node.js), PHP, Java, DotNet, and C++.

I will be using MySQL Connector/Python 8.0.12 for the example in this blog. The example is executed on Microsoft Windows with Python 3.6 installed, but it has also been tested on Oracle Linux 7 with Python 2.7. I do assume that MySQL Connector/Python has been installed. If that is not the case, you can read how to do it in the Installing Connector/Python from a Binary Distribution section in the manual or Chapter 1 of MySQL Connector/Python Revealed from Apress.

The example will go through the following steps:

Getting Ready:
1. Load the mysqlx module.
2. Create a database connection.
Setup:
1. Create a schema.
2. Create a collection.
CRUD – Create:
1. Insert some documents into the collection.
CRUD – Read:
1. Query the documents.
Cleanup:
1. Drop the schema.

You can download the complete example program here.Download

The program uses the pyuser@localhost user. The connection parameters can be changed as described in the “Getting Ready” section. A user that fulfills the requirement to the example program can be created using the following SQL statements:

mysql> CREATE USER pyuser@localhost IDENTIFIED BY 'Py@pp4Demo';
mysql> GRANT CREATE, INSERT, SELECT, DROP
             ON my_collections.* TO pyuser@localhost;

Warning

This program is not an example of using best practices. Do not store the password and preferably also the other connection options in the source code. There is also very limited handling of errors and warnings in order to keep the example simple. You should not skip those steps in a production program.

Getting Ready

The first thing is to get ready by importing MySQL Connector/Python’s mysqlx module and connect to MySQL. This is simple to do as shown in the below code snippet (the line numbers refer to the full example):

import mysqlx

connect_args = {
    'host': '127.0.0.1',
    'port': 33060,
    'user': 'pyuser',
    'password': 'Py@pp4Demo',
};

# OK is not used but would correspond to a value of 0.
# 1 is Info and 2 is warning. Errors cause an exception.
warning_levels = ("OK", "Info", "Warning")

# Create the database connection
db = mysqlx.get_session(**connect_args)

The mysqlx module in imported in line 38. This is where the MySQL Connector/Python implementation of the X DevAPI resides. The module includes support for CRUD statements both for documents and SQL tables, schema and collection manipulations, as well as executing SQL statements. In this example, only the CRUD implementation for documents and the schema and collection manipulation will be used.

The warning_levels variable is uses to convert numeric warning levels returns by the X DevAPI to names. There is an example of how to handle warnings after the first document has been added.

Finally, the connection is created in line 52 using the get_session() method in the mysqlx module. With a connection object in place, let’s move on to set up the schema and collection.

Setup

The X DevAPI has support for creating and dropping schemas and collections (but currently not SQL tables). This is used in the example to set up the my_collections schema with a single collection called my_docs:

# Create the my_collections schema
schema = db.create_schema("my_collections")

# Create the my_docs collection in the my_collections schema
my_docs = schema.create_collection("my_docs")

The create_schema() method on the database (session) object is used to create the schema. It will succeed even if the schema already exists. In that case the existing schema will be returned.

The collection is similarly created with the create_collection() method from the schema object. This will by default fail if the collection already exists. It can be overwritten with the reuse argument (the second argument).

That is it. A collection is always defined internally in the same way when it is created. You can add indexes – I hope to get back to that in a future blog – but there is no need to think of columns and data types at this point. That is the advantage of using a schemaless database (but also one of the dangers – now the whole responsibility of staying consistent is up to you as a developer).

Let’s continue to add some documents.

CRUD – Create

For this example, three documents will be added to the my_docs collection. The documents contain information about three persons including their name, birthday, and hobbies. The documents can be defined as Python dictionaries with JSON arrays represented as Python lists:

# Define three documents to insert
adam = {
    "First_name": "Adam",
    "Surname": "Smith",
    "Birthday": "1970-10-31",
    "Hobbies": [
        "Programming",
        "Databases",
        "Hiking"
    ]
}

kate = {
    "First_name": "Kate",
    "Surname": "Lee",
    "Birthday": "1982-08-09",
    "Hobbies": [
        "Programming",
        "Photography",
        "Running"
    ]
}

jane = {
    "First_name": "Jane",
    "Surname": "Walker",
    "Birthday": "1977-02-23",
    "Hobbies": [
        "Databases",
        "Hiking",
        "Photography"
    ]
}

This is the beauty of working with JSON documents in Python. They just work.

The birthdays are written in the ISO 8601 format (the same as MySQL’s date data type uses – but not datetime!). As the MySQL Document Store is schemaless, you are free to chose whatever format you feel like, however, it is strongly recommended to use a standard format. This YYYY-mm-dd format has the advantage that it will sort correctly, so alone for that reason, it is a strong candidate.

The documents will be inserted in two rounds. First Adam will be added, then Kate and Jane.

Adding a Single Document

There are a few ways to add documents (all working in the same basic way). This example will show two of them. First let’s look at how Adam is added:

# Insert the document for Adam
# This is an example of chaining the actions
db.start_transaction()
result = my_docs.add(adam).execute()

if (result.get_warnings_count() > 0):
    print("{0} warnings occurred!".format(result.get_warnings_count()))
    print("The warnings are:\n")
    for warning in result.get_warnings():
        level = warning_levels[warning["level"]]
        print("   * Level: {0} - Errno: {1} - Message: {2}".format(
            level, warning["code"], warning["msg"]))
    print("")
    print("Rolling the transaction back and existing.")
    db.rollback()
    exit()

# No errors or warnings, so the transaction can be committed
db.commit()

print("Adam: Number of documents added: {0}".format(
    result.get_affected_items_count()))
print("Document ID for Adam: {0}".format(result.get_generated_ids()))

The document is added inside a transaction. The X DevAPI connection inherits the value of autocommit from the server-side (defaults to ON), so to be sure the create action can be tested for warnings before committing, an explicit transaction is used. (Errors cause an exception, so since that is not handled here, it would cause an automatic rollback.)

The document is added using a chained statement. When you build an X DevAPI statement, you can choose between calling the method one by one or chaining them together as it is done in this case. Or you can choose a combination with some parts chained and some not. When the documents for Kate and Jane are added, it will be done without chaining.

The statement is submitted to the database using the execute() method. If you are used to executing Python statements in MySQL Shell, you may not be familiar with execute() as MySQL Shell allows you to skip it for interactive statements where the result is not assigned to a variable. The result is stored in the result variable which will be used to examine whether any warnings were triggered by the statement.

Tip

In MySQL Connector/Python, you must always call execute() to execute an X DevAPI statement.

It is best practice to verify whether queries cause any warnings. A warning will still allow the statement to execute, but it is in general a sign that not everything is as it should be. So, take warnings seriously. The earlier you include tests for warnings, the easier it is to handle them.

In line 99, the get_warnings_count() method of the result object is used to check if any warnings occurred. If so, the number of warnings is printed and each warning if retrieved using the get_warnings() method. A warning is a dictionary with three elements:

level: 1 for note and 2 for warning. This is what the warning_levels variable was created for at the start of the example.
code: The MySQL error number. The mysqlx.errorcode module contains string symbols for all the error numbers. This can be useful in order to check whether it is an expected error number that can be ignored.
msg: A string message describing the problem.

In this case, if any warnings occur, the transaction is rolled back, and the script exists.

Tip

Include handling of warnings from the beginning of coding your program. Handling warnings from the get go makes it much easier to handle them. They are usually a sign of something not working as expected and it is important that you know exactly why the warnings occur. All warnings include an error code that you can check against to verify whether it is an expected warning. If some warning is expected and you are confident, it is acceptable to ignore it.

If no error occurs, some information from the result is printed. An example output looks like (the ID will be different):

Adam: Number of documents added: 1
Document ID for Adam: ['00005b9634e3000000000000001c']

As expected one document has been added. The number of documents is printed using the get_affected_items_count() method. More interesting is the document ID. As the document did not include an element named _id, MySQL added one automatically and assigned a value to it. I will not go into how the ID is generated here, but just note that it includes three parts that together ensure global uniqueness even if you use multiple clients against multiple MySQL Server instances. At the same time, the IDs are still being generated in a way that is optimal for the InnoDB storage engine that is used for the underlying storage of the documents. The IDs are returned as a list; in this case there is only one element in the list, but if more than one document is inserted without an _id value, then there will be one generated ID per document.

The final step is to commit the transaction, so the document is persisted in the collection.

Adding Multiple Documents

When you want to add multiple documents using a single CRUD statement, you can essentially do it in two ways. You can add all of the documents in one go in the initial add() call similar to what was done for a single document with Adam. This can for example be done by having the documents in a tuple or list.

The other way, which will be used here, is to repeatably call add() to add the documents. Let’s see how that works:

db.start_transaction()
stmt_add = my_docs.add()
stmt_add.add(kate)
stmt_add.add(jane)
result = stmt_add.execute()
db.commit()
print("Kate and Jane: Number of documents added: {0}".format(
    result.get_affected_items_count()))
print("Document IDs: {0}".format(result.get_generated_ids()))

To keep the example from getting too long, the check for warnings have been removed, and the example will just focus on adding the documents.

After the transaction has been started, the statement object is created by calling add() on the collection object. In this case, no arguments are given, so at that point in time, the statement will not insert any documents.

Then the two documents are added one by one by calling add() on the statement object, first with the kate document, then with the jane document. An advantage of this approach is that if you for example generate the documents inside a loop, then you can add them as they are ready.

When both documents have been added, the execute() method is called to submit the documents to the database and the transaction is committed. Again, some information from the result is printed (the IDs will be different):

Kate and Jane: Number of documents added: 2
Document IDs: ['00005b9634e3000000000000001d', '00005b9634e3000000000000001e']

So, two documents are inserted (again as expected) and two IDs are generated.

The way that the add statement was used to insert the two documents is an example of the opposite of chaining. Here, one action at a time is performed and the result is stored in the stmt_add variable.

Now that there are some documents to work with, it is time to query them.

CRUD – Read

When you want to query documents in a collation, you use the find() method of the collection object. The resulting find statement support all of the usual refinements such as filtering, sorting, grouping, etc. In this example, three queries will be executed. The first will find the total number of documents in the collection. The second, will find the persons born on 9 August 1982. The third will find the persons who has hiking as a hobby.

Total Number of Documents

The X DevAPI makes it easy to determine the number of documents in the document – the count() method of the collection will return the value as an integer. In practice the count() method goes through the same steps as you will see in the two subsequent queries, but they are hidden inside the implementation. The code snippet is:

print("The total number of documents in the collection: {0}".format(
    my_docs.count()))

It cannot get much easier than that. The output is:

The total number of documents in the collection: 3

Let’s move on and see some of the steps that were hidden in the first query.

Finding Documents Based on Simple Comparison

The persons (in this case just one person) born on 9 August 1982 can be found by creating a find statement and adding a simple filter. The example code is:

# Find the person born on 9 August 1982
print("")
stmt_find = my_docs.find("Birthday = :birthday")
stmt_find.fields("First_name", "Surname")
stmt_find.bind("birthday","1982-08-09")
result = stmt_find.execute()
person = result.fetch_one()
print("Person born on 9 August 1982: {First_name} {Surname}".format(**person))

The filter clause is added in the call to find(). The syntax :birthday means that a parameter is used and the value will be added later. That has two advantages: it makes it easier to reuse the statement, and importantly it makes the statement safer as MySQL will ensure the value is escaped correctly – this is similar to the mysql_real_escape_string() function in the MySQL C API. The value of the parameter is given using the bind() method that has two arguments: the parameter name and value. If you use multiple parameters, call bind() once for each of them.

Otherwise the statement is simple to use. The filtering condition may seem too simple given it is a JSON document it applies to. However, Birthday in the condition is interpreted as $.Birthday (the $. part is optional) – that is the object named Birthday and is a child of the root of the document, which is just what is needed in this case. The next example includes a more complicated filter condition.

The fields to include are specified in a similar manner to the filter condition. You specify the path to the element you want to include. You can optionally rename the element using the AS keyword, for example: Surname AS Last_name. As for the condition, the $. part is optional.

The resulting row is retrieved using the fetch_one() method on the result object. This is fine here as we know there is only one resulting row. However, in a more general case you should use fetch_one() is a loop and continue until it returns None at which point all rows have been fetched.

The output is:

Person born on 9 August 1982: Kate Lee

Querying with Condition on Element in Array

A more complicated find statement is to look into the Hobbies array and see if any of the elements is Hiking. This query also matches two of the persons in the collection, so a loop is required to handle them. The code is:

stmt_find = my_docs.find("JSON_CONTAINS($.Hobbies, :hobby)")
stmt_find.fields("First_name", "Surname")
stmt_find.sort("Surname", "First_name")
stmt_find.bind("hobby", '"Hiking"')
result = stmt_find.execute()
people = result.fetch_all()
print("Number of people in the result: {0}".format(result.count))
print("The people with a hobby of hiking:")
for person in people:
    print("   {First_name} {Surname}".format(**person))

There are two main differences between this example and the previous: the filter condition and how the result documents are handled.

The filter uses the JSON_CONTAINS() function to check whether the $.Hobbies elements contains the value specified by the :hobby parameter. In the call to bind(), the parameter value is set to "Hiking". Note that Hiking must be quoted with double quotes as it is a JSON string. In this case, $. is included in the document path. However, it is still optional.

After executing the query, the resulting documents are fetched using the fetch_all() method. This will return all of the documents as a list. This makes it simpler to loop over the resulting rows, however be aware that for large result sets, it can cause a high memory usage on the application server.

Warning

Be careful with the fetch_all() method if the query can return a large result set. It will require the remaining part of the result to be stored in-memory on the application-side.

One advantage of the fetch_all() method is that it will allow you to get the total number of documents in the result using the count property of the result. The count property will show 0 until fetch_all() have completed. Once the documents have been fetched, it is possible to print the names of the persons who like to hike. The output is:

Number of people in the result: 2
The people with a hobby of hiking:
   Adam Smith
   Jane Walker

Other than a bit of cleanup, there is nothing more to do.

Cleanup

The final part of the example is to clean up. The my_collections schema is dropped so the database is left in the same state as at the start, and the connection is closed:

# Remove the schema again, so the the database is left in the same
# state as at the start. Comment out if you want to play with the
# data.
db.drop_schema("my_collections")

# Close the database connection.
db.close()

Dropping a schema is done in the same way as creating it, just that the drop_schema() method is used instead. The drop_schema() method will also work if the schema does not exist. In that case it is a null-operation.

It is important always to close the database connection. Have you ever seen the MySQL Server error log full of notes about aborted connections? If you do not explicitly close the database connection when you are done with it, one of those notes will be generated (provided the server is configured with error_log_verbosity = 3).

Additionally, not closing the connection will keep the connection alive until the program terminates. That is not a problem here, but in other cases, it may take a long time before the application shuts down. In the meantime, the connection count is higher than it needs to be, and if you happen to have an ongoing transaction (can very easily happen with autocommit = OFF), the connection may cause lock issues or slowness for the other connections.

Advice

Always close the database connection when you are done with it.

Want to Learn More?

I hope this has triggered your curiosity and you are ready to dive deeper into the world of MySQL Connector/Python, the X DevAPI, and the MySQL Document Store. If so, there are two recently released books that you may find useful.

Disclaimer

I am the author of one of these books.

One book is MySQL Connector/Python Revealed (Apress) written by me. It goes through MySQL Connector/Python both for the legacy PEP249 API (mainly the mysql.connector module) and the new X DevAPI (the mysqlx module). There are three chapters dedicated to the X DevAPI.

The other book is Introducing the MySQL 8 Document Store (Apress) written by Dr. Charles Bell (MySQL developer). This book goes through how JSON works in MySQL including information about the X DevAPI and its siblings the X Protocol and the X Plugin.

Both books are more than 500 pages and comes with code examples that will help bring you up to speed with MySQL Connector/Python and the MySQL Document Store.

2 Comments on “NoSQL/X DevAPI Tutorial with MySQL Connector/Python 8.0”

francis says:

20 January, 2021 at 03:11

Thanks for this.
Is it possible to store an attribute with the null value, and how to find it (xxx is null ?
Regards.

- Jesper Krogh says:
  
  23 January, 2021 at 17:29
  
  Excellent question. Since it is a little more complicated than for non-JSON documents, I decided to answer the question in a separate post: https://mysql.wisborg.dk/2021/01/23/the-mysql-x-devapi-working-with-null-values/