Partial table recovery from physical backup

In previous topic, we have covered “Transportable Tablespace” concept by copying and importing table’s tablespace to remote server. See -> Copying Tablespaces to Remote Server The idea is copying tablespace file to remote server, in remote server you must create identical database names and table names manually, then you should discard new table’s tablespace file and import new copied one. To achieve this you must have running MySQL version >= 5.6, innodb_file_per_table=1 and you must know “CREATE statement” of table. Let’s to change our test condition. Assume that, you have MySQL server and you have taken physical backup of your server (you can use Percona XtraBackup, cold backup for eg.). But one of the wonderful day somebody deleted all table data (say -> delete from table_name). In fact your table at this moment exists(.frm and .ibd), you can easily discard table’s tablespace and import tablespace from backup folder. But if table is dropped and you don’t know the create of table. Or even database is dropped. Our path will differ from previous one: *1. Create dropped database manually. 2. Create dropped table by extracting table’s create statement from .frm file which is in backed up directory. To extract table create statement from .frm file you can use **mysqlfrm tool from MySQL Utilities. 3. Discard table’s tablespace (ALTER TABLE t DISCARD TABLESPACE;) 4. Copy .ibd file from backup directory to MySQL’s datadir database directory 5. Import copied back tablespace file.(ALTER TABLE t IMPORT TABLESPACE;)*** You can also read about this concept from documentation -> tablespace-copying I have automatized this process adding table create statement extracting functionality to MySQL-AutoXtraBackup project as –partial recovery option. Here is a demo usage video:

If you tested and found issues, please report it to improve this opensource project.

mysqldiskusage – source code examination

As you know there is a great toolset named “MySQL Utilities”, which you can use for solving various administrative tasks. mysqldiskusage utility is for calculating MySQL Server’s disk usage and generating informative reports. Of course this project is open source and everybody could review the source code. A few words about how mysqldiskusage calculates database disk usage will be crucial for understanding algorithm. The source tree is: mysql-utilities-1.5.4/scripts/mysqldiskusage.py If you open this Python file you will see (line 169-175) :

# We do database disk usage by default. 
try: 
    diskusage.show_database_usage(servers[0], datadir, args, options) 
except UtilError: _, e, _ = sys.exc_info() 
    print("ERROR: %s" % e.errmsg) 
    sys.exit(1) 

By default it shows database disk usage and calling another function named show_database_usage from mysql-utilities-1.5.4/mysql/utilities/command/diskusage.py file. Now if we open up this diskusage.py file and search for show_database_usage function, you should see that in turn this function uses another function named _build_db_list. From _build_db_list it gets back all necessary information as in code stated clearly(line 550-562):

# Get list of databases with sizes and formatted when necessary columns, rows     
db_total = _build_db_list(server, res, dblist, datadir, fmt == "grid", have_read, verbosity, include_empty or do_all, is_remote) 
if not quiet: 
    print "# Database totals:" print_list(sys.stdout, fmt, columns, rows, no_headers) 
if not quiet: _print_size("nTotal database disk usage = ", db_total) 
    print 

Now we know that all calculations are happened in _build_db_list function. If you search and find this function(begins from line 360) you can see that, in fact mysqldiskusage is calculating database disk usage as follows:

It finds (data_length + index_length) from information_schema.tables per database manner

Then it sum ups (data_length + index_length) with misc_files variable data which is in fact returned by _get_db_dir_size function. But what is this misc_files? Logically misc_files must be .opt and .frm files. So misc_files must not be an “.MYD” , “.MYI”, “.IBD”, “general_log”, “slow_log”.

So in fact mysqldiskusage calculates database disk usage as -> (data_length + index_length)[size in bytes] + (.opt+.frm)[size in bytes]. First of all we must insist on not using information_schema for accurate disk usage calculation because of simple rule: “InnoDB preallocates pages(16Kib) for further table usage, but data_length column will not show these pages”

As proof of concept let’s create sample empty table:

mysql> create database test; 
Query OK, 
1 row affected (0,01 sec) 
mysql> use test; 
Database changed 
mysql> 
select data_length, index_length from information_schema.tables where table_schema='test' and table_name='t1'; 
+-------------+--------------+ 
| data_length | index_length | 
+-------------+--------------+ 
| 16384 | 0 | 
+-------------+--------------+ 
1 row in set (0,00 sec) 

If we consider that our exact table size is 16384 bytes, we are in wrong direction. In fact if we use OS commands we could see that the exact size of table is 98304 bytes:

[root@node1 ~]# ls -lt /var/lib/mysql/test/ 
total 208 
-rw-rw----. 1 mysql mysql 98304 Apr 18 11:44 t1.ibd 

So when we create an InnoDB table it is preallocates 6 pages(16Kib*6 = 98304) but only 1 page shown up from data_length column. Now let’s come back to our misc_files or exactly _get_db_dir_size() function. From source code we can see that there is no a check for “.IBD” files:

... 
for item in os.listdir(folder): 
    name, ext = os.path.splitext(item) 
    if ext.upper() not in (".MYD", ".MYI") and  name.upper() not in ('SLOW_LOG', 'GENERAL_LOG'): 
        itemfolder = os.path.join(folder, item) 
    ... 

Because of this calculation is wrong for databases as we see from output:

[root@node1 ~]# mysqldiskusage --server=root_pass -vvv 
# Source on localhost: ... connected. 
# Database totals: 
+---------------------+--------------+--------------+--------------+--------------+ | db_name | db_dir_size | data_size | misc_files | total | +---------------------+--------------+--------------+--------------+--------------+ | employees | 242.523.049 | 205.979.648 | 242.523.049 | 448.502.697 | 

As you see it sum ups data_size with misc_files and gets back total as 448.502.697 bytes. But in fact our employees database is exactly 242.523.049 bytes. And of course the: Total database disk usage = 450.940.391 bytes or 430,05 MB is wrong as well. For further exploration and how to patch source code see related BUG REPORT #76703.