MySQL Optimizer Tracer usage case with count(*)

What is Optimizer Trace? After reading topic about Optimizer Tracer by [Morgan Tocker][1] decided to test it. From [Optimizer Trace and EXPLAIN FORMAT=JSON in 5.7][2]: Optimizer trace is a new diagnostic tool introduced in MySQL 5.6 to show how the optimizer is working internally. It is similar to EXPLAIN, with a few notable differences: It doesn’t just show the intended execution plan, it shows the alternative choices. You enable the optimizer trace, then you run the actual query. It is far more verbose in its output. For understanding goal of article please read previous one about related verified optimizer BUG: [Playing with count() optimizer work][3] ** We have 2 queries: **select count() from sales; select count(*) from sales where sales_id > 0; Firstly let’s get explain plan for query with JSON format and as regular:

       -- JSON 
          mysql> explain format=json select count(*) from sales; |
          { "query_block": 
          { "select_id": 1, 
          "table": { "table_name": "sales", 
                     "access_type":  "index", 
                     "key": "sales_cust_idx", 
                     "used_key_parts": [ "CUSTOMER_ID" ] 
                    /*  used_key_parts */, 
                    "key_length": "4", 
                    "rows": 2489938, 
                    "filtered": 100,
                     "using_index": true }  
            /* table */ } 
           /* query_block */ } 

        mysql> explain select count(*) from salesG 
        *************************** 1. row *************************** 
        id: 1 
        select_type: SIMPLE 
        table: sales 
        type: index 
        possible_keys: NULL 
        key: sales_cust_idx 
        key_len: 4 
        ref: NULL 
        rows: 2489938 
        Extra: Using index 1 row in set (0.00 sec) 

Second query:

       -- JSON  
       mysql> explain format=json select count(*) from sales where sales_id > 0G         *************************** 1. row *************************** 
    EXPLAIN: 
    { "query_block": 
    { "select_id": 1, 
    "table": 
    { "table_name": "sales", 
      "access_type": "range",
      "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", 
      "used_key_parts": [ "SALES_ID" ], "key_length": "4", 
      "rows": 1244969, 
      "filtered": 100, 
      "using_index": true, 
     "attached_condition": "(`sales`.`sales`.`SALES_ID` > 0)" } } } 


     mysql> explain select count(*) from sales where sales_id > 0G 
     *************************** 1. row *************************** 
     id: 1 
     select_type: SIMPLE 
     table: sales 
     type: range 
     possible_keys: PRIMARY 
     key: PRIMARY 
     key_len: 4 
     ref: NULL 
     rows: 1244969 
     Extra: Using where; Using index 1 row in set (0.00 sec)  

From Explain plan it is obvious that, first query will use “Index Scan”, second will use “Range Scan + Index”. First query will use, sales_cust_idx in customer_id column, second query will use primary key in sales_id column. From first view, there now difference between queries, but optimizer estimates half of rows when attaching sales_id > 0 condition. See related BUG: [#68814][4] Now let’s examine problem with Optimizer Tracer. So before running query you should enable optimizer trace:

      SET OPTIMIZER_TRACE="enabled=on",END_MARKERS_IN_JSON=on;
      SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000; 

After run first query:

      mysql> select count(*) from sales; 
      +----------+ 
      | count(*) |
      +----------+
      | 2500003 |
      +----------+ 
     1 row in set (0.58 sec) 

Query to OPTIMIZER_TRACE table from information_schema:

     mysql> select query, trace from INFORMATION_SCHEMA.OPTIMIZER_TRACE;
     select count(*) from sales | 
    { "steps": 
   [ { "join_preparation": 
    { "select#": 1, 
    "steps": [ { "expanded_query": "/* select#1 */ select count(0) AS `count(*)` from `sales`" } ] /* steps */ }
    /* join_preparation */ },
    { "join_optimization": 
       { "select#": 1, 
         "steps": [ { "table_dependencies": [ { "table": "`sales`", "row_may_be_null": false, "map_bit": 0, "depends_on_map_bits": [ ] 
    /* depends_on_map_bits */ } ] 
    /* table_dependencies */ }, 
      { "rows_estimation": 
           [ { "table": "`sales`", "table_scan": { "rows": 2489938, "cost": 10347 }
          /* table_scan */ } ] 
          /* rows_estimation */ }, 
           { "considered_execution_plans": [ { "plan_prefix": [ ] 
           /* plan_prefix */,
           "table": "`sales`", 
           "best_access_path": { "considered_access_paths": [ { "access_type": "scan", "rows": 2.49e6, "cost": 508335, "chosen": true } ] 
          /* considered_access_paths */ } 
          /* best_access_path */, 
         "cost_for_plan": 508335,
         "rows_for_plan": 2.49e6,
         "chosen": true } ]
         /* considered_execution_plans */ },
        { "attaching_conditions_to_tables": { "original_condition": null,  "attached_conditions_computation": [ ] 
        /* attached_conditions_computation */,
       "attached_conditions_summary": [ { "table": "`sales`", "attached": null } ] 
        /* attached_conditions_summary */ } 
        /* attaching_conditions_to_tables */ },
       { "refine_plan": [ { "table": "`sales`", "access_type": "index_scan" } ] 
        /* refine_plan */ } ] 
        /* steps */ } 
        /* join_optimization */ },
        { "join_execution": { "select#": 1, "steps": [ ] 
        /* steps */ } 
        /* join_execution */ } ] 
        /* steps */ 

Interesting part for query 1 is -> “cost_for_plan”: 508335, “rows_for_plan”: 2.49e6, “chosen”: true Cost is 508335, rows for plan is 2.49e6 = 2490000 rows, is roughly equal to explain plan estimation.
Now second query:

     mysql> select count(*) from sales where sales_id > 0;
     +----------+ 
     | count(*) |
     +----------+ 
     | 2500003 |
     +----------+ 1 row in set (1.18 sec) 

Query to OPTIMIZER_TRACE:

     mysql> select query, trace from INFORMATION_SCHEMA.OPTIMIZER_TRACE; 
     select count(*) from sales where sales_id > 0 | 
     { "steps": [ { "join_preparation": 
        { "select#": 1, 
          "steps": [ { "expanded_query": "/* select#1 */ select count(0) AS `count(*)` from `sales` where (`sales`.`SALES_ID` > 0)" } ] 
        /* steps */ } 
        /* join_preparation */ },
       { "join_optimization": 
          { "select#": 1, 
            "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "(`sales`.`SALES_ID` > 0)", "steps": [ { "transformation": "equality_propagation", 
           "resulting_condition": "(`sales`.`SALES_ID` > 0)" }, 
          { "transformation": "constant_propagation", "resulting_condition": "(`sales`.`SALES_ID` > 0)" }, 
          { "transformation": "trivial_condition_removal", "resulting_condition": "(`sales`.`SALES_ID` > 0)" } ] 
         /* steps */ } 
         /* condition_processing */ }, 
        { "table_dependencies": [ { "table": "`sales`", "row_may_be_null": false, "map_bit": 0, "depends_on_map_bits": [ ] 
        /* depends_on_map_bits */ } ] 
        /* table_dependencies */ }, 
       { "ref_optimizer_key_uses": [ ] 
        /* ref_optimizer_key_uses */ },
       { "rows_estimation": [ { "table": "`sales`", "range_analysis": { "table_scan": {  "rows": 2489938, "cost": 508337 } 
       /* table_scan */, 
       "potential_range_indices": [ { "index": "PRIMARY", "usable": true, "key_parts": [ "SALES_ID" ] 
      /* key_parts */ }, 
      { "index": "sales_cust_idx", "usable": false, "cause": "not_applicable" } ] 
      /* potential_range_indices */,
     "best_covering_index_scan": { "index": "sales_cust_idx", "cost": 500418, "chosen": true } 
      /* best_covering_index_scan */,
      "setup_range_conditions": [ ] 
      /* setup_range_conditions */, 
      "group_index_range": { "chosen": false, "cause": "not_group_by_or_distinct" } 
      /* group_index_range */, 
      "analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "PRIMARY", "ranges": [ "0 < SALES_ID" ] 
     /* ranges */, 
     "index_dives_for_eq_ranges": true, 
     "rowid_ordered": true, 
     "using_mrr": false, "index_only": true, "rows": 1244969, "cost": 251364, "chosen": true } ] 
    /* range_scan_alternatives */,
   "analyzing_roworder_intersect": { "usable": false, "cause": "too_few_roworder_scans" }  /* analyzing_roworder_intersect */ } 
    /* analyzing_range_alternatives */,
   "chosen_range_access_summary": { "range_access_plan": { "type": "range_scan", "index": "PRIMARY", "rows": 1244969, "ranges": [ "0 < SALES_ID" ] 
   /* ranges */ } 
   /* range_access_plan */, 
   "rows_for_plan": 1244969, "cost_for_plan": 251364, "chosen": true } 
   /* chosen_range_access_summary */ } 
   /* range_analysis */ } ] 
   /* rows_estimation */ },
    { "considered_execution_plans": [ { "plan_prefix": [ ] /* plan_prefix */, "table": "`sales`", "best_access_path": { "considered_access_paths": [ { "access_type": "range", "rows": 1.24e6, "cost": 500357, "chosen": true } ] 
    /* considered_access_paths */ } 
    /* best_access_path */, 
    "cost_for_plan": 500357, "rows_for_plan": 1.24e6, "chosen": true } ] 
    /* considered_execution_plans */ },
    { "attaching_conditions_to_tables": { "original_condition": "(`sales`.`SALES_ID` > 0)", "attached_conditions_computation": [ ] 
   /* attached_conditions_computation */,
   "attached_conditions_summary": [ { "table": "`sales`", "attached": "(`sales`.`SALES_ID` > 0)" } ] 
   /* attached_conditions_summary */ } 
   /* attaching_conditions_to_tables */ }, 
   { "refine_plan": [ { "table": "`sales`", "access_type": "range" } ] /* refine_plan */ } ] 
   /* steps */ } 
   /* join_optimization */ }, 
  { "join_execution": { "select#": 1, "steps": [ ] /* steps */ } /* join_execution */ } ] /* steps */ 

It is much more complicated with second query and due to lacking documentation for all output, i am looking for explanations from experts.
First thing is, it says in “potential_range_indices” that “index”: “sales_cust_index” is not usable:

      "potential_range_indices": 
      [ { "index": "PRIMARY", "usable": true, "key_parts": [ "SALES_ID" ] 
      /* key_parts */ },
      { "index": "sales_cust_idx", "usable": false, "cause": "not_applicable" } ] 

But in “best_covering_index_scan”, “index”: “sales_cust_idx” is marked as “chosen”:true

       "best_covering_index_scan": { "index": "sales_cust_idx", "cost": 500418, "chosen": true } 

Second thing is, in “range_scan_alternatives” and,

        "analyzing_range_alternatives": 
        { "range_scan_alternatives": [ { "index": "PRIMARY", "ranges": [ "0 < SALES_ID" ]  
        /* ranges */, 
       "index_dives_for_eq_ranges": true, "rowid_ordered": true, "using_mrr": false, "index_only": true, "rows": 1244969, "cost": 251364, "chosen": true } 

in “chosen_range_access_summary”, “rows_for_plan” is 1244969 and “cost_for_plan” is 251364

       "chosen_range_access_summary": 
       { "range_access_plan": 
        { "type": "range_scan", "index": "PRIMARY", "rows": 1244969, "ranges": [ "0 < SALES_ID" ] 
         /* ranges */ } 
         /* range_access_plan */, 
        "rows_for_plan": 1244969, "cost_for_plan": 251364, "chosen": true } 

But for final “best_access_path” “cost_for_plan” is increased to 500357 and “rows_for_plan” is 1.24e6 = 1240000:

       "best_access_path": { "considered_access_paths": [ { "access_type": "range", "rows": 1.24e6, "cost": 500357, "chosen": true } ] 
       /* considered_access_paths */ } 
       /* best_access_path */, 
       "cost_for_plan": 500357, "rows_for_plan": 1.24e6, "chosen": true }

Third thing is that, sales_id > 0 is rewritten to 0 < sales_id

      "ranges": [ "0 < SALES_ID" ] 

*** After explanations from community this article will be updated ***

http://www.tocker.ca/
http://www.tocker.ca/2015/05/25/optimizer-trace-and-explain-formatjson-in-5-7.html
https://mysql.az/playing-with-count-optimizer-work/
http://bugs.mysql.com/bug.php?id=68814