Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

MySQL Troubleshooting

Case and Methodology
orczhou@AskHelloDBA
一般过程
案例
  • 为什么我们的dump很慢?
  • 为什么应用程序抛错?
  • 为什么我们的dump很慢
    量化问题
     host         avg  max  min
    rd007002.idc 687 2064.1 164
    rd007003.idc 162 268.70 146
    rd007004.idc 353 948.52 143
    rd007005.idc 440 1314.9 146
    rd007006.idc 392 1137.5 143
    rd007007.idc 559 1728.9 145
    rd007008.idc 617 1858.9 151
    rd007009.idc 482 1479.8 143
    rd007010.idc 568 1782.1 147
    rd007012.idc 616 1962.9 157
    rd007013.idc 598 1937.5 149
    rd007014.idc 781 2203.1 230
    rd007015.idc 817 2602.3 144
    rd007016.idc 830 2613.7 148
    rd007017.idc 774 2484.2 144
    rd007018.idc 686 2191.8 154
     host         avg  max  min
    rd007002.idc 687 2064.1 164
    rd007003.idc 162 268.70 146
    rd007004.idc 353 948.52 143
    rd007005.idc 440 1314.9 146
    rd007006.idc 392 1137.5 143
    rd007007.idc 559 1728.9 145
    rd007008.idc 617 1858.9 151
    rd007009.idc 482 1479.8 143
    rd007010.idc 568 1782.1 147
    rd007012.idc 616 1962.9 157
    rd007013.idc 598 1937.5 149
    rd007014.idc 781 2203.1 230
    rd007015.idc 817 2602.3 144
    rd007016.idc 830 2613.7 148
    rd007017.idc 774 2484.2 144
    rd007018.idc 686 2191.8 154
    猜测原因
    部署监控 尝试重现
    过程分析
    过程分析tcpdump
    • ACK非常快
    • 发包间隔很长!!!
    重现问题
    • 低峰时间段模拟dump
    • 实用gdb/oprofile/pmp分析现场
    GDB(1)
    #0  0x00000037c340ab99 in pthread_cond_wait@@GLIBC_2.3.2
    #1  0x00002aaab307e9a5 in os_event_wait_low
    #2  0x00002aaab30cb075 in sync_array_wait_event
    #3  0x00002aaab30cbf12 in mutex_spin_wait
    #4  0x00002aaab30185c7 in buf_page_set_accessed_make_young
    #5  0x00002aaab3018679 in buf_page_optimistic_get
    #6  0x00002aaab300d3e2 in btr_pcur_restore_position_func
    #7  0x00002aaab30b4bc7 in sel_restore_position_for_mysql
    #8  0x00002aaab30b8432 in row_search_for_mysql
    #11 0x000000000061b756 in sub_select
    #12 0x000000000062e0ad in do_select
                    
    GDB(2)
    #0  0x00002aaab3020481 in buf_LRU_search_and_free_block
    #1  0x00002aaab302094e in buf_LRU_get_free_block
    #2  0x00002aaab301643e in buf_page_init_for_read
    #3  0x00002aaab30214d5 in buf_read_page_low
    #4  0x00002aaab3021ec1 in buf_read_ahead_linear
    #5  0x00002aaab3018d0e in buf_page_get_gen
    #6  0x00002aaab300c612 in btr_pcur_move_to_next_page
    #7  0x00002aaab30b8d6a in row_search_for_mysql
    #8  0x00002aaab305133c in ha_innodb::general_fetch
    #9  0x00000000006a7d97 in rr_sequential
    #10 0x000000000061b756 in sub_select
    #11 0x000000000062e0ad in do_select
    #12 0x000000000063b0d4 in JOIN::exec
                    
    oprofile
    CPU: CPU with timer interrupt, speed 0 MHz (estimated)
    Profiling through timer interrupt
    samples  %        image name               symbol name
    34043    72.3826  ha_innodb_plugin.so.0.0.0 buf_LRU_search_and_free_block
    4837     10.2845  ha_innodb_plugin.so.0.0.0 ut_delay
    2959      6.2915  ha_innodb_plugin.so.0.0.0 buf_LRU_free_block
    1283      2.7279  libpthread-2.5.so         pthread_mutex_lock
    851       1.8094  ha_innodb_plugin.so.0.0.0 buf_page_is_corrupted
    546       1.1609  ha_innodb_plugin.so.0.0.0 mutex_spin_wait
    315       0.6698  libc-2.5.so               memcpy
    239       0.5082  ha_innodb_plugin.so.0.0.0 buf_calc_page_new_checksum
    179       0.3806  ha_innodb_plugin.so.0.0.0 rec_get_offsets_func
    162       0.3444  libpthread-2.5.so         pthread_mutex_unlock
    107       0.2275  ha_innodb_plugin.so.0.0.0 row_search_for_mysql
    98        0.2084  ha_innodb_plugin.so.0.0.0 row_sel_store_mysql_rec
    原因小结
    修改策略
    有效性验证
    建议规范
    SET GLOBAL innodb_old_blocks_time = 1000;
    
    开始dump
    
    SET GLOBAL innodb_old_blocks_time = 0;
    结果
    host         avg  max  min
    rd007002.idc 687 2064.1 164
    rd007003.idc 162 268.70 146
    rd007004.idc 353 948.52 143
    rd007005.idc 440 1314.9 146
    rd007006.idc 392 1137.5 143
    rd007007.idc 559 1728.9 145
    rd007008.idc 617 1858.9 151
    rd007009.idc 482 1479.8 143
    rd007010.idc 568 1782.1 147
    rd007012.idc 616 1962.9 157
    rd007013.idc 598 1937.5 149
    rd007014.idc 781 2203.1 230
    rd007015.idc 817 2602.3 144
    rd007016.idc 830 2613.7 148
    rd007017.idc 774 2484.2 144
    rd007018.idc 686 2191.8 154
     ==> 
     host        avg max min
    rd007002.idc 206 260 158
    rd007003.idc 201 248 161
    rd007004.idc 206 262 170
    rd007005.idc 204 257 169
    rd007006.idc 204 257 166
    rd007007.idc 206 278 161
    rd007008.idc 206 250 162
    rd007009.idc 205 258 162
    rd007010.idc 208 271 163
    rd007012.idc 209 267 154
    rd007013.idc 206 280 154
    rd007014.idc 207 278 155
    rd007015.idc 203 287 154
    rd007016.idc 205 273 153
    rd007017.idc 203 276 152
    rd007018.idc 188 247 108
    工具说明
    oprofile cheatsheet
    sudo yum install binutils-devel
    ./configure --with-kernel-support && make
    make install
    sudo opcontrol --deinit
    sudo modprobe oprofile timer=1  
    $dmesg|grep oprofile|tail -n 1
      (oprofile: using timer interrupt.)
    sudo opcontrol --reset
    sudo opcontrol --separate=lib --no-vmlinux \
    --start --image=/u01/mysql/libexec/mysqld
    sudo opcontrol --dump
    sudo opcontrol --shutdown
    opreport -l /u01/mysql/libexec/mysqld
    gdb+pmp cheatsheet
    sudo sh -c 'gdb -ex "set pagination 0" –ex \
    "thread apply all bt" --batch \
    -p $(pidof mysqld) > bt.log'
    
    
    sudo sh -c 'gdb -ex "set pagination 0" -ex \
    "thread apply all bt full" --batch \
    -p $(pidof mysqld) > bt.log.full'
    
  • Poor man's profile
  • tcpdump cheatsheet
    nohup tcpdump -n -nn -tttt -i bond0 \
    -s 65535 'port 3306' -w tcpdump.ret -C 100 &
    
    流程
    案例2: 数据库是不是有波动?
    数据库是否有问题?
    确定这是一个数据库问题
  • Repsonse Time
  • $tcprstat -l $ip_addr -t $interval -n $count
    RT/us         Time
    736    2012-06-05 15:12:11
    62195  2012-06-05 15:12:19
    822385 2012-06-05 15:12:27
    758795 2012-06-05 15:12:35
    263320 2012-06-05 15:12:44
    732564 2012-06-05 15:12:52
    757521 2012-06-05 15:13:00
    677436 2012-06-05 15:13:08
    75488  2012-06-05 15:13:16
    3725   2012-06-05 15:13:24
    3042   2012-06-05 15:13:32
    2393   2012-06-05 15:13:41
    2590   2012-06-05 15:13:49
    
    Key Performance Indication
    慢日志
    pt-stalk cheatsheet
    发现的各种问题
    索引覆盖扫描
    索引覆盖扫描 vs 回表
    错误的执行计划
    当前场景,原因分析
    如何解决
    小结
    最后
    Q & A
    Powered by Alibaba/Taobao