MapReduce的jobstatus分析

首页 › Hbase › MapReduce的jobstatus分析

MapReduce的jobstatus分析

lidasheng 2016年5月17日 Hbase 发表评论 (0)

下面是一个完整的mapreduce job status，现通过文字来说明mapreduce消耗的资源情况及名词含义。

2016-01-07 15:37:06 INFO Job:1383 – Job job_1443106373325_140563 completed successfully

2016-01-07 15:37:06 INFO Job:1390 – Counters: 52

File System Counters

FILE: Number of bytes read=50476611819 累计读取本地磁盘的文件数据大小，map和reduce端有排序，排序时需要读写本地文件。

FILE: Number of bytes written=44851815671 累计写入本地磁盘的文件数据大小，map和reduce端有排序，排序时需要读写本地文件，还有reduce做shuffle时，需要从map端拉取数据，也存在写入本地磁盘文件的情况。

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=3285816319 整个job执行过程中，只有map端运行时，才从HDFS读取数据，这些数据不限于源文件内容，还包括所有map的split元数据

HDFS: Number of bytes written=350831425 Reduce的最终结果都会写入HDFS，就是一个job执行结果的总量。

HDFS: Number of read operations=1977

HDFS: Number of large read operations=0

HDFS: Number of write operations=600

Job Counters

Killed map tasks=3 此job失败了多少个map task

Launched map tasks=362 此job启动了多少个map task

Launched reduce tasks=300 此job启动了多少个reduce task

Other local map tasks=1

Data-local map tasks=320 Job在被调度时，如果启动了一个data-local(源文件的幅本在执行map task的taskTracker本地)

Rack-local map tasks=41

Total time spent by all maps in occupied slots (ms)=30745916

Total time spent by all reduces in occupied slots (ms)=7818313

Total time spent by all map tasks (ms)=30745916

Total time spent by all reduce tasks (ms)=7818313

Total vcore-seconds taken by all map tasks=30745916

Total vcore-seconds taken by all reduce tasks=7818313

Total megabyte-seconds taken by all map tasks=31483817984

Total megabyte-seconds taken by all reduce tasks=8005952512

Map-Reduce Framework

Map input records=123106620 所有map task从HDFS读取的文件总行数

Map output records=2708345640 map task的直接输出record是多少，就是在map方法中调用context.write的次数，也就是未经过Combine时的原生输出条数

Map output bytes=92998338307 Map的输出结果key/value都会被序列化到内存缓冲区中，所以这里的bytes指序列化后的最终字节之和

Map output materialized bytes=15102670672

Input split bytes=61748

Combine input records=0

Combine output records=0

Reduce input groups=22675333 Reduce总共读取了多少个这样的groups

Reduce shuffle bytes=15102670672 reduce往map拉取中间结果的累计数据大小，如果map产生的中间结果是压缩文件，它的值是压缩文件解压前的大小

Reduce input records=2708345640 如果有Combiner的话，那么这里的数值就等于map端Combiner运算后的最后条数，如果没有，那么就应该等于map的输出条数

Reduce output records=22675333 所有reduce执行后输出的总条目数

Spilled Records=8118445302 spill过程在map和reduce端都会发生，这里统计在总共从内存往磁盘中spill了多少条数据

Shuffled Maps =107700 每个reduce几乎都得从所有map端拉取数据，每个copy线程拉取成功一个map的数据，那么增1，所以它的总数基本等于 reduce number * map number

Failed Shuffles=0

Merged Map outputs=107700

GC time elapsed (ms)=865508

CPU time spent (ms)=46447150 每个task会读取对应进程的用户cpu时间和内核cpu时间，他们的和就是cpu时间。

Physical memory (bytes) snapshot=466476339200 这个是进程的当前物理内存使用大小。

Virtual memory (bytes) snapshot=887784427520 这个是进程的当前虚拟内存使用大小。

Total committed heap usage (bytes)=528945250304 每个task的jvm调用Runtime.getRuntime().totalMemory()获取jvm的当前堆大小。

Shuffle Errors

BAD_ID=0 如果reduce的copy线程抓取过来的元数据中这个ID不是标准格式，那么此Counter增加

CONNECTION=0 表示copy线程建立到map端的连接有误

IO_ERROR=0 Reduce的copy线程如果在抓取map端数据时出现IOException，那么这个值相应增加

WRONG_LENGTH=0 map端的那个中间结果是有压缩好的有格式数据，所有它有两个length信息：源数据大小与压缩后数据大小。如果这两个length信息传输的有误(负值)，那么此Counter增加

WRONG_MAP=0 如果当前抓取的map数据不是copy线程之前定义好的map，那么就表示把数据拉错了

WRONG_REDUCE=0 如果抓取的数据表示它不是为此reduce而准备的，那还是拉错数据了

File Input Format Counters

Bytes Read=3285754571 Map task的所有输入数据(字节)，等于各个map task的map方法传入的所有value值字节之和。

File Output Format Counters

Bytes Written=350831425

2016-01-07 15:37:06 INFO BsnComHiveDaoImpl:989 – 结束运行mapreduce作业

好开发

被诅咒的程序猿