reduce task启动后的第一阶段是shuffle(向map端fetch数据),每次fetch数据的时候都可能因为connect timeout,read timeout,checksum error等原因时报,因而reduce task为每个map设置了一个计数器,用以记录fetch该map输出时失败的次数,当失败次数达到一定阀值的时候。会通知MRAppMaster 从该map fetch数据时失败的次数太多了,并打印想要的log;
该阀值计算方式:
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.java
float failureRate = runningReduceTasks == 0 ? 1.0f :
(float) fetchFailures / runningReduceTasks;
// declare faulty if fetch-failures >= max-allowed-failures
boolean isMapFaulty =
(failureRate >= MAX_ALLOWED_FETCH_FAILURES_FRACTION);
if (fetchFailures >= MAX_FETCH_FAILURES_NOTIFICATIONS && isMapFaulty) {
LOG.info("Too many fetch-failures for output of task attempt: " +
mapId + " ... raising fetch failure to map");
job.eventHandler.handle(new TaskAttemptEvent(mapId,
TaskAttemptEventType.TA_TOO_MANY_FETCH_FAILURE));
job.fetchFailuresMapping.remove(mapId);
}
默认的阀值是3,
//The maximum fraction of fetch failures allowed for a map
private static final double MAX_ALLOWED_FETCH_FAILURES_FRACTION = 0.5;
// Maximum no. of fetch-failure notifications after which map task is failed
private static final int MAX_FETCH_FAILURES_NOTIFICATIONS = 3;
最终的日志信息是在
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.TooManyFetchFailureTransition类中打印出来的
private static class TooManyFetchFailureTransition implements
SingleArcTransition<TaskAttemptImpl, TaskAttemptEvent> {
@SuppressWarnings("unchecked")
@Override
public void transition(TaskAttemptImpl taskAttempt, TaskAttemptEvent event) {
//add to diagnostic
taskAttempt.addDiagnosticInfo("Too Many fetch failures.Failing the attempt");
//set the finish time
taskAttempt.setFinishTime();
if (taskAttempt.getLaunchTime() != 0) {
taskAttempt.eventHandler
.handle(createJobCounterUpdateEventTAFailed(taskAttempt));
TaskAttemptUnsuccessfulCompletionEvent tauce =
createTaskAttemptUnsuccessfulCompletionEvent(taskAttempt,
TaskAttemptState.FAILED);
taskAttempt.eventHandler.handle(new JobHistoryEvent(
taskAttempt.attemptId.getTaskId().getJobId(), tauce));
}else {
LOG.debug("Not generating HistoryFinish event since start event not " +
"generated for taskAttempt: " + taskAttempt.getID());
}
taskAttempt.eventHandler.handle(new TaskTAttemptEvent(
taskAttempt.attemptId, TaskEventType.T_ATTEMPT_FAILED));
}
}
分享到:
相关推荐
hadoop-mapreduce-examples-2.7.1.jar
【SpringBoot】Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster报错明细问题解决后记 报错明细 IDEA SpringBoot集成hadoop运行环境,,本地启动项目,GET请求接口触发...
hadoop-mapreduce-examples-2.6.5.jar 官方案例源码
基于MapReduce的气候数据分析 是一个旨在通过大数据处理技术深入理解和分析气候数据的研究课题。该课题利用MapReduce编程模型,针对包括温度、湿度、风速等在内的多种气象参数进行处理,以应对传统方法在处理大规模...
大数据分析课程设计后端大数据分析MapReduce程序和sql脚本.zip 95分以上必过项目,下载即用无需修改。 大数据分析课程设计后端大数据分析MapReduce程序和sql脚本.zip 95分以上必过项目,下载即用无需修改。大数据...
第5章 MapReduce分布式计算框架 2 5.1. MapReduce简介 2 5.2. wordcount经典案例介绍 2 5.3. MapReduce进程介绍 3 5.4. MapReduce编程规范 3 5.5. wordcount经典案例的实现 5 5.5.1. 分析数据准备 5 5.5.2. 新建...
Ch6-MapReduce算法设计.ppt.ppt
华为MapReduce服务组件操作指南,供大家学习参考。
华为MapReduce服务应用开发指南,供大家学习参考。
使用Hadoop下的MapReduce实现词频统计