Wednesday, December 14, 2016

Namenode Handler Configuration Best Practice

HDFS has two configuration parameters, which are dfs.namenode.handler.count  and dfs.namenode.service.handler.count. The blog tries to explain what they are and what value we should set them. Setting them too low causes degraded performance at HDFS layer. Even worse, it can cause namenode or datanode in bad health status or namenode HA constantly has failover. In the namenode log, you might see message like below

INFO org.apache.hadoop.ipc.Server: IPC Server handler xx on 8022 caught an exception java.nio.channels.ClosedChannelException

Also you might see large RPC call queue length when you monitor your namenode through Cloudera Manager. RPC queue length should be 0. 

Namenode is a RPC server that requires a thread pool to handle incoming RPC calls. The number of thread in the pool is controlled by dfs.namenode.handler.count. If dfs.namenode.servicerpc-address is configured (which is also recommended), the namenode starts an extra RPC server to handle the non-client related RPC call, such as from datanodes daemon themselves. That extra RPC server's thread count is controlled by dfs.namenode.service.handler.count. In that case, threads controlled by dfs.namenode.handler.count only handle client RPC call, such as from your MapReduce jobs, HDFS cli commands, etc. 

The use of dfs.namenode.service.handler.count was not documented clearly in earlier version of HDFS releases, hence the JIRA

So what is the value we should set for dfs.namenode.handler.count or dfs.namenode.service.handler.count? 

Both dfs.namenode.service.handler.count and dfs.namenode.service.handler.count should be set to the same value, which is ln(num of datanodes) * 20. For example, if your cluster has 100 datanodes, these parameters should be set to 92. By default, they are both 10. But if you use Cloudera Manager to do the installation, Cloudera Manager should set this for you automatically based on the number of datanodes you have in your cluster. But if you add nodes to your cluster after the initial installation, this value should be increased accordingly, one thing that lots of Hadoop administrators miss after expanding their Hadoop cluster.