Getting Started With Hadoop DFS Operations

When beginning with Hadoop, it takes a while to get used to having a local filesystem and the Hadoop filesystem.  Plus, when testing out Hadoop on a local development machine, Hadoop can be configured in "local" mode or "pseudo distributed" mode with each of those using a different filesystem - local uses the OS filesystem while pseudo use an HDFS filesystem. In local mode, the following command execute successfully and will give a directory listing of the current OS directory.

hadoop dfs -ls .

That same command, when run in pseudo mode gives a file not found error.  This is because pseudo mode is using the HDFS fileystem and "." is not a valid HDFS file path.  Instead, pseudo mode needs a valid HDFS file path such as "/" or "/myData/myDir" which will operate against the /myData/myDir directory in HDFS.

hadoop dfs -ls /


After a few days of playing around with Hadoop commands, this becomes second-nature, but in the beginning it's a bit tricky to remember which style of command to use.