Discover the world with our lifehacks

What is HDFS API?

What is HDFS API?

This is a specification of the Hadoop FileSystem APIs, which models the contents of a filesystem as a set of paths that are either directories, symbolic links, or files.


WEBHDFS is a REST API that supports HTTP operations like GET POST, PUT, and DELETE. It allows client applications to access HDFS data and execute HDFS operations via HTTP or HTTPs.

What is WebHDFS in Hadoop?

WebHDFS provides web services access to data stored in HDFS. At the same time, it retains the security the native Hadoop protocol offers and uses parallelism, for better throughput. To enable WebHDFS (REST API) in the name node and data nodes, you must set the value of dfs. webhdfs.

What are some WebHDFS REST API related parameters in HDFS?


  • Get Content Summary of a Directory.
  • Get File Checksum.
  • Get Home Directory.
  • Set Permission.
  • Set Owner.
  • Set Replication Factor.
  • Set Access or Modification Time.

What is HDFS client?

The basic filesystem client hdfs dfs is used to connect to a Hadoop Filesystem and perform basic file related tasks. It uses the ClientProtocol to communicate with a NameNode daemon, and connects directly to DataNodes to read/write block data.

How does a client read a file from HDFS?

HDFS read operation

  1. The Client interacts with HDFS NameNode. As the NameNode stores the block’s metadata for the file “File.
  2. The client interacts with HDFS DataNode. After receiving the addresses of the DataNodes, the client directly interacts with the DataNodes.

What is Hdfs HttpFS?

HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the webhdfs REST HTTP API.

What is Knox Gateway?

The Apache Knox Gateway (“Knox”) provides perimeter security so that the enterprise can confidently extend Hadoop access to more of those new users while also maintaining compliance with enterprise security policies. Knox also simplifies Hadoop security for users who access the cluster data and execute jobs.

What is API data?

API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other. Each time you use an app like Facebook, send an instant message, or check the weather on your phone, you’re using an API.

What is HDFS and how it works?

HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories.

What is the difference between Hadoop and HDFS?

Conclusion. The main difference between Hadoop and HDFS is that the Hadoop is an open source framework that helps to store, process and analyze a large volume of data while the HDFS is the distributed file system of Hadoop that provides high throughput access to application data. In brief, HDFS is a module in Hadoop.

How do I access my HDFS data?

Access the HDFS using its web UI. Open your Browser and type localhost:50070 You can see the web UI of HDFS move to utilities tab which is on the right side and click on Browse the File system, you can see the list of files which are in your HDFS.