

allows Hadoop application access data stored in both local Spectrum Scale file system and remote Spectrum Scale file system from multiple cluster.

When DFS Client request to getblocklocation of one existing block, NameNode will select 3 DataNodes randomly for this request. Especially, when DFS Client is located on one DataNode, the current node will be selected for this request. When DFS Client request to write blocks toĥ Spectrum Scale, Name Node will select DataNode randomly for this request. DFSClient access to data through protocol RPC. RAID and other technology can be used to protect hardware failure instead of taking replication. In this mode, data are stored in SAN storage which will offer better storage efficiency than local storage. Shared Storage Mode allows big data applications to access data stored in shared storage (such SAN-based storage, ESS etc). Figure 2 illustrates the over FPO: Physical node1 Physical node2 Physical node3 Hadoop Service Hadoop Service Hadoop Service Hadoop GPFS GPFS GPFS GPFS cluster Figure 2: over Spectrum Scale FPO 2.2. In such a storage mode, short-circuit read is recommended to improve the access efficiency. DFS clients run over the storage node so it can leverage the data location for executing the task quickly. In FPO mode, data blocks are stored in chunks in IBM Spectrum Scale, and replicated to protect against disk or node failure. Local Storage mode allows big data applications to access IBM Spectrum Scale local storage mode-file Placement Optimizer (FPO) mode (since gpfs.hdfs-protocol ) and enable the support for shared storage mode (such SAN-based storage, ESS etc). Supported Spectrum Scale storage mode 2.1. Improved security management by Kerberos authentication and encryption in RPC Simplified file system monitor by Hadoop Metrics2 integration The following Figure 1 shows the framework of transparency over Spectrum Scale: Figure 1 Spectrum Scale FrameworkĤ 2. Application Client may access IBM Spectrum Scale without GPFS client installed. Advantages of transparency are as follows: Compliant APIs or shell-interface command Application client isolation from storage. IBM Spectrum Scale protocol implementation integrates both NameNode and DataNode services and responds to the request as in. All data transmission and meta data operations in are through RPC and processed by NameNode and DataNode services within. Revision History Overview IBM Spectrum Scale (aka, Protocol) offers a set of interfaces that allows applications to use Client to access IBM Spectrum Scale through RPC requests.
