 详细说明:HDFS是Hadoop分布式计算的存储基础。HDFS具有高容错性,可以部署在通用硬件设备上,适合数据密集型应用,并且提供对数据读写的高吞 吐量。HDFS能 够提供对数据的可扩展访问,通过简单地往集群里添加节点就可以解决大量客户端同时访问的问题。HDFS支持传统的层次文件组织结构,同现 有的一些文件系 统类似,如可以对文件进行创建、删除、重命名等操作。FAULT TOLERANCE IN HDFS. PART I: TYPES OF FAULTS AND THEIR DETECTION FAULT I: NODE FAILURE FAULT II: COMMUNICATION FAILURE FAULT III: DATA CORRUPTION Third is DATA CORRUPTION There are typically three kinds of faults Second is COMMUNICA TION FAILURE The first is NODE FAILURE (cannot send and receive data) Data can be corrupted whilel sending over network Goodbye cruel world i where IS everybody? i Data on disk Or corrupted while it is stored in hard disks DETECTION #1: NODE FAILURES DETECTING DATANODE FAILURE If i don t get a message We send HEARTBEAT n 10 minutes the If namenode is dead datanode is dead to me the entire cluster is dead! message every 3 seconds Namenode is the sINGle This is our way of POINT OF FAILURE saying we are alive I may be aLIVE and、 there was only a Instead, let's focus on network failure, but how datanode failures the namenode treats are detected both as same) DETECTION #2: NETWORK FAILURES DETECTION #3: CORRUPTED DATA DETECTING CORRUPTED HARD DRIVES Whenever data is sent Checksum is sent along with an ACK is replied by the reciever transmitted data Periodically, all datanodes Checksum Data send bLOCKREPORT to Data the namenode ACK If the ACK is not received ( after several Moreover when I store List of all retries), the sender assumes that the host data in hard disks blocks i have is dead, or the network has failed I also store the checksum /_Checksum RECAP: HEARTBEAT MESSAGES AND BLOCK REPORTS Before sending block report I check if checksums are ok i dont send info for We send heartbeats every We send block reports blocks that are corrupted 3 seconds to say we are alive and we skip blocks that are corrupted I have HEARTBEAT four blocks (which is how the i thought he had five namenode will knot blocks. so one BLOCK hich b| ocks are|os↑ block is corrupted REPORT FAULT TOLERANCE IN HDFS. PART II: HANDLING READING AND WRITING FAILURES HANDLING WRITE FAILURES Moreover, each datanode replies back an ACk for each packet to So, if I dont get ACKs from some One thing i should have said earlier datanode i know it is dead I write the block in smaller data confirm that they got it I adjust the pipeline to skip him units(usually 64KB)called"packets ACK Packet Remember replication pipleline? HANDLING READ FAILURES Heres the adjusted pipeline If one datanode is dead Note that the block will be Remember when i asked for I read from the others in the list under replicated", but the namenode location of a block, the will take care of that later on namenode gave me locations of all datanodes Got Data? No Got data DN 1. DN2, DN 3 FAULT TOLERANCE IN HDFS. PART III: HANDLING DATANODE FAILURES UNDER REPLICATED BLOCKS First--I must tell you pout the two tables I keep I continuously update these two tabl If i find a block on a datanode I scan the first list (list List of block Block 1-stored at DN1, DN2, DN3 is corrupted, I update first table of blocks) periodically, and see if there are blocks that Block 2- stored at Dn1 Dn4 DN5 (by removing bad DN from block's list) are not replicated properly List of datanodes Datanode 1- has block 1.2 And if i find that a datanode Datanode 2- has block 1.5 has died, I update both tables These are called"under replicated"blocks For all under-replicated blocks Could you copy the Umm. one more question: All of block from that datanode this works if there is atleast one valid I ask other datanodes to copy them from datanodes that copy of the block somewhere. right? have the replica Hey, I need te copy a block from you like so That's correct. hdfs cannot guarantee that atleast one replica will always survive But it tries it best by smartly selecting replica locations Here you go as we will see next REPLICA PLACEMENT STRATEGY RACKS AND DATANODES SELECTING FIRST REPLICA LOCATION Remember i promised to tell you how I select datanode The cluster is divided into racks Each rack has multiple datanodes First replica location is simple locations for storing tl replicas of a block? If the writer is a member of cluster 彩影翻 it is selected as first replica Hang tight. here it goes Otherwise some random datanode is selected Rack 1 Rack 2 Rack 3 NEXT TWO REPLICA LOCATIONS SUBSEQUENT REPLICA LOCATIONS Pick a different rack than first replicas Pick any random datanode Please note the fine print: sometimes Select two different datanode on that rack if it satisfies these two conditions those two conditions cannot be satisfied in which case they are. ahem . ignored first replica next two replicas Only one replica per datanode (convenient eh?) Max two Also, HDFS allows you use your replicas own placement algorithm So if you know a better pe er rac algorithm, don t be shy now WHERE TO GO FROM HERE? i do a lot of other things We do more than store data as well. read more Or best of all We can run"Map-Reduce" jobs about me at websites and books. I install and run HDFS Read about map reduce and see for yourself! in our next comics THE END



