原文連接:http://blog.csdn.net/sprintfwater/article/details/8996214node
1.創建、關閉與HDFS鏈接:hdfsConnect()、hdfsConnectAsUser()、hdfsDisconnect()。hdfsConnect()其實是直接調用hdfsConnectAsUser。express
2.打開、關閉HDFS文件:hdfsOpenFile()、hdfsCloseFile()。當用hdfsOpenFile()建立文件時,能夠指定replication和blocksize參數。寫打開一個文件時,隱含O_TRUNC標誌,文件會被截斷,寫入是從文件頭開始的。apache
3.讀HDFS文件:hdfsRead()、hdfsPread()。兩個函數都有可能返回少於用戶要求的字節數,此時能夠再次調用這兩個函數讀入剩下的部分(相似APUE中的readn實現);只有在兩個函數返回零時,咱們才能判定到了文件末尾。app
4.寫HDFS文件:hdfsWrite()。HDFS不支持隨機寫,只能是從文件頭順序寫入。less
5.查詢HDFS文件信息:hdfsGetPathInfo()函數
6.查詢和設置HDFS文件讀寫偏移量:hdfsSeek()、hdfsTell()oop
7.查詢數據塊所在節點信息:hdfsGetHosts()。返回一個或多個數據塊所在數據節點的信息,一個數據塊可能存在多個數據節點上。性能
8.libhdfs中的函數是經過jni調用JAVA虛擬機,在虛擬機中構造對應的HDFS的JAVA類,而後反射調用該類的功能函數。總會發生JVM和程序之間內存拷貝的動做,性能方面值得注意。ui
9.HDFS不支持多個客戶端同時寫入的操做,無文件或是記錄鎖的概念。this
10.建議只有超大文件才應該考慮放在HDFS上,並且最好對文件的訪問是寫一次,讀屢次。小文件不該該考慮放在HDFS上,得不償失!
1 /** 2 * Licensed to the Apache Software Foundation (ASF) under one 3 * or more contributor license agreements. See the NOTICE file 4 * distributed with this work for additional information 5 * regarding copyright ownership. The ASF licenses this file 6 * to you under the Apache License, Version 2.0 (the 7 * "License"); you may not use this file except in compliance 8 * with the License. You may obtain a copy of the License at 9 * 10 * http://www.apache.org/licenses/LICENSE-2.0 11 * 12 * Unless required by applicable law or agreed to in writing, software 13 * distributed under the License is distributed on an "AS IS" BASIS, 14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 * See the License for the specific language governing permissions and 16 * limitations under the License. 17 */ 18 19 #ifndef LIBHDFS_HDFS_H 20 #define LIBHDFS_HDFS_H 21 22 #include <sys/types.h> 23 #include <sys/stat.h> 24 25 #include <fcntl.h> 26 #include <stdio.h> 27 #include <stdint.h> 28 #include <string.h> 29 #include <stdlib.h> 30 #include <time.h> 31 #include <errno.h> 32 33 #include <jni.h> 34 35 #ifndef O_RDONLY 36 #define O_RDONLY 1 37 #endif 38 39 #ifndef O_WRONLY 40 #define O_WRONLY 2 41 #endif 42 43 #ifndef EINTERNAL 44 #define EINTERNAL 255 45 #endif 46 47 48 /** All APIs set errno to meaningful values */ 49 #ifdef __cplusplus 50 extern "C" { 51 #endif 52 53 /** 54 * Some utility decls used in libhdfs. 55 */ 56 57 typedef int32_t tSize; /// size of data for read/write io ops 58 typedef time_t tTime; /// time type 59 typedef int64_t tOffset;/// offset within the file 60 typedef uint16_t tPort; /// port 61 typedef enum tObjectKind { 62 kObjectKindFile = 'F', 63 kObjectKindDirectory = 'D', 64 } tObjectKind; 65 66 67 /** 68 * The C reflection of org.apache.org.hadoop.FileSystem . 69 */ 70 typedef void* hdfsFS; 71 72 73 /** 74 * The C equivalent of org.apache.org.hadoop.FSData(Input|Output)Stream . 75 */ 76 enum hdfsStreamType 77 { 78 UNINITIALIZED = 0, 79 INPUT = 1, 80 OUTPUT = 2, 81 }; 82 83 84 /** 85 * The 'file-handle' to a file in hdfs. 86 */ 87 struct hdfsFile_internal { 88 void* file; 89 enum hdfsStreamType type; 90 }; 91 typedef struct hdfsFile_internal* hdfsFile; 92 93 94 /** 95 * hdfsConnect - Connect to a hdfs file system. 96 * Connect to the hdfs. 97 * @param host A string containing either a host name, or an ip address 98 * of the namenode of a hdfs cluster. 'host' should be passed as NULL if 99 * you want to connect to local filesystem. 'host' should be passed as 100 * 'default' (and port as 0) to used the 'configured' filesystem 101 * (hadoop-site/hadoop-default.xml). 102 * @param port The port on which the server is listening. 103 * @return Returns a handle to the filesystem or NULL on error. 104 */ 105 hdfsFS hdfsConnect(const char* host, tPort port); 106 107 108 /** 109 * hdfsDisconnect - Disconnect from the hdfs file system. 110 * Disconnect from hdfs. 111 * @param fs The configured filesystem handle. 112 * @return Returns 0 on success, -1 on error. 113 */ 114 int hdfsDisconnect(hdfsFS fs); 115 116 117 /** 118 * hdfsOpenFile - Open a hdfs file in given mode. 119 * @param fs The configured filesystem handle. 120 * @param path The full path to the file. 121 * @param flags Either O_RDONLY or O_WRONLY, for read-only or write-only. 122 * @param bufferSize Size of buffer for read/write - pass 0 if you want 123 * to use the default configured values. 124 * @param replication Block replication - pass 0 if you want to use 125 * the default configured values. 126 * @param blocksize Size of block - pass 0 if you want to use the 127 * default configured values. 128 * @return Returns the handle to the open file or NULL on error. 129 */ 130 hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, 131 int bufferSize, short replication, tSize blocksize); 132 133 134 /** 135 * hdfsCloseFile - Close an open file. 136 * @param fs The configured filesystem handle. 137 * @param file The file handle. 138 * @return Returns 0 on success, -1 on error. 139 */ 140 int hdfsCloseFile(hdfsFS fs, hdfsFile file); 141 142 143 /** 144 * hdfsExists - Checks if a given path exsits on the filesystem 145 * @param fs The configured filesystem handle. 146 * @param path The path to look for 147 * @return Returns 0 on success, -1 on error. 148 */ 149 int hdfsExists(hdfsFS fs, const char *path); 150 151 152 /** 153 * hdfsSeek - Seek to given offset in file. 154 * This works only for files opened in read-only mode. 155 * @param fs The configured filesystem handle. 156 * @param file The file handle. 157 * @param desiredPos Offset into the file to seek into. 158 * @return Returns 0 on success, -1 on error. 159 */ 160 int hdfsSeek(hdfsFS fs, hdfsFile file, tOffset desiredPos); 161 162 163 /** 164 * hdfsTell - Get the current offset in the file, in bytes. 165 * @param fs The configured filesystem handle. 166 * @param file The file handle. 167 * @return Current offset, -1 on error. 168 */ 169 tOffset hdfsTell(hdfsFS fs, hdfsFile file); 170 171 172 /** 173 * hdfsRead - Read data from an open file. 174 * @param fs The configured filesystem handle. 175 * @param file The file handle. 176 * @param buffer The buffer to copy read bytes into. 177 * @param length The length of the buffer. 178 * @return Returns the number of bytes actually read, possibly less 179 * than than length;-1 on error. 180 */ 181 tSize hdfsRead(hdfsFS fs, hdfsFile file, void* buffer, tSize length); 182 183 184 /** 185 * hdfsPread - Positional read of data from an open file. 186 * @param fs The configured filesystem handle. 187 * @param file The file handle. 188 * @param position Position from which to read 189 * @param buffer The buffer to copy read bytes into. 190 * @param length The length of the buffer. 191 * @return Returns the number of bytes actually read, possibly less than 192 * than length;-1 on error. 193 */ 194 tSize hdfsPread(hdfsFS fs, hdfsFile file, tOffset position, 195 void* buffer, tSize length); 196 197 198 /** 199 * hdfsWrite - Write data into an open file. 200 * @param fs The configured filesystem handle. 201 * @param file The file handle. 202 * @param buffer The data. 203 * @param length The no. of bytes to write. 204 * @return Returns the number of bytes written, -1 on error. 205 */ 206 tSize hdfsWrite(hdfsFS fs, hdfsFile file, const void* buffer, 207 tSize length); 208 209 210 /** 211 * hdfsWrite - Flush the data. 212 * @param fs The configured filesystem handle. 213 * @param file The file handle. 214 * @return Returns 0 on success, -1 on error. 215 */ 216 int hdfsFlush(hdfsFS fs, hdfsFile file); 217 218 219 /** 220 * hdfsAvailable - Number of bytes that can be read from this 221 * input stream without blocking. 222 * @param fs The configured filesystem handle. 223 * @param file The file handle. 224 * @return Returns available bytes; -1 on error. 225 */ 226 int hdfsAvailable(hdfsFS fs, hdfsFile file); 227 228 229 /** 230 * hdfsCopy - Copy file from one filesystem to another. 231 * @param srcFS The handle to source filesystem. 232 * @param src The path of source file. 233 * @param dstFS The handle to destination filesystem. 234 * @param dst The path of destination file. 235 * @return Returns 0 on success, -1 on error. 236 */ 237 int hdfsCopy(hdfsFS srcFS, const char* src, hdfsFS dstFS, const char* dst); 238 239 240 /** 241 * hdfsMove - Move file from one filesystem to another. 242 * @param srcFS The handle to source filesystem. 243 * @param src The path of source file. 244 * @param dstFS The handle to destination filesystem. 245 * @param dst The path of destination file. 246 * @return Returns 0 on success, -1 on error. 247 */ 248 int hdfsMove(hdfsFS srcFS, const char* src, hdfsFS dstFS, const char* dst); 249 250 251 /** 252 * hdfsDelete - Delete file. 253 * @param fs The configured filesystem handle. 254 * @param path The path of the file. 255 * @return Returns 0 on success, -1 on error. 256 */ 257 int hdfsDelete(hdfsFS fs, const char* path); 258 259 260 /** 261 * hdfsRename - Rename file. 262 * @param fs The configured filesystem handle. 263 * @param oldPath The path of the source file. 264 * @param newPath The path of the destination file. 265 * @return Returns 0 on success, -1 on error. 266 */ 267 int hdfsRename(hdfsFS fs, const char* oldPath, const char* newPath); 268 269 270 /** 271 * hdfsGetWorkingDirectory - Get the current working directory for 272 * the given filesystem. 273 * @param fs The configured filesystem handle. 274 * @param buffer The user-buffer to copy path of cwd into. 275 * @param bufferSize The length of user-buffer. 276 * @return Returns buffer, NULL on error. 277 */ 278 char* hdfsGetWorkingDirectory(hdfsFS fs, char *buffer, size_t bufferSize); 279 280 281 /** 282 * hdfsSetWorkingDirectory - Set the working directory. All relative 283 * paths will be resolved relative to it. 284 * @param fs The configured filesystem handle. 285 * @param path The path of the new 'cwd'. 286 * @return Returns 0 on success, -1 on error. 287 */ 288 int hdfsSetWorkingDirectory(hdfsFS fs, const char* path); 289 290 291 /** 292 * hdfsCreateDirectory - Make the given file and all non-existent 293 * parents into directories. 294 * @param fs The configured filesystem handle. 295 * @param path The path of the directory. 296 * @return Returns 0 on success, -1 on error. 297 */ 298 int hdfsCreateDirectory(hdfsFS fs, const char* path); 299 300 301 /** 302 * hdfsSetReplication - Set the replication of the specified 303 * file to the supplied value 304 * @param fs The configured filesystem handle. 305 * @param path The path of the file. 306 * @return Returns 0 on success, -1 on error. 307 */ 308 int hdfsSetReplication(hdfsFS fs, const char* path, int16_t replication); 309 310 311 /** 312 * hdfsFileInfo - Information about a file/directory. 313 */ 314 typedef struct { 315 tObjectKind mKind; /* file or directory */ 316 char *mName; /* the name of the file */ 317 tTime mLastMod; /* the last modification time for the file*/ 318 tOffset mSize; /* the size of the file in bytes */ 319 short mReplication; /* the count of replicas */ 320 tOffset mBlockSize; /* the block size for the file */ 321 } hdfsFileInfo; 322 323 324 /** 325 * hdfsListDirectory - Get list of files/directories for a given 326 * directory-path. hdfsFreeFileInfo should be called to deallocate memory. 327 * @param fs The configured filesystem handle. 328 * @param path The path of the directory. 329 * @param numEntries Set to the number of files/directories in path. 330 * @return Returns a dynamically-allocated array of hdfsFileInfo 331 * objects; NULL on error. 332 */ 333 hdfsFileInfo *hdfsListDirectory(hdfsFS fs, const char* path, 334 int *numEntries); 335 336 337 /** 338 * hdfsGetPathInfo - Get information about a path as a (dynamically 339 * allocated) single hdfsFileInfo struct. hdfsFreeFileInfo should be 340 * called when the pointer is no longer needed. 341 * @param fs The configured filesystem handle. 342 * @param path The path of the file. 343 * @return Returns a dynamically-allocated hdfsFileInfo object; 344 * NULL on error. 345 */ 346 hdfsFileInfo *hdfsGetPathInfo(hdfsFS fs, const char* path); 347 348 349 /** 350 * hdfsFreeFileInfo - Free up the hdfsFileInfo array (including fields) 351 * @param hdfsFileInfo The array of dynamically-allocated hdfsFileInfo 352 * objects. 353 * @param numEntries The size of the array. 354 */ 355 void hdfsFreeFileInfo(hdfsFileInfo *hdfsFileInfo, int numEntries); 356 357 358 /** 359 * hdfsGetHosts - Get hostnames where a particular block (determined by 360 * pos & blocksize) of a file is stored. The last element in the array 361 * is NULL. Due to replication, a single block could be present on 362 * multiple hosts. 363 * @param fs The configured filesystem handle. 364 * @param path The path of the file. 365 * @param start The start of the block. 366 * @param length The length of the block. 367 * @return Returns a dynamically-allocated 2-d array of blocks-hosts; 368 * NULL on error. 369 */ 370 char*** hdfsGetHosts(hdfsFS fs, const char* path, 371 tOffset start, tOffset length); 372 373 374 /** 375 * hdfsFreeHosts - Free up the structure returned by hdfsGetHosts 376 * @param hdfsFileInfo The array of dynamically-allocated hdfsFileInfo 377 * objects. 378 * @param numEntries The size of the array. 379 */ 380 void hdfsFreeHosts(char ***blockHosts); 381 382 383 /** 384 * hdfsGetDefaultBlockSize - Get the optimum blocksize. 385 * @param fs The configured filesystem handle. 386 * @return Returns the blocksize; -1 on error. 387 */ 388 tOffset hdfsGetDefaultBlockSize(hdfsFS fs); 389 390 391 /** 392 * hdfsGetCapacity - Return the raw capacity of the filesystem. 393 * @param fs The configured filesystem handle. 394 * @return Returns the raw-capacity; -1 on error. 395 */ 396 tOffset hdfsGetCapacity(hdfsFS fs); 397 398 399 /** 400 * hdfsGetUsed - Return the total raw size of all files in the filesystem. 401 * @param fs The configured filesystem handle. 402 * @return Returns the total-size; -1 on error. 403 */ 404 tOffset hdfsGetUsed(hdfsFS fs); 405 406 #ifdef __cplusplus 407 } 408 #endif 409 410 #endif /*LIBHDFS_HDFS_H*/