由於有時候一次性將數據加載到內存中有可能太大,因此咱們能夠選擇將數據轉換成標準格式recordio文件並讀取供咱們的網絡利用,接下來記錄一下如何保存數據爲recordio,並讀取。html
官網給出了例子:python
import paddle.fluid as fluid import numpy def reader_creator(): def __impl__(): for i in range(1000): yield [ numpy.random.random(size=[3,224,224], dtype="float32"), numpy.random.random(size=[1], dtype="int64") ] return __impl__ img = fluid.layers.data(name="image", shape=[3, 224, 224]) label = fluid.layers.data(name="label", shape=[1], dtype="int64") feeder = fluid.DataFeeder(feed_list=[img, label], place=fluid.CPUPlace()) BATCH_SIZE = 32 reader = paddle.batch(reader_creator(), batch_size=BATCH_SIZE) fluid.recordio_writer.convert_reader_to_recordio_file( "train.recordio", feeder=feeder, reader_creator=reader)
乍看也能看懂,很是合理,可是我這麼保存之後就出現問題,在後面的讀取數據的時候,通常咱們會把[3,244,244]的圖片送入卷積層,可是會報錯,提示維數至少爲4維,這點跟tensorflow同樣,第一維是維數,那麼應該怎麼辦呢?個人方法是:網絡
def reader_creator(): def __impl__(): for i in range(len(src_im_test)): yield [ src_im_test[i],#shape=[3,244,244] test_desmap[i],#shape=[3,244,244] test_num[i] #shape=[1] ] return __impl__ img = fluid.layers.data(name="image", shape=[-1,3, 244, 244])#注意這裏要加-1 label = fluid.layers.data(name="label", shape=[-1,1,244, 244]) num = fluid.layers.data(name="num", shape=[1], dtype='int64') feeder = fluid.DataFeeder(feed_list=[img, label, num], place=fluid.CPUPlace()) reader = paddle.batch(reader_creator(), batch_size=1) fluid.recordio_writer.convert_reader_to_recordio_file( "train.recordio", feeder=feeder, reader_creator=reader)
這裏把batch_size 設爲1,後面讀取的時候能夠自由組batch_size。dom
這裏是官網代碼:ide
import paddle.fluid as fluid file_obj = fluid.layers.open_files( filenames=["train.recordio"], shape=[[3, 224, 224], [1]], lod_levels=[0, 0], dtypes=["float32", "int64"], pass_num=100 ) image, label = fluid.layers.read_file(file_obj)
對應於上面的官網例子,可是前面說過,這樣子有問題(在我這裏是不能送入卷積層),下面給出個人讀取方法,和上面個人代碼相對應:fetch
import paddle.fluid as fluid file_obj = fluid.layers.open_files( filenames=["train.recordio"], shapes = [[-1,3, 244, 244], [-1,1,244, 244],[-1, 1]], dtypes=['float32','float32','int64'], lod_levels=[0, 0, 0], ) file_obj = fluid.layers.batch(file_obj, batch_size=9) img, des_im, total_num = fluid.layers.read_file(file_obj)#這裏的數據能夠直接送入網絡 conv1 = fluid.layers.conv2d(img, 1, 1)#若是前面保存時不指定-1,這裏就會報錯 loss =fluid.layers.reduce_mean(fluid.layers.square_error_cost(input=conv1,label=des_im)) exe = fluid.Executor(fluid.CPUPlace()) exe.run(fluid.default_startup_program()) loss_v, = exe.run(fetch_list=[loss]) print "loss is {}".format(loss_v)
這裏須要說明的是,從文件裏面讀取的數據已是網絡能夠識別的數據格式了,有興趣的話能夠fluid.layers.data生成的變量一塊兒print出來看一下,是同樣的類型。ui
結果:code
loss is [190564.94]
注意,返回的是一個numpy array,這裏能夠修改exe.run裏面的參數return_numpy=False來決定。orm
咱們再來看一下:htm
num = exe.run(fetch_list=[total_num]) print "num is {}".format(num)
結果:
num is [[17] [13] [61] [9] [8] [9] [17] [29] [9]]
一樣返回的也是numpy array,能夠看出來是怎麼組成batch的。