图像大小与参数个数:

前面几章都是针对小图像块处理的,这一章则是针对大图像进行处理的。两者在这的区别还是很明显的,小图像(如8*8,MINIST的28*28)可以采用全连接的方式(即输入层和隐含层直接相连)。但是大图像,这个将会变得很耗时:比如96*96的图像,若采用全连接方式,需要96*96个输入单元,然后如果要训练100个特征,只这一层就需要96*96*100个参数(W,b),训练时间将是前面的几百或者上万倍。所以这里用到了部分联通网络。对于图像来说,每个隐含单元仅仅连接输入图像的一小片相邻区域。

这样就引出了一个卷积的方法:

convolution:

自然图像有其固有特性,也就是说,图像的一部分的统计特性与其他部分是一样的。这也意味着我们在这一部分学习的特征也能用在另一部分上,所以对于这个图像上的所有位置,我们都能使用同样的学习特征。

对于图像,当从一个大尺寸图像中随机选取一小块,比如说8x8作为样本,并且从这个小块样本中学习到了一些特征,这时我们可以把从这个8x8样本中学习到的特征作为探测器,应用到这个图像的任意地方中去。特别是,我们可以用从8x8样本中所学习到的特征跟原本的大尺寸图像作卷积,从而对这个大尺寸图像上的任一位置获得一个不同特征的激活值。

讲义中举得具体例子,还是看例子容易理解:

假设你已经从一个96x96的图像中学习到了它的一个8x8的样本所具有的特征,假设这是由有100个隐含单元的自编码完成的。为了得到卷积特征,需要对96x96的图像的每个8x8的小块图像区域都进行卷积运算。也就是说,抽取8x8的小块区域,并且从起始坐标开始依次标记为(1,1),(1,2),...,一直到(89,89),然后对抽取的区域逐个运行训练过的稀疏自编码来得到特征的激活值。在这个例子里,显然可以得到100个集合,每个集合含有89x89个卷积特征。讲义中那个gif图更形象,这里不知道怎么添加进来...

最后,总结下convolution的处理过程:

假设给定了r * c的大尺寸图像,将其定义为xlarge。首先通过从大尺寸图像中抽取的a * b的小尺寸图像样本xsmall训练稀疏自编码,得到了k个特征(k为隐含层神经元数量),然后对于xlarge中的每个a*b大小的块,求激活值fs,然后对这些fs进行卷积。这样得到(r-a+1)*(c-b+1)*k个卷积后的特征矩阵。

pooling:

在通过卷积获得了特征(features)之后,下一步我们希望利用这些特征去做分类。理论上讲,人们可以把所有解析出来的特征关联到一个分类器,例如softmax分类器,但计算量非常大。例如:对于一个96X96像素的图像,假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集,由于已经得到了400个特征,所以对于每个样例(example)结果集的大小就将达到892 * 400 = 3,168,400 个特征。这样学习一个拥有超过3百万特征的输入的分类器是相当不明智的,并且极易出现过度拟合(over-fitting).

所以就有了pooling这个方法,翻译作“池化”?感觉pooling这个英语单词还是挺形象的,翻译“作池”化就没那么形象了。其实也就是把特征图像区域的一部分求个均值或者最大值,用来代表这部分区域。如果是求均值就是mean pooling,求最大值就是max pooling。讲义中那个gif图也很形象,只是不知道这里怎么放gif图....

至于pooling为什么可以这样做,是因为:我们之所以决定使用卷积后的特征是因为图像具有一种“静态性”的属性,这也就意味着在一个图像区域有用的特征极有可能在另一个区域同样适用。因此,为了描述大的图像,一个很自然的想法就是对不同位置的特征进行聚合统计。这个均值或者最大值就是一种聚合统计的方法。

另外,如果人们选择图像中的连续范围作为池化区域,并且只是池化相同(重复)的隐藏单元产生的特征,那么,这些池化单元就具有平移不变性(translation invariant)。这就意味着即使图像经历了一个小的平移之后,依然会产生相同的(池化的)特征(这里有个小小的疑问,既然这样,是不是只能保证在池化大小的这块区域内具有平移不变性?)。在很多任务中(例如物体检测、声音识别),我们都更希望得到具有平移不变性的特征,因为即使图像经过了平移,样例(图像)的标记仍然保持不变。例如,如果你处理一个MNIST数据集的数字,把它向左侧或右侧平移,那么不论最终的位置在哪里,你都会期望你的分类器仍然能够精确地将其分类为相同的数字。

练习:

下面是讲义中的练习。用到了上一章的练习的结构(即在convolution过程中的第一步,用稀疏自编码对xsmall求k个特征)。

以下是主要程序:

主程序cnnExercise.m

%% CS294A/CS294W Convolutional Neural Networks Exercise

%  Instructions
%  ------------
%
%  This file contains code that helps you get started on the
%  convolutional neural networks exercise. In this exercise, you will only
%  need to modify cnnConvolve.m and cnnPool.m. You will not need to modify
%  this file.

%%======================================================================
%% STEP : Initialization
%  Here we initialize some parameters used for the exercise.

imageDim = ;         % image dimension
imageChannels = ;     % number of channels (rgb, so )

patchDim = ;          % patch dimension
numPatches = ;    % number of patches

visibleSize = patchDim * patchDim * imageChannels;  % number of input units
outputSize = visibleSize;   % number of output units
hiddenSize = ;           % number of hidden units 

epsilon = 0.1;           % epsilon for ZCA whitening

poolDim = ;          % dimension of pooling region

%%======================================================================
%% STEP : Train a sparse autoencoder (with a linear decoder) to learn
%  features from color patches. If you have completed the linear decoder
%  execise, use the features that you have obtained from that exercise,
%  loading them into optTheta. Recall that we have to keep around the
%  parameters used in whitening (i.e., the ZCA whitening matrix and the
%  meanPatch)

% --------------------------- YOUR CODE HERE --------------------------
% Train the sparse autoencoder and fill the following variables with
% the optimal parameters:

%optTheta =  zeros(*hiddenSize*visibleSize+hiddenSize+visibleSize, );
%ZCAWhite =  zeros(visibleSize, visibleSize);
%meanPatch = zeros(visibleSize, );
load STL10Features.mat;

% --------------------------------------------------------------------

% Display and check to see that the features look good
W = reshape(optTheta(:visibleSize * hiddenSize), hiddenSize, visibleSize);
b = optTheta(*hiddenSize*visibleSize+:*hiddenSize*visibleSize+hiddenSize);

displayColorNetwork( (W*ZCAWhite)');

%%======================================================================
%% STEP : Implement and test convolution and pooling
%  In this step, you will implement convolution and pooling, and test them
%  on a small part of the data set to ensure that you have implemented
%  these two functions correctly. In the next step, you will actually
%  convolve and pool the features with the STL10 images.

%% STEP 2a: Implement convolution
%  Implement convolution in the function cnnConvolve in cnnConvolve.m

% Note that we have to preprocess the images in the exact same way
% we preprocessed the patches before we can obtain the feature activations.

load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels

%% Use only the first  images for testing
convImages = trainImages(:, :, :, :); 

% NOTE: Implement cnnConvolve in cnnConvolve.m first!
convolvedFeatures = cnnConvolve(patchDim, hiddenSize, convImages, W, b, ZCAWhite, meanPatch);

%% STEP 2b: Checking your convolution
%  To ensure that you have convolved the features correctly, we have
%  provided some code to compare the results of your convolution with
%  activations from the sparse autoencoder

% For  random points
:
    featureNum = randi([, hiddenSize]);
    imageNum = randi([, ]);
    imageRow = randi([, imageDim - patchDim + ]);
    imageCol = randi([, imageDim - patchDim + ]);    

    patch = convImages(imageRow:imageRow + patchDim - , imageCol:imageCol + patchDim - , :, imageNum);
    patch = patch(:);
    patch = patch - meanPatch;
    patch = ZCAWhite * patch;

    features = feedForwardAutoencoder(optTheta, hiddenSize, visibleSize, patch); 

    ) - convolvedFeatures(featureNum, imageNum, imageRow, imageCol)) > 1e-
        fprintf('Convolved feature does not match activation from autoencoder\n');
        fprintf('Feature Number    : %d\n', featureNum);
        fprintf('Image Number      : %d\n', imageNum);
        fprintf('Image Row         : %d\n', imageRow);
        fprintf('Image Column      : %d\n', imageCol);
        fprintf('Convolved feature : %0.5f\n', convolvedFeatures(featureNum, imageNum, imageRow, imageCol));
        fprintf());
        error('Convolved feature does not match activation from autoencoder');
    end
end

disp('Congratulations! Your convolution code passed the test.');

%% STEP 2c: Implement pooling
%  Implement pooling in the function cnnPool in cnnPool.m

% NOTE: Implement cnnPool in cnnPool.m first!
pooledFeatures = cnnPool(poolDim, convolvedFeatures);

%% STEP 2d: Checking your pooling
%  To ensure that you have implemented pooling, we will use your pooling
%  function to pool over a test matrix and check the results.

testMatrix = reshape(:, , );
expectedMatrix = [mean(mean(testMatrix(:, :))) mean(mean(testMatrix(:, :))); ...
                  mean(mean(testMatrix(:, :))) mean(mean(testMatrix(:, :))); ];

testMatrix = reshape(testMatrix, , , , );

pooledFeatures = squeeze(cnnPool(, testMatrix));

if ~isequal(pooledFeatures, expectedMatrix)
    disp('Pooling incorrect');
    disp('Expected');
    disp(expectedMatrix);
    disp('Got');
    disp(pooledFeatures);
else
    disp('Congratulations! Your pooling code passed the test.');
end

%%======================================================================
%% STEP : Convolve and pool with the dataset
%  In this step, you will convolve each of the features you learned with
%  the full large images to obtain the convolved features. You will then
%  pool the convolved features to obtain the pooled features for
%  classification.
%
%  Because the convolved features matrix is very large, we will do the
%  convolution and pooling  features at a time to avoid running out of
%  memory. Reduce this number if necessary

stepSize = ;
assert(mod(hiddenSize, stepSize) == , 'stepSize should divide hiddenSize');

load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels
load stlTestSubset.mat  % loads numTestImages,  testImages,  testLabels

pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, ...
    floor((imageDim - patchDim + ) / poolDim), ...
    floor((imageDim - patchDim + ) / poolDim) );
pooledFeaturesTest = zeros(hiddenSize, numTestImages, ...
    floor((imageDim - patchDim + ) / poolDim), ...
    floor((imageDim - patchDim + ) / poolDim) );

tic();

:(hiddenSize / stepSize)

    featureStart = (convPart - ) * stepSize + ;
    featureEnd = convPart * stepSize;

    fprintf('Step %d: features %d to %d\n', convPart, featureStart, featureEnd);
    Wt = W(featureStart:featureEnd, :);
    bt = b(featureStart:featureEnd);    

    fprintf('Convolving and pooling train images\n');
    convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
        trainImages, Wt, bt, ZCAWhite, meanPatch);
    pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
    pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
    toc();
    clear convolvedFeaturesThis pooledFeaturesThis;

    fprintf('Convolving and pooling test images\n');
    convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
        testImages, Wt, bt, ZCAWhite, meanPatch);
    pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
    pooledFeaturesTest(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;
    toc();

    clear convolvedFeaturesThis pooledFeaturesThis;

end

% You might want to save the pooled features since convolution and pooling takes a long time
save('cnnPooledFeatures.mat', 'pooledFeaturesTrain', 'pooledFeaturesTest');
toc();

%%======================================================================
%% STEP : Use pooled features for classification
%  Now, you will use your pooled features to train a softmax classifier,
%  using softmaxTrain from the softmax exercise.
%  Training the softmax classifer  iterations should take less than
%   minutes.

% Add the path to your softmax solution, if necessary
% addpath /path/to/solution/

% Setup parameters for softmax
softmaxLambda = 1e-;
numClasses = ;
% Reshape the pooledFeatures to form an input vector for softmax
softmaxX = permute(pooledFeaturesTrain, [   ]);
softmaxX = reshape(softmaxX, numel(pooledFeaturesTrain) / numTrainImages,...
    numTrainImages);
softmaxY = trainLabels;

options = struct;
options.maxIter = ;
softmaxModel = softmaxTrain(numel(pooledFeaturesTrain) / numTrainImages,...
    numClasses, softmaxLambda, softmaxX, softmaxY, options);

%%======================================================================
%% STEP : Test classifer
%  Now you will test your trained classifer against the test images

softmaxX = permute(pooledFeaturesTest, [   ]);
softmaxX = reshape(softmaxX, numel(pooledFeaturesTest) / numTestImages, numTestImages);
softmaxY = testLabels;

[pred] = softmaxPredict(softmaxModel, softmaxX);
acc = (pred(:) == softmaxY(:));
acc = sum(acc) / size(acc, );
fprintf();

% You should expect to % on the test images.

cnnConvolve.m

function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
%  patchDim - patch (feature) dimension
%  numFeatures - number of features
%  images - large images to convolve with, matrix in the form
%           images(r, c, channel, image number)
%  W, b - W, b for features from the sparse autoencoder
%  ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for
%                        preprocessing
%
% Returns:
%  convolvedFeatures - matrix of convolved features in the form
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)
patchSize = patchDim*patchDim;
numImages = size(images, );
imageDim = size(images, );
imageChannels = size(images, );

convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + , imageDim - patchDim + );

% Instructions:
%   Convolve every feature with every large image here to produce the
%   numFeatures x numImages x (imageDim - patchDim + ) x (imageDim - patchDim + )
%   matrix convolvedFeatures, such that
%   convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the
%   value of the convolved featureNum feature for the imageNum image over
%   the region (imageRow, imageCol) to (imageRow + patchDim - , imageCol + patchDim - )
%
% Expected running times:
%   Convolving with  images should take less than  minutes
%   Convolving with  images should take around an hour
%   (So to save time when testing, you should convolve with less images, as
%   described earlier)

% -------------------- YOUR CODE HERE --------------------
% Precompute the matrices that will be used during the convolution. Recall
% that you need to take into account the whitening and mean subtraction
% steps
WT = W*ZCAWhite;
bT = b-WT*meanPatch;
% --------------------------------------------------------

convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + , imageDim - patchDim + );
:numImages
  :numFeatures

    % convolution of image with feature matrix for each channel
    convolvedImage = zeros(imageDim - patchDim + , imageDim - patchDim + );
    :

      % Obtain the feature (patchDim x patchDim) needed during the convolution
      % ---- YOUR CODE HERE ----
      %feature = zeros(,); % You should replace this
      offset = (channel-)*patchSize;
      feature = reshape(WT(featureNum,(offset+):(offset+patchSize)),patchDim,patchDim);

      % ------------------------

      % Flip the feature matrix because of the definition of convolution, as explained later
      feature = flipud(fliplr(squeeze(feature)));

      % Obtain the image
      im = squeeze(images(:, :, channel, imageNum));

      % Convolve "feature" with "im", adding the result to convolvedImage
      % be sure to do a 'valid' convolution
      % ---- YOUR CODE HERE ----
       convolveThisChannel = conv2(im,feature,'valid');
       convolvedImage = convolvedImage + convolveThisChannel;            %三个通道加起来,应该是指三个通道同时用来做判断标准。

      % ------------------------

    end

    % Subtract the bias unit (correcting for the mean subtraction as well)
    % Then, apply the sigmoid function to get the hidden activation
    % ---- YOUR CODE HERE ----
    convolvedImage = sigmoid(convolvedImage + bT(featureNum));

    % ------------------------

    % The convolved feature is the sum of the convolved values for all channels
    convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;
  end
end

function sigm = sigmoid(x)

    sigm =  ./ ( + exp(-x));
end

end

cnnPool.m

function pooledFeatures = cnnPool(poolDim, convolvedFeatures)
%cnnPool Pools the given convolved features
%
% Parameters:
%  poolDim - dimension of pooling region
%  convolvedFeatures - convolved features to pool (as given by cnnConvolve)
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)
%
% Returns:
%  pooledFeatures - matrix of pooled features in the form
%                   pooledFeatures(featureNum, imageNum, poolRow, poolCol)
%     

numImages = size(convolvedFeatures, );
numFeatures = size(convolvedFeatures, );
convolvedDim = size(convolvedFeatures, );

pooledFeatures = zeros(numFeatures, numImages, floor(convolvedDim / poolDim), floor(convolvedDim / poolDim));

% -------------------- YOUR CODE HERE --------------------
% Instructions:
%   Now pool the convolved features in regions of poolDim x poolDim,
%   to obtain the
%   numFeatures x numImages x (convolvedDim/poolDim) x (convolvedDim/poolDim)
%   matrix pooledFeatures, such that
%   pooledFeatures(featureNum, imageNum, poolRow, poolCol) is the
%   value of the featureNum feature for the imageNum image pooled over the
%   corresponding (poolRow, poolCol) pooling region
%   (see http://ufldl/wiki/index.php/Pooling )
%
%   Use mean pooling here.
% -------------------- YOUR CODE HERE --------------------
numBlocks = floor(convolvedDim/poolDim);             %每个维度总共分成多少块(/),这里对于不同维数的数据,poolDim要选择能刚好除尽的?
:numFeatures
    :numImages
        :numBlocks
            :numBlocks
                features = convolvedFeatures(featureNum,imageNum,(poolRow-)*poolDim+:poolRow*poolDim,(poolCol-)*poolDim+:poolCol*poolDim);
                pooledFeatures(featureNum,imageNum,poolRow,poolCol) = mean(features(:));
            end
        end
    end
end
end

结果:

Accuracy: 78.938%

与讲义提到的80%左右差不多。

ps:讲义地址:

http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

http://deeplearning.stanford.edu/wiki/index.php/Pooling

http://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_Pooling

 

Deep Learning 学习随记(七)Convolution and Pooling --卷积和池化的更多相关文章

  1. Deep Learning 学习随记(八)CNN(Convolutional neural network)理解

    前面Andrew Ng的讲义基本看完了.Andrew讲的真是通俗易懂,只是不过瘾啊,讲的太少了.趁着看完那章convolution and pooling, 自己又去翻了翻CNN的相关东西. 当时看讲 ...

  2. Deep Learning 学习随记(六)Linear Decoder 线性解码

    线性解码器(Linear Decoder) 前面第一章提到稀疏自编码器(http://www.cnblogs.com/bzjia-blog/p/SparseAutoencoder.html)的三层网络 ...

  3. Deep Learning学习随记(一)稀疏自编码器

    最近开始看Deep Learning,随手记点,方便以后查看. 主要参考资料是Stanford 教授 Andrew Ng 的 Deep Learning 教程讲义:http://deeplearnin ...

  4. Deep Learning 学习随记(五)深度网络--续

    前面记到了深度网络这一章.当时觉得练习应该挺简单的,用不了多少时间,结果训练时间真够长的...途中debug的时候还手贱的clear了一下,又得从头开始运行.不过最终还是调试成功了,sigh~ 前一篇 ...

  5. Deep Learning 学习随记(五)Deep network 深度网络

    这一个多周忙别的事去了,忙完了,接着看讲义~ 这章讲的是深度网络(Deep Network).前面讲了自学习网络,通过稀疏自编码和一个logistic回归或者softmax回归连接,显然是3层的.而这 ...

  6. Deep Learning 学习随记(四)自学习和非监督特征学习

    接着看讲义,接下来这章应该是Self-Taught Learning and Unsupervised Feature Learning. 含义: 从字面上不难理解其意思.这里的self-taught ...

  7. Deep Learning学习随记(二)Vectorized、PCA和Whitening

    接着上次的记,前面看了稀疏自编码.按照讲义,接下来是Vectorized, 翻译成向量化?暂且这么认为吧. Vectorized: 这节是老师教我们编程技巧了,这个向量化的意思说白了就是利用已经被优化 ...

  8. Deep Learning 学习随记(三)Softmax regression

    讲义中的第四章,讲的是Softmax 回归.softmax回归是logistic回归的泛化版,先来回顾下logistic回归. logistic回归: 训练集为{(x(1),y(1)),...,(x( ...

  9. Deep Learning 学习随记(三)续 Softmax regression练习

    上一篇讲的Softmax regression,当时时间不够,没把练习做完.这几天学车有点累,又特别想动动手自己写写matlab代码 所以等到了现在,这篇文章就当做上一篇的续吧. 回顾: 上一篇最后给 ...

随机推荐

  1. ios设备中openGL所支持的最大纹理尺寸

    这几天碰到一个在iphone4上显示图片未黑色矩形的bug,在其他机器上都正常 最后发现是图片打包尺寸的关系,iphone4无法读取2048以上大小的单个图片,所以其中的图片都显示成了黑色,希望对碰到 ...

  2. 字符编码笔记:ASCII,Unicode和UTF-8

    很久很久以前,有一群人,他们决定用8个可以开合的晶体管来组合成不同的状态,以表示世界上的万物.他们看到8个开关状态是好的,于是他们把这称为"字节". 再后来,他们又做了一些可以处理 ...

  3. javascript,从库到框架再到平台

    对于库,框架,平台,从事过后端开发的人并不陌生,一直基于.net平台做开发,本人懒惰,面对庞大的体系,基本只掌握一点开发上用得着的技术,到是在程序结构,业务过程等方面花了点精力. 随着VS开发工具的成 ...

  4. android 中targetSdkVersion和与target属性的区别

    AndroidMenifest.xml中targetSdkVersion和project.properties中的target属性的区别      在AndroidMenifest.xml中,常常会有 ...

  5. 关于NSLocalizedString(@"Foo %@",nil)

    NSLocalizedString(@"Foo %@",nil) 这句话实际上是在多语言文件中寻找一个key为“Foo %@”的文字,千万不要把这个和[NSString strin ...

  6. Multi-Die系统介绍

    一个典型的存储系统一般是有几片NAND存储器组成的.一般会使用8-bit的总线,用来将不同的存储器与控制器进行连接,如图2.32所示.一个系统中多片NAND的存储系统可以提高存储容量,同时还可以提高读 ...

  7. 关于浮动float属性和position:absolute属性的区别

    最近返回头看了很多书籍,一直在纠结float属性和absolute绝对定位的区别和使用的情况,给大家分享一下自己的心得和体会吧. 1,float属性 float属性意义是让元素拜托独占一行的霸道总裁, ...

  8. C++ ABI之名字改变,编译器生成符号研究(以Qt为例)

    在C++中,由于重载等技术的存在,编译器要将函数.结构体.类等等的信息传递给链接器,就不能像C语言那样简单地通过函数名来完成,它需要提供额外的参数信息,而还要和C语言共用链接器,这就需要用到名字改编( ...

  9. NSURLRequest POST方式请求服务器示例

    http://lizhuang.iteye.com/blog/1833297 1.  准备阶段 NSString *urlString = [NSString stringWithFormat:@&q ...

  10. Django之--POST方法处理表单请求

    上一篇:Django之--MVC的Model 演示了如何使用GET方法处理表单请求,本文讲述直接在当前页面返回结果,并使用更常用的POST方法处理. 一.首先我们修改下page.html <!D ...