U-net:Convolutional Networks for Biomedical Image Segmentation

U-Net：Convolutional Networks for Biomedical Image Segmentation

Abstract ：

The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization

Introduction ：

The convolutional network，their success was limited due to the size of the available training sets and the size of the considered networks.
In many visual tasks，especially in biomedical image processing，the desired output should include localization，i.e.，a class label is supposed to be assigned to each pixel .
Ciresan et al：trained a network in a sliding-window setup to predict the class label of each pixel by providing a local region (patch) around that pixel as input .
- advantages：
  - this network can localize
  - the training data in terms of patches is much larger than the number of training images
- disadvantages：
  - it is quite slow because the network must be run separately for each patch , and there is a lot of redundancy due to overlapping patches.
  - there is a trade-off between localization accuracy and the use of context.
More recent approaches proposed a classifier output that takes into account the features from multiple layers.

多尺度融合的深度网络，把某一个像素为中心的不同大小的patch作为多个通道输入到深度网络中学习
In this paper , “fully convolutional network” , the main idea is to supplement a usual contracting network by successive layers , where pooling operators are replaced by upsampling operators .
In the upsampling part we have alse a large number of feature channels , which allow the network to propagate context information to higher resolution layers . The segmentation map only contains the pixels , for which the full context is available in the input image . To predict the pixels in the border region of the image , the missing context is extrapolated by mirroring the input image .

This strategy allows the seamless segmentation of arbitrarily large images by an overlap-tile strategy.

u_net_overlap

There is very little training data available , we use excessive data augmentation by applying elastic deformations to the available training images .
Another challenge is the separation of touching objects of the same class . We propose the use of a weighted loss , where the separating background labels between touching cells obtain a larger weight in the loss function

Network Architecture

u-net

Training

SGD
unpadded
larger input tiles over a large batch size
softmax
The separation border computed using morphological operations

中间项链的部分像素，人为的提高权重，让network可以重点学习这些特征，以便于分割
draw the initial weights from a Gaussian distribution

Data Augmentation

Data Augmentation is essential to teach the network the desired invariance and robustness properties , when only few training samples are available .

Need shift and rotation invariance as well as robustness to deformations and gray value variations
Generate smooth deformations using random displacement vectors on a coarse 3 by 3 grid .

Experiments

30 images(512*512 pixels)
ground truth
membranes
The u-net (averaged over 7 rotated version of the input data)

Conclusion

The u-net architecture achieves very good performance on very different biomedical segmentation applications . Thanks to data augmentation with elastic deformations , it only needs very few annotated images and has a very reasonable training time .