文件名称:
通过深度学习增强的视网膜光学相干断层扫描图像.pdf
开发工具:
文件大小: 9mb
下载次数: 0
上传时间: 2019-10-05
详细说明:通过深度学习增强的视网膜光学相干断层扫描图像论文,pdf格式Research Article
VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6207
Biomedical Optics EXPRESS
「 RtO IFA
/R→lFA
To achieve this we train a generator network as a feed-forward convolutional neural network
(CNN) GeG parametrised by 0G. GeG maps Ir into a certain distribution E. We train the
parameters 0G by minimising an enhancement-specilic loss function L, with the aim of makin
IE as close as possible to IFA
2. 1. Network architecture
The overall view of the proposed network GUG is shown in Fig. 1(a). We employ aresidual network
architecture [23 which is modular in style with multiple connecting blocks of identical structure
(shown in Fig. I(b). The residual blocks employ pre-activation with batch normalisation(BN)
and rectified linear units(ReLU), which improves training and generalisation of the network [23]
Pre-activation precedes two convolutional layers with small 3 x 3 kernels and 64 feature maps
The blocks include a skip link, which combines the signals pre- and post-processing through
addition. This helps information to flow through the network, and improves gradient flow during
back-propagation. Another skip connection combines the signal prior to the residual blocks
with that after processing through addition. This is followed by another pre-activation and
convolutional layer, and a final convolutional layer with feature depth of one. The output of this
final layer iS IE
a)Generator Network
b)Residual Block(RB)
Bx Residual blocks
BN
□ReLu
3x3 Cony
(n filters
Dense
C) Discriminator Network
F(x)
Fig 1. Illustration of the proposed networks. The input(Ir) to the generator network(a)
is a raw B-scan from the OCT scanner, which undergoes processing by B residual blocks
(b)to produce an enhanced image(E). The numbers of filters for the convolutional layers,
and the number of units for the dense layers, are indicated by numbers on the blocks. The
discriminator network(C, described in 2.3)aims to estimate the Wasserstein metric between
real (IFA) and generated(le)data distributions
We train two versions of the generator network GeG. The first version is trained via a
mean-squared error loss, and is referred to as CNN-MSE. The Mse loss favours high peak signal
to noise ratio(PSNr), which is a widely-used metric for quantitatively evaluating the quality of
OCT images. However, convolutional neural networks trained solely with Mse loss often display
th textures, and lack high-frequency features. Considering the importance of clear
layer boundaries in OCT B-scan analysis, this approach alone may not be optimal. Therefore, we
trained a second version referred to as cnn-wgan. with the aim of creating images which are
perceptually indistinguishable from IF A by formulating the problem in terms of a Wasserstein
Research Article
VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6208
Biomedical Optics EXPRESS
generative adversarial network(WGAN)[24]. and utilising a perceptual loss in addition to the
MSE loSS. In the next section, we will describe the implementation of these methods
2. 2. Perceptual/loss
The perceptual loss is based on high-level features extracted from a pre-trained network [25
This ensures that the network is trained to replicate image similarities more robustly compared
to using per-pixel losses. The perceptual loss is detined as the Euclidean distance between the
feature representations of the enhanced image(GeG(IR)and the frame-averaged reference image
(FA) given by a pre-trained vGG19 network [26]
V;;H;
VGG/
∑∑(的1(F0)y-6A(m()y)2
x=I y=I
where, oi, j indicates the feature map obtained by the j-th convolution, after RelU activation
prior to the i-th pooling layer, and Wi,j and Hi, j describe the dimensions of the respective feature
maps within the VGG network
2.3. Adversaria/ loss
Along with the generator network, a generative adversarial network(Gan) involves a discriminator
network, Dep, parametrised by AD(shown in Fig. 1). The generator network is trained to
produce realistic images, while the discriminator network is trained to identify which images are
real versus those that are generated. Here, we implement a WGAN, an improved version of the
original gaN, which uses the Earth Mover's distance [27] to compare two data distributions(that
of IFA and lE). We optimise both networks in an alternating manner (fixing one and updating
the other)to solve the following min-max problem
min max LwGAN (D, G)=-EIFALDGFA+EIRID(G(RD
Bg Op
(3)
+BI(△—xD(FA川2-1)21,
where, the first two terms represent the estimation of the wasserstein distance and the final
term performs a gradient penalty to enforce the Lipschitz constraint, with penalty coefficient a
TE is uniformly sampled along pairs of lE and IFA samples. This results in improved stability
during training. Additionally, we impose gradient penalty [281, which has been shown to improve
convergence of the WGan compared to gradient clipping. With this approach, our generator can
learn to create solutions that are highly similar to real images and thus difficult to classify by d
Thus, the overall loss of the CNn-wgan architecture is given by
min max d,LWGaN(D, G)+A2LyGGIG)+LMSE(G),
(4)
where nI and a2 are weighting parameters to control the trade-off between the three components
of the loss
3. Experiments
3. 1. Data acquisition and pre-processing
Six OCt volumes were acquired from both eyes of 38 healthy patients on a Cirrus HD-OCT
Scanner (zeiss, Dublin, CA)at a single visit. The scans were centred on the optic nerve head
(ONH)and were 200x 200 x1024 voxel per cube, acquired from a region 6mm x 6mm x 2mm
These scans were then registered and averaged to create the "ground truth"denoised image. The
scan with the highest signal strength(as provided by the scanner software) was chosen as the
Research Article
VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6209
Biomedical Optics EXPRESS
reference image, and the remaining five volumes were registered to it in two steps. First, a 2-D
Elastix registration [29] of the en-face images addressed rotation and translation differences
between the consecutive scans(x-y axis registration). Next, individual A-scans from the reference
and moving scans were cross-correlated to find the translation that best aligned the scans (z-axis
registration). The resulting volumes were averaged to remove noise. This process produced 76
denoised "Oct volumes. however 7 volumes were not used for further training or analysis due to
inaccurate registration as recognized upon visual inspection, resulting in a cohort of 69 volumes
from healthy eyes
A second set of scans were also acquired from a single eye of 6 glaucoma patients using the
same protocol i.e. six ONH-centred scans were acquired on the same visit. The denoised ground
ruth was generated as described above. Note, however, that this set was not used in the training
procedure and was only used to evaluate the method.
3.1.1. Training details
The 69 OCT volumes were split into a training, validation and testing subsets, containing 51
7 and 1l volumes, respectively. In this way, B-scans from volumes in the hold-out test set
were never seen during training. B-scans within 10 voxels of the edge of the volume were not
considered during training since they often contained artefacts from the registration process
Therefore, each denoised volume had 6 corresponding raw volumes, with 180 B-scans, resulting
in 55080 B-scan pairs. Patches of size 176x 176 were extracted from these scans for training,
and augmentation was performed by randomising the position of the patch(ensuring that each
patch showed a portion of retina), and randomly flipping each patch horizontally. Pairs of patches
underwent normalisation prior to training, as such:
I-HT
5)
where I and l are the original and the normalised B-scans respectively, and ur and or are
the mean and standard deviation of the training set. This transformation was also applied to
validation and hold-out set images. In our experiments, both networks were optimized using
Adam algorithm [30], the hyper parameters of which were empirically set as cr =e-5, B1=0.5,
B2=0.9. Mini-batch size was 4. The penalty coefficient was set as d =10 as suggested in [28]
and the loss weighting parameters were empirically set as dI=le-3 and d2=2e-6. training
wasperformedinTensorflow(https://www.tensorflow.org/)onaNextscalenx360m4server
with 64 GB of raM, 800GB SSD, and an Nvidia Tesla k40 graphics Processing Unit(GPU
withNvidiacuda(v9.0)andcu-dnn(v7.0.5)libraries(http://www.nvidia.com).Trainingwas
monitored, and stopped when no further improvement was observed. Training of CNN-MSE and
CNN-WGAN took approximately 5 hours and 22 hours respectively. Testing was performed on
1980B-scans(1l volumes x 180 Slices). While the network was trained on patches of images,
the fully-convolutional architecture of the network means that images of arbitrary size may be
used during testing, therefore entire B-scans where enhanced. Prior to further analytics, the
enhanced images were scaled back to their original distribution by:
I=(I×or)+
3.1.2. Quantitative metrics
We compare the performance of CNN-Mse and CNN-Wgan to two well-known image
enhancement techniques: block-matching 3D(BM3D)algorithm [31 implemented using the
py BM3D wrapper in Python, and double-density dual-tree complex wavelet transform ( DD
CDWT)[32](which has shown more recent, promising results on oct data [33), implemented
using Matlab(Http: //eeweb. poly edu/iselesni/waveletsoftware/dt2d. html). To fairly compare
Research Article
VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6210
Biomedical Optics EXPRESS
the speed of the different methods, all were implemented on a Mac Book Pro with 2.9 GHz Intel
Core 15, &GB 1867 MHZ DDR3. Performance was gauged using standard error metrics such as
PSNR, structural similarity index (ssim), multi-scale structural similarity index (MS -SSIm)
metrics, and MSE. SSiM was calculated for pixels considered to be part of the retina. These
metrics were computed on a hold-out set of data from the healthy cohort and the entire glaucoma
cohort. Statistical significance was determined using paired two-sided wilcoxon signed-rank
tests [34] at p<0.05
Additionally, the impact of the proposed enhancement techniques on the visibility of structures
of interest was gauged by having multiple independent experts manually annotate five surfaces
the internal limiting membrane (ILM), the posterior border of the retinal nerve fibre layer (rnFl)
the ganglion cell and inner plexiform layer(GCIPL), the inner nuclear layer(INL) and Bruch's
membrane(BM) shown in Fig. 4(a). Manual annotations were obtained on 22 slices from
the original volumes (2 from each volume of the test set), plus the same images enhanced
by cNn-mse and cnn-Wgan. half of the slices were chosen from a central section of the
volume(containing the cross-section of the ONh), while the other half were chosen at random
from a peripheral section (lacking the ONH). The optic nerve head itself was omitted from the
annotations. The slices were also repeated in the set and shuffled before being presented to the
annotators
With the assumption that retinal surfaces would be easier to identify with a higher degree
of repeatability in clearer images, unsigned variability in the location of the surface location
between observers (inter-observer), and for the same observer between multiple repeated scans
(intra-observer), titled Average Annotation Difference(AAD)was calculated as follows:
AAD
∑∑|1-2
(7)
N×A
Where l represents the surface location on the a column (or A-scan) of the nB-scan. The
subscript on l indicates the first versus second observer(for inter-observer), or first versus second
repeat of the annotation(for intra-observer). A three-way ANOVa was used to compare the
effect of the annotator (or annotator pair, for inter-observer AAD), the enhancement technique
and the surface annotated on intra- and inter-observer variability. statistical significance was
then determined using a tukey-KI
3.1.3. Qualitative metrics
We performed a mean opinion score test to quantify the ability of different approaches to
reconstruct perceptually convincing images. Given six different representations of a B-scan
unprocessed(Raw ) enhanced using CNN-WGAN, CNN-MSE, BM3D and DD-CDWT, and
frame averaged, three experts were asked to rank the images from best to worst on three metrics
I. Ability to discriminate structures(e.g. retinal layers, blood vessels)(I= most clear, 6
2. Accuracy with which the image represents what you might see at a cellular/microscopic
level (1=most accurate. 6= least accurate)
3. Personal preference for use in clinical practice(1= most preferred, 6=least preferred
The metrics were specifically chosen to understand the differences in perceived clarity (metric 1)
accuracy(metric 2), and personal preference(metric 3) between various versions of the same
scan. Each observer was simultaneously shown all 6 versions of each of 55 B-scans (5 from
each of the 1 1 test set volumes), in a randomized order and asked to rank each of the three
Research Article
VoL 9, No. 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6211
Biomedical Optics EXPRESS
metrics in turn. Inter-observer agreement was calculated using Cohen,s Kappa Score [35], and
statistical significance between observers in their preference for CNN-MsE versus CNN-WGAN
was calculated using a Binomial Test(with Bonferroni correction for multiple comparisons)
4. Results
To show the denoising effect of the proposed networks, a representative B-scan from the optic
nerve head of a healthy person within the test set is shown in Fig. 2, as it appears in the raw
image(a), ground truth achieved through averaging(b), post-processing of the raw scan with
CNN-MSE(C). CNN-WGan(d), BM3D(e), and DD-CDWT(f). The B-scans consisted of 200
A-Scans, each with 1024 pixels. While the scans were kept at this aspect ratio during processing
for display purposes they were resized for better visualisation
a
b)
Fig. 2. OCT Image from a healthy volume captured by Cirrus HD-OCT Scanner(a),
and corresponding 6-fraine averaged image(b). The result of posl-processing of (a) with
CNN-MSE(C). CNN-WGan (d), BM3D(e), and DD-CDWT(f). Three zoomed in colour
coded sections are shown below each B-scan(best viewed in colour)
As can be observed, the bm3d method (e)does not over-smooth the image, but still shows
some residual noise, and has poor contrast. In comparison, dd-CDWT(f) introduces over
Research Article
VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6212
Biomedical Optics EXPRESS
e
Fig 3. OCT Image from a glaucomatous volume captured by Cirrus HD-oCT Scanner(a),
and corresponding 6-frame averaged image(b). The result of post-processing of (a)with
CNN-MSE(c), CNN-WGAN(d), BM3D(e), and DD-CDWT(f). Three zoomed in, colour
coded sections are shown below each B-scan(best viewed in colour)
SMoothing, resulting in loss of layer boundary distinction(particularly evident in the red and blue
insets). The images produced by CNN-mse(c)also show signs of smoothing in that it attains
homogeneity within each layer, but does not lose layer separation(as is the case for DD-CDWT),
and layers can still be clearly distinguished. CNN-WGan (d) produces images most similar
perceptually to the frame-averaged scans(b), in that it maintains layer texture and boundary
clarity without introducing artefacts. Fig. 3 shows a representative scan from a glaucomatous
volume. Even though no glaucomatous scans were used during network training, both CNn
enhancement methods produce enhanced images with a high degree of accuracy for structural
detail. In particular, the border tissue/Bruch's membrane configuration(green box), which is
an important indicator of disc margin in glaucomatous volumes [36] is best preserved in the
CNN-WGAN enhancement. Additionally, the two Cnn methods do not blur the edges between
layers(blue box), as does dd-CDWT BM3D performs well in the horizontally flat region of
the retina indicated by the blue box but is outperformed by the cnn methods in the other two
highlighted regions
Research Article
VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6213
Biomedical Optics EXPRESS
The average times required to process a B-scan by each method are shown in Table 1. The
times reported are averaged over 200 B-scans, with the proposed methods(CNN-MSE and
CNN-WGAN) processing all 200 simultaneously
Table 1. Time required to process a single B-scan, averaged over 200 B-scans
Method
Time (seconds)
BM3D
784
DD-CDWT
0.72
CNN-WGAN
0.68
CNN-MSE
0.68
The results show that the proposed solutions out-perform existing methods in terms
of use. Additionally, the networks are highly parallelizable, and therefore lend themselves to
further speed increases on more powerful systems, particularly those with access to a GPu
4.1. Quantitative results
For quantitative results, we calculated the PSNR, Ssim, Ms-SSIM and mse between six different
image types(Raw, BM3D, DD-CDWT, CNN-WGAN and CNN-MSE) and their corresponding
frame-averaged scan The summary data are shown in Table 2. the best performing technique in
each metric is shown in bold
Table 2. Mean t standard deviation of the peak signal to noise ratio(PSNr), structural
similarity ratio(SSIM), multi-scale structural similarity ratio(Ms-SSIM) and mean squared
error (mse) for 1980 healthy B-scans and 1080 B-scans from patients with glaucoma, using
the BM3D, DD-CDWT, and the proposed CNN-WGan and CNN-Mse networks. The best
results are shown in bold. All pairwise comparisons(excluding SSIM on glaucoma images
processed by BM3D and DD-CDWT) were statistically significant(p < 0.0001)
Health
ny
PSNR
SSIM
MS-SSIM
MSE
Raw
2682±0.670.62±0.040.79±0.02136.67±22.43
BM3D
31.17±1270.74±0.040.88±0.0251.53±17.29
DD-CDWT31.21±1.190.75±0.040.89±0.025122±1592
CNN-WGAN31.83±1.210.77±0.030.92±0.0144.52±14.34
CNN-MSE
32.28±127078±0030.92±0.0140.28±13.44
Glaucona
PSNR
SSIM
MS-SSIM
MSE
Raw
25.46±0.670.58±0.040.76±002188.32±52.53
BM3D
29.12±1.120.74±0.030.88±00383.34±44.76
DD-CDWT29.44±1180.74±0.030.89±0.0377.84±46.15
CNN- WGAN29.92±1.270.76±0.030.91±0.027285±4607
CNN-MSE
30.21±1120.78±0.030.92±0.0370.04±44.91
Research Article
VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6214
Biomedical Optics EXPRESS
We observe that the cnn-mse network delivers the best results on all metrics. with all
pairwise comparisons showing statistical significance(p<00001)(excluding the comparison
between SSIM on glaucoma images processed by BM3D and dd-CDwT, for which p >0.05)
The CNN-MSE network is also able to greatly improve the quality of glaucomatous volumes
despite having been trained using only healthy volumes. Since visual inspection and quantitative
comparison of the enhancement techniques showed that the cnn-based methods were superior
to both bm3d and dd-cdwt, these were excluded from further analyses
While these results show that CNN-MsE is objectively better in terms of image metrics, it has
previously been posited that such a network might overly blur images, which in this application
might result in reduced layer boundary visibility. Therefore, we define a quantitative assessment
based on the efect upon the end user. We asked three experts to annotate five surfaces (ILM
RNFL, GCIPL, INL and BM) in raw images, and those having undergone enhancement with
CNN-MSE and CNN-WGAN. Fig. 4(a) shows a section of B-scan, processed by CNN-WGAN
with layer annotations indicated. Each observer annotated each image twice, and the average
difference in location of annotations (AAD, defined in Equation 7), both between observers
(inter-observer), and for the one observer between repeated scans (intra-observer)was calculated
The intra- and inter-observer AAd for two of the surfaces are shown in Figs. 4(b )and(c)
Layer boundaries
Image Type
M
RNFL
BM
CNN-MSE
GCIPL
CNN-WGAN
GCIPL
ILM
61
5
9
2
1v2 IV3 2V3
1v2 IV3 2V3
Grader
Grader Pa
Fig. 4. a)Portion of a B-scan with surface annotations for five layers. Intra-observer(b)
and inter-observer(c)annotation location difference(AAD), averaged over the columns of
all annotated B-scans for ILM and GCIPL surfaces(mean t slandard error). Statistical
significance at p<0.001 indicated by *. Results for the remaining layers are shown in
6
The results of intra-observer analysis show that compared to raw images, both networks
significantly improved the repeatability of layer annotations for all three annotators (p<0. 001
this and all further comparisons in this section were determined via a three way avona with
Tukey-Kramer post-hoc analysis). Additionally, this also occurred for all surfaces when controllin
for grader(all p <0.001). Differences between the AAd metric were more pronounced for
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.