通过深度学习增强的视网膜光学相干断层扫描图像.pdf通过深度学习增强的视网膜光学相干断层扫描图像论文

文件名称: 通过深度学习增强的视网膜光学相干断层扫描图像.pdf

所属分类: 深度学习

开发工具:

文件大小: 9mb

下载次数: 0

上传时间: 2019-10-05

提供者: qq_41******

下载 (9mb)

不能下载？报告错误

详细说明：通过深度学习增强的视网膜光学相干断层扫描图像论文，pdf格式Research Article VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6207 Biomedical Optics EXPRESS 「 RtO IFA /R→lFA To achieve this we train a generator network as a feed-forward convolutional neural network (CNN) GeG parametrised by 0G. GeG maps Ir into a certain distribution E. We train the parameters 0G by minimising an enhancement-specilic loss function L, with the aim of makin IE as close as possible to IFA 2. 1. Network architecture The overall view of the proposed network GUG is shown in Fig. 1(a). We employ aresidual network architecture [23 which is modular in style with multiple connecting blocks of identical structure (shown in Fig. I(b). The residual blocks employ pre-activation with batch normalisation(BN) and rectified linear units(ReLU), which improves training and generalisation of the network [23] Pre-activation precedes two convolutional layers with small 3 x 3 kernels and 64 feature maps The blocks include a skip link, which combines the signals pre- and post-processing through addition. This helps information to flow through the network, and improves gradient flow during back-propagation. Another skip connection combines the signal prior to the residual blocks with that after processing through addition. This is followed by another pre-activation and convolutional layer, and a final convolutional layer with feature depth of one. The output of this final layer iS IE a)Generator Network b)Residual Block(RB) Bx Residual blocks BN □ReLu 3x3 Cony (n filters Dense C) Discriminator Network F(x) Fig 1. Illustration of the proposed networks. The input(Ir) to the generator network(a) is a raw B-scan from the OCT scanner, which undergoes processing by B residual blocks (b)to produce an enhanced image(E). The numbers of filters for the convolutional layers, and the number of units for the dense layers, are indicated by numbers on the blocks. The discriminator network(C, described in 2.3)aims to estimate the Wasserstein metric between real (IFA) and generated(le)data distributions We train two versions of the generator network GeG. The first version is trained via a mean-squared error loss, and is referred to as CNN-MSE. The Mse loss favours high peak signal to noise ratio(PSNr), which is a widely-used metric for quantitatively evaluating the quality of OCT images. However, convolutional neural networks trained solely with Mse loss often display th textures, and lack high-frequency features. Considering the importance of clear layer boundaries in OCT B-scan analysis, this approach alone may not be optimal. Therefore, we trained a second version referred to as cnn-wgan. with the aim of creating images which are perceptually indistinguishable from IF A by formulating the problem in terms of a Wasserstein Research Article VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6208 Biomedical Optics EXPRESS generative adversarial network(WGAN)[24]. and utilising a perceptual loss in addition to the MSE loSS. In the next section, we will describe the implementation of these methods 2. 2. Perceptual/loss The perceptual loss is based on high-level features extracted from a pre-trained network [25 This ensures that the network is trained to replicate image similarities more robustly compared to using per-pixel losses. The perceptual loss is detined as the Euclidean distance between the feature representations of the enhanced image(GeG(IR)and the frame-averaged reference image (FA) given by a pre-trained vGG19 network [26] V;;H; VGG/ ∑∑(的1(F0)y-6A(m()y)2 x=I y=I where, oi, j indicates the feature map obtained by the j-th convolution, after RelU activation prior to the i-th pooling layer, and Wi,j and Hi, j describe the dimensions of the respective feature maps within the VGG network 2.3. Adversaria/ loss Along with the generator network, a generative adversarial network(Gan) involves a discriminator network, Dep, parametrised by AD(shown in Fig. 1). The generator network is trained to produce realistic images, while the discriminator network is trained to identify which images are real versus those that are generated. Here, we implement a WGAN, an improved version of the original gaN, which uses the Earth Mover's distance [27] to compare two data distributions(that of IFA and lE). We optimise both networks in an alternating manner (fixing one and updating the other)to solve the following min-max problem min max LwGAN (D, G)=-EIFALDGFA+EIRID(G(RD Bg Op (3) +BI(△—xD(FA川2-1)21, where, the first two terms represent the estimation of the wasserstein distance and the final term performs a gradient penalty to enforce the Lipschitz constraint, with penalty coefficient a TE is uniformly sampled along pairs of lE and IFA samples. This results in improved stability during training. Additionally, we impose gradient penalty [281, which has been shown to improve convergence of the WGan compared to gradient clipping. With this approach, our generator can learn to create solutions that are highly similar to real images and thus difficult to classify by d Thus, the overall loss of the CNn-wgan architecture is given by min max d,LWGaN(D, G)+A2LyGGIG)+LMSE(G), (4) where nI and a2 are weighting parameters to control the trade-off between the three components of the loss 3. Experiments 3. 1. Data acquisition and pre-processing Six OCt volumes were acquired from both eyes of 38 healthy patients on a Cirrus HD-OCT Scanner (zeiss, Dublin, CA)at a single visit. The scans were centred on the optic nerve head (ONH)and were 200x 200 x1024 voxel per cube, acquired from a region 6mm x 6mm x 2mm These scans were then registered and averaged to create the "ground truth"denoised image. The scan with the highest signal strength(as provided by the scanner software) was chosen as the Research Article VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6209 Biomedical Optics EXPRESS reference image, and the remaining five volumes were registered to it in two steps. First, a 2-D Elastix registration [29] of the en-face images addressed rotation and translation differences between the consecutive scans(x-y axis registration). Next, individual A-scans from the reference and moving scans were cross-correlated to find the translation that best aligned the scans (z-axis registration). The resulting volumes were averaged to remove noise. This process produced 76 denoised "Oct volumes. however 7 volumes were not used for further training or analysis due to inaccurate registration as recognized upon visual inspection, resulting in a cohort of 69 volumes from healthy eyes A second set of scans were also acquired from a single eye of 6 glaucoma patients using the same protocol i.e. six ONH-centred scans were acquired on the same visit. The denoised ground ruth was generated as described above. Note, however, that this set was not used in the training procedure and was only used to evaluate the method. 3.1.1. Training details The 69 OCT volumes were split into a training, validation and testing subsets, containing 51 7 and 1l volumes, respectively. In this way, B-scans from volumes in the hold-out test set were never seen during training. B-scans within 10 voxels of the edge of the volume were not considered during training since they often contained artefacts from the registration process Therefore, each denoised volume had 6 corresponding raw volumes, with 180 B-scans, resulting in 55080 B-scan pairs. Patches of size 176x 176 were extracted from these scans for training, and augmentation was performed by randomising the position of the patch(ensuring that each patch showed a portion of retina), and randomly flipping each patch horizontally. Pairs of patches underwent normalisation prior to training, as such: I-HT 5) where I and l are the original and the normalised B-scans respectively, and ur and or are the mean and standard deviation of the training set. This transformation was also applied to validation and hold-out set images. In our experiments, both networks were optimized using Adam algorithm [30], the hyper parameters of which were empirically set as cr =e-5, B1=0.5, B2=0.9. Mini-batch size was 4. The penalty coefficient was set as d =10 as suggested in [28] and the loss weighting parameters were empirically set as dI=le-3 and d2=2e-6. training wasperformedinTensorflow(https://www.tensorflow.org/)onaNextscalenx360m4server with 64 GB of raM, 800GB SSD, and an Nvidia Tesla k40 graphics Processing Unit(GPU withNvidiacuda(v9.0)andcu-dnn(v7.0.5)libraries(http://www.nvidia.com).Trainingwas monitored, and stopped when no further improvement was observed. Training of CNN-MSE and CNN-WGAN took approximately 5 hours and 22 hours respectively. Testing was performed on 1980B-scans(1l volumes x 180 Slices). While the network was trained on patches of images, the fully-convolutional architecture of the network means that images of arbitrary size may be used during testing, therefore entire B-scans where enhanced. Prior to further analytics, the enhanced images were scaled back to their original distribution by: I=(I×or)+ 3.1.2. Quantitative metrics We compare the performance of CNN-Mse and CNN-Wgan to two well-known image enhancement techniques: block-matching 3D(BM3D)algorithm [31 implemented using the py BM3D wrapper in Python, and double-density dual-tree complex wavelet transform ( DD CDWT)[32](which has shown more recent, promising results on oct data [33), implemented using Matlab(Http: //eeweb. poly edu/iselesni/waveletsoftware/dt2d. html). To fairly compare Research Article VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6210 Biomedical Optics EXPRESS the speed of the different methods, all were implemented on a Mac Book Pro with 2.9 GHz Intel Core 15, &GB 1867 MHZ DDR3. Performance was gauged using standard error metrics such as PSNR, structural similarity index (ssim), multi-scale structural similarity index (MS -SSIm) metrics, and MSE. SSiM was calculated for pixels considered to be part of the retina. These metrics were computed on a hold-out set of data from the healthy cohort and the entire glaucoma cohort. Statistical significance was determined using paired two-sided wilcoxon signed-rank tests [34] at p<0.05 Additionally, the impact of the proposed enhancement techniques on the visibility of structures of interest was gauged by having multiple independent experts manually annotate five surfaces the internal limiting membrane (ILM), the posterior border of the retinal nerve fibre layer (rnFl) the ganglion cell and inner plexiform layer(GCIPL), the inner nuclear layer(INL) and Bruch's membrane(BM) shown in Fig. 4(a). Manual annotations were obtained on 22 slices from the original volumes (2 from each volume of the test set), plus the same images enhanced by cNn-mse and cnn-Wgan. half of the slices were chosen from a central section of the volume(containing the cross-section of the ONh), while the other half were chosen at random from a peripheral section (lacking the ONH). The optic nerve head itself was omitted from the annotations. The slices were also repeated in the set and shuffled before being presented to the annotators With the assumption that retinal surfaces would be easier to identify with a higher degree of repeatability in clearer images, unsigned variability in the location of the surface location between observers (inter-observer), and for the same observer between multiple repeated scans (intra-observer), titled Average Annotation Difference(AAD)was calculated as follows: AAD ∑∑|1-2 (7) N×A Where l represents the surface location on the a column (or A-scan) of the nB-scan. The subscript on l indicates the first versus second observer(for inter-observer), or first versus second repeat of the annotation(for intra-observer). A three-way ANOVa was used to compare the effect of the annotator (or annotator pair, for inter-observer AAD), the enhancement technique and the surface annotated on intra- and inter-observer variability. statistical significance was then determined using a tukey-KI 3.1.3. Qualitative metrics We performed a mean opinion score test to quantify the ability of different approaches to reconstruct perceptually convincing images. Given six different representations of a B-scan unprocessed(Raw ) enhanced using CNN-WGAN, CNN-MSE, BM3D and DD-CDWT, and frame averaged, three experts were asked to rank the images from best to worst on three metrics I. Ability to discriminate structures(e.g. retinal layers, blood vessels)(I= most clear, 6 2. Accuracy with which the image represents what you might see at a cellular/microscopic level (1=most accurate. 6= least accurate) 3. Personal preference for use in clinical practice(1= most preferred, 6=least preferred The metrics were specifically chosen to understand the differences in perceived clarity (metric 1) accuracy(metric 2), and personal preference(metric 3) between various versions of the same scan. Each observer was simultaneously shown all 6 versions of each of 55 B-scans (5 from each of the 1 1 test set volumes), in a randomized order and asked to rank each of the three Research Article VoL 9, No. 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6211 Biomedical Optics EXPRESS metrics in turn. Inter-observer agreement was calculated using Cohen,s Kappa Score [35], and statistical significance between observers in their preference for CNN-MsE versus CNN-WGAN was calculated using a Binomial Test(with Bonferroni correction for multiple comparisons) 4. Results To show the denoising effect of the proposed networks, a representative B-scan from the optic nerve head of a healthy person within the test set is shown in Fig. 2, as it appears in the raw image(a), ground truth achieved through averaging(b), post-processing of the raw scan with CNN-MSE(C). CNN-WGan(d), BM3D(e), and DD-CDWT(f). The B-scans consisted of 200 A-Scans, each with 1024 pixels. While the scans were kept at this aspect ratio during processing for display purposes they were resized for better visualisation a b) Fig. 2. OCT Image from a healthy volume captured by Cirrus HD-OCT Scanner(a), and corresponding 6-fraine averaged image(b). The result of posl-processing of (a) with CNN-MSE(C). CNN-WGan (d), BM3D(e), and DD-CDWT(f). Three zoomed in colour coded sections are shown below each B-scan(best viewed in colour) As can be observed, the bm3d method (e)does not over-smooth the image, but still shows some residual noise, and has poor contrast. In comparison, dd-CDWT(f) introduces over Research Article VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6212 Biomedical Optics EXPRESS e Fig 3. OCT Image from a glaucomatous volume captured by Cirrus HD-oCT Scanner(a), and corresponding 6-frame averaged image(b). The result of post-processing of (a)with CNN-MSE(c), CNN-WGAN(d), BM3D(e), and DD-CDWT(f). Three zoomed in, colour coded sections are shown below each B-scan(best viewed in colour) SMoothing, resulting in loss of layer boundary distinction(particularly evident in the red and blue insets). The images produced by CNN-mse(c)also show signs of smoothing in that it attains homogeneity within each layer, but does not lose layer separation(as is the case for DD-CDWT), and layers can still be clearly distinguished. CNN-WGan (d) produces images most similar perceptually to the frame-averaged scans(b), in that it maintains layer texture and boundary clarity without introducing artefacts. Fig. 3 shows a representative scan from a glaucomatous volume. Even though no glaucomatous scans were used during network training, both CNn enhancement methods produce enhanced images with a high degree of accuracy for structural detail. In particular, the border tissue/Bruch's membrane configuration(green box), which is an important indicator of disc margin in glaucomatous volumes [36] is best preserved in the CNN-WGAN enhancement. Additionally, the two Cnn methods do not blur the edges between layers(blue box), as does dd-CDWT BM3D performs well in the horizontally flat region of the retina indicated by the blue box but is outperformed by the cnn methods in the other two highlighted regions Research Article VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6213 Biomedical Optics EXPRESS The average times required to process a B-scan by each method are shown in Table 1. The times reported are averaged over 200 B-scans, with the proposed methods(CNN-MSE and CNN-WGAN) processing all 200 simultaneously Table 1. Time required to process a single B-scan, averaged over 200 B-scans Method Time (seconds) BM3D 784 DD-CDWT 0.72 CNN-WGAN 0.68 CNN-MSE 0.68 The results show that the proposed solutions out-perform existing methods in terms of use. Additionally, the networks are highly parallelizable, and therefore lend themselves to further speed increases on more powerful systems, particularly those with access to a GPu 4.1. Quantitative results For quantitative results, we calculated the PSNR, Ssim, Ms-SSIM and mse between six different image types(Raw, BM3D, DD-CDWT, CNN-WGAN and CNN-MSE) and their corresponding frame-averaged scan The summary data are shown in Table 2. the best performing technique in each metric is shown in bold Table 2. Mean t standard deviation of the peak signal to noise ratio(PSNr), structural similarity ratio(SSIM), multi-scale structural similarity ratio(Ms-SSIM) and mean squared error (mse) for 1980 healthy B-scans and 1080 B-scans from patients with glaucoma, using the BM3D, DD-CDWT, and the proposed CNN-WGan and CNN-Mse networks. The best results are shown in bold. All pairwise comparisons(excluding SSIM on glaucoma images processed by BM3D and DD-CDWT) were statistically significant(p < 0.0001) Health ny PSNR SSIM MS-SSIM MSE Raw 2682±0.670.62±0.040.79±0.02136.67±22.43 BM3D 31.17±1270.74±0.040.88±0.0251.53±17.29 DD-CDWT31.21±1.190.75±0.040.89±0.025122±1592 CNN-WGAN31.83±1.210.77±0.030.92±0.0144.52±14.34 CNN-MSE 32.28±127078±0030.92±0.0140.28±13.44 Glaucona PSNR SSIM MS-SSIM MSE Raw 25.46±0.670.58±0.040.76±002188.32±52.53 BM3D 29.12±1.120.74±0.030.88±00383.34±44.76 DD-CDWT29.44±1180.74±0.030.89±0.0377.84±46.15 CNN- WGAN29.92±1.270.76±0.030.91±0.027285±4607 CNN-MSE 30.21±1120.78±0.030.92±0.0370.04±44.91 Research Article VoL 9, No 12 1 Dec 2018 BIOMEDICAL OPTICS EXPRESS 6214 Biomedical Optics EXPRESS We observe that the cnn-mse network delivers the best results on all metrics. with all pairwise comparisons showing statistical significance(p<00001)(excluding the comparison between SSIM on glaucoma images processed by BM3D and dd-CDwT, for which p >0.05) The CNN-MSE network is also able to greatly improve the quality of glaucomatous volumes despite having been trained using only healthy volumes. Since visual inspection and quantitative comparison of the enhancement techniques showed that the cnn-based methods were superior to both bm3d and dd-cdwt, these were excluded from further analyses While these results show that CNN-MsE is objectively better in terms of image metrics, it has previously been posited that such a network might overly blur images, which in this application might result in reduced layer boundary visibility. Therefore, we define a quantitative assessment based on the efect upon the end user. We asked three experts to annotate five surfaces (ILM RNFL, GCIPL, INL and BM) in raw images, and those having undergone enhancement with CNN-MSE and CNN-WGAN. Fig. 4(a) shows a section of B-scan, processed by CNN-WGAN with layer annotations indicated. Each observer annotated each image twice, and the average difference in location of annotations (AAD, defined in Equation 7), both between observers (inter-observer), and for the one observer between repeated scans (intra-observer)was calculated The intra- and inter-observer AAd for two of the surfaces are shown in Figs. 4(b )and(c) Layer boundaries Image Type M RNFL BM CNN-MSE GCIPL CNN-WGAN GCIPL ILM 61 5 9 2 1v2 IV3 2V3 1v2 IV3 2V3 Grader Grader Pa Fig. 4. a)Portion of a B-scan with surface annotations for five layers. Intra-observer(b) and inter-observer(c)annotation location difference(AAD), averaged over the columns of all annotated B-scans for ILM and GCIPL surfaces(mean t slandard error). Statistical significance at p<0.001 indicated by *. Results for the remaining layers are shown in 6 The results of intra-observer analysis show that compared to raw images, both networks significantly improved the repeatability of layer annotations for all three annotators (p<0. 001 this and all further comparisons in this section were determined via a three way avona with Tukey-Kramer post-hoc analysis). Additionally, this also occurred for all surfaces when controllin for grader(all p <0.001). Differences between the AAd metric were more pronounced for

(系统自动生成,下载前可以参看下载内容)