dSPIC: a deep SPECT image classification network for automated multi

您所在的位置：网站首页 › spect骨扫描 › dSPIC: a deep SPECT image classification network for automated multi

dSPIC: a deep SPECT image classification network for automated multi

2024-07-10 16:00| 来源: 网络整理| 查看: 265

In this section, the used dataset of whole-body SPECT bone scans is outlined, followed by a description of the data pre-processing and the developed deep classification network.

Dataset

The whole-body SPECT images used in this work were collected from the Department of Nuclear Medicine, Gansu Provincial Hospital in 2018. In the process of examination, patients were intravenously injected with radionuclides 99mTc-MDP (740 MBq), which were then acquired after about three hours by using a Siemens SPECT ECAM imaging equipment outside the body of the patients.

Patients with bone metastasis, arthritis or both of them are considered in this study, consisting of 181 female patients and 203 male patients. Figure 1 provides the distribution of patients with respect to gender and age.

Fig. 1

Distribution of patients included in the dataset of whole-body SPECT images. a Gender; and b Age

Full size image

Generally, SPECT imaging process outputs two images (i.e., the anterior- and posterior-view image) for every examination and each image is stored in a DICOM (Digital Imaging and Communications in Medicine) file. The collected 768 whole-body SPECT images from 384 patients fall into four classes of concern, i.e., the normal (n = 334, ≈43.5%), metastatic (n = 174, ≈22.7%), arthritic (n = 252, ≈32.8%), and metastatic & arthritic (n = 8, ≈1.0%). The lesion distribution shown in Fig. 2 reveals that the vertebra, rib, and femur are the top three skeletal areas where bone metastasis frequently occurs and arthritis often presents in knee joint.

Fig. 2

An illustration of lesion distribution in the selected whole-body SPECT images. a Bone metastasis with 907 lesions of 182 (174 + 8) images; and b Arthritis with 599 lesions of 260 (252 + 8) images

Full size image

We can see from the lesion distribution in Fig. 2 that, in general, there are more than one lesion in an image. Moreover, eight whole-body SPECT images in the metastatic & arthritic class include metastatic and arthritic lesions simultaneously. The objective of this work is to develop a CNN-based classification method for multi-disease, multi-lesion diagnosis with whole-body SPECT images.

Overview

Figure 3 shows the overall process of automated diagnosis of diseases in whole-body SPECT images by using CNN-based classification network, consisting of two main stages.

Stage 1: Data preprocessing including intensity normalization and data augmentation is utilized to first keep the acquired varying intensity of radiopharmaceuticals within a fixed interval and then generate more samples of SPECT images. This can facilitate the CNN-based model extract more rich features from ‘big data’ of samples.

Stage 2: A self-defined end-to-end classification network, dSPIC, extracts hierarchical features from the augmented data of SPECT images and classify the high-level features into one of the four classes, i.e., the normal (N), metastatic (M), arthritic (A), and metastatic & arthritic (M&A).

Fig. 3

Overview of the proposed CNN-based SPECT image classification method consisting of data preprocessing, feature extraction and feature classification

Full size image

In the subsequent sections, the data processing methods and the self-defined classification network, dSPIC, will be elaborated.

Data preprocessing

SPECT imaging with 99mTc-MDP often demonstrates intensive radiopharmaceutical uptake in bone with a large-mineralizing surface area (e.g., spine) compared to the shafts of long bones [29]. As illustrated in Fig. 4, the large variability in intensity of radiopharmaceuticals makes SPECT images significantly different from the natural images in which the value of pixel ranges from 0 to 255. To mitigate the effect of the varying intensity on feature extraction and representation, in this work, each DICOM file will be self-adaptively normalized to keep every element within a fixed interval according to its maximum and minimum of intensity.

Fig. 4

An illustration of the maximum of uptake intensity from all SPECT images in the original dataset

Full size image

For xi (1 ≤ i ≤ m × n) denoting the intensity of the i-th element in a SPECT image with size of m × n, let x_max (x_min) be the maximum (minimum) of intensity in this image, a normalized value xN can be calculated using the min–max normalization method as follows.

$$x_{N} = \frac{x - x\_\min }{{x\_\max\, - \, x\_\min }}.$$ (1)

For the whole-body SPECT images used in this work, m = 256 and n = 1024. The normalized SPECT images are organized into dataset D_1. The subsequent data augmentation is conducted on the samples in dataset D_1.

It is widely accepted that the classification performance of CNN-based models depends on the size of dataset, with high classification accuracy always corresponding to the large dataset. For that reason, we harvest more samples of images by augmenting dataset D_1 with the parameter variation and sample generation techniques. A concomitant effect of data augmentation is to improve robustness of CNN-based model for coping with the patient-related artifacts during imaging.

Data augmentation using parameter variation

For a point (xi, yi) in the given image X, we can calculate its corresponding point (xo, yo) in the mirror counterpart XM according to Eq. 2.

The outputted points in images XT and XR obtained via translating X by ± t pixels in horizontal or vertical direction and rotating X by ± r degrees in left or right direction can be calculated according to Eqs. 3 and 4.

$$\left[ {\begin{array}{*{20}c} {x_{o} } \\ {y_{o} } \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} { - 1} & 0 & w \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{i} } \\ {y_{i} } \\ 1 \\ \end{array} } \right],$$ (3) $$\left[ \begin{gathered} x_{o} \hfill \\ y_{o} \hfill \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{i} } \\ {y_{i} } \\ \end{array} } \right] \pm \left[ {\begin{array}{*{20}c} {t_{x} } \\ {t_{y} } \\ \end{array} } \right]$$ (4)

The values for t and r mentioned above are experimentally determined in this work. Now, we obtain a new augmented dataset D_2 that is outlined in Table 1. We can see from Table 1 that all diseased classes have been augmented while the normal class keeps unchanged.

Table 1 The augmented dataset D_2Full size tableData augmentation using sample generation

Generative adversarial network (GAN) [30] as one of the most emerging deep learning techniques can be used to generate new samples with the given images. The generated samples have entirely different distribution from the original ones. Deep convolutional generative adversarial network (DCGAN) [31] is the recent innovation of GAN. We apply DCGAN to generate samples with images in dataset D_1 and organize these generated samples in dataset D_3 (see Table 2).

Table 2 The augmented dataset D_3Full size tableSupervised classification network dSPIC

In this work, we self-define a deep SPECT Image Classification network (dSPIC) for automated diagnosis of diseases of concern. Table 3 outlines the network architecture of dSPIC, consisting of seven weight layers (i.e., the convolutional and fully connected), one added layer, and one Softmax layer.

Table 3 The architecture of the self-defined dSPIC networkFull size tableConvolutional layer

This layer uses a linear filter to produce the feature maps. A total of five convolutional layers are included in dSPIC, denoting as . The size of convolutional kernel and stride keeps on decreasing while the number of channel increasing except from the one of the last convolution layer. Every convolutional layer has a group of filters with different kernel_size. In the first convolutional layer, the input 256 × 1024 SPECT image is convolved with each filter of 11 × 11 to calculate a feature map made of neurons, which is followed by a pooling layer. Since the local connectivity of convolutional operation, dSPIC can learn filters that maximally respond to a local region (e.g., lesions) of the input, thus exploiting the spatial local correlation of the input [32]. Similarly, the subsequent convolutional layers take the feature maps of immediately previous layers as inputs to convolve with each filter.

Pooling layer

This layer completes the downsampling operation that is typically applied after a convolution layer. Max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively. The similar information from the neighborhood of a receptive field covered by a filter will be captured through outputting a dominant response within this local region, enabling the input be invariant to geometrical distortions. Specifically, the max pooling is used in dSPIC to partition an input image or feature map into a set of sub-regions with size of 3 × 3, and output the maximum value for each such sub-region.

The added layer

We introduce attention mechanism and residual module [33] into the network to improve dSPIC focusing on those more important areas (i.e., lesions) on the feature maps by considering the important information, or reducing the training parameters and time. As depicted in Fig. 5, the attention module consists of two sequential sub-modules, i.e., the channel attention module and spatial attention module.

Fig. 5

The attention module used in dSPIC, consisting of a channel sub-module and a spatial attention sub-module

Full size image

The channel attention module in Fig. 5 produces a 1D output F′ for an input of 2D feature map F. The vector F′ will be fed into the spatial attention module to obtain a refined 2D feature map M. Formally, M is calculated according to Eq. 6.

$${\mathbf{M}} = f_{S} \left( {f_{C} ({\mathbf{F}}) \otimes {\mathbf{F}}} \right) \otimes {\mathbf{F}},$$ (6)

where $\otimes$ is the element-wise multiplication; and fC and fS is the channel and spatial function, respectively.

In detail, the channel attention F′ = fC and spatial attention M = fS are calculated according to Eqs. 7 and 8.

$$f_{C} ({\mathbf{F}}) = \sigma \left( {MLP\left( {AvgPool({\mathbf{F}})} \right) + MLP\left( {MaxPool({\mathbf{F}})} \right)} \right),$$ (7) $$f_{S} ({\mathbf{F}}^{\prime } ) = \sigma \left( {f^{7 \times 7} \left( {\left[ {AvgPool({\mathbf{F}}^{\prime } );MaxPool({\mathbf{F}}^{\prime } )} \right]} \right)} \right),$$ (8)

where σ denotes the sigmoid function, MLP is the multi-layer perceptron, AvgPool (MaxPool) represents the average (max) pooling, and f 7×7 is a convolutional operation with the kernel size of 7 × 7.

The introduced residual modules is shown in Fig. 6 where two 3 × 3 convolutional layers and a ReLU layer after the first convolutional layer are added.

Fig. 6

The residual module used in dSPIC

Full size image

Given an input of 2D feature map F, the residual module will output a 2D output R, which is mathematically represented as follows.

$${\mathbf{R}} = \delta \left( {f^{3 \times 3} \left( {\delta \left( {f^{3 \times 3} ({\mathbf{F}})} \right)} \right) + {\mathbf{F}}} \right),$$ (9)

where δ is the ReLU function, and f 3 × 3 is a convolutional operation with the kernel size of 3 × 3.

The skip connection indicating by the identify mapping path in Fig. 6 enables the same output with the input, i.e., R = F.

Fully connected layer

A fully connected layer has a set of full connections to all activations in its previous layer. The activations can be computed with a matrix multiplication followed by a bias offset. There are two fully connected layers in dSPIC network to make non-linear combination of the selected features at the end of the network.

Softmax layer

The network output nodes use the Softmax function for the number of the unordered classes. For the case that an image contains the metastatic and arthritic lesions simultaneously, the outputs of the top-1 and -2 probability indicate the classes, respectively. A Softmax function is defined in Eq. 10.

$$f(x_{j} ) = \frac{{e^{{x_{j} }} }}{{\sum\nolimits_{i = 1}^{n} {e^{{x_{i} }} } }},$$ (10)

where f (xj) is the score of the j-th output node, xj is the network input to j-th output node, and n is the number of output nodes. In fact, all of the output values f (x) are a probability between 0 and 1, and their sum is 1.

Furthermore, the nonlinear function used in dSPIC network is the ReLU (rectified liner unit) function, which enables dSPIC to approximate arbitrarily complex functions. The input of a non-linear processing layer is the output of its immediate previous convolution layer. For a given input x, ReLU is mathematically defined as follows.

$${\text{ReLU}}(x) = \max (0,x)$$ (11)

The used optimizer in dSPIC is Adam (adaptive moment estimation) [34], which has been proved to be well suited for the problems with large-size data and parameters like whole-body SPECT images. Adam typically performs smaller updates for the frequent parameters and larger updates for the infrequent parameters. Let θt denote the parameter vector at timestep t, Eq. 12 provides the Adam’s update rule [34].

$$\theta_{{t + {1}}} = \theta_{t} - \alpha \frac{{\vec{m}_{t} }}{{\sqrt {\vec{v}_{t} } + \varepsilon }}\left| \begin{gathered} \vec{m}_{t} = \frac{{m_{t} }}{{1 - \beta_{1}^{t} }} \hfill \\ \vec{v} = \frac{{v_{t} }}{{1 - \beta_{2}^{t} }} \hfill \\ \end{gathered} \right.,$$ (12)

where α is the stepsize, ε is a constant; mt = β1mt–1 + (1–β1)∙gt denotes the biased first moment estimate, and vt = β2vt–1 + (1–β2)∙$\beta_{2}^{t}$ represents the biased second raw moment estimate. gt ← ∇θft(θt−1) denotes the gradients with respect to stochastic objective at timestep t; and β1, β2 ∈ [0, 1) are the exponential decay rates for the moment estimates.

To examine the effect of attention module and residual module on classification performance of dSPIC, two classifiers dSPIC-AM (attention module) and dSPIC-RM (residual module) will be evaluated separately in the experimental validation section below.

【本文地址】

dSPIC: a deep SPECT image classification network for automated multi

dSPIC: a deep SPECT image classification network for automated multi

今日新闻

推荐新闻