IMAGE PROCESSING, PATTERN RECOGNITION Skin lesion segmentation method for dermoscopic images with convolutional neural networks and semantic segmentation

Melanoma skin cancer is one of the most dangerous forms of skin cancer because it grows fast and causes most of the skin cancer deaths. Hence, early detection is a very important task to treat melanoma. In this article, we propose a skin lesion segmentation method for dermoscopic images based on the U-Net architecture with VGG-16 encoder and the semantic segmentation. Base on the segmented skin lesion, diagnostic imaging systems can evaluate skin lesion features to classify them. The proposed method requires fewer resources for training, and it is suitable for computing systems without powerful GPUs, but the training accuracy is still high enough (above 95 %). In the experiments, we train the model on the ISIC dataset – a common dermoscopic image dataset. To assess the performance of the proposed skin lesion segmentation method, we evaluate the Sorensen-Dice and the Jaccard scores and compare to other deep learning-based skin lesion segmentation methods. Experimental results showed that skin lesion segmentation quality of the proposed method are better than ones of the compared methods.


Introduction
Melanoma skin cancer is one of the most dangerous forms of skin cancer. It grows fast and causes most of the skin cancer deaths. For cancer in general and skin cancer in particular, early detection is a very important task, because doctors can help to stop the metastatic -one of the most popular causes of cancer death. One important method for diagnosing melanoma is the ABCD rule [1,2]. To improve the diagnostic quality by ABCD rule, it is necessary to segment skin lesions from dermoscopic images. Based on the segmented region, features of skin lesions will be extracted to evaluate the lesion.
The skin lesion segmentation problem plays an important role in medical image processing. Several methods were studied, including learning-based [3,4] and non-learning-based approaches such as thresholding and level set methods [2,5]. In this paper, we mainly focus on learning-based methods that become a hot research trend.
In recent years, deep learning is an efficient approach to solve image processing problem, including image segmentation. In that, the artificial neural networks (ANNs) [6] and convolutional neural networks (CNNs) [3] became the most powerful tools in image processing, pattern recognition, computer vision, and other fields of science, engineering, and technology [3]. CNNs are applied to solve many medical image segmentation problems such as segmentation of tumors, human organs, brain, and bone.
For skin lesion segmentation, there are several methods based on CNNs such as the method with fully convolutional-deconvolutional networks [3], the method using deep fully convolutional networks with Jaccard distance [4], and the method based on multistage fully convolutional networks [7]. All these methods are developed based on the fully convolutional networks (FCNs). Many works denoted that training on FCNs is complicated, and FCNs are not sensitive enough for segmenting small details and low-intensity regions as in the case of skin lesion [8]. Moreover, FCNs typically require a large amount of training data.
Some other models based on CNN such as the highresolution CNN [9] and combined deep convolution networks and unsupervised learning [10] are also proposed for segmenting skin lesions. However, the accuracy of the methods for skin lesion segmentation, especially for low-density regions of skin lesions is not high. The skin lesion segmentation methods based on dense deconvolution networks [11,12] were proposed. Although these methods are good enough for skin lesion segmentation, they cannot reliably segment low-intensity regions. Some other skin lesion segmentation methods based on CNN were proposed [13,14,15]. However, the methods cannot work on colorful images directly. Therefore, we need to convert dermoscopic images to grayscale images, process on separate channels, or normalize colors. Some methods are only applied after skin lesion images were preprocessed, such as remove hair, extract regions of interest (ROI), remove shadow and shading effects, etc. One of the effective CNN-based architectures for medical image segmentation is U-Net [16,8]. It is effective to segments the image by pixels instead of a whole. Therefore, it will improve accuracy for medical image segmentation. Moreover, U-Net-based methods usually require less memory for training with a small training set. Rashika et al. proposed a skin lesion segmentation method based on U-Net [16]. However, the method is not effective to segment small details and narrow bands. In other words, the method of Rashika is not effective for skin lesions with low-intensity regions.
To improve the accuracy of segmenting low-intensity regions of skin lesions, we combine U-Net with VGG-16 architecture. VGG-16 is one of the most effective CNN architecture for semantic segmentation and it can work well on low-intensity pixels [17]. In the proposed CNN architecture, VGG-16 will be used for replacing the encoder. Therefore, the proposed skin lesion segmentation method will work more effectively, even with lowintensity regions. In this paper, we propose a method to segment skin lesion with a CNN architecture based on VGG-16 encoder [18,19] for U-Net and semantic segmentation method. The proposed method does not require a large training data because it utilizes the advantages of U-Net. Therefore, it suits to computing systems without powerful GPUs. Moreover, the method works directly on colorful images and does not require any preprocessing tasks. It can segment low-intensity regions because of a combination of VGG-16 encoder. Further, the proposed method utilizes the advantages of the semantic segmentation method [20]. Semantic segmentation plays a vital role in the field of computer vision. It is one of the high-level tasks that paves the way towards complete scene understanding.
The rest of the article is organized as follows. Section II presents the proposed skin lesion segmentation method for dermoscopic images with convolutional neural networks and semantic segmentation. Section III presents experimental results and the comparison. Finally, Section IV concludes the article.

Convolutional neural network
In the field of deep learning, the convolutional neural networks are powerful and effective tools to process many problems in science, engineering, technology, business, management, and medicine. CNNs are a subclass of deep neural networks that are widely used for analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to process. They are stimulated by biological processes of connectivity of the neurons of humans and animals.
For the skin lesion segmentation problem, some approaches of deep learning were used: method using convolutional-deconvolutional networks, the method using fully connected convolutional networks, the method based on deep residual networks, etc. In the proposed method, we combine with the ImageNet network [19] to design the CNN architecture to process the skin lesion segmentation problem.
Before proposing the CNN architecture for the skin lesion segmentation problem, we explain some terminologies related to the proposed CNN architecture: Image input layer [19]: inputs images to a network and applies data normalization.
Convolution 2D layer [21]: applies sliding-convolutional filters to the input image. This layer convolves the input image by moving the filter along the input image by directions (vertical or horizon) and computes the dot product of the weights and the input image. Finally, a bias term is added. In the convolution layer, there are various components including filters and stride, dilated convolutions, feature maps, zero paddings, output size, number of neurons, learnable parameters, and number of layers. The learnable parameters will be updated during network training.
Batch normalization layer [22]: normalizes each input channel x i of the input image x across a mini-batch. The first layer normalizes the activations ˆi x of each channel by subtracting the mini-batch mean  B and dividing by the mini-batch standard deviation 2 B  . Then, the layer shifts the input by a learnable offset  and scales it by a learnable scale factor : The learnable parameters  and  will be updated during network training.
ReLU layer [23]: performs a threshold operation to each element of the input image x, where any values less than zero is set to zero: The ReLU layer does not change the input image size.
Max pooling 2D layer [24]: performs down-sampling by dividing the input image into rectangular pooling regions and computing the maximum of each region.
Softmax layer [25]: apply the softmax function to the input image x. The softmax function has the following form: where 0  y r  1 and a = (a 1 , ..., a k ) is a k-dimensional vector of arbitrary real values.
Classification output layer [25]: computes the crossentropy loss for multi-class classification problems with mutually exclusive classes. The classification output layer usually follows the softmax layer.

Skin lesion dataset
For the dataset of dermoscopic skin lesions images, we use the International Skin Imaging Collaboration (ISIC) 2017: https://www.isic-archive.com. The total size of the dataset is 5.4GB and it includes about 2000 dermoscopic images with given ground truth segmented by dermoscopic experts and super-pixel masks. The ID of images has the form: ISIC_00xxxxx. The ground truth for segmentation has the same ID as the corresponding image ISIC_00xxxxx_segmentation. Moreover, it also provides a validation set with 150 images and a test set with 600 images. All dermoscopic images of the dataset are stored in RGB-colour and the JPEG format. Ground truth and super-pixel masks are stored in the PNG format. The proposed CNN architecture for skin lesion segmentation includes five max pooling layers (and five max unpooling layers). General CNN architecture is presented in Figure 1.
In fig. 1, the blocks P1-P5 use max pooling layers and the blocks U1-U5 use max unpooling layers.
As we mentioned above, the proposed CNN architecture used VGG-16 encoder with the U-Net architecture. The algorithm for skin lesion segmentation method for dermoscopic images with semantic segmentation and CNN is presented in Algorithm 1.
Algorithm 1. The skin lesion segmentation method for dermoscopic images with semantic segmentation and convolutional neural network.

Input:
The input dermoscopic images of skin lesion v. Output: The segmented skin lesion image u.

Function u=SemanticSegCNN (v)
Step 1: Pretrain the proposed model on a training set.
Step 2: Implement the semantic segmentation method.
Step 3: Improve segmentation quality:  Apply Gaussian filter.  Fill the holes and filter out small segments.  Compute the scores to assess the segmentation quality of the proposed method.

End.
To suit to weak computing systems, in the first step, we only need to implement the training process on a small number of patterns of a dataset.
We implement the steps outlined in Algorithm 1 for improving the deep learning model for accurate delineation of skin lesions in dermoscopic images. After pretraining data, we implement the semantic segmentation method proposed by Brostow et al. [20]. The detail of the semantic segmentation method in the second step can be found in the work [20].
We note that to implement the semantic segmentation method in MATLAB, we use semanticseg function. To apply Gaussian filter after segmentation, we use imgaussfilt function, and to fill the holes and filter out small segments, we use imfill function with default settings.

Experimental results and discussions
We implement the training process and the proposed skin lesion segmentation method on MATLAB 2018b. The configuration of the computing system is Windows 10 Pro with Intel Core i5, 1.6 GHz, 4GB 2295 MHz DDR3 RAM without GPU. If the training process is implemented on computing systems with powerful GPU, performance will be better. Our proposed method suits for both training on CPU and GPU.

Image segmentation quality assessment metrics
For the segmentation task, we use the Sorensen-Dice and the Jaccard metrics to assess quality [26,27].
Let consider that X is segmented regions that we need to assess the quality, Y is the corresponding ground truth.
Sorensen-Dice similarity [28] is computed as follows: where |ꞏ| denotes the set cardinality (the number of elements of a set). The value of the Sorensen-Dice similarity metric is between 0 and 1 (or 0% to 100%). The higher the Sorensen-Dice value, the better the segmentation result. Jaccard similarity [28,27] related to the Sorensen-Dice similarity: The range of the Jaccard value is in [0, 1]. The higher the Jaccard value, the better the segmentation result.
Accuracy (%) [26] measures how well a binary segmentation method correctly identifies or excludes a condition: To evaluate Dice and Jaccard scores, we need the ground truth. All ground truths are given in the ISIC dataset. These ground truths were segmented by experienced dermatologists.

Image Datasets
We use the dermoscopic images of skin lesions of the ISIC dataset for the challenge of 2017. All images are of high definition. To suit processing on our computing system, we resize all images to standard size 256 × 256 pixels and store them in PNG format.
We select 20 images to present the visual results of the proposed skin lesion segmentation method. Moreover, the acquired results for a test set are used for comparison. Fig. 2 shows all selected images (20 images) for testing. All images used for the test are colorful images. Our proposed method works directly on colorful images without converting to grayscale or extracting separate channels.
All selected images used for the segmentation task are original without any preprocessing tasks. We only resize dermoscopic images to standard size 256 × 256 pixels. As can be seen in Fig. 2, these images include shading effects, hairs and color intensity of skin lesion is different together varying from low-intensity to high-intensity.
For the training process, we use the proposed CNN architecture with the stochastic gradient descent with momentum [29]. Otherwise, we use the configuration: initial learning rate is 0.001, max epochs -200, minibatch size -32. The number of iterations is the same as max epochs.
The training accuracy is presented in Figure 3. As can be seen, with iteration under 90, accuracy increases very fast. After that, it increases slower. With the above setting, the accuracy of our training result is 95.79 % after 200 iterations. This is a very good result, especially for the small size of the training data.
The segmented results by the proposed method are presented in Fig. 4 -5. Fig. 4 is for black-white segmentation. The white region denotes the skin lesion. Figure 5 presents segmented results on the dermoscopic images of skin lesions. We must notice that the red border region is segmented by our proposed method, the green border region -by the given ground truth (is segmented by experienced dermatologists). As can be seen, the difference is very small.
The proposed skin lesion segmentation method has some advantages: although we only train on small data, the proposed method still works effectively; it can work directly on colorful images and no need to convert to grayscale or process on separate channels; no need to remove hairs; no need to extract the regions of interest (ROI). Otherwise, the proposed method can segment lowintensity regions of skin lesions well. The other consid-ered methods usually use image enhancement algorithms before processing .   ISIC_0000000 ISIC_0000001 ISIC_0000002 ISIC_0000003 ISIC_0000006 ISIC_0000007 ISIC_0000008 ISIC_0000009 ISIC_0000010 ISIC_0000011   ISIC_0000013 ISIC_0000014 ISIC_0000015 ISIC_0000016 ISIC_0000017 ISIC_0000018 ISIC_0000019 ISIC_0000020 ISIC_0009942  Tab. 1 shows Accuracy, Sorensen-Dice, Jaccard, Sensitivity, and Specificity scores for a set of 20 selected images from the ISIC dataset segmented by the proposed skin lesion segmentation method. Almost of Sorensen-Dice scores are higher than 0.9 and most of Jaccard scores are higher than 0.8. This is an impressive result. The average Dice score is 0.92 and the average Jaccard score is 0.86.
Tab. 2 presents a comparison of Accuracy, Dice, Jaccard, Sensitivity, and Specificity scores of the proposed method with other skin lesion segmentation methods based on deep learning. As can be seen that Accuracy, Dice, and Jaccard scores of the segmented result by the proposed method are the highest. Hence, our proposed method can compete with other state-ofthe-art methods for skin lesion segmentation .   ISIC_0000000 ISIC_0000001 ISIC_0000002 ISIC_0000003 ISIC_0000006 ISIC_0000007 ISIC_0000008 ISIC_0000009 ISIC_0000010 ISIC_0000011   ISIC_0000013 ISIC_0000014 ISIC_0000015 ISIC_0000016 ISIC_0000017 ISIC_0000018 ISIC_0000019 ISIC_0000020 ISIC_0009942 ISIC_0000000 ISIC_0000001 ISIC_0000002 ISIC_0000003 ISIC_0000006 ISIC_0000007 ISIC_0000008 ISIC_0000009 ISIC_0000010 ISIC_0000011   ISIC_0000013 ISIC_0000014 ISIC_0000015 ISIC_0000016 ISIC_0000017 ISIC_0000018 ISIC_0000019 ISIC_0000020 ISIC_0009942   For execution speed, it takes around 8 hours to complete the training process. This is a normal result to compare to other training methods. The training task is always heavy and takes a lot of time. With given pretrained data, our proposed method only takes less than 1 second to complete the segmentation task.

Conclusions
In this paper, we proposed a CNN architecture for skin lesion segmentation for dermoscopic images based on convolutional neural networks and a skin lesion segmentation method based on that proposed CNN architecture and semantic segmentation. Our proposed method can work effectively even with a small size of training data. It gives a very good and impressive result, but no require any preprocessing tasks, such as hair removal, ROI extraction, image enhancement, etc. Our proposed method is good enough to compare to other state-of-theart methods for skin lesion segmentation.
In future work, we can apply some preprocessing tasks such as image inpainting algorithms [31,32] to remove hair before applying the segmentation task. This is necessary to increase the accuracy of both training tasks and the segmentation task.