Satellite/Ariel Images Classification and Segmentation

Satellite/Ariel Images Classification and Segmentation

Aerial and satellite imagery gives us the unique ability to look down and see the earth from above. It is being used to measure deforestation, map damaged areas after natural disasters, spot looted archaeological sites, and has many more current and untapped use cases. We understand that the enormous and ever-growing amount of imagery presents a significant challenge.

Figure 1: Satellite/Ariel Images
There are not enough people to look at all of the images all of the time. That’s why we are building tools and techniques to allow technology to see what we cannot. This project encapsulates a workflow for using deep learning to understand and analyze geo-spatial imagery. This project is released under an Apache 2.0 license and developed in the open on GitHub. This allows anyone to use and contribute to the project. It can also provide a starting point for others getting up to speed in this area.
This project work on performing semantic segmentation on aerial imagery provided by UC Merced Land Datasets. In this project, we’ll discuss our approach to analyzing this datasets. We’ll describe the main model architecture we used, how we implemented it in Keras and Tensor flow, and talk about various experiments we ran using the Uc Merced Land data. We then discuss how we used other open source tools built this project to visualize our results.
Semantic Segmentation and UC Merced Land Datasets
Figure 2: A ResNet FCN’s semantic segmentation as it becomes more accurate during training.
The goal of semantic segmentation is to automatically label each pixel in an image with its semantic category. Using UC Merced Land Datasets to create a semantic segmentation of high resolution aerial imagery. Part of the datasets had been labeled by hand with 21 21 class UC Merced land-use Datasets (RGB): (a) agricultural, (b) airplane, (c) baseball diamond, (d) beach, (e) buildings, (f) chaparral, (g) dense residential, (h) forest, (i) freeway, (j) golf course, (k) harbor, (l) intersection, (m) medium residential, (n) mobile home park, (o) overpass, (p) parking lot, (q) river, (r) runway, (s) sparse residential, (t) storage tanks and (u) tennis court. The datasets contains 2100 images and is divided into a development set, where the labels are provided and used for training models, and a test set, where the labels are hidden and are used by the client to test the performance of trained models.
Figure 3: UC Merced Land Dataset Classes
Fully Convolutional Networks
There has been a lot of research on using convolutional neural networks for image recognition, the task of predicting a single label for an entire image. Most recognition models consist of a series of convolutional and pooling layers followed by a fully-connected layer that maps from a 3D array to a 1D array of probabilities.
Figure 4: Neural network architecture for recognition
The Fully Convolutional Network (FCN) approach to semantic segmentation works by adapting and repurposing recognition models so that they are suitable for segmentation. By removing the final fully-connected layer, we can obtain a “fully convolutional” model that has 3D output. However, the final convolutional layer will still have too many channels (typically > 512) and too low a spatial resolution (typically 8×8). To get the desired output shape, we can use a 1×1 convolutional layer which squashes the number of channels down to the number of labels, and then use bilinear interpolation to up sample back to the spatial resolution of the input image.
Despite having the correct resolution, the output will be spatially coarse, since it is the result of up sampling, and the model will have trouble segmenting small objects such as cars. To solve this problem, we can incorporate information from earlier, finer-grained layers into the output of the model. We can do this by performing convolution and up sampling on the final 32×32, 16×16, and 8×8 layers of the recognition model, and then summing these together.
Figure 5: Fully convolutional neural network architecture for semantic segmentation
The FCN was originally proposed as an adaptation of the VGG recognition model, but can be used to adapt newer recognition models such as ResNets which we used in our experiments. One advantage of the FCN over other architectures is that it is easy to initialize the bulk of the model using weights that were obtained from training on a large object recognition dataset such as ImageNet. This is often helpful when the size of the training set is small relative to the complexity of the model.
Experiments and Results
We ran many experiments, and the following are some of the most interesting. Each experiment was specified by a JSON file stored in version control, which helped keep us organized and makes it easier to replicate our results.

  Overall agricultural airplane baseball beach buildings
Validation 89.3 88.5 87.1 89.9 84.1 88.6
Test 92.7 89.7 90.7 91.8 86.8 90.4


  chaparral dense residential forest freeway golf course harbor
Validation 88.0 86.9 86.9 88.2 87.1 83.1
Test 91.2 89.8 87.8 91.9 87.9 86.9


  intersection medium residential Mobile home park overpass parking lot river
Validation 89.9 84.1 88.6 86.9 86.9 88.2
Test 89.7 90.7 91.8 91.8 86.8 91.9


  runway sparse residential storage tanks tennis court
Validation 86.8 89.1 88.6 85.7
Test 89.9 93.1 91.6 88.9
Posted on: June 19, 2020