Detecting Ships from Aerial Imagery using Deep Learning

Using a ConvNet to detect the presence of ships in aerial images taken of the San Francisco Bay

Detecting Ships from Aerial Imagery using Deep Learning
Photo by MUTI on Dribbble

Satellite imagery provides unique insights into various markets, including agriculture, defense and intelligence, energy, and finance. New commercial imagery providers, such as Planet, are using constellations of small satellites to capture images of the entire Earth every day.

This flood of new imagery is outgrowing the ability for organizations to manually look at each image that gets captured, and there is a need for machine learning and computer vision algorithms to help automate the analysis process.

The aim of this ConvNet is to help address the difficult task of detecting the location of large ships in satellite images. Automating this process can be applied to many issues including monitoring port activity levels and supply chain analysis.

This model can be used in more complex frameworks such as the YOLOv3 model and Fast R-CNN models so that multiple ships in the port can be pinpointed with bounding boxes around them.


The dataset consists of image chips extracted from Planet satellite imagery collected over the San Francisco Bay and San Pedro Bay areas of California. It includes 4000 80×80 RGB images labeled with either a “ship” or “no-ship” classification.

Image chips were derived from PlanetScope full-frame visual scene products, which are orthorectified to a 3 meter pixel size.


  1. Numpy
  2. Pandas
  3. Scikit-image
  4. Matplotlib
  5. Tensorflow
  6. Keras
  7. Cainvas Notebook Server

Here’s How it Works

  • Our input is a training dataset that consists of N images, each labeled with one of 2 different classes.
  • Then, we use this training set to train a ConvNet to learn what each of the classes looks like.
  • In the end, we evaluate the quality of the classifier by asking it to predict labels for a new set of images that it has never seen before. We will then compare the true labels of these images to the ones predicted by the classifier.
  • The model and its weights are then saved for future uses.

We’ll be using the Cainvas platform which provides us with powerful resources over the cloud to deliver robust models which are also available for future deployment on EDGE devices.


Let’s get started with the code.

We’ll be using matplotlib to plot the images and the labels. Tensorflow and Keras are one of the most commonly used deep learning frameworks and we’ll be using them to create our ConvNet and engineer our model.

Loading the dataset

The images from the dataset are loaded into NumPy arrays, with labels [0,1] corresponding to the classes no-ship and ship. The data was loaded into NumPy arrays as data augmentation and upsampling/downsampling is easier to perform.

Exploratory Data Analysis

Here we’ll be performing a simple EDA of the dataset to determine whether the classes are imbalanced or not using a bar plot.


Bar Plot of distribution of images per class
Bar Plot of distribution of images per class

Data Augmentation

The images present in the ship class are augmented and then stored in the dataset, so that there is an equal representation of the classes.

The current ratio of classes is 1:3, meaning that for every image present in the ship class there are 3 images present in the no-ship class. This will be countered by producing 2 augmented images per original image of the ship class. This will make the dataset balanced.

Then the labels NumPy array is one hot encoded using to_categorical from Keras. This removes any unnecessary bias in the dataset, by keeping the class at equal footing, with respect to labels.

Splitting the data

Instead of using train_test_split the images and labels arrays are randomly shuffled using the same seed value set at 42. This allows the images and their corresponding labels to remain linked even after shuffling.

This method allows the user to make all 3 datasets. The training and validation dataset is used for training the model while the testing dataset is used for testing the model on unseen data. Unseen data is used for simulating real-world prediction, as the model has not seen this data before. It allows the developers to see how robust the model is.

The data has been split into –

  • 70% — Training
  • 20% — Validation
  • 10% — Testing

Model Architecture

Our model structure consists of four Conv2D layers including the input layer with a 2×2 MaxPool2D pooling layer each followed by a dense layer of 64 neurons and one dense layer of 128 neurons each with a ReLU activation function and the last layer with 2 neurons and the softmax activation function for making our class predictions.

Model: "sequential"
Layer (type)                 Output Shape              Param #   
zero_padding2d (ZeroPadding2 (None, 58, 58, 3)         0         
conv2d (Conv2D)              (None, 56, 56, 16)        448       
batch_normalization (BatchNo (None, 56, 56, 16)        64        
conv2d_1 (Conv2D)            (None, 54, 54, 32)        4640      
batch_normalization_1 (Batch (None, 54, 54, 32)        128       
max_pooling2d (MaxPooling2D) (None, 27, 27, 32)        0         
dropout (Dropout)            (None, 27, 27, 32)        0         
conv2d_2 (Conv2D)            (None, 23, 23, 32)        25632     
batch_normalization_2 (Batch (None, 23, 23, 32)        128       
max_pooling2d_1 (MaxPooling2 (None, 11, 11, 32)        0         
dropout_1 (Dropout)          (None, 11, 11, 32)        0         
conv2d_3 (Conv2D)            (None, 9, 9, 64)          18496     
batch_normalization_3 (Batch (None, 9, 9, 64)          256       
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64)          0         
dropout_2 (Dropout)          (None, 4, 4, 64)          0         
flatten (Flatten)            (None, 1024)              0         
dense (Dense)                (None, 64)                65600     
dropout_3 (Dropout)          (None, 64)                0         
dense_1 (Dense)              (None, 128)               8320      
dense_2 (Dense)              (None, 2)                 258       
Total params: 123,970
Trainable params: 123,682
Non-trainable params: 288

I defined checkpoints specifying when to save the model and implemented early stopping to make the training process more efficient.

I have used Adam as the optimizer as it was giving much better results. I have used binary cross-entropy as the loss function for our model since there are only 2 classes. I trained the model for 50 iterations. However, you’re free to use a number of different parameters to your liking as long as the accuracy of the predictions doesn’t suffer.

This is a function to plot loss and accuracy which we will get after training the model. The model was saved and the results were plotted using the defined function.


Loss/Accuracy vs Epoch

Detecting Ships from Aerial Imagery using Deep Learning : Loss/Accuracy vs Epoch
Loss/Accuracy vs Epoch

The model reached a validation accuracy of roughly 70% which is quite decent considering the size of the dataset and the fact that we are keeping the model lightweight for deployment.


I took a batch of 8 images from the test images and plotted them with the corresponding true and predicted labels after making the predictions using the model.

Here are some of the predictions made by the model :

Prediction: no-ship
Prediction: no-ship

Predicted: ship
Predicted: ship

Predicted: ship
Predicted: ship


In this article, I’ve demonstrated how a ConvNet built from scratch can be used to detect ships in ports from aerial imagery using deep learning. This simple use-case can be used as a module in more complex frameworks for achieving object detection where the ships can be detected and marked out from a single scene image.

Notebook Link: Here.

Credit: Ved Prakash Dubey

Also Read: Blood Cell classification using Deep Learning on Cainvas Platform