Data Preparation
The chart above shows the steps taken to create the training and evaluation data for our model
We began by training a model to detect logos in general (no brand classification) with OpenLogo & Logo3kdet datasets
We applied the resulting model to OpenLitterMap images to received cutouts of potential logos
We then grouped the logo cutouts using hierarchical clustering, named the clusters by brand and corrected mistakes
We sought out additional examples of underrepresented brands using Google Vision APIs (logo & text recognition) and tags provided by openlittermap users.
We then cut classes with less than 20 examples, split into train, val and test sets, and further rebalanced class representation through augmentation
Clustering & Class reblance
The image on the right shows the dendrogram produced by our clustering process. Hierarchical clustering works by initializing each example in it’s own group and then slowly joining each to the closest group until a distance limit is met. A core benefit of this approach is you do not need to pre-determine the number of clusters as in other techniques like with K-means.
We started by converting each image to a vector representation using pre-trained embedding models (imgbeddings & VGG16). We could then begin the clustering process. We elected to use cosine distance as our distance metric. This ignores the scale of the vector and only pays attention to the direction or angle.
Here we can see the impact of our class rebalancing. The data set is naturally imbalanced which could cause our model to overly focus on the majority class. We sought added additional training examples and applied augmentation (flip, rotation, blur, noise, brightness) to increase the representation of minority classes.
Count of Annotations by Brand ID Before
Count of Annotations by Brand ID After
Model Evaluation
The core evaluation metric was mean average precision (mAP)
Calculate the area under the precision recall curve for each class (average precision) then average these metrics to create an overall measure of effectiveness.
Since our interest was only in determining brand presence, we did not mind if the predicted labels did not align perfectly with the ground truth. We determined mAP with a 50% IOU (intersection over union) was sufficient,
We obtained a final mAP of 68%.
Precision Recall Curve
Precision, recall and mean average precision by class
Model Evaluation:
Examples
Success
Failure
Confusion matrix
The Confusion Matrix shows that performance remains relatively
uniform for all brands with the exception of Amstel and Budweiser
While there might be instances that are missed, the model seldom confuses one brand for another.
The model performance has little effect on a user’s perception of the brands. For example, when the model
flags Corona and Heineken as the worst polluters, we are certain it is not mistaken for“Miller” or “Amstel”.