Model Details

Data Preparation

The chart above shows the steps taken to create the training and evaluation data for our model

We began by training a model to detect logos in general (no brand classification) with OpenLogo & Logo3kdet datasets

We applied the resulting model to OpenLitterMap images to received cutouts of potential logos

We then grouped the logo cutouts using hierarchical clustering, named the clusters by brand and corrected mistakes

We sought out additional examples of underrepresented brands using Google Vision APIs (logo & text recognition) and tags provided by openlittermap users.

We then cut classes with less than 20 examples, split into train, val and test sets, and further rebalanced class representation through augmentation

Clustering & Class reblance

The image on the right shows the dendrogram produced by our clustering process. Hierarchical clustering works by initializing each example in it’s own group and then slowly joining each to the closest group until a distance limit is met. A core benefit of this approach is you do not need to pre-determine the number of clusters as in other techniques like with K-means.

We started by converting each image to a vector representation using pre-trained embedding models (imgbeddings & VGG16). We could then begin the clustering process. We elected to use cosine distance as our distance metric. This ignores the scale of the vector and only pays attention to the direction or angle.

Here we can see the impact of our class rebalancing. The data set is naturally imbalanced which could cause our model to overly focus on the majority class. We sought added additional training examples and applied augmentation (flip, rotation, blur, noise, brightness) to increase the representation of minority classes.

Count of Annotations by Brand ID Before

Count of Annotations by Brand ID After

Model Evaluation

The core evaluation metric was mean average precision (mAP)

Calculate the area under the precision recall curve for each class (average precision) then average these metrics to create an overall measure of effectiveness.

Since our interest was only in determining brand presence, we did not mind if the predicted labels did not align perfectly with the ground truth. We determined mAP with a 50% IOU (intersection over union) was sufficient,

We obtained a final mAP of 68%.

Precision Recall Curve

Precision, recall and mean average precision by class

Model Evaluation:

Examples

Success

Failure

Confusion matrix

The Confusion Matrix shows that performance remains relatively

uniform for all brands with the exception of Amstel and Budweiser

While there might be instances that are missed, the model seldom confuses one brand for another.

The model performance has little effect on a user’s perception of the brands. For example, when the model

flags Corona and Heineken as the worst polluters, we are certain it is not mistaken for“Miller” or “Amstel”.