Google just deployed a massive neural net
to automatically categorize what’s in your photos. Some people
that it was very good, even able to pick out a camouflaged snake in a background. Or that it
could categorize concepts like “jump”.
If true, this would be a giant leap forward for image recognition. But I was skeptical. Sometimes people let other information about the photo leak in other ways, through its use on social media. Just by other people’s reports it’s hard to tell how much Google is using available textual information, or EXIF data (such as the time and place the photo was taken).
So I uploaded the last year or so of my photos on Flickr.
Some of these are photos that have been tweeted, or have descriptions, and others are just random photo data with automatically generated titles. The hope was to determine what factors Google is using here, or at least get some indications. It’s a small sample set, but it was worth trying.
Plus does not categorize images instantly. As I expected, images go into a queue and are categorized some time later. In the first hour, I saw no results for unlabelled images. After 24 hours some were recognized, and after 72 hours even more.
OCR: Tried “hagioscopic”: this word appears in one of my photos, but Plus did not find it. I tried other easily legible words, and to my surprise, OCR is not being done, at least, not within the first 48 hours. I wonder if it was deemed a privacy issue.
Event: “xoxo”: this is the name of a conference in Portland. At first I was impressed because Plus correctly identified three of my photos taken at this conference. I thought maybe that there was some kind of EXIF analysis combined with photos uploaded by other people. However, these are also the same three photos that come up in a search for “xoxo” in my photos on Flickr. This suggests that Google Plus is identifying photos already seen in a web crawl and using text associated there. Searching for another such word, “banh”, confirms it.
Service: I see no evidence that Google is using words obtained from Tweets linking to the images. However it does seem to use Flickr.
Genre: “sunset”: I have a number of sunset photos. After 24 hours it only found one, but later found one more. These photos are not associated with the word “sunset” on Flickr. So this is some for-real image analysis. The other sunset photos did not include a flat horizon, or had other confounding items in the foreground. Similar results for “sky”, which identified photos with a lot of blue sky.
Location: “vancouver”, “portland”, “washington” all worked well. As far as I can tell it got this information from EXIF data. For some reason it fails to show results for “california” and “san francisco”. I wonder if the neural net has been inadvertently trained to ignore results from that area, if it gets a lot of those in its training data.
Subject: “blossom”: This should be easy as it’s associated with a color and genre of photo. It picked out two photos which are associated with the word ‘blossom’ on Flickr, and two which are not. One of these is absolutely correct, and seems to be a result from image analysis (it’s relatively simple though, pinkish blobs on blue.) The other is more confusing. It’s a white board which I thought had an interesting pattern of wet leaves on it. Presumably something about this photo - the blobbiness, the structure - triggers on “blossom”. It might be related to EXIF, as I’ve taken other “blossom” photos nearby.
Subject: “space needle”: I was a bit shocked it got this one. It’s not associated with those words on Flickr, exactly - I labelled it “Spaaaaaaaaaaaaaaaaaaaaaaaace Needle”. And searching for “needle” does not turn it up on Google Plus. So this is starting to show some real insight - a search for “needle” should not turn up pictures of the building; it might be using geolocation to help; and the Space Needle is a distinctive structure.
Subject: “leaves” I have a whole set of fall leaves, overlapping to be a texture. Unsurprisingly Plus could not separate them from their background. However, some “leaves” photos turn up on a search for “flower”.
Subject: “dome”: results are confusing here. Dome might be the sort of thing an image analysis algorithm could pick up on, with the pattern of struts. Plus found a dome, one that has been labelled as such on Flickr, but it also found a picture of a nearby structure which is made of struts but is not a dome. And it also missed several other pictures which are more clearly of domes, and that are associated with that word on Flickr.
What Google is using: associated words seen in a web crawl from identical or similar images; geolocation data; and the much-vaunted feature recognition. The feature recognition is impressive, but not quite as creepy as has been suggested.
What Google is not using: optical character recognition, Twitter feeds.
At least for the moment, Google doesn’t seem to have an AI that’s truly capable of understanding photo content, at least, not within 48 hours of an upload. And Google may have placed some arbitrary limits on what they do with photos (like avoiding OCR) to deal with some privacy issues.
Normally I’m enthusiastic about new technology, but a truly smart AI for understanding photos creeps me out a little. Few people think about all the information they are potentially releasing with a photo, and I’m sure nobody grasps what can be done with correlations between photos.