One of the early lessons you pick up looking at product lifecycles is that some people hold out buying any new technology product or service longer than anyone else. You make it past the techies, the visionaries, the early majority, late majority and finally meet the laggards at the very right of the diagram (PDF version here). The normal way of selling at that end of the bell curve is to embed your product in something else; the person who swore they’d never buy a Microprocessor unknowingly have one inside the controls on their Microwave, or 50-100 ticking away in their car.
In 2016, Google started releasing access to its Vision API. They were routinely using their own Neural networks for several years; one typical application was taking the video footage from their Google Maps Streetview cars, and correlating house numbers from video footage onto GPS locations within each street. They even started to train their own models to pick out objects in photographs, and to be able to annotate a picture with a description of its contents – without any human interaction. They have also begun an effort to do likewise describing the story contained in hundreds of thousands of YouTube videos.
One example was to ask it to differentiate muffins and dogs:
This is does with aplomb, with usually much better than human performance. So, what’s next?
One notable time in Natural History was the explosion in the number of species on earth that occured in the Cambrian period, some 534 million years ago. This was the time when it appears life forms first developed useful eyes, which led to an arms race between predators and prey. Eyes everywhere, and brains very sensitive to signals that come that way; if something or someone looks like they’re staring at you, sirens in your conscience will be at full volume.
Once a neural network is taught (you show it 1000s of images, and tell it which contain what, then it works out a model to fit), the resulting learning can be loaded down into a small device. It usually then needs no further training or connection to a bigger computer nor cloud service. It can just sit there, and report back what it sees, when it sees it; the target of the message can be a person or a computer program anywhere else.
While Google have been doing the heavy lifting on building the learning models in the cloud, Apple have slipped in with their own CloudML data format, a sort of PDF for the resulting machine learning data formats. Then using the Graphics Processing Units on their iPhone and iPad devices to run the resulting models on the users device. They also have their ARkit libraries (as in “Augmented Reality”) to sense surfaces and boundaries live on the embedded camera – and to superimpose objects in the field of view.
With iOS 11 coming in the autumn, any handwritten notes get automatically OCR’d and indexed – and added to local search. When a document on your desk is photo’d from an angle, it can automatically flatten it to look like a hi res scan of the original – and which you can then annotate. There are probably many like features which will be in place by the time the new iPhone models arrive in September/October.
However, tip of the iceberg. When I drive out of the car park in the local shopping centre here, the barrier automatically raises given the person with the ticket issued to my car number plate has already paid. And I guess we’re going to see a Cambrian explosion as inexpensive “eyes” get embedded in everything around us in our service.
With that, one example of what Amazon are experimenting with in their “Amazon Go” shop in Seattle. Every visitor a shoplifter: https://youtu.be/NrmMk1Myrxc
Lots more to follow.
PS: as a footnote, an example drawing a ruler on a real object. This is 3 weeks after ARkit got released. Next: personalised shoe and clothes measurements, and mail order supply to size: http://www.madewitharkit.com/post/162250399073/another-ar-measurement-app-demo-this-time