Sunday, January 17, 2016

If you think Google's/Facebook's image recognition is impressive, that's only the start.



I'm fairly appalled - almost creeped out - every time I upload a picture on Facebook, the site gives correct tag suggestions on almost everyone that's featured in the photo. Even then, I'm still not used to having the capability to use Google as a reverse image search engine, and that has been around for quite a bit. These are some pretty ground-breaking features that came about within the last 4-5 years. I would just sit there and try to imagine how a search algorithm could trace through photos and give return you a result. It's already hard enough for me to comprehend the search algorithms used by Google for just basic text-based search. There's so many factors like relevancy to keywords, domain names, exact names, or determining worthy sources. But apparently, that's only scratching the surface.

computer vision from http://www.mathworks.com

A recent Wired article calls the image/facial recognition computer vision. The identification process these search algorithms use are part of what's known as deep learning, a "breed", as Wired calls it, of artificial intelligence. Deep learning goes on to represent a branch of machine learning used to model high-abstract concepts. The article continues to talk about a historical 2012 image recognition competition for computers called ImageNet being won by the University of Toronto, introducing the use of deep neural nets, which is technology that uses mass collections of images to learn to identify another image. It sets up its own rules to find a result versus using human-influenced rules.

exmaple of a neural net from http://e-lab.github.io


The article features a more recent breakthrough. A team of researchers from Microsoft has found a way to expand on that concept. They recently won the next ImageNet competition with their new approach called the deep residual network, which is essentially their version of a complex neural net that spans 152 layers of mathematical operations. This is tremendous since most nets use 6-7 layers. Those few layers itself are often difficult tasks for programmers to have them communicate together within their networks. With 152 layers, the Microsoft researchers resolved the problem by skipping a signal across layers within a network that was deemed unnecessary and saves them for when they are needed later. The process alone allows the signal to be much stronger and span through more layers than any other network.

I'm impressed by Microsoft's findings. Their research is going to affect not only the future of image recognition, but also areas of A.I. such as speech recognition or language understanding. Even then, I can't even imagine myself where this deep residual network can possibly lead us.








4 comments:

  1. It is really interesting that you mention Microsoft's development in image processing. I actually heard of this site (http://how-old.net) last spring that Microsoft has been using to test what I think is the same technology. It basically tries to tell you how old you are based on the picture. It is kind of fun to play with but also shows an interesting implementation of the technology you were looking into.

    ReplyDelete
  2. I agree with you that facebook's facial recognition is creepy. In general I think it is cool and your discussion is a great review of neural networks.

    ReplyDelete
  3. I have a question about this development. If they were able to continue to improve these neural networks, could this tech be used instead of image recognition, but for recognizing the world around a robot? Would this make it possible for robots to 'see' and recognize not only objects around them but also be able to identify people? I feel like image recognition like this could potentially lead to a robot that sees and can identify the complete world around them. Anyone have thoughts?

    ReplyDelete
    Replies
    1. This makes sense if the robot sees through taking pictures or video. Can this technology work on moving images? I imagine it would work on individual frames, which might mean it would take up a lot of processing power/memory since that would be a lot to have going on essentially at once.

      Delete