By combining voice and image, artificial intelligence should allow us to solve many problems and answer increasingly complicated questions. Of course, the use of a smartphone is necessary.
Published
Updated
Reading time: 2 mins
Rather than getting angry with your big fingers on the tiny keyboard, it will be enough, for example, to point to the sky and say: “what are these clouds?“. To aim at a bag that caught our eye and to say: “how much does it cost?” Or “where can you buy it?“The system will analyze the image, decode what was said and carry out the search with as many context elements as possible.
The function is currently deployed, in October, in the Google Lens application. At the beginning of 2025, it will be integrated as standard into the latest iPhones via an update (to activate it, simply press the photo button). The objective? Make searches more natural, like asking someone. We could be quite vague and say “what is this thing for?“That’s the whole point of combining voice and image. This allows us to better understand the context in which the request is made.
Another new feature: you can also ask your question on a video. Imagine, you have a broken device. There are lots of lights flashing. Well, just film them and ask”What is this fault?“. Here again, it will interpret the video, understand the question and its context, then retrieve the corresponding information from forums, blogs and websites. It is still easier to show your problem than to try to solve it. describe The function also arrives in Google Lens, but you must first activate it on Google Labs.
Alas, it doesn’t work with sound modulations. No need to ask.”why does my car make this funny noise?“This is the limit of the artificial intelligence models behind these tools. For the moment, they are only trained on images, videos and text, not yet on sound. Oddly, this is not a priority for tech giants But it will probably become so if they want machines to better understand the environment in which they operate.