December 31, 2018 at 01:05PM
AI image processing for computer vision, facial recognition, image generation, and other applications saw marked improvements in 2018 (Andrew Tarantola/Engadget)
Computer scientists have spent more than two decades teaching, training and developing machines to see the world around them. Only recently have the artificial eyes begun to match (and occasionally exceed) their biological predecessors. 2018 has seen marked improvement in two areas of AI image processing: facial-recognition technology in both commerce and security, and image generation in — of all fields — art.
In September of this year, a team of researchers from Google’s DeepMind division published a paper outlining the operation of their newest Generative Adversarial Network. Dubbed BigGAN, this image-generation engine leverages Google’s massive cloud computing power to create extremely realistic images. But, even better, the system can be leveraged to generate dreamlike, almost nightmarish, visual mashups of objects, symbols and virtually anything else you train the system with. Google has already released the source code into the wilds of the internet and is allowing creators from anywhere in the world to borrow its processing capabilities to use the system as they wish.
“I’ve been really excited by all of the interactive web demos that people have started to turn these algorithms into,” Janelle Shane, who is a research scientist in optics by day and a neural-network programmer by night, told Engadget. She points out that in the past, researchers would typically publish their findings and call it a day. You’d be lucky to find even a YouTube video on the subject.
“But now,” she continued, “they will publish their model, they’ll publish their code and what’s even greater for the general creative world is that they will publish a kind of web application where you can try out their model for yourself.”
This is exactly what Joel Simon, developer of GANbreeder has done. This web app enables users to generate and remix BigGAN images over multiple generations to create truly unique creations. “With Simon’s web interface, you can look at what happens when you’re not generating pictures of just symbols, for example,” Shane points out. “But you’re generating something that’s a cross between a symbol and a comic book and a shark, for example.”
By providing access to these systems for people who might not otherwise know how to develop, program, train and operate complex neural networks, front-end applications like GANbreeder allow for the technology to be more widely adopted, more quickly. “You can interact with these algorithms get new artistic results out of them,” Shane argued. “Also kind of see where their limitations are, where they do well, where they fall flat.” That, in turn, enables the technology to mature more quickly.
“There’s a lot of room for playing in there,” Shane said. “But what Joel’s done is made it accessible to people who know nothing about programming, who definitely don’t have the computing power access to be able to train a model like this for themselves.”
— Jer Thorp (@blprnt) October 2, 2018
In the coming years, these GAN-based systems will find far more uses beyond making striking artistic images. Shane points to a wide variety of commercial applications — from music video production to digital texture generation — for BigGAN and its ilk. “There are lots of creative artistic endeavors that we take on,” she argues, “where we have this as a tool to give us a shortcut to making really nice textures, or as a kind of creative jumpstart, or just allowing us to get a different sort of effect that we haven’t been able to achieve before.”
“Some of these textures, images of things that are just coming out of this utilitarian BigGAN are so compelling, so texturally rich, I would love to see more artists be able to play with that,” Shane said. “I would love to see there be a movie using this GAN Punk aesthetic (this sort of aesthetic that you get out of these mistakes and problems that these GAN algorithm have…. melty clocks with way too many hands and illegible numbers), that would be very, very cool.”
This technology is not without its drawbacks, mind you. There is a very real possibility that scammers and trolls might use BigGAN to create counterfeit images and video, so-called Deep Fakes. But, Shane warns, “as we’re developing these image generating algorithms, we’ve got to also develop our capability to spot when an image has been generated, as opposed to an authentic picture that’s been taken.”
Digital artist Alex Reben, whose latest work leverages the BigGAN engine to generate artistic images, concurs. “In my own artistic practice, I am looking forward to some of the tools maturing and becoming more accessible,” he told Engadget. “While at the same time, I think AI will start becoming more apparent in people’s everyday lives, so discussions about implications of such technology are more important now than ever.”
In many ways, AI and machine learning are already becoming apparent in our day-to-day lives, especially in commerce. Specifically, retailers are turning to facial recognition systems to help them better target, market and sell their products to an increasingly harried and distracted shopping public.
“I see the recent advances such as deep learning technology for vision as one of the most profound technology leaps I’ve ever seen or come across,” Joe Jensen, Intel’s vice president for its internet of things group and general manager of its retail solutions division, told Engadget.
“We’ve got a partner in China that’s developed a vending machine that is just a glass door refrigerator and there’s a camera on the front,” he continued. The camera not only recognizes the shopper but also tracks the items that they remove from the case and bills their account accordingly. It’s essentially a miniature Amazon Go.
“It feels completely seamless from a customer perspective: You walk up, open the door, take what you want,” Jensen explained. “You can look at things and put them back, whatever looks good you take, then close the door and just walk away.” He points out that the machine costs barely half of what conventional vending machines do yet reportedly sells 40 percent more product than its traditional counterparts because of its ease of use.
Moving forward, Intel hopes to use similar, albeit anonymized, facial-recognition systems to expand this sense of seamlessness to other retail shopping situations. “We should be able to anonymously determine the few things about the shopper. What gender are they? How old are they?” Jensen queried. “With their [observed] size, what do we have in stock right now that we think would be interesting to a shopper like that?”
Associating biometric data with specific accounts isn’t nearly as important as using that data to understand the shopper’s mood and intentions — their “shopping mode” — Jensen argued. He points out that his behavior when shopping with his family (listlessly browsing through various racks of merchandise in an effort to kill time) is very different than when he is shopping for a specific item that he knows he’ll purchase (entering the store through the doors nearest the relevant department, walking directly to appropriate racks, and actively looking for items that match his size and style preferences).
Neither of these shopping modes actually need to know who he is specifically in order to extract useful marketing information. “Knowing it’s Joe isn’t what’s really relevant,” he said. But understanding the shopper’s intention based on their actions and body language could prove invaluable.
Jensen also points out that just a decade ago, this sort of system would have been impossible to deploy. “Trying to recognize a person or how many persons walked by [a security camera], that was a really difficult computer vision challenge 10 years ago,” he argued, but those sorts of capabilities are “almost freeware today.”
This rapid spread and normalization of advanced computer vision technologies is already having an impact on how we shop and how retailers market their wares. Jensen noted that in May of this year, Walmart quietly began rolling out a fleet of stock-monitoring robots in more than four dozen of its stores nationwide. These autonomous machines cruise the store’s aisles, scanning shelves as they pass. Should the drones spot an empty shelf, they alert human employees, who can quickly restock the missing items. Target is currently testing a similar shelf-scanning system in its stores as well.
“I think, as a retailer, the fundamentals of retailing haven’t really changed,” Jensen figured. “You want to delight your customers, to have products that they want, available to them where they are. I think what we’re going to see is AI technologies are going to enable retailers to do the fundamentals of retailing better.”
These sorts of advancements are only the tip of the AI iceberg. Even more capable machine-vision systems are already in development thanks to foundational research currently being done by IBM and its partners.
For example, one of the biggest obstacles in creating new AI systems, especially those dealing with visual media, is the need for massive training data sets. However, in November, a team of IBM researchers published their research into a new technique dubbed Delta-encoding.
This methodology allows AI systems to train for “few-shot” object recognition. “Essentially what it’s trying to do is to learn and model the sample space around our labeled items,” Dr. John Smith, manager of AI Tech for IBM Research AI at the Watson Research Center, told Engadget.
So, say we have a labeled picture of a cat. Rather than feed the system hundreds or thousands more labeled pictures of cats, the Delta-encoder measures the “distances around all the points vested in that category, all the different variants of ‘cat’,” Smith explained. “As opposed to the representation of the cats themselves.”
Once the system learns the Delta model for cats, researchers can introduce an unknown image — say of a hippopotamus — and the AI will “synthetically generate new samples around that the new ones that are given, which artificially create the training data for what we want it to learn,” Smith said. While this capability is still in early development, it could eventually help researchers and developers build and train more robust AI systems far more quickly than they can today.
But moving fast and rapidly designing AI won’t be worth much if researchers and developers don’t come to terms with existing issues such as the inherent bias within training data sets. To that end, IBM released in 2018 a pair of image data sets designed specifically to reduce the bias of systems trained on them: one, a million-picture-plus set built to help researchers combat bias in facial recognition; the other, a 36,000-image set with models “equally distributed across skin tones, genders, and ages.” Whether the company plans to leverage these data sets in its collaboration with the NYPD, which is reportedly developing an AI-backed facial recognition technology that would allow officials to scan security camera footage for suspects based on skin and hair color, remains unclear.
Art and commerce are just two areas within a galaxy of AI advancements that have taken place in 2018. Artificial intelligence and machine vision are revolutionizing the fields of medicine, transportation, manufacturing, design, science, health care, and law enforcement. This technology is no longer in the realm of science fiction; it’s already an integral part of fabric of modern life. So the next time you casually flip off a security camera at the mall, you can be sure that the computer system monitoring it recognizes that gesture and has probably taken offense.