OpenAI’s DALL-E creates plausible images of literally anything you ask it to – TechCrunch

DALL-E is OpenAI’s latest weird yet catchy creation, Which through a hasty summary can be called “GPT-3 for images”. This makes a picture, photo, render or whatever method you prefer, whatever you can intelligently describe, from “a cat wearing a bow tie” to “a tutton is a decon radish in a dog”. But do not yet write a glossary of stock photography and illustration.

as always, Of OpenAI The description of its invention is quite readable and not overly technical. But it is a bit relevant.

What researchers Built with GPT-3 There was an AI that, given a hint, would try to produce a plausible version of what it describes. So if you say, “A story about a child who finds a witch in the woods,” it will try to write one – and if you press the button again, it will write it again, differently. And then, and again, and again.

Some of these efforts will be better than others; In fact, some will be barely relevant while others may be almost indistinguishable from something written by a human. But it does not produce garbage or serious grammatical errors, making it suitable for many types of tasks, as startups and researchers are now exploring.

DALL-E (a combination of Dali and WALL-E) takes this concept one further. Transforming images into text has been done by AI agents for years, but with ever-increasing success. In this situation the agent uses the understanding and context of the language provided by GPT-3 and its underlying structure to create a plausible image that matches a signal.

As OpenAI puts it:

GPT-3 showed that the language can be used to instruct a large neural network so that a variety of text formation tasks can be performed. Image GPT showed that the same type of neural network can also be used to generate images with high fidelity. We extend these findings to show that manipulation of visual concepts through language is now within reach.

What they mean is that an image generator of this type can be manipulated naturally, simply by telling them what to do. Sure, you can dig into its guts and find a token that represents color, and decode its path so that you can activate and change them, the way you can stimulate neurons of a real brain . But you won’t do this when your employees ask the illustrator to make something blue instead of green. You just say, “a blue car” instead of “a green car” and they get it.

So it is with DAL-E, who understands these signs and rarely fails in any serious way, although it must be said that even on the best view of a hundred or thousand attempts, the many pictures that arise Do, they are a little more… off. Of which later.

In the OpenAI post, researchers give abundant interactive examples of how the system can be said to make minor variations of the same idea, and the results are appreciable and often quite good. The truth is that these systems can be very fragile, as they accept DALL-E in some ways, and say what one might expect “a green leather purse like a Pentagon,” but “one A blue suede purse shaped like the Pentagon may produce “nightmare fuel. Why? Given the black-box nature of these systems, it is difficult to say.

Image courtesy: OpenAI

But DALL-E is remarkably strong for such changes, and reliably produces much of what you ask for. A squirt of guacamole, a sphere of zebra; A large blue block sitting on a small red block; Front view of a happy capybara, a isometric view of a sad capybara; and so on and so forth. You can play with all the examples on the post.

It also demonstrated some untold but useful behaviors, using intuitive logic to interpret requests such as asking it to make multiple sketches of the same (non-existent) cat, the original at the top and the bottom. With sketch. There is no special coding here: “We did not anticipate that this capability would emerge, and no modifications were made to the neural network or training process to encourage it.” This is right.

Interestingly, OpenAI, another new system of CLIP, was used in conjunction with DALL-E to understand and rank the images in question, although this is slightly more technical and difficult to understand. . You can read about CLIP here.

The implications of this ability are many and varied, not only that I will not try to visit them. Even OpenAI punishes:

In the future, we plan to analyze how models such as DALL · E relate to the economic impact on certain work processes and occupations, the potential for bias in model output, and the long-term ethical challenges inherent by this technique.

Right now, like the GPT-3, this technology is amazing and yet difficult to make clear predictions.

In particular, very little of what it produces actually seems “final” – that is to say, I can’t ask it to create a lead image for anything I’ve written recently and expected Am that this will be something that I can use without modification. Even a brief inspection reveals all types of AI (The specialty of Jenelle Shane), And while these rough edges will certainly buffer over time, it is far from safe, in the way that GPT-3 text cannot be sent unaided to a place of human writing.

The following shows the collection, it helps to generate many and select the top:

AI-generated images of dogs walking radishes.

The top eight out of a total of X were generated, with X moving to the right. Image courtesy: OpenAI

This is no different from the achievement of OpenAI. It is fabulously interesting and powerful work, and like other projects of the company there is no doubt that something more spectacular and will develop into interesting before long.