Many of us have, by now, seen a new crop of images online that look not-quite-right, or not quite believable (in the sense of slightly wrong images of famous people doing strange things); and many of us know, or have heard about, the explosion in AI imaging through programs like DALL-E, Midjourney, and an ever-increasing number of others. Some of us have friends or online friends who are producing images that have us intrigued.
I have such a friend in Jonathan Hoefler, and as a discussion of the ethics/dangers of AI ensued on one of his Facebook posts (and for the purposes of this article, unless noted otherwise, when I refer to “AI” I am referring specifically to the image-generating form of AI, not the text-generating or any other kind or use), I decided I’d better check it out for myself before arguing either for or against.
I was a bit afraid of getting into it because I was worried it might “imagine” better than I do, leaving me feel useless as an artist. I’d also heard it’s addictive, and I was worried about that too. Most of the online concern I’d encountered was centered on copyright, so I wanted to experiment and see how easy it might be to rip off another artist or photographer (which I will do in the 2nd post of this series). I also had some ideas of my own that I wondered if it could “help” me with. And finally, I do love the really fucked-up images I’ve seen, and I wanted to make some surreal, fucked-up images too.
AI is not stealing your images
I want to explain a bit about how these programs (or whatever they are) work. Their source material is billions (trillions?) of images on the internet. Initially it relies on tagging, otherwise it has no idea what the assemblage of pixels is supposed to represent. So let’s say it assembles a few hundred thousand images tagged #horse. These are photos and illustrations and paintings and sculptures from all different angles and sizes. From this it gets a general idea of horseness, which is different from the general idea of dogness or humanness or carrotness. It then uses that information to start collecting untagged images that it now identifies as #horse. If you’ve ever used the face recognition in Adobe Lightroom or any other image sorting software, you understand how at first you have to tag #Janet several times before it starts finding #Janet (and not-Janet!) for you in other photos.
BUT, contrary to many people’s belief, when you type “horse” into one of the AI programs it does not pull up one of its millions of photos of horses and serve it to you … it generates “horse” based on its training of what “horse” is. Similarly, it has “learned” about lighting, styles, techniques, mood, etc. based on the #hashtags that people use (yes, you’ve been training them all along), and it can recreate (more or less) those attributes when you ask it to, again, from scratch, based on its “understanding” of that. It can also approximate very famous people who have been tagged thousands of times.
AI is not intelligent
To test it out, I chose Midjourney, because it’s the one Jonathan uses. I had read that AI has trouble with hands, because #hands is not in common use, and I had seen examples showing how the AI seems to like adding fingers. It doesn’t know how many fingers humans have, so it just puts in a bunch.
My very first prompt was “Hands with carrot-fingers, holding a small white rabbit, moody dark, forest background”. It then generates 4 options; you can choose one or more to upscale, whereafter it adds detail and makes it larger. You can also create more variations based on one of the images, or create 4 more variations on the same prompt.
I was a bit puzzled. Where are my carrot-fingers? I spun again: I got 4 versions with no carrots (though the ears were starting to look a little carrotty), but more fingers and different positions for the rabbit. Again: more carrots, but none of them fingers. I could generate this many times, and each iteration would be slightly different, but none of them closer to what I wanted. I could add and subtract parameters to make the image more or less realistic, with different styles or lighting etc., but I might never get carrots for fingers.
So, this brings me to my second, and probably most important point. AI is not intelligent. NONE OF IT IS. AI should more accurately be called Massive Data Training, or something like that. It’s a system trained to recognize objects, styles, techniques, and even “concepts” to a very limited degree, but it doesn’t understand those things, or how they relate to each other in the real world. It’s a little bit smarter than a dog. You can easily train a dog to recognize the word “ball” and be able to apply that word to many kinds of “balls.” With effort you could train a dog to recognize the difference between the striped ball and the red ball in your house, but it would be unlikely to recognize the difference between all striped balls and plain balls; furthermore a dog will never understand that “stripes” are something that can appear on a shirt, or a wall, or that there is any relationship whatsoever between a striped shirt and a striped ball. AI is similar to that, but with a much, much larger “understood” data set.
“Striped ball on box in room.”
Here you can clearly see that it knows “ball”, “stripe[d]”, and “room”, as well as “in”, but having some trouble with “on”. Where to put the stripes, the box, or the ball is beyond it: it’s just applying them everywhere, in different combinations.
I’m friendly with Rodney Brooks, who, for 10 years was the director of MIT Artificial Intelligence Laboratory and then the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL). Not many people know as much about AI as he does, and I remembered him saying that a small child can outperform AI in understanding and intelligence. So I decided to do a little test. Imagine this: “A rabbit wearing red shoes, holding hands with a carrot wearing black shoes.” Got it? I then asked neighbours with children to get them to draw it.
The kids nailed it: they even got the red shoes on the rabbit and the black shoes on the carrot. They also intuited that holding hands is something nice that people do with friends: all of them are happy. Here’s how Midjourney did with the exact same phrase:
It’s an idiot.
AI is getting better at generating things realistically and in different styles; and soon it will put only 5 fingers on each human hand, and stop making the little weirdnesses and glitches—but by Rodney Brooks’ account, and by others I’ve spoken to who know a lot more about this than I do, it will not come nearer to “understanding”.
So what is it good for?
At the moment, AI is super good at making surprising combinations. Jonathan describes “fighting with it” and then resigning himself to giving into what it comes up with. Whatever he’s doing (and I have some ideas), the results have been fantastic.
For myself, after some experiments for this post, I started to encourage and embrace Midjourney’s ability to blow my mind. Instead of coming up with an idea of my own, I give it enough rope to hopefully hang itself. And it is totally addictive. To me it’s like playing slots: you put some stuff in, pull a lever and hope. Sometimes you’re rewarded and sometimes you’re disappointed, but I find it very, very hard not to make “just one more.”
But are these mine?
I am reasonably convinced, given that each time I generate the prompt I get something different and that when I upgrade an image it adds more random details (which sometimes I don’t like) and that I can upgrade the same image over again and it will add different small details, that these images are indeed unique in all the world. If you used the same prompts I do, you’d eventually get similar results, but not identical.
I feel protective of these images in the same way I would if I had found something, and I’m reluctant to reveal the coordinates of where I found it (i.e. my prompts). This is how I would feel if I were a collector of, say, bottlecaps (or anything): I’d be very proud of my ownership of a certain special bottlecap, and reluctant to tell another bottle cap collector where I found it.
I also think this has some similarities to photography—particularly of scenery. Tourists can line up all day and take the same picture from the same location and the photos will be similar, but not identical. Some people with knowledge and skill, or luck to find the right conditions, will take remarkably better photos of the same scene than others will. But that scene will always be there waiting to be “found”, if you know the location.
So I feel the same way about these images as I do about most of my photos. They’re mine, I like or even love them, but I take no particular pride in having made them—because I don’t feel I did make them. I found them: I held up the camera and pressed a button; I fed something into a machine and won a jackpot.
Garbage in, garbage out
Given that most people are idiots with poor taste, stuffed to the nuts with Marvel comics and fantasy TV, drunk on porn* and animé, it should come as no surprise that the vast majority of AI generated material reflects these interests of the general populace. All you need to do is look at the Midjourney showcase, see these Midjourney prompt examples, or just Google “Midjourney images,” to see what I mean.
(*Re: “porn”: Midjourney has a large number of banned words to circumvent the making of pornographic images. This doesn’t prevent the stereotypical renditions of “sexy” women with big tits etc., but it does prevent the otherwise inevitable tsunami of sex acts.)
Airy castles, princesses, warriors, kings, swords, futuristic cities, roided-up heroes and busty heriones, centaurs, pegasi, fairies, dragonflies … they’re all there in great abundance, piled fantasy-mountain high. This general aesthetic is so prevalent it’s actually difficult to get away from, and certain words are polluted beyond repair. If you want to avoid the fantasy look, you have to avoid some of these words. One of them is “hair”:
Nowhere in my prompt did I include woman, face, or anything relating to humans, but the word “hair” triggered the fantasy bias. Look what happened when I included the word “iron” in my prompt (the actual entire prompt was “iron eidelweiss”):
Then I experimented with just the word “King” for a prompt:
Midjourney also has a propensity for ornament. Given my aesthetic history you might think this wouldn’t bother me, but I like my ornament thought out and controlled. I have often inveighed against the mindless regurgitation of ornamental splorp, and Midjourney will barf it up, again without provocation, often in the “upgrade” stage of the process, thrown in as “detail.”
I have to assume that these AI programs are also learning from themselves—or rather from the people who use them—in which case this fantasy problem is only going to get worse as the algorithms get polluted with more and more of the same.
Furthermore, as “mistakes” get trained out of them, there’s a good chance that genuine surprises will be rarer. It won’t get smarter, it’ll get dumber and more predictable. That’s just my gut feeling, but who knows, really?
I’m still not sure what, if anything, I’m going to do with these. I have ideas, but as with all of my ideas, I’m not sure what is worthwhile following. Images like the one above I’m tempted to just print and frame, because I really, really like it. Maybe that’s enough.
In my next post about imaging AI I’ll look at the controversies surrounding it in the illustration/design/photography industries, and issues of copyright and ownership.
A friend and I were trying to get less biased results from Midjourney and it really resists efforts to avoid specific, culturally-hot stylizations. "Woman with dark hair and eyes" produces an army of clone girls (clearly between 10 and 18 years of age) with the same shape mouth, face, nose, light dusting of freckles, and..... LIGHT EYES.... The darkest eyes are medium caramel hazel. Similar attempts to prompt a male face (no race specified) with dark eyes gives you unlimited portraits of Timothee Chalamet from different angles and differing amounts of tanner applied. No variants ever stray from this weird idealized facial standard. Apparently the training the AI gets is subject to an enormous amount of bias, whether conscious or not.
Saw this article in PRINT magazine and came here to leave you a comment.
It's great to see an artist's take on MidJourney, and to read your conclusions about the strengths and weaknesses.
I've been exploring and fighting with MidJourney and, like your friend, giving in or giving up pretty regularly. But it can conjure some terrific stuff.
The bias is very pronounced. One day I decided to create images of gray hair - long, beautiful gray hair. After a few iterations, I thought I should add some ethnic diversity but no matter what prompts I added (except for African American) I got the same young, pert Caucasian faces. Hmmm.
Another weirdness: I was creating images of vegetables and fruit and I added "glamorous lighting" because I wanted very attractive food. Instead, I got images of a young, pert Caucasian girl. Hmmm.
Last weirdness: I was creating Valentine's Day images - plump Valentine's Day hearts - and MidJourney also included an image of a Corvette with flames coming out of the sides. ??? Hmmm.
Keep reporting your explorations, findings and conclusions.