As I
discussed in the previous part, a key philosophical question for the future of
AI image generation and art is the question of representation. Will future
iterations of MidJourney, DALL-E, Stable Diffusion, etc, just become ever more
proficient at translating complex prompts into accurate visual representations,
with a consequent loss of the wildness and "ghost in the machine"
type interventions, which for me are its main selling point? Or will the
technology retain, or even begin to coalesce around its own artistic
tendencies, ending up less of a tool and more of a genuine collaborator?
One artist
who appreciated more than most the impact of new technology on artistic practice
was Francis Bacon. In interviews he used the expression "deepening the
game" to describe the approach painters needed to take in response to the
advent of photography and film, which had during his lifetime thoroughly
challenged the role of painting in the visual arts. After all, if photography
was now the dominant medium for representing reality, what could traditional
painting, and especially figurative painting offer that went beyond mere
illustration? Bacon's answer to this challenge was to develop a style and
technique of painting that not only produced works of unsettling power, but
which also assimilated photography into their mode of production. He made no
secret of the fact that photographs - of his friends and lovers, from
newspapers and magazines, and even pictures of old masters like Velazquez'
portrait of Pope Innocent X - frequently formed the raw material for his own
images. While other artists of the time abandoned the figure and moved into
abstraction, Bacon took on photography and in a sense made the camera into a
collaborator.
Artificial
intelligence poses yet another challenge to artistic production on perhaps an
even greater scale to photography. It not only takes aim at the skills of the
digital artist or graphic designer, but puts into question the very notion of
artistic authorship. As such, I've spent a fair amount of my CPU time on
MidJourney attempting to apply some "Baconian strategies" to how I
use it and the types of images I've tried to make. Some of this has consisted of
pictures produced in Bacon's style - with a contemporary twist, while at other
times I've attempted to develop a more general approach to image generation in which
Bacon's concepts and insights on art are applied.
I should
say from the outset that each version of
MidJourney has a different take on Bacon's style which reflects the differing
levels of complexity in the coding and the way the developers have trained it.
Generally speaking, versions 3 and 4 will give you more abstraction, dissonance
and incoherence, whereas the different iterations of version 5 and the recently
released version 6 will offer more realism, better figurative representation
and more straightforward expressions of compositional coherence. As such,
producing Bacon-esque
images is not as straightforward as asking for so and so "in the style of
Francis Bacon". Indeed as you will see, Bacon's painting themselves must often
be brought in to the prompts to provide guidance and set examples for the AI to
follow, as well as moving between versions and using counterintuitive phrases
to excite more interesting results. Take for example the two images below
produced by version 6, both of which use the prompt "Francis Bacon painting of a dog sitting peacefully in front of a large
open window", with the addition that the one on the right also
includes a jpg file of Bacon's Dog
(1952) as part of the prompt.
Neither of these images really captures
much of Bacon's style. The most one could say is that some of the
"brushstrokes" for the surrounding rooms are vaguely reminiscent of
how Bacon would situate his figures perhaps in the 1940s and 50s. I also like
the way the flooring from Bacon's painting has been remixed into the picture on
the right. But the figure of the dog itself is just too well rendered, too
illustrative, and just, well, a bit dull. Switching down to V4 however gives us
a very different result. The picture on the left uses the exact same prompt
(including the image of Bacon's Dog) as the picture above on the right, but
gives us an entirely different style which in my mind much more characteristic
of Bacon's compositions and stark framing of the figure. The dog is perhaps still
too much of a direct representation but the whole thing is more obviously modernist
in execution. The translation of the window, however, has entirely lost its
realism, and is now only hinted at with the arrangement of line and colour in
the upper half of the picture.
Finally the image on the right adds a
longer description in near-natural language: "Francis Bacon painting of a dog sitting peacefully in front of a large
open window, twisted and distorted anatomy. Mirrored reflections of 1940s
interior room in the background. Geometric lines frame the main figure. Yellow,
black and white colour palette with streaks of red and blue". Now
we're getting somewhere. The dog has been messed up a good deal with odd
proportions and the surreal inclusion of a lizard-like tail. The requested
colours are present, though the background and interesting use of forced
perspective lines is perhaps a little busy. But overall I'd say this is a
fairly good attempt at a Bacon-like dog, although the extra data in the
description - such as the mirrored reflections and 1940s interior - has not
directly translated into the image.
This example demonstrates one of the
key attractions of MidJourney, which is its penchant for chance and
indeterminacy when interpreting the data fed into it. The most convincing of
the prompts above includes a natural language element, with subject-predicate
and statement parts, plus an actual image approximating the subject and style
desired. Nevertheless, the output image is something quite different, neither
exactly what we asked for, nor a complete departure. Getting to that point took
me six prompts which each produced a grid of four images, many of which were
no-where near the desired outcome. It's this element of chance and accident
that brings MidJourney into true Bacon territory. Don't believe me? Take it
from the man himself: The following is an exchange between Bacon and the art
critic David Sylvester in an interview in 1966.
FB: I want a very ordered image, but I want it to come
about by chance.
DS: But you're sufficiently puritanical not to want to
make the chance come too easily.
FB: I would like things to come easily, but you can't
order chance. Because if you could, you would only be imposing another type of
illustration.
DS: Are you aware of the moment when you find you are
becoming free and the thing is taking you over?
FB: Well, very often the involuntary marks are much more
deeply suggestive than others, and those are the moments when you feel that
anything can happen.
Those of
you conversant with the history of the surrealist movement will recognise the
ethos at work here. Chance encounters, the unconscious, or the marvellous
accident, were all themes and techniques developed in various ways by the
surrealists (and before them in Dada) as a means to get behind - or beyond -
the usual modes of representation in Western art. Bacon's courting of chance
and accident in his work in part continues this tradition, but it also formed a
key element of his personality, which came out in his often tempestuous
relationships and predilection for gambling. But as we can see in the quote
above he also wants to remain in overall control and desires "a very
ordered image", not an abstraction produced by chance encounters between
paint and canvas. Similar tensions are observable in his private life, which
despite the reputation for sexual and alcoholic excess was similarly marked by
a tendency towards control. I argued in an essay in 2018 that rather than being
a simple facet of his character, this constant tension between chance and order
constitutes Bacon's Form-of-Life; the wellspring if you like, from out of which
came his remarkable paintings.
Making a
picture in MidJourney without an obvious subject can lead in odd directions.
One way I like to start is by using the blend function in which up to five
images can be fed into the AI which will then attempt to produce a single image
which melds the aesthetic qualities of those in the prompt. It does this
without any additional word prompting, instead relying entirely on its own
internal logic. How exactly it does this, which elements it focuses on and
which it ignores, is more than a little mysterious, and again like so many
things with MidJourney, the results tend to vary between versions. One thing
I've noticed consistently is that it will latch on to any anatomical shapes it
can find and centre the image around them. If there is no central figure in the
images fed into it, it will sometimes create one by anthropomorphising
structures or textural elements. This can lead to a lot of strange results
where limbs, faces or composite body parts will appear. This is usually the
result of feeding highly non-comparable images into the prompt; but can, if you
choose them wisely, lead to a fairly unique starting point to build your
composition and improvise around results using additional word prompts or
substituting new images into the prompt to take it in different directions.
The
image to the left started out as a blend which included some older Bacon-type
images produced by MidJourney, plus some more photorealistic figurative data
and "textural" matter. I call input images textural if their
composition strongly influences the overall aesthetic style of the output
without defining the subject or dictating overall coherence. Such images are
often abstract or contain strong colours. After I had the initial blended image
I began to push it more deliberately using word prompts in addition to the
original images. First I started with “A young woman sits in a gloomy hotel
room", before adding a description of her clothing and extra
aesthetic filler such as "soft pastel thermal colour palette, variable contrast" and "dark urban skyline, shadowy figures, architectural
drawings". Eventually I fed one of the resulting images back into the
blend alongside material that suggested more of an urban setting. One of the
images in the resulting grid had altered the position of the figure such that
it looked reminiscent of a pole dancer. With this "involuntary mark"
made I changed the word prompt pushing it more towards such an outcome, before
finally switching in a textural image into the prompt, which I knew would have
the effect of scrambling the anatomy of the figure (and losing its head!). The
result was the image you see here; a unique, unplanned composition, that while
utilising the indeterminacy of the AI's training is also ordered, thematic and
recognisably figurative.
Bacon also
employed a degree of improvisation in his work, gambling - as he saw it - a
painting's fate on the next dramatic brush stroke. If it went wrong and could
not be recovered he would frequently destroy the painting. But this approach
could also yield desirable (though unconsciously produced) results, which for
Bacon were a route away from illustration and towards what he described as a
more violent return to reality itself. The most famous example of Bacon using
this method is his Painting 1946,
sometimes known as the butcher shop for its depiction of sides of beef. In an
unusually detailed account of how one of his paintings was created he describes
the intention of depicting a chimpanzee in grass, before then attempting to paint
a bird of prey landing in a field. None of this came to fruition and in his
laboured attempts he had produced such a strange mass of marks that it seemed
as if the figurations he finally arrived at - the final assemblage of which owes a good deal to
Poussin's The Adoration of the Golden
Calf - had formed by an act of his
unconscious.
|
Francis Bacon, Painting 1946
|
What
Bacon describes here - and what I've tried to apply with my image of the pole
dancer - is I think one of the best strategies for getting unique images out of
MidJourney. Images that do not privilege prompt following or exact
representational translation but nevertheless aim at an ordered final result.
It's also not random, since in order to get good results the user has to learn
the aesthetic and interpretative tendencies of the AI and how to provoke it
into making images with the desired artistic qualities and level of
compositional coherence. It's also rarely instantaneous and involves a lot of
tweaking and remixing of prompts. Sometimes, it leads nowhere worthwhile, and
one is always pushing up against MidJourney's more conservative impulses, which
are usually kitsch and highly influenced by American popular culture. When this
happens, there is no choice but to abandon the thread and start off once again
in a completely new direction.
In practice
this involves, so to speak, mastering the dark side of MidJourney; deploying
dissonance and contradiction, and getting a feel for the kinds of prompts that
send the AI away from its basic representational paradigm. For example getting
anything vaguely explicit into your compositions isn't easy. MidJourney's
interface on Discord is moderated to exclude explicit images or words from
prompts. This extends to slang or even phrases that could count as a double
entendre. Despite the moderation, which also varies depending on the version
you're using (earlier ones are generally more permissive for words, but also
can't tell the difference between the Venus De Milo and a pornographic image,
go figure) the system will frequently create images with explicit content,
usually around the naked human form. In short, MidJourney has a predilection
for boobs, and they sometimes pop up when you least expect them. It's just
another little morsel of indeterminacy that makes using it fun. And on the plus
side you're able to continue Western art's long tradition of fascination with
the female form.
Testing
the limits of the moderation, both around text and images is worth doing as it
can be frustrating to have your prompts knocked back for seemingly innocuous
infractions, e.g. the phrase "tight white" in any context cannot be
used. This
being said some surprising things do get through, so roll the dice, you never
know what will come out the other end.
The
moderation is meant to extend to horror and gore but there seems to be some
intelligence to the way it is applied so that photorealistic images of injury
or warped anatomy are excluded, but more painterly or cartoonish prompts get
through. This is certainly beneficial if like me you want to produce figures
with distortions or that generally look a bit roughed up. Combine this with the
relative ease in making if not explicit then at least vaguely erotic images,
then you have a good set of levers for adding more adult content into your
compositions, and this is no bad thing. As I described in the last piece, the
majority of content produced on MidJourney would in my opinion fall into the
category of infantile. Thus I'm all in favour of pushing it towards a more
mature audience. To my mind using cutting edge artificial intelligence to raise
the aesthetic standard of erotica is a more venerable project than using it to
produce endless Pokémon fan art.
I've grouped the images I've created using
these strategies in to two broad categories: After-Bacons, which try to produce
images in Bacon's style but with subjects more in keeping with contemporary
life in 2024; and Meta-Bacons (yeah I know it's a lame tag) which apply the
conceptual toolkit described above to make images that do not look like
paintings by Francis Bacon but nevertheless owe something to the ethos and
approach to image-making.
One
example of the former I'm fond of is this picture of a man with a newspaper
standing in a modern glass box apartment. Some of the elements, such as his
white trainers and the fact he's holding a newspaper - which I recognise is
hardly symbolic of the digital first era - were directly stated in the prompt,
as was the high setting over the city; but the rest of the composition came
about through trial and error, and improvising around marvellous accidents
thrown up by MidJourney itself. I especially like the mirroring of the
buildings outside the window on the presumably hyper-shiny floor of the flat.
The figure is a delightful quandary. He looks anxious and out of place, despite
the attempt seemingly to look relaxed with his newspaper, which in the way he's
holding it appears as if he were trying to hide something in his hands. The
clothing is also amusingly misplaced. Is that shorts over cut off leggings? The
trainers are nicely done and draw attention to the shadow on the floor which
instead of tracking the figure's silhouette, forms a incongruous rectangle.
This is a scene that nicely captures the ennui and out-of-jointness amid
superficial luxury that is symbolic 2020s Western culture.
And
as an example of a Meta-Bacon the image on the right, which I've titled An Unexpected
Visitor, has all the qualities I'm looking for. The prompt developed over a
couple of days and multiple iterations during which different colour palettes
were tried out and the arrangement of the figure and background played around
with. I'd been experimenting with trying to add something of Man Ray's
surrealistic black and white photography to the mix and wanted a slightly
erotic theme around a woman putting on stockings in a bedroom. the trouble was
that most of the versions either produced a too photorealistic representation
or the result was just too abstract and unnatural looking. One of the images
included in the final prompt featured a classical statue without its head -
which I'd included to give a nudge towards the body shape I wanted - but
instead this seems to have filtered down into the final picture in the
wonderful accident seen above, where the woman's head is unnaturally large and
constructed as a kind of glowing neon sign. The head, turned to the figure's
left, wears an anxious expression, and the left hand is held up at the throat
as if startled by a sudden knock at the door. The dishevelled state of the bed
mirrors the degraded framing of the image, as if it were a collage of old
photographs.
The figure itself is beautifully formed, non-symmetrical in
appearance (which is something MidJourney often struggles with) and has these
suggestive marks that could be either surface damage (in a photo collage) or
symbolic of violence. Finally, there's the odd sprite-like entity at the head
of the bed, that in its diaphanous appearance seems hardly there. It's an image
that's full of intrigue and tension, combining a number of styles into a
striking final result. As with the picture of the man in the apartment above,
it grew out of a combination of intention and accident, where unintended
results could be built upon and improvised around. I had only an outline of the
subject I wanted, and the decision when to settle for a final image is somewhat
arbitrary, after all, images can be endlessly remixed and new versions
produced. In a similar fashion to how Bacon claimed his paintings were
"let out", pictures on MidJourney are not so much finished as abandoned.
The jargon
around AI such as ChatGPT and MidJourney would have the user known as a 'Prompt
Engineer'. One influencer on Twitter even speculated this could be a future job
title for anyone whose livelihood was under threat from the new technology.
Sadly I think for many that will be the case,
much as previous rounds of automation rendered older skills obsolete,
those former hands-on builders and creatives were pushed further away from the
product of their labour, and now mostly supervised the machines that actually
do the work. The threat now is that the inner logic and tendencies of the AI -
which, we should remember, is never neutral, and can only approximate our
world, with all its prejudices and darkness included - will come to supplant
the inexhaustible creative potential of human beings. Humans do require paying,
MidJourney just needs a monthly subscription. Thus you can be sure that company
executives across the creative industries are gleefully adding up the possible
savings from eliminating those same pesky humans from the production process.
If the public gallery on MidJourney is anything to go by, then, MidJourney is
well suited to being a workhorse for the mainstream of the culture industry.
But,
as I hope I've been able to show, this technology also has artistic potential
that can be brought out in collaboration with a user that seeks - so to speak -
to meet the ghost in the machine halfway. I think this form of use - which
rejects the standard representational paradigm, whereby AI should be a slave to
our most asinine dreams - is a real and open possibility. It's certainly no
more outlandish than the idea of artificial general intelligence or the notion
of the human/AI singularity that excites transhumanists. Art, like cruelty, is
one of the quintessential human things, and the leap from the Lascaux cave
paintings to Duchamp's Fountain is perhaps greater than what artificial intelligence
will do for us.
My bet is
that Francis Bacon would also not have been overly disturbed by AI image
generation. There is no possibility of perfect prompt following since the
meaning of the words pumped into it can have no objectively translatable
aesthetic counterpart. To suggest otherwise is a category error. It's the same
category error made by people who claim music can be reduced to maths. Adding
image prompts into the mix only compounds this basic truth. Consequently, this means is that there will
always be a fundamental degree of chance and indeterminacy at work; a ghost in
the machine, that by way of certain clairvoyant techniques can become the
collaborator in an artist's work. If he were around now perhaps Bacon would
have used MidJourney much as he used photography; as a source of inspiration, of
subjects and new ways to show the figure. It will undoubtedly accelerate
cultural production in the most shallow and commercially driven parts of
society. But alongside that certainty there is also the possibility of taking
up the challenge and deepening the game.