As I discussed in the previous part, a key philosophical question for the future of AI image generation and art is the question of representation. Will future iterations of MidJourney, DALL-E, Stable Diffusion, etc, just become ever more proficient at translating complex prompts into accurate visual representations, with a consequent loss of the wildness and "ghost in the machine" type interventions, which for me are its main selling point? Or will the technology retain, or even begin to coalesce around its own artistic tendencies, ending up less of a tool and more of a genuine collaborator?
One artist who appreciated more than most the impact of new technology on artistic practice was Francis Bacon. In interviews he used the expression "deepening the game" to describe the approach painters needed to take in response to the advent of photography and film, which had during his lifetime thoroughly challenged the role of painting in the visual arts. After all, if photography was now the dominant medium for representing reality, what could traditional painting, and especially figurative painting offer that went beyond mere illustration? Bacon's answer to this challenge was to develop a style and technique of painting that not only produced works of unsettling power, but which also assimilated photography into their mode of production. He made no secret of the fact that photographs - of his friends and lovers, from newspapers and magazines, and even pictures of old masters like Velazquez' portrait of Pope Innocent X - frequently formed the raw material for his own images. While other artists of the time abandoned the figure and moved into abstraction, Bacon took on photography and in a sense made the camera into a collaborator.
Artificial intelligence poses yet another challenge to artistic production on perhaps an even greater scale to photography. It not only takes aim at the skills of the digital artist or graphic designer, but puts into question the very notion of artistic authorship. As such, I've spent a fair amount of my CPU time on MidJourney attempting to apply some "Baconian strategies" to how I use it and the types of images I've tried to make. Some of this has consisted of pictures produced in Bacon's style - with a contemporary twist, while at other times I've attempted to develop a more general approach to image generation in which Bacon's concepts and insights on art are applied.
I should say from the outset that each version of MidJourney has a different take on Bacon's style which reflects the differing levels of complexity in the coding and the way the developers have trained it. Generally speaking, versions 3 and 4 will give you more abstraction, dissonance and incoherence, whereas the different iterations of version 5 and the recently released version 6 will offer more realism, better figurative representation and more straightforward expressions of compositional coherence. As such, producing Bacon-esque images is not as straightforward as asking for so and so "in the style of Francis Bacon". Indeed as you will see, Bacon's painting themselves must often be brought in to the prompts to provide guidance and set examples for the AI to follow, as well as moving between versions and using counterintuitive phrases to excite more interesting results. Take for example the two images below produced by version 6, both of which use the prompt "Francis Bacon painting of a dog sitting peacefully in front of a large open window", with the addition that the one on the right also includes a jpg file of Bacon's Dog (1952) as part of the prompt.
Neither of these images really captures much of Bacon's style. The most one could say is that some of the "brushstrokes" for the surrounding rooms are vaguely reminiscent of how Bacon would situate his figures perhaps in the 1940s and 50s. I also like the way the flooring from Bacon's painting has been remixed into the picture on the right. But the figure of the dog itself is just too well rendered, too illustrative, and just, well, a bit dull. Switching down to V4 however gives us a very different result. The picture on the left uses the exact same prompt (including the image of Bacon's Dog) as the picture above on the right, but gives us an entirely different style which in my mind much more characteristic of Bacon's compositions and stark framing of the figure. The dog is perhaps still too much of a direct representation but the whole thing is more obviously modernist in execution. The translation of the window, however, has entirely lost its realism, and is now only hinted at with the arrangement of line and colour in the upper half of the picture.
Finally the image on the right adds a longer description in near-natural language: "Francis Bacon painting of a dog sitting peacefully in front of a large open window, twisted and distorted anatomy. Mirrored reflections of 1940s interior room in the background. Geometric lines frame the main figure. Yellow, black and white colour palette with streaks of red and blue". Now we're getting somewhere. The dog has been messed up a good deal with odd proportions and the surreal inclusion of a lizard-like tail. The requested colours are present, though the background and interesting use of forced perspective lines is perhaps a little busy. But overall I'd say this is a fairly good attempt at a Bacon-like dog, although the extra data in the description - such as the mirrored reflections and 1940s interior - has not directly translated into the image.
This example demonstrates one of the key attractions of MidJourney, which is its penchant for chance and indeterminacy when interpreting the data fed into it. The most convincing of the prompts above includes a natural language element, with subject-predicate and statement parts, plus an actual image approximating the subject and style desired. Nevertheless, the output image is something quite different, neither exactly what we asked for, nor a complete departure. Getting to that point took me six prompts which each produced a grid of four images, many of which were no-where near the desired outcome. It's this element of chance and accident that brings MidJourney into true Bacon territory. Don't believe me? Take it from the man himself: The following is an exchange between Bacon and the art critic David Sylvester in an interview in 1966.
FB: I want a very ordered image, but I want it to come about by chance.
DS: But you're sufficiently puritanical not to want to make the chance come too easily.
FB: I would like things to come easily, but you can't order chance. Because if you could, you would only be imposing another type of illustration.
DS: Are you aware of the moment when you find you are becoming free and the thing is taking you over?
FB: Well, very often the involuntary marks are much more deeply suggestive than others, and those are the moments when you feel that anything can happen.
Those of you conversant with the history of the surrealist movement will recognise the ethos at work here. Chance encounters, the unconscious, or the marvellous accident, were all themes and techniques developed in various ways by the surrealists (and before them in Dada) as a means to get behind - or beyond - the usual modes of representation in Western art. Bacon's courting of chance and accident in his work in part continues this tradition, but it also formed a key element of his personality, which came out in his often tempestuous relationships and predilection for gambling. But as we can see in the quote above he also wants to remain in overall control and desires "a very ordered image", not an abstraction produced by chance encounters between paint and canvas. Similar tensions are observable in his private life, which despite the reputation for sexual and alcoholic excess was similarly marked by a tendency towards control. I argued in an essay in 2018 that rather than being a simple facet of his character, this constant tension between chance and order constitutes Bacon's Form-of-Life; the wellspring if you like, from out of which came his remarkable paintings.
Making a picture in MidJourney without an obvious subject can lead in odd directions. One way I like to start is by using the blend function in which up to five images can be fed into the AI which will then attempt to produce a single image which melds the aesthetic qualities of those in the prompt. It does this without any additional word prompting, instead relying entirely on its own internal logic. How exactly it does this, which elements it focuses on and which it ignores, is more than a little mysterious, and again like so many things with MidJourney, the results tend to vary between versions. One thing I've noticed consistently is that it will latch on to any anatomical shapes it can find and centre the image around them. If there is no central figure in the images fed into it, it will sometimes create one by anthropomorphising structures or textural elements. This can lead to a lot of strange results where limbs, faces or composite body parts will appear. This is usually the result of feeding highly non-comparable images into the prompt; but can, if you choose them wisely, lead to a fairly unique starting point to build your composition and improvise around results using additional word prompts or substituting new images into the prompt to take it in different directions.
The
image to the left started out as a blend which included some older Bacon-type
images produced by MidJourney, plus some more photorealistic figurative data
and "textural" matter. I call input images textural if their
composition strongly influences the overall aesthetic style of the output
without defining the subject or dictating overall coherence. Such images are
often abstract or contain strong colours. After I had the initial blended image
I began to push it more deliberately using word prompts in addition to the
original images. First I started with “A young woman sits in a gloomy hotel
room", before adding a description of her clothing and extra
aesthetic filler such as "soft pastel thermal colour palette, variable contrast" and "dark urban skyline, shadowy figures, architectural
drawings". Eventually I fed one of the resulting images back into the
blend alongside material that suggested more of an urban setting. One of the
images in the resulting grid had altered the position of the figure such that
it looked reminiscent of a pole dancer. With this "involuntary mark"
made I changed the word prompt pushing it more towards such an outcome, before
finally switching in a textural image into the prompt, which I knew would have
the effect of scrambling the anatomy of the figure (and losing its head!). The
result was the image you see here; a unique, unplanned composition, that while
utilising the indeterminacy of the AI's training is also ordered, thematic and
recognisably figurative.
Bacon also employed a degree of improvisation in his work, gambling - as he saw it - a painting's fate on the next dramatic brush stroke. If it went wrong and could not be recovered he would frequently destroy the painting. But this approach could also yield desirable (though unconsciously produced) results, which for Bacon were a route away from illustration and towards what he described as a more violent return to reality itself. The most famous example of Bacon using this method is his Painting 1946, sometimes known as the butcher shop for its depiction of sides of beef. In an unusually detailed account of how one of his paintings was created he describes the intention of depicting a chimpanzee in grass, before then attempting to paint a bird of prey landing in a field. None of this came to fruition and in his laboured attempts he had produced such a strange mass of marks that it seemed as if the figurations he finally arrived at - the final assemblage of which owes a good deal to Poussin's The Adoration of the Golden Calf - had formed by an act of his unconscious.
What
Bacon describes here - and what I've tried to apply with my image of the pole
dancer - is I think one of the best strategies for getting unique images out of
MidJourney. Images that do not privilege prompt following or exact
representational translation but nevertheless aim at an ordered final result.
It's also not random, since in order to get good results the user has to learn
the aesthetic and interpretative tendencies of the AI and how to provoke it
into making images with the desired artistic qualities and level of
compositional coherence. It's also rarely instantaneous and involves a lot of
tweaking and remixing of prompts. Sometimes, it leads nowhere worthwhile, and
one is always pushing up against MidJourney's more conservative impulses, which
are usually kitsch and highly influenced by American popular culture. When this
happens, there is no choice but to abandon the thread and start off once again
in a completely new direction.
In practice this involves, so to speak, mastering the dark side of MidJourney; deploying dissonance and contradiction, and getting a feel for the kinds of prompts that send the AI away from its basic representational paradigm. For example getting anything vaguely explicit into your compositions isn't easy. MidJourney's interface on Discord is moderated to exclude explicit images or words from prompts. This extends to slang or even phrases that could count as a double entendre. Despite the moderation, which also varies depending on the version you're using (earlier ones are generally more permissive for words, but also can't tell the difference between the Venus De Milo and a pornographic image, go figure) the system will frequently create images with explicit content, usually around the naked human form. In short, MidJourney has a predilection for boobs, and they sometimes pop up when you least expect them. It's just another little morsel of indeterminacy that makes using it fun. And on the plus side you're able to continue Western art's long tradition of fascination with the female form.
The moderation is meant to extend to horror and gore but there seems to be some intelligence to the way it is applied so that photorealistic images of injury or warped anatomy are excluded, but more painterly or cartoonish prompts get through. This is certainly beneficial if like me you want to produce figures with distortions or that generally look a bit roughed up. Combine this with the relative ease in making if not explicit then at least vaguely erotic images, then you have a good set of levers for adding more adult content into your compositions, and this is no bad thing. As I described in the last piece, the majority of content produced on MidJourney would in my opinion fall into the category of infantile. Thus I'm all in favour of pushing it towards a more mature audience. To my mind using cutting edge artificial intelligence to raise the aesthetic standard of erotica is a more venerable project than using it to produce endless Pokémon fan art.
I've grouped the images I've created using these strategies in to two broad categories: After-Bacons, which try to produce images in Bacon's style but with subjects more in keeping with contemporary life in 2024; and Meta-Bacons (yeah I know it's a lame tag) which apply the conceptual toolkit described above to make images that do not look like paintings by Francis Bacon but nevertheless owe something to the ethos and approach to image-making.
One
example of the former I'm fond of is this picture of a man with a newspaper
standing in a modern glass box apartment. Some of the elements, such as his
white trainers and the fact he's holding a newspaper - which I recognise is
hardly symbolic of the digital first era - were directly stated in the prompt,
as was the high setting over the city; but the rest of the composition came
about through trial and error, and improvising around marvellous accidents
thrown up by MidJourney itself. I especially like the mirroring of the
buildings outside the window on the presumably hyper-shiny floor of the flat.
The figure is a delightful quandary. He looks anxious and out of place, despite
the attempt seemingly to look relaxed with his newspaper, which in the way he's
holding it appears as if he were trying to hide something in his hands. The
clothing is also amusingly misplaced. Is that shorts over cut off leggings? The
trainers are nicely done and draw attention to the shadow on the floor which
instead of tracking the figure's silhouette, forms a incongruous rectangle.
This is a scene that nicely captures the ennui and out-of-jointness amid
superficial luxury that is symbolic 2020s Western culture.
And as an example of a Meta-Bacon the image on the right, which I've titled An Unexpected Visitor, has all the qualities I'm looking for. The prompt developed over a couple of days and multiple iterations during which different colour palettes were tried out and the arrangement of the figure and background played around with. I'd been experimenting with trying to add something of Man Ray's surrealistic black and white photography to the mix and wanted a slightly erotic theme around a woman putting on stockings in a bedroom. the trouble was that most of the versions either produced a too photorealistic representation or the result was just too abstract and unnatural looking. One of the images included in the final prompt featured a classical statue without its head - which I'd included to give a nudge towards the body shape I wanted - but instead this seems to have filtered down into the final picture in the wonderful accident seen above, where the woman's head is unnaturally large and constructed as a kind of glowing neon sign. The head, turned to the figure's left, wears an anxious expression, and the left hand is held up at the throat as if startled by a sudden knock at the door. The dishevelled state of the bed mirrors the degraded framing of the image, as if it were a collage of old photographs.
The figure itself is beautifully formed, non-symmetrical in
appearance (which is something MidJourney often struggles with) and has these
suggestive marks that could be either surface damage (in a photo collage) or
symbolic of violence. Finally, there's the odd sprite-like entity at the head
of the bed, that in its diaphanous appearance seems hardly there. It's an image
that's full of intrigue and tension, combining a number of styles into a
striking final result. As with the picture of the man in the apartment above,
it grew out of a combination of intention and accident, where unintended
results could be built upon and improvised around. I had only an outline of the
subject I wanted, and the decision when to settle for a final image is somewhat
arbitrary, after all, images can be endlessly remixed and new versions
produced. In a similar fashion to how Bacon claimed his paintings were
"let out", pictures on MidJourney are not so much finished as abandoned.
The jargon around AI such as ChatGPT and MidJourney would have the user known as a 'Prompt Engineer'. One influencer on Twitter even speculated this could be a future job title for anyone whose livelihood was under threat from the new technology. Sadly I think for many that will be the case, much as previous rounds of automation rendered older skills obsolete, those former hands-on builders and creatives were pushed further away from the product of their labour, and now mostly supervised the machines that actually do the work. The threat now is that the inner logic and tendencies of the AI - which, we should remember, is never neutral, and can only approximate our world, with all its prejudices and darkness included - will come to supplant the inexhaustible creative potential of human beings. Humans do require paying, MidJourney just needs a monthly subscription. Thus you can be sure that company executives across the creative industries are gleefully adding up the possible savings from eliminating those same pesky humans from the production process. If the public gallery on MidJourney is anything to go by, then, MidJourney is well suited to being a workhorse for the mainstream of the culture industry.
But, as I hope I've been able to show, this technology also has artistic potential that can be brought out in collaboration with a user that seeks - so to speak - to meet the ghost in the machine halfway. I think this form of use - which rejects the standard representational paradigm, whereby AI should be a slave to our most asinine dreams - is a real and open possibility. It's certainly no more outlandish than the idea of artificial general intelligence or the notion of the human/AI singularity that excites transhumanists. Art, like cruelty, is one of the quintessential human things, and the leap from the Lascaux cave paintings to Duchamp's Fountain is perhaps greater than what artificial intelligence will do for us.
My bet is that Francis Bacon would also not have been overly disturbed by AI image generation. There is no possibility of perfect prompt following since the meaning of the words pumped into it can have no objectively translatable aesthetic counterpart. To suggest otherwise is a category error. It's the same category error made by people who claim music can be reduced to maths. Adding image prompts into the mix only compounds this basic truth. Consequently, this means is that there will always be a fundamental degree of chance and indeterminacy at work; a ghost in the machine, that by way of certain clairvoyant techniques can become the collaborator in an artist's work. If he were around now perhaps Bacon would have used MidJourney much as he used photography; as a source of inspiration, of subjects and new ways to show the figure. It will undoubtedly accelerate cultural production in the most shallow and commercially driven parts of society. But alongside that certainty there is also the possibility of taking up the challenge and deepening the game.
No comments:
Post a Comment