TLDR; While everyone fights about Art / Not art and other stuff. Companies are building next-level AI systems that not only produce good-looking images but are using human-in-the-middle feedback loops to conquer the seemingly unautomatable parts of the artistic process. Picking what looks beautiful from a given set of images.
While the Pro-AI and Anti-AI camps are playing volleyball with recriminations on Twitter and Reddit. Everyone knows the arguments and counterarguments thrown around. It's Sunday evening, and I'm not in the mood for acidity and high blood pressure today. So let's pretend whatever you think about this topic, is the correct view.
I want to talk about the even bigger picture in a slightly more nuanced way. Not just from the artist's perspective but from another angle. An angle of where is this thing going?
Some say people using these systems are prompt monkeys and the other side says 'we are artists'
I put people in this space in 2 categories:
Producers: They make stuff, but their workflow consists of just using a chat-based system and finding variations on artwork. They do the creative work of coming up with ideas of what they want to generate. learning how to prompt to get what they want. There is a level of skill that comes with experience with this as well, and is generally not given the credit for their efforts because the output far outshines their efforts. This person needs to have a good eye for beauty to pick the right image variations.
Power users: People that may use a chat-based system and even AI image generation tools, but the whole thing is part of a process.
This process may involve sketching, making 3D models, shooting videos, writing python scripts, training models, photoshop, etc. you get the idea.
There is definitely a skill gap between these 2 types of users and a gap in how much effort is put into generating each image. There are also a whole bunch of traditional and digital artists that use these systems so it's pretty stupid to pile everyone together.
Coming to the models, there are 2 kinds of models in the AI image generation space.
Open models: These are models that have a known dataset and are freely available for anyone to download and use. This puts the power in the hands of ordinary people. They have their own challenges on how to get the models to work for you on your machine -- it's generally more DIY and hands-on.
Private models: These are models that are trained and owned by companies:
The dataset that goes into these models is not disclosed, that would be their trade secret.
The models themselves are very closely guarded and wouldn't be released to the public, as they are seen as core intellectual assets of these companies.
What they do allow, is that you can access their service and use their models - right now, they are "free" or very low cost, but we'll get to that.
From what I see, right now -- there are 2 problems -- the technical and the aesthetic.
Level 1: Generating AI-assisted images -- getting them correct and making them look more realistic. This first problem is almost sort of solved, with amazing-looking outputs and the things remaining to solve numbering in the range you can count on your fingers. lol!
Companies that run these Open Models just want to get technically correct-looking images -- what to generate isn't their concern -- that's the user's headache.
Level 2: The second problem is, now, from the seemingly infinite sea of images that can be generated, what are the kinds of images that people prefer to see?
To answer this question, you need to answer the question, "What do most people find attractive or pleasing?"
This is a known problem. It's the number one thing that almost all creative industries have in mind, day in and day out. What do you produce that the people will like and enjoy consuming, so as a next step, you can motivate them to do what you actually want them to do. i.e. buy your game, come watch your movie, buy your product.
This is a really tough question and we'll come to this in a bit.
Right now, all the discourse is more from the angle of the 'fury of the Artists' who feel they've been shortchanged and ripped off. I feel for them. Except for some adolescents(of various ages) with a room-temperature IQ. Everyone across the board appreciates and knows the work put in by artists. That mastery takes time, is not easy to acquire and must be respected.
The arguments right now, are about 'Is AI-generated art or not art?', 'what if someone decided to call themselves an artist?' how would society survive this catastrophe? I won't bore you with this, you can decide for yourself and talk about it till the cows come home on Twitter.
I want to talk a little bit about the broader picture from my perspective. I have briefly used some of the closed-source services once or twice where you go to a discord server, punch in a bunch of commands and get a response back from the bot.
My personal experience of the chat-based systems is:
1 - It has the feel of a communal bathhouse, go in get naked and shower in a common area meant for everyone -- one common chat room where people type in what they want to generate. The generated images from those commands are commonly visible to others in the chat. I can't put a finger on it, but it just feels very degrading to work like this.
2 - The second feeling you get, is because you can see other people's images and their prompts everyone here is, 'just a prompt monkey'. I mean, type a bunch of stuff and the benevolent gods will serve you with beautiful images. Seemingly at no cost.
Look, prompting is an art in itself, it's the same skill as writing a SQL query to fetch records from a database, but instead, you write it in natural language because the model is smart enough to understand it. It still takes skill and experience to manipulate the model to do what you want.
It's not about whether you generate images using prompts or not. The reason why I find it repulsive is that on these platforms. You own absolutely nothing. Say today you generate a beautiful image, what about tomorrow, is it gauranteed? what about the day after that? If the service shuts down can you still generate these images? if the answer is no. You are not an artist using a tool. You are a producer outsourcing the creative work to a vendor and curating it.
No doubt, the images from these platforms look just stunning.
But, after using it for the second time. It made me step back and realize, wait a second, this thing costs a ton of money to run. What is the game plan? why is it free Then I added the 2 and 2 together. These services are not meant for the people that are using them right now in these communal chat rooms.
This is the unsaid part - these people using it for free and at low costs are working towards teaching the model the Level 2 skills -- They are teaching the models: 'how do you pick images that people find pleasant'
Every prompt gives you 4 options - you pick one or do 4 more variations from one of the results and keep going till you find the one that you like.
Given 4 images, which one of these is more pleasing? Do this over and over and over millions of times with a wide array of people across the globe and you start to learn what 'people' in general actually like. This in turn helps train the actual models themselves. Curating and building only aesthetic images.
If you want to actually use these services privately like you would want to, if you're producing work for anything serious. These services cost an eye-watering 50$ a month.
Right now, the output from these systems is way better than the Open model systems - for this exact reason.
In my opinion, if you can't download the model and run it on your own machine to produce images. If you cant tweak and modify it on your terms, you should keep a couple of things in mind. A project like StableDiffusion can be thought of as a tool in your workflow.
But the discord services can at best be thought of as outside vendors, or at worst as temporary partners. Where you provide the manual labor of picking the right images and training their model of what is aesthetic and what is not and they give you free or low-cost images. This is a symbiotic relationship up until the time they need you to do that trainer/curator role. It shouldn't come as a shock that if you need to pay some or a lot of money to use the same services you helped train.
I think the real fight is not going to be about 'Is AI art real art?' -- this question has been answered by the big companies with a resounding yes. The photorealistic images can fool just about anyone casually glancing at the images. The consumer can't tell one image apart from another anyways (relax, the fingers will get solved).
Following is my opinion, its based on how I think things will play out, take it with a pinch of salt.
I think the next claim from the AI image model companies will:
'Not only do we know how to make coherent images. We now also authoritatively know how to make aesthetically pleasing images that people like'.
Question is, why would someone need something like that?
To answer this question, we go back to the problem we talked about that every creative industry is trying to solve. How to make compelling stuff that influences the people to do what you want them to do. (buy stuff, watch your movies, etc.).
What if you could automate a proven aesthetic art generation system and show variations of good-looking artworks( ads ) to a whole bunch of people that you probably also know something about (from the advertising platform) -- AB test what stuff really works, which demographic likes what ad and then do variations and tune the ads to be the most compelling.
Even if this doesn't happen, ultimately the people using the Chat-Based image generation systems are doing a second job of training those systems, whether they know it or not.