The Dangers of AI Fast Fashion Content

The shock brought by the release of ChatGPT, a state-of-the-art chatbot from the AI research laboratory OpenAI, will redefine how we relate to writing. ChatGPT is an extensive language model trained on a dataset of billions of words and can generate human-like text. It was all good and fun to ask ChatGPT to rewrite Bohemian Rhapsody as the life of a postdoc, how to remove a peanut butter sandwich from a VCR in the style of the King James version of the Bible, generate essays or Harvard applications in seconds. But the AI could also be manipulated: giving advice on hotwiring a car or how to break into someone’s house.

And that’s not all. OpenAI developed another AI system, DALL-E, which is trained on massive datasets of text-image pairs and generates images based on a text prompt. The official Instagram account for DALL-E showcases infinite possibilities.

View this post on Instagram

A post shared by Izanami Art (@call_me_izanami)

Mixing ChatGPT and DALL-E or similar AI systems are supercharging everything we know about art, creativity, education ethics, and the future of jobs. Take, for example,

https://twitter.com/GuyP/status/1598020781065527296

where ChatGPT prompts were generated and served into Midjourney, another AI program that creates images from text prompts.

The unethical risks are incredibly high. First, we have the training datasets issue. As reported by Euronews,

DALL-E 2 and Midjourney have not yet made their datasets public. However, the popular open-source tool Stable Diffusion has been more transparent about what it trains its AI on.

Even so, Stable Diffusion uses data sets from LAION (a German non-profit with a stated goal “to make large-scale machine learning models, datasets and related code available to the general public”), which in turn, gets the image-text pairs from another non-profit, Common Crawl that crawls the web and shares its dataset to the public. Twisted are the paths of copyrights, as, according to Wikipedia,

The Common Crawl dataset includes copyrighted work and is distributed from the US under fair use claims. Researchers in other countries have made use of techniques such as shuffling sentences or referencing the common crawl dataset to work around copyright law in other legal jurisdictions.

Then, the livelihoods of millions of creative jobs are at risk and illustrators, designers, writers, customer service representatives face the perils of becoming obsolete. With these mighty engines at their fingertips, we could argue that artists need to retrain with these new tools if they want their skills to remain relevant.

But some would say that AI text-to-image programs are appropriating the style of artists who, more than likely, when they published their portfolios online, didn’t agree that an AI can be trained on their work. Should AI companies reveal the datasets they use for training? Should an artist have the right to opt out of web crawlers that make the meat and bones of the datasets on which AI generators are trained? Should an artist be compensated depending on how often their works appear in datasets? Conversations about what is ethical and not should have happened before such powerful engines were unleashed to the public.

And then, there is the case of publishing an entire children’s book with prompts from ChatGPT to generate the text and prompts from MidJourney to render the images.

https://www.twitter.com/ammaar/status/1601284293363261441?cxt=HHwWgoCjpZbl87gsAAAA

The book is available to buy on Amazon UK and Amazon US. The reviews are appalling, but the cat is already out of the bag. Especially with AI-generated content targeting vulnerable categories such as children, we need to be highly critical about what should be ethical and legal. As one user replied,

People have every right to be angry. This tech is built extremely unethically and is threatening the livelihoods of millions of people. If left unchecked it will leave our world soulless and hollow. Do you really want your kids to grow up on algorithmic art shat out by computers?
— Jon Neimeister (@Andantonius) December 12, 2022

Looking over the books I read to my daughter over the years, there are many wonderful quiet masterpieces. Wimmelbooks, where a child or adult alike can spend hours looking at the everyday scenes. Or the books from Clotilde Perrin (this or this) that have an insanely imaginative use for all kinds of little flaps. Or the powerful message in just a few images and paragraphs within The Paper Bag Princess by Robert Munsch. Winnie The Pooh. Frog and Toad. The Missing Piece meets the Big O. Fabulous storytelling about empathy, sharing, emotions, and friendship. And we didn’t even start the classic chapter books.

There is something else that we must remember: children’s books are examples of high-context culture.

In low-context cultures, information is more explicit and taken at face value. Say what you mean and mean what you say. In high-context cultures, most communication occurs with body language, facial expressions, silence or voice tone. Context is more important than words. We wouldn’t say “No”, but something like, “Let’s see what I can do.” Yes is more subtle, like a glance or a slight nodding. Imagine you have guests over, you ask them if they want another serving, and a guest says no. In the case of a low-context culture, that’s it, no more offerings. In the case of a high-context culture, you would insist and insist until the guest succumbs.

Storytelling in children’s books is high-context. Blank space in one book could mean the arrival of winter and snow that brings two friends together (The Lion and the Bird) or death in another (Michael Rosen’s Sad Book).

E.B. White, the author of Charlotte’s Web, understood that:

Anyone who writes down to children is simply wasting his time. You have to write up, not down. Children are demanding. They are the most attentive, curious, eager, observant, sensitive, quick, and generally congenial readers on earth. They accept, almost without question, anything you present them with, as long as it is presented honestly, fearlessly, and clearly. I handed them, against the advice of experts, a mouse-boy, and they accepted it without a quiver. In Charlotte’s Web, I gave them a literate spider, and they took that.

Although White mentions “presented honestly, fearlessly, and clearly”, which can mean low-context, let’s not forget that most children will not assume that literate spiders exist, except in the context of make-believe. Context matters more than words.

How many of those who might appeal to AI to publish children’s books would know how to adapt the generated writing to a high-context culture?

There is another ethical aspect. As a parent, I want to know if the book I buy for my child is AI-generated. How many self-publishing authors or publishing houses would disclose this information if it isn’t legally required? And how to find out if a book is AI generated? Ironically, this is what I got from ChatGPT when I asked about this:

a book that appears to be overly simple, repetitive, or formulaic in its language and content may be more likely to be AI-generated.

There are benefits to these AI tools as they alleviate the pain to start. The text or images generated should be the “shitty first drafts” writer Anne Lamott mentions in her book Bird by Bird:

Almost all good writing begins with terrible first efforts. You need to start somewhere. Start by getting something—anything—down on paper. What I’ve learned to do when I sit down to work on a shitty first draft is to quiet the voices in my head.

William Zinsser was onto something when describing the process of writing and editing:

Paragraph 1 is a disaster — a tissue of generalities that seem to have come out of a machine. No person could have written them. Paragraph 2 isn’t much better. But Paragraph 3 begins to have a somewhat human quality, and by Paragraph 4 you begin to sound like yourself. You’ve started to relax. It’s amazing how often an editor can throw away the first three or four paragraphs of an article, or even the first few pages, and start with the paragraph where the writer begins to sound like himself or herself.

William Zinsser – On Writing Well: An Informal Guide to Writing Nonfiction

So yes, ChatGPT can be used as a writing partner to bounce ideas. We also need to remember it was tricked into writing a scientific article about the benefits of eating glass.

Imagine a world where people took the easy way. Instead of pouring their attention and editing the AI-generated drafts, they pump out tedious or biased content every millisecond. The proliferation of AI in publishing could lead to a situation similar to fast fashion, where large quantities of low-quality items are produced quickly and inexpensively. Future generations will not want to learn how to draw or write, seeing how easily books or drawings are spewed out and, most importantly, promoted. This will lead to immense machine-generated training data sets to feed other AI programs and so on and so on ad infinitum. What a disposable world that would be.

As Maggie Appleton brilliantly said,

Our new challenge as little snowflake humans will be to prove we aren’t language models. It’s the reverse Turing test.

It is already happening, as an artist got banned from the art subreddit because their work looked too AI-esque. As expected, other users were unhappy about the whole situation.

Perhaps more relevant than ever is the Nobel Prize in Literature winner Svetlana Alexievich, whose books are fundamentally human. She is a Belarusian investigative journalist and oral historian who interviewed thousands of people for her books.

In Alexievich’s words,

The books that I’m writing, you can write them only when you’re amongst your people. You’re not going to find it on the Internet. You’re not going to hear it there.

And above all,

All our lives, we fight for certain ideals, and they get diluted, and then we have to fight for them again.

Because the times we live in weren’t interesting enough.

Resources:

The Expanding Dark Forest and Generative AI, an impressive article from Maggie Appleton