This weekend I started making short AI-generated music videos, and the process has been much more fun and interesting than I expected. This was my favorite:
The idea started in a very simple way. I was watching YouTube videos with music that I liked, but the visuals were not really enough to hold my attention. A lot of them were essentially just background music with a still image, a loop, or something visually repetitive. I wanted something more immersive. I wanted to relax, listen to music, and almost “act like a vegetable” while staring at a screen, letting the sound and visuals carry me somewhere.
I asked ChatGPT to help me look for YouTube channels that combined AI-generated music with rich AI-generated visuals, but I did not find many examples that matched what I had in mind. So I decided to try making them myself.
Around the same time, I had done a consulting job helping someone create a business proposal, build a website concept, and start coding software using GPT Codex. With the money from that job, I bought a one-month unlimited subscription to Runway AI. Then I started experimenting.
The workflow quickly became a kind of creative pipeline. I used Midjourney for text-to-image generation, GPT Image 2 and Grok for additional images and ideas, ChatGPT for brainstorming and prompt refinement, Runway for image-to-video animation, Suno for text-to-song music generation, and iMovie for editing everything together.
The process usually started with a still image. I would prompt Midjourney, GPT Image 2, Grok, or ChatGPT to create colorful, cinematic images, often with an oil-painting feel. I liked images that felt vivid, dynamic, and slightly surreal: race cars in outer space, foxes racing in Miami, submarines underwater, or animals piloting strange vehicles in impossible environments.
Once I had an image I liked, I uploaded it into Runway and described the motion I wanted. I asked for aerial shots, bird’s-eye views, camera pans, drifting cars, racing motion, underwater movement, and cinematic gliding. Some clips worked beautifully. Others were strange, distorted, or unusable. A common problem was that Runway would often begin with a car standing still and then accelerating. When several clips did this in a row, the final video lost momentum, so I learned to cut out the first few seconds of those clips in iMovie and keep only the parts where the motion was already underway.
The music came from Suno. I generated many songs, experimenting with retro wave, chillwave, synthwave, soul, R&B, soft rock, and 1980s-inspired sounds. I asked for piano keys, synthesizers, saxophone, mellow grooves, and atmospheric textures. Some songs were too short. Some missed the feeling I wanted. But others were surprisingly strong, and when the right song matched the right visual world, the whole piece started to feel alive.
ChatGPT was useful throughout the process. It helped me brainstorm titles, fictional worlds, racing leagues, animal themes, fake technical jargon, YouTube descriptions, and thumbnail text. It also helped generate infographic-style concepts, such as “Ursine Cosmic Drifting,” “Vulpine Racing,” and “Porcine Deep Sea Explorer.” These were intentionally serious and silly at the same time. I liked the idea of treating absurd animal racing worlds with the language of motorsport, biology, ecology, and speculative physics.
Each finished video took about three to six hours. At least half of that time was simply waiting for generations to complete. There was a lot of trial and error: generating images, selecting the best ones, animating them, rejecting bad clips, trimming clips, generating new music, editing, making thumbnails, and packaging everything for YouTube.
What surprised me most was how much the process felt like directing rather than simply “asking AI to make something.” The tools do not just produce a finished work on demand. You still have to make choices. You have to decide what the world is, what the mood is, what the camera should do, what to cut, what to keep, what to emphasize, and when something feels right.
The result is not traditional filmmaking, but it is not passive automation either. It is a new kind of creative collaboration with image models, video models, language models, and music models. Each system contributes something different, and the human role becomes one of orchestration, taste, revision, and direction.
For me, the project began as a desire to make something fun to watch while listening to music. But it became a way to explore how quickly a person can now build entire visual worlds using accessible AI tools. With enough patience, curiosity, and iteration, a simple idea can become a miniature cinematic universe.



No comments:
Post a Comment