Making a Soláyae ad without a film crew / 9 Tunnels

This is a still from a 15-second ad for Soláyae. The model walks down a cobblestone street in Intramuros, Manila, glances off-camera, smiles, and the logo lands.

It took about a day to make. Call it six hours of actual work. Over two hours of that was fighting with audio. I’ll get to that.

Why I’m doing this at all

Soláyae is one of three businesses I’m involved in. My wife Elisa founded it and runs everything that touches the bags themselves: the design, the materials, the weavers, the quality. I co-founded it with her and handle the other side: operations, marketing, getting the bags from the Philippines to the United States, the financials. She builds the soul of the brand. I build the plumbing around it.

We’re a startup, which is a polite way of saying our marketing budget has a ceiling. We also have a curiosity budget, and that one is bigger.

So this post is a prototype, not a pivot. I’m a co-founder playing with tools, not a marketing expert. The real marketing work at Soláyae belongs to Ella Naling, our digital marketing lead. In four months she has built our site, shaped our branding, and runs our social presence (and quietly helps across my other businesses too). This particular experiment is mine. For now I’m just figuring out what the tools can and can’t do, and whether any of it is worth handing to her later.

The concept

Simple and intentional: old world against new. Intramuros is two centuries older than the United States (Spanish-founded in 1571, though much of what stands today was restored after the 1945 Battle of Manila), and I wanted that backdrop pressed up against a very modern, very clean designer bag.

Color-wise: muted grays and browns behind, black and cream on the bag, neutral clothing on the model. The bag is the point; everything else gets out of its way.

The action is almost nothing. A Filipina woman stands waiting on the sidewalk, starts walking, looks off-camera, smiles, and the logo appears. Fifteen seconds total.

Here’s the twist

I made this whole thing without a model, without a film crew, without a photographer, without leaving my desk.

The entire ad, except for the logo and the bag, was generated by AI.

Before anyone decides that’s either cool or cursed, let me say the loudest part first.

The ethics call, said out loud early

The model in the video is not a real person. That was deliberate, for two reasons.

First, consent. Using someone’s face, even if you pay them, gets complicated fast once it runs through an AI pipeline. I didn’t want to use a real person’s likeness without their explicit permission for that specific use. The simpler answer was to not use any real person’s likeness at all.

Second, I didn’t want to decide what a modern Filipina looks like. I asked the AI to generate its interpretation, looked at several candidates, and picked one. I’d rather let the model be imagined than type out a list of features and end up with my own assumptions rendered as a face.

A fair pushback: why not hire an actual Filipina model? We are. In the coming weeks we’re hiring three models, a videographer who is also our photographer, a makeup artist, and a marketing specialist to direct on-site. We’re paying venue and location fees across the Philippines, and shooting in places and with materials that can’t be replicated anywhere else in the world. The AI experiment isn’t replacing any of that. It’s sitting next to it. I’ll come back to where AI actually fits at the end.

For now, the process.

Making the scene

I started with a real photograph of Intramuros. I looked at a lot of options and picked this one: a sidewalk, a cobblestone street, cafe windows, old balconies, a place that reads as a place instead of a stage.

Reference photograph of a cobblestone street in Intramuros, Manila

Then I photographed the actual bag. AI is creative with logos (more on that later), and I wanted the generator to understand the shape, the stitching, and the hardware of the real product before it got anywhere near the final composition.

For the model, I generated several candidates in Nano Banana (Google’s image model) and picked one. Nothing dramatic, basically casting. Then I composited the three pieces together: model, bag, Intramuros backdrop. This is the still that the video was generated from.

AI-generated composite of a Filipina model holding a Soláyae handbag on a cobblestone street in Intramuros

Asking an AI to prompt an AI

For the video prompt itself, I used Claude. I fed it the composite image and asked it to write a prompt covering camera motion, pacing, framing, focus on the bag, aspect ratio, and length. Then I ran that prompt through Gemini Veo, and also experimented with Higgsfield on a few of the takes.

Then I ran into the first hard limit. The generators cap out at around eight seconds per clip. A 15-second ad had to be two prompts stitched together.

Multiple takes, except they’re prompts

I’m not a film production specialist, but the closest analogy I can think of is a director calling for another take. Generate a clip, it’s almost right, tweak the prompt, try again. Several takes before landing on each of the two segments.

One of those takes is worth showing on its own. Fair warning: it’s a little unsettling. Press play if you’re curious, or skip past to the next section if not.

This is what happens when the AI loses track of where the head connects to the body. Mid-clip, her head rotates nearly 180 degrees on a torso that stays facing forward. Physics and biology both break in the same second. It’s funny and genuinely unsettling at once, which is not a tone I was aiming for in a handbag ad. It’s also the kind of mistake that’s instantly recognizable as AI, which is why most process posts about AI video quietly skip past this stage.

I’m not skipping it. It’s honest, and this is what “figuring it out” actually looks like.

The audio problem nobody talks about

Both Higgsfield and Gemini include audio on the videos they generate. That sounds convenient until you try to stitch two eight-second clips together and discover that the audio doesn’t carry across the cut.

So I needed one 15-second audio bed that fit the whole ad.

I tried Gemini. I tried Suno. I tried a couple of others. None of them could reliably produce exactly 15 seconds in the style I wanted. Too long, too short, wrong mood, abrupt endings.

I spent over two hours on this before giving up and going to Shutterstock. I found a 15-second clip that fit in under fifteen minutes.

The last mile still isn’t AI. Not for this, not yet.

Final Cut Pro, the unsexy part

I brought the two video clips, the Shutterstock audio, and the Soláyae logo into Final Cut Pro and edited it the way you’d edit any other video.

The logo is its own story.

AI video generators hallucinate the Soláyae name constantly. Sometimes the accent on the “á” disappears. Sometimes the letters rearrange. Sometimes the logo becomes a different logo entirely. With branding there’s no “almost right”. A company name is either spelled correctly or it isn’t. So the logo drop went in as the real image file above, placed in Final Cut, not as anything generated.

This is the part that surprised me. For all the AI in the pipeline, the final ad still required a human sitting at an editor, moving clips around, placing a logo, nudging timing. The tools haven’t closed that loop.

The reveal

We’re not the first

Before anyone pushes back on the “AI in an ad” framing, it’s worth saying that fashion has been here for a while, including in handbags specifically.

In December 2024, Valentino released an AI-generated Instagram video for its Garavani DeVain handbag, a surreal piece with bodies morphing into the bag. It got called “AI slop” and “disturbing” by its own audience. A cautionary tale, but also a data point: a luxury house with every resource in the world still chose to try.

Smaller brands have had a warmer reception. Handbag maker Marina Raphael has been using Midjourney since 2024 for “factory of the future” style fantasy campaigns featuring her signature plexiglass-handled bags. The Business of Fashion piece frames her as part of a broader pattern: smaller brands using AI to produce work that would otherwise be unaffordable.

Outside of handbags: Mango’s Teen line launched an entirely AI-generated photo campaign in July 2024 called “Sunset Dream”, real product photographed on AI-generated models in an AI-generated Moroccan Medina. And Levi Strauss famously partnered with Lalaland.ai back in 2023 to test AI-generated models as a supplement to real ones, which drew significant backlash and a public clarification from the brand.

The pattern across all four: the bar is curation. When the work is good and the brand is honest about what it is, people mostly shrug. When it feels cheap, lazy, or dishonest, the audience smells it instantly.

Soláyae isn’t the first brand to try this. We’re just being unusually open about the part where it doesn’t work yet.

Where AI actually fits

For the record, here’s everything we’re investing in that isn’t AI. We already have a library of real product photography and video from prior shoots with our bags, done properly, with real cameras pointed at real products. In the coming weeks we’re adding to it: three Filipina models hired for a photoshoot in the Philippines, plus a videographer, a makeup artist, and a marketing specialist on the ground directing. Venue rentals, location licenses, production logistics we’re learning to schedule as we go. Shooting on location across the Philippines: Intramuros, artisan workshops where buntal and raffia are woven by hand, beaches, islands, gardens, and colonial towns. Places with a light, a local flora, and a texture you cannot replicate anywhere else in the world.

AI isn’t replacing any of that. It’s filling gaps we can’t otherwise afford to fill right now. When our marketing budget grows, some of what AI is doing today will move back to humans. Some of it probably won’t. That’s the honest answer.

The question (actually, two)

A bit of honesty first: I’m a co-founder who had never made an AI-generated ad before I started this one. Most of what’s in this post is me learning out loud. So there are two things I’d genuinely love your help with.

The first is the ethics question. As a small brand just getting started, where do you think AI belongs in the marketing mix? What feels fair, and what crosses a line? Would you rather a startup use AI to stay alive and fund its first real photoshoot, or skip the AI entirely and do less, more slowly? Is there a point where you stop noticing, and does that point matter?

The second is practical. Looking at the workflow I just walked through — Nano Banana for the composite, Gemini Veo and Higgsfield for the video, Claude for the prompt writing, Shutterstock for the audio, Final Cut Pro for the edit — what would you change? Better tools I should be trying? Steps I’m doing the hard way? A different audio pipeline that actually hits fifteen seconds? There are people reading this who have done this many more times than I have, and I’d rather learn from what you’ve already figured out than keep bumping into the same walls.

I don’t have the answers yet. I’m figuring it out, one head-turn at a time.