We run an AI film studio. We've made films for governments and billion-dollar companies, and for the last few months we've been deep inside Seedance 2.0… testing every technique, documenting what actually works.
This is the complete technical playbook, for free.
How to access Seedance 2.0 right now
There's only one way (actually 2). Download CapCut, set your VPN to Indonesia, and you get full access to Seedance 2.0 inside the app. No waitlist, no API application, no special access. Indonesian VPN, CapCut, you're in.
The constraints that shape everything
Seedance generates clips up to 15 seconds at 720p. That's the ceiling. No one-minute generations, no 4K native output.
The way you build a real video is cuts. One frame becomes a 5-second clip. Another becomes 10. Another becomes 15. You stitch them and you have a one-minute video built from individual shots… cuts of 5, 10, or 15 seconds assembled into something that flows. A real editor works the exact same way.
Stop thinking "I need to generate a video" and start thinking "I need to generate shots," because the moment you make that switch, Seedance goes from a toy to a production tool.
Break your idea into scenes first
Before you touch CapCut, map your video on paper. Scene 1: wide establishing, 5 seconds. Scene 2: close-up reaction, 10 seconds. Scene 3: reverse angle, 15 seconds. Write specific notes for every scene… setting, framing, action, mood, sound.
A one-minute video is four to five scenes. Two minutes is eight to twelve. Each generated separately, stitched in post. This works because Seedance is at its strongest when it focuses on one shot, one action, one moment at a time.
The decision that determines your workflow
Do you need consistent characters across the video, or just consistent environments and style?
No consistent characters: go full text-to-video. Set a style anchor, build a base prompt that locks the visual language, and generate scene by scene. Seedance creates audio and video in the same pass, so prompt the narration, the ambient sound, the mood… it all gets baked into each generation. Keep descriptive language consistent across every prompt and the model pulls from the same visual space, making the final stitch feel like one continuous piece.
Consistent characters, animated or illustrated: Build storyboard frames in Nano Banana Pro first. One frame per scene with exact composition and character placement. NB Pro runs a thinking step before every generation and self-corrects for consistency, which is why it holds character identity better than anything else. Then run each frame into Seedance as image-to-video… image carries identity, prompt adds motion.
The prompt structure
Most people write prompts like essays. The model ignores 80 percent of it.
Every Seedance prompt should hit five blocks:
SUBJECT. Who or what is in the frame, wardrobe, setting, mood. Be specific and physical.
*"A man in a dark wool coat stands at the edge of a rain-soaked rooftop, shoulders tense, jaw clenched, city lights scattered behind him."*
ACTION. One single verb. One motion. "He slowly turns to face the camera." Not "he turns, walks forward, reaches out, and speaks." Multiple verbs confuse the model every time… one verb per generation, no exceptions.
CAMERA. Framing, movement, lens feel. The keywords that actually produce results:
*Movement:* "slow dolly push-in," "lateral tracking shot," "static locked-off frame," "slow pan left," "orbital movement around subject," "crane up," "handheld drift," "Steadicam follow," "POV shot."
Speed modifiers matter more than people think… "slowly" versus "rapidly" produces dramatically different output. Don't stack three moves. One primary, one optional at most.
*Framing:* "extreme close-up," "medium shot waist up," "wide establishing," "over-the-shoulder," "low angle looking up," "high angle looking down."
STYLE. One aesthetic anchor plus lighting plus color. This is where everyone writes "cinematic" and wonders why the output looks like stock footage.
Single keywords do nothing. Reinforcement pairs do everything:
"Cinematic" alone… generic, the model has seen ten million images tagged cinematic.
"Motivated warm lighting, natural film grain, shallow depth of field, lifted blacks"… the model knows exactly what you want.
*Film stock anchors that consistently hit:*"Kodak Vision3 500T" for warm cinematic tones. "ARRI Alexa color science" for high-end digital. "35mm film grain" for indie texture.
*Lighting keywords ranked by how strongly Seedance responds:*"Motivated lighting"… strongest cinematic cue in the model. "Practical light sources visible in frame"… instant realism. "Warm tungsten bounce"… intimate interiors. "Volumetric dust particles"… atmospheric depth. "Negative fill"… shapes faces with shadow and contrast.
QUALITY SUFFIX. On every single prompt, no exceptions: "4K, Ultra HD, Rich details, Sharp clarity, Cinematic texture, Natural colors, Stable picture."
Five blocks. Subject, action, camera, style, quality. Every prompt, every time.
Prompt length: the numbers
Text-to-video: 120 to 280 words. Below 30 and the model gets random. Above 280 and it starts dropping instructions.
Image-to-video: 50 to 80 words maximum, because the reference image carries identity and long prompts erode what the image established. The model splits attention between text and image, and with a long prompt the text overwrites the visual reference… which means your character drifts from the frame you gave it.
Never use negative prompts. Seedance rejects them. "Stable picture, sharp clarity" not "no blur, no shaking." Positive framing only, always.
The reference image system
Seedance supports up to 12 reference files… 9 images, 3 videos, 3 audio.
@Image1 gets 40 to 50 percent more attention weight than any other slot. Your most important reference always goes in slot one.
For character consistency: three images per character… front, three-quarter, profile. That hits 75 to 85 percent identity consistency across scenes.
Individual face crops, never grid sheets. Turnaround grids cause mosaic confusion where the model reads each panel as a separate character. Crop each angle separately.
For image-to-video, always open the prompt with "@Image1 as the first frame." No character descriptions… the image carries all identity. Describe only action and environment. Over-describing the character in text actively erodes the reference, because the model starts reconstructing the face from your words instead of from the image.
Identity lock phrase that works: "Same person as @Image1. Do not alter facial proportions, eye shape, or hairstyle." Add it to every prompt when a character carries across scenes.
The Chinese prompt technique
Seedance was built by ByteDance and trained heavily on Chinese-language data. The model's token distribution skews toward Mandarin, which means certain visual concepts map more precisely in Chinese than they do in English… spatial relationships, fabric textures, weather descriptions, and architectural detail all resolve tighter in Mandarin because the training data is denser in those categories.
In practice: write your prompt in English using the five-block structure, translate to Mandarin, and run both versions. The Chinese prompt often nails material properties and environmental nuance that the English version approximates. It's not universally better, but on prompts with complex physical descriptions… fabric moving under wind, rain hitting stone, light refracting through glass… the Chinese version locks in where English stays soft.
We test both on every project. Five minutes of translation regularly saves an hour of re-generation.
The viral content strategy
Seedance is fast enough to ride real-time trends, and that speed is a weapon.
Set notifications on Polymarket. When a topic spikes… a celebrity moment, a political event, a cultural flashpoint… you have a window of a few hours where millions of people are searching for content about that thing and almost nobody has video yet.
Grab the moment, open CapCut, generate four to five scenes, stitch into a 30 to 50-second video, post. You're the first person with a high-quality AI video about something that just happened, which means the algorithm pushes you hard because you're the only supply for massive demand.
We tested this with Meek Mill. Simple viral video, just wanted to see if the pipeline could move fast enough. Generated start frames, ran each through Seedance as image-to-video, stitched together. The prompt for a shot like that follows the exact structure:
"@Image1 as the first frame. Over-the-shoulder shot facing the monk. He leans forward and speaks with calm warmth: 'What about: I used to pray for times like this, to rhyme like this.' Candlelight flickers across his face. He pauses after speaking. No music, ambient room tone only. 4K, Ultra HD, Rich details, Sharp clarity, Natural skin texture, Stable picture."
74 words. Image carries the identity, prompt carries the motion and atmosphere. Start frame, start frame, start frame, start frame… stitched into one video. Clean enough that people couldn't tell it was AI, and it spread because the timing was right.
Same approach with Erika Kirk… trending topic, fast turnaround, Seedance pipeline, posted while demand was still peaking. The pattern works every time because you're solving a supply problem, not a quality problem.
Generating realistic humans
Everything above works inside CapCut with the Indonesian VPN… animated characters, illustrated styles, environments, style-driven content. But CapCut blocks photorealistic human generation. You can't get around that inside the app.
We use a completely different strategy for generating consistent, realistic human characters in Seedance 2.0. The Meek Mill video was proof it's possible. The full method… exact workflow, the bypass, the settings… is exclusively for our community.
Reply "ARQ" or say hello in the comments and we'll send you the Telegram link.
