Plot'n Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models

1Virginia Tech
2Adobe Research

We introduce Plot'n Polish, a training-free approach for creating and refining story visualizations. Our framework enables users to adjust story elements through fine or coarse-grained edits. Users can alter elements like hairstyles or clothing, transform objects or styles, and customize characters iteratively all directed through text prompts.

Abstract

Text-to-image diffusion models have demonstrated significant capabilities in generating detailed and diverse visuals across various domains, with story visualization emerging as a particularly promising application. However, as their use in real-world creative domains increases, the need for providing enhanced control, refinement, and the ability to modify images post-generation in a consistent manner become an important challenge. Existing methods often lack the flexibility to apply fine or coarse edits while maintaining visual and narrative consistency across multiple frames, preventing the creators from seamlessly crafting and refining their visual stories. To address these challenges, we introduce Plot'n Polish, a zero-shot framework that enables consistent story generation and provides fine-grained control over story visualizations at various levels of detail.

Method


An overview of Plot'n Polish. Users can provide story plots for each frame and image prompts, or these can be generated by the LLM based on the story idea. The image prompts are used to create template images for visualizing the story. The editing framework takes editing prompts in the form of text or images, along with initial images to edit and extracted depth conditions.

Qualitative Results


Qualitative results for Plot'n Polish. Our results demonstrate that Plot'n Polish excels in producing consistent visual narratives and allows for a wide range of successful edits including localized edits, character or object replacements, and personalization.

Qualitative Comparison


Qualitative comparison of our method with state-of-the-art story visualization methods, including StoryDiffusion, ConsiStory, AutoStudio, and Intelligent Grim. Our method outperforms competitors by maintaining consistent visual elements, such as attire and character features, across all panels, ensuring narrative coherence. In contrast, existing methods struggle with inconsistencies, blending errors, often breaking narrative flow and reducing clarity.