arXiv:2508.19254v1 Announce Type: cross Abstract: This paper presents a real-time generative drawing system that interprets and integrates both formal intent – the structural, compositional, and stylistic attributes of a sketch – and contextual intent – the semantic and thematic meaning inferred from its visual content – into a unified transformation process. Unlike conventional text-prompt-based generative systems, which primarily capture high-level contextual descriptions, our approach simultaneously analyzes ground-level intuitive geometric features such as line trajectories, proportions, and spatial arrangement, and high-level semantic cues extracted via vision-language models. These dual intent signals are jointly conditioned in a multi-stage generation pipeline that combines contour-preserving structural control with style- and content-aware image synthesis. Implemented with a touchscreen-based interface and distributed inference architecture, the system achieves low-latency, two-stage transformation while supporting multi-user collaboration on shared canvases. The resulting platform enables participants, regardless of artistic expertise, to engage in synchronous, co-authored visual creation, redefining human-AI interaction as a process of co-creation and mutual enhancement.
Original: https://arxiv.org/abs/2508.19254