Drawing with Vision-LLMs (independent research). Summer → Winter 2024

Can LLMs succesfully draw? This work explores an iterative generation process, where an LLM outputs pieces of a scene, part-by-part, in SVG format. Additionally, to compensate for their weaknesses in spatial understanding, we use an iterative placement agent for each individual part, allowing it to refine the placement and scaling until it is visually correct. This work also explore’s LLMs planning capabilities; once a component is placed, it can’t be moved. Can it order operations so that objects at the back are drawn first?

pyramids starry night

click here to return to projects