Drawing with Vision-LLMs (independent research). Summer → Winter 2024
Can LLMs succesfully draw? This work explores an iterative generation process, where an LLM outputs pieces of a scene, part-by-part, in SVG format. Additionally, to compensate for their weaknesses in spatial understanding, we use an iterative placement agent for each individual part, allowing it to refine the placement and scaling until it is visually correct. This work also explore’s LLMs planning capabilities; once a component is placed, it can’t be moved. Can it order operations so that objects at the back are drawn first?

