StarVector: A Multimodal Model for Generating SVG Code from Images and Text
What is StarVector?
StarVector is a multimodal vision-language model (VLM) designed specifically for Scalable Vector Graphics (SVG) generation. It can produce high-precision, semantically rich SVG code through both Image-to-SVG and Text-to-SVG methods. Unlike traditional curve vectorization techniques, StarVector operates directly at the SVG code level, allowing it to accurately utilize SVG primitives (such as ellipses, rectangles, polygons, and text), thus avoiding common distortions and artifacts seen in conventional methods.
Core Technologies of StarVector
1. Multimodal Architecture
StarVector employs a multimodal architecture capable of processing both images and text as inputs:
- Image-to-SVG: Converts images into visual tokens and then generates SVG code.
- Text-to-SVG: Creates new SVGs purely from text instructions, without needing an image.
The model is built upon StarCoder, enabling it to transfer coding capabilities to the SVG generation domain, ensuring concise and syntactically correct code.
Challenges in SVG Generation & StarVector’s Advantages
1. Overcoming Traditional Method Limitations
Traditional SVG generation methods, such as AutoTrace, Potrace, and VTracer, primarily rely on curve fitting and lack semantic understanding of images. This often results in distorted or overly complex path data, making it difficult to handle complex SVG elements.
StarVector’s advantages:
- Semantic Understanding: The model can analyze image content and correctly select appropriate SVG primitives (e.g., circles, rectangles, polylines).
- Concise Code: Directly outputs structured and compact SVG code instead of complex
<path>
data.
- Supports Various SVG Generation Scenarios: Including logos, technical diagrams, and icons.
2. More Accurate Evaluation Metrics
Many previous SVG generation methods relied on pixel-level evaluation metrics (e.g., MSE), which fail to measure the true semantic accuracy of SVGs. To address this, the StarVector team developed SVG-Bench, a benchmark specifically designed to assess SVG generation quality, covering 10 datasets and 3 types of SVG generation tasks:
- Image-to-SVG
- Text-to-SVG
- Diagram-to-SVG
StarVector Models & Evaluation Results
Currently, StarVector offers two model versions, both available for download on Hugging Face:
- 💫 StarVector-8B
- 💫 StarVector-1B
In SVG-Bench testing, StarVector outperformed all baseline models in DinoScore metrics:
Method |
SVG-Stack |
SVG-Fonts |
SVG-Icons |
SVG-Emoji |
SVG-Diagrams |
AutoTrace |
0.942 |
0.954 |
0.946 |
0.975 |
0.874 |
Potrace |
0.898 |
0.967 |
0.972 |
0.882 |
0.875 |
VTracer |
0.954 |
0.964 |
0.940 |
0.981 |
0.882 |
Im2Vec |
0.692 |
0.733 |
0.754 |
0.732 |
- |
LIVE |
0.934 |
0.956 |
0.959 |
0.969 |
0.870 |
DiffVG |
0.810 |
0.821 |
0.952 |
0.814 |
0.822 |
GPT-4-V |
0.852 |
0.842 |
0.848 |
0.850 |
- |
💫 StarVector-1B |
0.926 |
0.978 |
0.975 |
0.929 |
0.943 |
💫 StarVector-8B |
0.966 |
0.982 |
0.984 |
0.981 |
0.959 |
Note: StarVector is not designed for natural images or illustrations since its training data primarily consists of icons, technical diagrams, charts, and logos.
SVG-Bench Dataset Overview
StarVector’s training data comes from SVG-Bench, a specialized dataset for SVG generation models, covering 10 sub-datasets, each targeting different SVG generation scenarios:
Dataset |
Training Set |
Validation Set |
Test Set |
Avg. Token Length |
Supported SVG Primitives |
Annotation Type |
SVG-Stack |
2.1M |
108k |
5.7k |
1,822 ± 1,808 |
All SVG Primitives |
Image Annotation |
SVG-Stack_sim |
601k |
30.1k |
1.5k |
2,000 ± 918 |
Vector path |
- |
SVG-Diagrams |
- |
- |
472 |
3,486 ± 1,918 |
All SVG Primitives |
- |
SVG-Fonts |
1.8M |
91.5k |
4.8k |
2,121 ± 1,868 |
Vector path |
Font Annotation |
SVG-Fonts_sim |
1.4M |
71.7k |
3.7k |
1,722 ± 723 |
Vector path |
Font Annotation |
SVG-Emoji |
8.7k |
667 |
668 |
2,551 ± 1,805 |
All SVG Primitives |
- |
SVG-Emoji_sim |
580 |
57 |
96 |
2,448 ± 1,026 |
Vector path |
- |
SVG-Icons |
80.4k |
6.2k |
2.4k |
2,449 ± 1,543 |
Vector path |
- |
SVG-Icons_sim |
80.4k |
2.8k |
1.2k |
2,005 ± 824 |
Vector path |
- |
SVG-FIGR |
270k |
27k |
3k |
5,342 ± 2,345 |
Vector path |
Image Classification & Annotation |
Conclusion: Why StarVector Matters
SVGs play a crucial role in icons, branding, technical diagrams, and map design. StarVector is currently the most advanced Image-to-SVG and Text-to-SVG generation model. Compared to traditional curve-fitting methods, it offers:
✅ Semantic Understanding, ensuring accurate image structure recognition
✅ Concise Code, generating more efficient SVGs
✅ More Accurate Evaluation Metrics, overcoming pixel-based limitations
✅ Support for Hugging Face training & testing, available for developers
StarVector makes AI-generated SVGs more precise and reliable, opening up new possibilities for vector graphics applications. 💡
👉 Resources: