Visual Struct
Overview
Visual Struct takes a UI screenshot or mockup and returns a hierarchical JSON description of every element it contains. Each element is typed (header, button, card, table, etc.), positioned (pixel-accurate bounding box), and linked to its parent so the whole image is represented as one tree.
The schema is shared with PDF to Visual Struct, so downstream consumers can accept both inputs through a single code path.
Endpoint
https://api.codia.ai/v1/open/image_to_designAuthentication is via bearer token. Get a key at codia.ai/dashboard/developer.
Request Options
Use one of these input modes:
| Mode | Content-Type | Required field | Description |
|---|---|---|---|
| JSON URL | application/json | image_url | Publicly-reachable URL of the source image. |
| Direct upload | multipart/form-data | image | Local image file. Field name file is also accepted for compatibility. |
Example
curl 'https://api.codia.ai/v1/open/image_to_design' \
-H 'Authorization: Bearer {codia_api_key}' \
-H 'Content-Type: application/json' \
--data '{ "image_url": "https://example.com/ui.png" }'Or upload a local file directly:
curl 'https://api.codia.ai/v1/open/image_to_design' \
-H 'Authorization: Bearer {codia_api_key}' \
-F 'image=@./screenshot.png'Codia checks credits before reading and uploading multipart files. If credits are insufficient, the request returns 402 without uploading the file.
Response
{
"elementId": "header_section_001",
"elementName": "HeaderSection",
"elementType": "header",
"displayName": "Header Section",
"layoutConfig": {
"positionMode": "flex",
"flexibleMode": "row",
"flexAttributes": {
"justifyContent": "space-between",
"alignItems": "center"
}
},
"childElements": [ /* nested elements */ ]
}Field reference
| Field | Description |
|---|---|
elementType | Element class — header, button, card, badge, icon, tab, table, chart, and others. |
layoutConfig.positionMode | flex or absolute. Flex containers expose flexibleMode (row / column) and flexAttributes. |
boundingBox | [x, y, width, height] in source-image pixel coordinates. |
processingMeta.detectionScore | 0–1 confidence. |
childElements | Recursive array of child elements. |
Output formats
| Format | Use case |
|---|---|
json | Default. Best for custom downstream pipelines. |
svg | Lossless vector re-render of the detected scene, design-tool agnostic. |
figma | Tree shaped for direct insertion into a Figma file via the REST API or plugin. |
All three formats are post-processed from the same underlying schema.
Capabilities
| Capability | Value |
|---|---|
| Output quality | Depends on source clarity and layout complexity |
| Typical latency | 2–5 s per image |
| OCR languages | 50+ |
| Max image size | 25 MB |
| Recommended resolution | 600–4000 px on the long edge |
| Production review | Enterprise review available for high-volume workloads |
Common patterns
Filter by confidence
function visibleElements(tree, threshold = 0.6) {
const out = []
const walk = (n) => {
if ((n.processingMeta?.detectionScore ?? 1) >= threshold) {
out.push(n)
;(n.childElements ?? []).forEach(walk)
}
}
walk(tree)
return out
}Tall screenshots
Full-page marketing screenshots (10k+ pixels tall) work, but chunking at section boundaries before calling produces more stable top-level groupings.
Dark-mode UIs
The detector does not rely on background assumptions. Extracted colors reflect rendered pixels — if you need semantic tokens (the brand color as declared, not as painted), map back to your token system on your side.
FAQ
What happens with mixed-language UIs?
OCR detects language per text block, so an interface with English controls and Japanese body copy is handled correctly in a single request.
Does the API keep uploaded images?
Images are retained only long enough to complete the request and generate the response URL. Retention can be customized for enterprise deployments.
Can I run this on-premises?
Yes — the same model ships as an isolated container for enterprises. Contact [email protected].
How do I handle scrollable regions?
The API returns a flat tree; it has no concept of scroll overflow. Annotate scrolling on your side based on known UI patterns or element overflow.
Next steps
- PDF to Visual Struct — same schema, PDF input.
- Design to Code — pipe the schema into code generation.
- Full endpoint reference at /api.