Codia

Visual Struct

Overview

Visual Struct takes a UI screenshot or mockup and returns a hierarchical JSON description of every element it contains. Each element is typed (header, button, card, table, etc.), positioned (pixel-accurate bounding box), and linked to its parent so the whole image is represented as one tree.

The schema is shared with PDF to Visual Struct, so downstream consumers can accept both inputs through a single code path.

Endpoint

POSThttps://api.codia.ai/v1/open/image_to_design

Authentication is via bearer token. Get a key at codia.ai/dashboard/developer.

Request Options

Use one of these input modes:

ModeContent-TypeRequired fieldDescription
JSON URLapplication/jsonimage_urlPublicly-reachable URL of the source image.
Direct uploadmultipart/form-dataimageLocal image file. Field name file is also accepted for compatibility.

Example

bash
curl 'https://api.codia.ai/v1/open/image_to_design' \ -H 'Authorization: Bearer {codia_api_key}' \ -H 'Content-Type: application/json' \ --data '{ "image_url": "https://example.com/ui.png" }'

Or upload a local file directly:

bash
curl 'https://api.codia.ai/v1/open/image_to_design' \ -H 'Authorization: Bearer {codia_api_key}' \ -F 'image=@./screenshot.png'

Codia checks credits before reading and uploading multipart files. If credits are insufficient, the request returns 402 without uploading the file.

Response

json
{ "elementId": "header_section_001", "elementName": "HeaderSection", "elementType": "header", "displayName": "Header Section", "layoutConfig": { "positionMode": "flex", "flexibleMode": "row", "flexAttributes": { "justifyContent": "space-between", "alignItems": "center" } }, "childElements": [ /* nested elements */ ] }

Field reference

FieldDescription
elementTypeElement class — header, button, card, badge, icon, tab, table, chart, and others.
layoutConfig.positionModeflex or absolute. Flex containers expose flexibleMode (row / column) and flexAttributes.
boundingBox[x, y, width, height] in source-image pixel coordinates.
processingMeta.detectionScore0–1 confidence.
childElementsRecursive array of child elements.

Output formats

FormatUse case
jsonDefault. Best for custom downstream pipelines.
svgLossless vector re-render of the detected scene, design-tool agnostic.
figmaTree shaped for direct insertion into a Figma file via the REST API or plugin.

All three formats are post-processed from the same underlying schema.

Capabilities

CapabilityValue
Output qualityDepends on source clarity and layout complexity
Typical latency2–5 s per image
OCR languages50+
Max image size25 MB
Recommended resolution600–4000 px on the long edge
Production reviewEnterprise review available for high-volume workloads

Common patterns

Filter by confidence

js
function visibleElements(tree, threshold = 0.6) { const out = [] const walk = (n) => { if ((n.processingMeta?.detectionScore ?? 1) >= threshold) { out.push(n) ;(n.childElements ?? []).forEach(walk) } } walk(tree) return out }

Tall screenshots

Full-page marketing screenshots (10k+ pixels tall) work, but chunking at section boundaries before calling produces more stable top-level groupings.

Dark-mode UIs

The detector does not rely on background assumptions. Extracted colors reflect rendered pixels — if you need semantic tokens (the brand color as declared, not as painted), map back to your token system on your side.

FAQ

What happens with mixed-language UIs?

OCR detects language per text block, so an interface with English controls and Japanese body copy is handled correctly in a single request.

Does the API keep uploaded images?

Images are retained only long enough to complete the request and generate the response URL. Retention can be customized for enterprise deployments.

Can I run this on-premises?

Yes — the same model ships as an isolated container for enterprises. Contact [email protected].

How do I handle scrollable regions?

The API returns a flat tree; it has no concept of scroll overflow. Annotate scrolling on your side based on known UI patterns or element overflow.

Next steps