Visual Struct

Overview

Visual Struct takes a UI screenshot or mockup and returns a hierarchical JSON description of every element it contains. Each element is typed (header, button, card, table, etc.), positioned (pixel-accurate bounding box), and linked to its parent so the whole image is represented as one tree.

The schema is shared with PDF to Visual Struct, so downstream consumers can accept both inputs through a single code path.

Endpoint

POSThttps://openapi.codia.ai/v2/open/image_to_design

Authentication is via bearer token. Get a key at codia.ai/dashboard/developer.

Request Options

Use one of these input modes:

Mode	Content-Type	Required field	Description
JSON URL	`application/json`	`image_url`	Publicly-reachable URL of the source image.
Direct upload	`multipart/form-data`	`image`	Local image file. Field name `file` is also accepted for compatibility.

Example

bash

curl 'https://openapi.codia.ai/v2/open/image_to_design' \
  -H 'Authorization: Bearer {codia_api_key}' \
  -H 'Content-Type: application/json' \
  --data '{ "image_url": "https://example.com/ui.png" }'

Or upload a local file directly:

bash

curl 'https://openapi.codia.ai/v2/open/image_to_design' \
  -H 'Authorization: Bearer {codia_api_key}' \
  -F 'image=@./screenshot.png'

Codia checks credits before reading and uploading multipart files. If credits are insufficient, the request returns 402 without uploading the file.

Response

json

{
  "elementId": "header_section_001",
  "elementName": "HeaderSection",
  "elementType": "header",
  "displayName": "Header Section",
  "layoutConfig": {
    "positionMode": "flex",
    "flexibleMode": "row",
    "flexAttributes": {
      "justifyContent": "space-between",
      "alignItems": "center"
    }
  },
  "childElements": [ /* nested elements */ ]
}

Field reference

Field	Description
`elementType`	Element class — `header`, `button`, `card`, `badge`, `icon`, `tab`, `table`, `chart`, and others.
`layoutConfig.positionMode`	`flex` or `absolute`. Flex containers expose `flexibleMode` (`row` / `column`) and `flexAttributes`.
`boundingBox`	`[x, y, width, height]` in source-image pixel coordinates.
`processingMeta.detectionScore`	0–1 confidence.
`childElements`	Recursive array of child elements.

Output formats

Format	Use case
`json`	Default. Best for custom downstream pipelines.
`svg`	Lossless vector re-render of the detected scene, design-tool agnostic.
`figma`	Tree shaped for direct insertion into a Figma file via the REST API or plugin.

All three formats are post-processed from the same underlying schema.

Capabilities

Capability	Value
Output quality	Depends on source clarity and layout complexity
Typical latency	2–5 s per image
OCR languages	50+
Max image size	25 MB
Recommended resolution	600–4000 px on the long edge
Production review	Enterprise review available for high-volume workloads

Common patterns

Filter by confidence

function visibleElements(tree, threshold = 0.6) {
  const out = []
  const walk = (n) => {
    if ((n.processingMeta?.detectionScore ?? 1) >= threshold) {
      out.push(n)
      ;(n.childElements ?? []).forEach(walk)
    }
  }
  walk(tree)
  return out
}

Tall screenshots

Full-page marketing screenshots (10k+ pixels tall) work, but chunking at section boundaries before calling produces more stable top-level groupings.

Dark-mode UIs

The detector does not rely on background assumptions. Extracted colors reflect rendered pixels — if you need semantic tokens (the brand color as declared, not as painted), map back to your token system on your side.

FAQ

What happens with mixed-language UIs?

OCR detects language per text block, so an interface with English controls and Japanese body copy is handled correctly in a single request.

Does the API keep uploaded images?

Images are retained only long enough to complete the request and generate the response URL. Retention can be customized for enterprise deployments.

Can I run this on-premises?

Yes — the same model ships as an isolated container for enterprises. Contact [email protected].

How do I handle scrollable regions?

The API returns a flat tree; it has no concept of scroll overflow. Annotate scrolling on your side based on known UI patterns or element overflow.

Next steps

PDF to Visual Struct — same schema, PDF input.
Design to Code — pipe the schema into code generation.
Full endpoint reference at /api.