Codia
Retour aux articles

From Flat PDF to Structured UI: Inside the PDF-to-Visual-Struct Pipeline

Engineering2026-04-23

The problem with PDFs

PDFs are everywhere — product specs, financial reports, compliance forms, legacy design deliverables — and they are almost uniformly hostile to automation. A PDF is a flat stream of drawing instructions: "move to (120, 440), show the glyph 'H' in Helvetica 12, move right, show 'e'…". There is no notion of a button, a card, or a header. There is no hierarchy. There isn't even a reliable concept of a word.

That's the gap we built PDF to Visual Struct to close. Send a PDF to POST /v1/open/pdf_to_design, and you get back the same Visual Element Schema that powers Codia Studio: a hierarchical JSON tree of typed UI elements with bounding boxes, layout configs, and style specs. The tree is machine-readable, designer-readable, and one transform away from runnable code.

This post walks through the pipeline — what the API does, the schema it emits, and what you can build on top of it.

Calling the API

The request is a single multipart/form-data POST. No SDK is required.

bash
curl 'https://api.codia.ai/v1/open/pdf_to_design' \ -H 'Authorization: Bearer {codia_api_key}' \ -H 'Content-Type: application/json' \ --form 'pdf_file=@"invoice.pdf"' \ --form 'page_no="0"'

Two inputs: the PDF file, and a zero-indexed page number. If you have a 40-page document and you want every page, loop and parallelize — the API is stateless per page and comfortable under concurrency. The advertised targets are 100+ pages per document, 50+ element types, and sub-two-second responses per page. In practice most single-page PDFs come back in 600–1200 ms once the file is on our ingress.

Anatomy of the response

The top level looks like this:

json
{ "configuration": { "scalingFactor": 1, "baseWidth": 1940, "measurementUnit": "px" }, "pages": [ { "visualElement": { /* root element */ } } ], "size": { "height": 1080, "width": 1940 } }

configuration tells you how coordinates should be interpreted. baseWidth and measurementUnit are the reference frame — if you resize your viewport, scale every bounding box by the ratio. size is the normalized canvas the layout was solved for.

Each page has a single visualElement root. Walking its childElements array is a recursive depth-first traversal of the document. Every node carries the same core fields:

json
{ "elementId": "pdf_page_1", "elementName": "PDF Document Page", "elementType": "Panel", "displayName": "First Page", "displayOrder": 0, "boundingBox": [0, 0, 595, 842], "layoutConfig": { "positionMode": "Normal", "flexibleMode": "Absolute" }, "styleConfig": { "widthSpec": { "sizing": "FIXED", "value": 595 }, "heightSpec": { "sizing": "FIXED", "value": 842 }, "backgroundSpec": { "type": "COLOR", "backgroundColor": { "rgbValues": [255, 255, 255] } } }, "processingMeta": { "surfaceArea": 501490, "detectionScore": 0.92, "textContainerized": false }, "childElements": [] }

The fields worth knowing:

  • elementType — the strong type. Panel, Text, Image, Icon, Button, Table, Chart, and a few dozen more. This is what makes the output actionable; you can switch on it to generate HTML, Flutter widgets, or React components without heuristics.
  • boundingBox[x, y, width, height] in the base coordinate system. Use this to position children in absolute layouts or to detect overlaps and groups when post-processing.
  • layoutConfig — tells you whether the parent laid children out as a flow, a flex row/column, or free-positioned (Absolute). Inspired by Figma's auto-layout model, which makes round-tripping to Figma trivial.
  • styleConfig — width/height sizing strategy (FIXED, FILL_CONTAINER, HUG_CONTENT), background paint, border, corner radius, typography. Mirrors CSS concepts closely enough that generators can emit clean stylesheets.
  • processingMeta.detectionScore — a 0–1 confidence the detector has in this element. Handy when you want to discard low-confidence nodes before generating downstream artifacts.

What the pipeline actually does

Turning a PDF into this schema is not a single model call — it's a short pipeline:

  1. Rasterize + extract. We parse the PDF stream to pull text runs, vector paths, and embedded raster images. Scanned pages drop into an OCR pass that runs in parallel.
  2. Layout analysis. A vision model segments the page into regions — headers, body blocks, sidebars, figures, tables, captions — and assigns each one a coarse type. This is where the Panel boundaries come from.
  3. Element detection. A denser model inside each region identifies individual UI elements — buttons, icons, form fields, chart series. This stage produces the elementType classifications.
  4. Text grouping. Raw glyph positions are clustered into words, lines, and paragraphs, then attached to the nearest container element.
  5. Schema assembly. The hierarchy is materialized, parent-child links are resolved, and per-element styles are computed. Coordinates are normalized to baseWidth.

Steps 2 and 3 are the bulk of the latency budget. Everything before and after is comparatively cheap.

What you can build on top

The schema is deliberately generic. A few concrete things teams have shipped on top of it:

Design import. Convert a legacy PDF spec into editable Figma frames in one pass. Because the layout config mirrors Figma auto-layout semantics, auto-layout rebuilds correctly on import.

Legacy modernization. Take a 500-page PDF manual and generate a responsive web version. Each page becomes a route; the element types drive component selection.

Visual regression testing. Render a baseline page in two builds, run both through the API, and diff the element trees. Element-level diffs are far more stable than pixel diffs under font shifts or subpixel anti-aliasing.

RAG with layout awareness. When you index a PDF for retrieval, ditch the raw text dump. Chunk by element, preserve parent-child relationships, and you can answer "what's the CTA on page 4" or "what's the third column of the second table on page 11" without fuzzy heuristics.

Chart extraction. Chart nodes carry enough metadata (axis labels, series colors, grouped text) that you can lift them into live chart components instead of rendering a screenshot.

A note on accuracy

Detection quality depends on source quality, PDF structure, scan clarity, and layout complexity. Product-design PDFs — pitch decks, UI specs, mobile app exports — are easier than scanned legal documents and hand-annotated forms. Expect more OCR noise and lower detectionScore values on hard inputs, and plan to filter.

Getting started

  1. Grab an API key from codia.ai/dashboard/developer.
  2. Try the curl above against any PDF you have handy.
  3. Walk the response tree with your language of choice — the schema is plain JSON, no SDK needed.
  4. If you want to skip the parsing and go straight to a Figma file, check the PDF to Design Figma plugin built on the same pipeline.

Full reference, request/response shapes, and error codes are in the developer docs. Ping [email protected] if you need higher rate limits or a private deployment — the pipeline ships as an isolated container for enterprises with data residency requirements.

PDFs will not disappear. They've been the stubborn last mile for every design-automation effort for a decade. Giving them a clean, typed structure is the first step to making them first-class citizens again.

#pdf#visual-struct#api#design-automation#schema