Inside Figma MCP — How your UI gets turned into a Figma design

March 7th, 20266 mins read

Inside Figma MCP — How your UI gets turned into a Figma design

Recently, Figma launched its MCP server, opening up a new way for AI agents like Claude Code to interact directly with the design environment. I wanted to try it out, but more than just using it, I was curious about what was actually happening behind the scenes.

When tools feel a bit magical, there’s usually a clever architectural idea hiding underneath. To experiment with it, I used a small e-commerce app I built with Next.js and Hygraph (using the Hygraph MCP btw). I wrote about how I built it on their blog.

Now I wanted to move the UI into Figma. This workflow has quietly become the norm for a lot of teams building with AI-assisted development. In our case, while building Spidra, we've noticed a recurring pattern in how things evolve:

  1. We use AI agents to help generate ideas, layouts, and rough interfaces.
  2. Our designers take those ideas and turn them into something that actually feels intentional and usable.
  3. We then implement the refined design.

What’s been missing in this loop is a smooth bridge between the running application and the design tool. This is exactly where the Figma MCP becomes interesting.

With it, you can capture a live web page and convert it directly into a Figma design with no manual recreation. Just the actual UI translated into editable design layers.

But that’s not really the focus of this article. Before even trying it, I found myself wondering “what actually happens behind the scenes when a live website turns into a Figma design?”

Surprisingly, I couldn’t find a clear explanation of how the process works. After running the workflow myself and watching what happens step by step, the architecture started to make a lot more sense.

I’m sure the full implementation inside Figma is far more sophisticated than what I’ll describe here, but the process becomes surprisingly understandable once you look at the pieces involved.

The architecture behind the capture process

Having seen the whole thing run, I think I can say the capture process involves four main pieces working together:

  1. The AI agent
  2. The Figma MCP server
  3. A browser capture script
  4. Figma’s backend conversion pipeline

What initially feels like a single action of me typing “export this page to Figma” is actually a small orchestration of steps happening across these components.

When I asked the agent to export my running Next.js application to Figma, the first thing it did was talk to the Figma MCP server to create what’s essentially a capture session.

This session produces a unique capture ID, something that looks like a UUID. That ID represents a temporary request.

Once that capture session exists, the rest of the process moves to the browser.

The capture script

I then noticed that the capture needs a small helper script for the browser to work. In my case, the agent automatically injected a browser-side capture agent script into the Next.js document file.

It sits quietly on the page until a capture request is triggered. Instead of needing a browser extension or special configuration, the capture script simply watches the URL hash.

When the page is opened with something like #figmacapture=<capture-id> in the URL, the script wakes up and begins the capture process.

Upon opening the URL, the script begins inspecting the rendered page. You will see a small toast showing “sending to Figma,” and then, when it’s done, you will see “Sent to Figma”. You can then check Figma, and voila, it’s there.

This is a clever design choice in my opinion because it means the workflow works with any running website (local development servers, staging environments, or production sites) without requiring deep integration.

Capturing the rendered DOM

The key idea behind the capture process is that it does not try to understand your application code, parse React components, inspect JSX, or examine your framework. Instead, it captures what the browser has already done, which is the fully rendered DOM.

By the time the capture script runs, several things have already happened:

  • Next.js has rendered the page
  • data from Hygraph has been fetched
  • the browser has computed layout
  • styles have been applied
  • images have loaded

At this point, the browser has a complete representation of the UI. The capture script walks through the DOM tree and collects things like:

  • element structure
  • computed CSS styles
  • layout positions and dimensions
  • typography and fonts
  • images and background assets
  • SVG elements
  • shadows, borders, and radii

This is why the approach works well even for dynamic pages. In my case, the /products page pulls product data from Hygraph and renders cards dynamically. By the time the capture script runs, those product images, titles, and prices are already part of the DOM, so they get captured naturally.

The script isn’t interested in how the page was built, only what the browser ended up rendering.

Converting the page into Figma nodes

For this, I did a little research and noticed that when Figma receives the capture payload, it needs to translate the web structure into a format that fits its internal design model.

In practice, this means mapping common HTML constructs to their closest Figma equivalents. For example:

Web ElementFigma Representation
div containersframes
imagesimage fills
text elementstext layers
flex layoutsauto layout frames
box shadowsdrop shadow effects

Layout properties such as padding, spacing, and alignment often translate fairly well into Figma’s auto-layout system, which is why the resulting designs remain editable rather than being flattened.

Images are resolved using the URLs captured from the page, which is why assets from my Hygraph CDN appeared correctly in the resulting Figma file.

Once this translation is complete, the generated node tree is inserted into the specified Figma file.

Closing thoughts

After experimenting with the workflow, what initially looked like a bit of magic started to feel more like a well-designed pipeline:

  • A capture session gets created.
  • The browser serializes the rendered page.
  • Figma converts that structure into editable design nodes.

There are undoubtedly deeper layers inside Figma’s implementation than what I’ve outlined here, but understanding the core mechanics makes the process feel much less mysterious.

As MCP servers continue to appear across tools like design platforms, CMS systems, and development environments, we’re starting to see a new kind of workflow emerge, one where AI agents orchestrate tools across the entire product lifecycle.


Joel Olawanle

Joel Olawanle

Software Engineer & Technical Writer. Building Spidra & NGN Market.

Follow on Twitter