← Back to detector

How the detector works

When you paste a URL, we run several parallel checks and combine the results into a single confidence score. Here's exactly what happens.

1

Fetching the page

We send a request that mimics a real browser — the sameUser-Agent,Accept and language headers Chrome sends. This fools most servers into returning the full page rather than an empty or challenge response.

If the page is protected by a Cloudflare WAF or returns an error, we don't stop — we continue with the other checks below and flag the fetch failure in the result.

2

HTML fingerprint matching

We scan the HTML for over 100 patterns that AI builders leave behind:

  • Generator meta tags — e.g. <meta name="generator" content="Lovable">
  • Script and asset paths specific to each builder — e.g. /lovable.dev/ CDN URLs
  • Data attributes injected by builder runtimes — e.g. data-lovable, data-slot=
  • Component library signatures — shadcn/ui, Radix UI, Lucide icons, cmdk
  • Visual design patterns in Tailwind classes — from-purple-500, backdrop-blur-xl, bg-clip-text text-transparent
  • Emoji density in headings — ✨ and 🚀 appear in a majority of Lovable/v0 landing pages
3

DNS lookups (TXT, CNAME, NS)

DNS runs in parallel with the page fetch — so even when a site blocks our crawler, we still get DNS data. We query five record types:

  • TXT records — platform verification strings like vc-domain-verify (Vercel), netlify-domain-verification, squarespace-domain-verification
  • CNAME records — reveal the hosting platform (e.g. CNAME pointing to herokudns.com)
  • NS records — nameserver patterns identify Cloudflare, Vercel, Netlify hosting
4

HTTP headers

Response headers frequently reveal the platform even when the HTML is minimal. We check server, x-powered-by, x-vercel-id, x-nf-request-id (Netlify), and dozens more. When the main GET is blocked, a lightweight HEAD request often still returns these headers.

5

Aux resources — robots.txt & sitemap.xml

These files are rarely behind WAF protection. A sitemap URL like https://cdn.lovable.dev/… or a robots.txt disallow entry for /wp-admin reveals the platform without ever loading the main page.

6

WHOIS / RDAP domain age

We query the RDAP registry for the domain's registration date. Domains registered in the past year score higher — not because new domains are always vibe coded, but because the overwhelming majority of AI-built projects use freshly registered domains.

7

Wayback Machine fallback

When both the live page fetch and HEAD request fail, we query the Internet Archive for the most recent snapshot of the URL. Archive.org serves the stored HTML directly — bypassing any WAF on the origin. We strip the Wayback toolbar (using the id_ flag) so the result is clean captured HTML.

If a Wayback snapshot is used, the result card shows a blue banner noting that results may reflect an older version of the site.

8

Confidence score

Every signal has a weight. The final score is the sum, clamped to 0–100:

SignalScore
Direct builder fingerprint (HTML/DNS)+40 per tool
shadcn/ui component library detected+20
Vibe Design pattern (gradient, glassmorphism…)+15 each
Domain registered < 1 year ago+15
AI buzzword density in copy (≥ 3 distinct)+10
Traditional builder detected (WordPress, Shopify…)−35 per builder
0–24%
Unlikely
25–49%
Possibly
50–74%
Likely
75–100%
Almost certainly

Try it on any website

Paste a URL and see the full score breakdown — which signals fired, what the DNS says, and whether Wayback was used.

Check a website →