How the detector works
When you paste a URL, we run several parallel checks and combine the results into a single confidence score. Here's exactly what happens.
Fetching the page
We send a request that mimics a real browser — the sameUser-Agent,Accept and language headers Chrome sends. This fools most servers into returning the full page rather than an empty or challenge response.
If the page is protected by a Cloudflare WAF or returns an error, we don't stop — we continue with the other checks below and flag the fetch failure in the result.
HTML fingerprint matching
We scan the HTML for over 100 patterns that AI builders leave behind:
- →Generator meta tags — e.g. <meta name="generator" content="Lovable">
- →Script and asset paths specific to each builder — e.g. /lovable.dev/ CDN URLs
- →Data attributes injected by builder runtimes — e.g. data-lovable, data-slot=
- →Component library signatures — shadcn/ui, Radix UI, Lucide icons, cmdk
- →Visual design patterns in Tailwind classes — from-purple-500, backdrop-blur-xl, bg-clip-text text-transparent
- →Emoji density in headings — ✨ and 🚀 appear in a majority of Lovable/v0 landing pages
DNS lookups (TXT, CNAME, NS)
DNS runs in parallel with the page fetch — so even when a site blocks our crawler, we still get DNS data. We query five record types:
- →TXT records — platform verification strings like vc-domain-verify (Vercel), netlify-domain-verification, squarespace-domain-verification
- →CNAME records — reveal the hosting platform (e.g. CNAME pointing to herokudns.com)
- →NS records — nameserver patterns identify Cloudflare, Vercel, Netlify hosting
HTTP headers
Response headers frequently reveal the platform even when the HTML is minimal. We check server, x-powered-by, x-vercel-id, x-nf-request-id (Netlify), and dozens more. When the main GET is blocked, a lightweight HEAD request often still returns these headers.
Aux resources — robots.txt & sitemap.xml
These files are rarely behind WAF protection. A sitemap URL like https://cdn.lovable.dev/… or a robots.txt disallow entry for /wp-admin reveals the platform without ever loading the main page.
WHOIS / RDAP domain age
We query the RDAP registry for the domain's registration date. Domains registered in the past year score higher — not because new domains are always vibe coded, but because the overwhelming majority of AI-built projects use freshly registered domains.
Wayback Machine fallback
When both the live page fetch and HEAD request fail, we query the Internet Archive for the most recent snapshot of the URL. Archive.org serves the stored HTML directly — bypassing any WAF on the origin. We strip the Wayback toolbar (using the id_ flag) so the result is clean captured HTML.
If a Wayback snapshot is used, the result card shows a blue banner noting that results may reflect an older version of the site.
Confidence score
Every signal has a weight. The final score is the sum, clamped to 0–100:
| Signal | Score |
|---|---|
| Direct builder fingerprint (HTML/DNS) | +40 per tool |
| shadcn/ui component library detected | +20 |
| Vibe Design pattern (gradient, glassmorphism…) | +15 each |
| Domain registered < 1 year ago | +15 |
| AI buzzword density in copy (≥ 3 distinct) | +10 |
| Traditional builder detected (WordPress, Shopify…) | −35 per builder |
Try it on any website
Paste a URL and see the full score breakdown — which signals fired, what the DNS says, and whether Wayback was used.
Check a website →