About StackScopeBot

StackScopeBot is our crawler that analyses new product launches to detect their tech stacks. Full scoring methodology lives on the methodology page; data handling is covered in the privacy policy.

User-Agent

Mozilla/5.0 (compatible; StackScopeBot/1.0; +https://stackscope.dev/bot)

Verifying it's really us

The User-Agent above can be copied by anyone, so we also cryptographically sign every request using Web Bot Auth (HTTP Message Signatures). Each request carries Signature, Signature-Input, and Signature-Agent headers, where Signature-Agent points to our public key directory:

https://stackscope.dev/.well-known/http-message-signatures-directory

A request carrying the StackScopeBot User-Agent but no valid signature against those keys is not us. You can use the signature to allowlist our crawler with confidence, or to tell genuine StackScopeBot traffic from a scraper borrowing our name.

What it does

Fetches the main page of each product launch (one HTML request)
Renders the page in a headless browser to capture a screenshot and the final DOM (this executes JavaScript)
Fetches up to 5 linked stylesheets to analyse CSS
Fetches the Web App Manifest (manifest.webmanifest or manifest.json) when linked from the page
Fetches a small number of first-party JavaScript files linked from the page
Reads the names (not the values) of client-side storage entries after rendering: cookies, localStorage / sessionStorage keys, IndexedDB database names, and Cache Storage cache names
Checks a few well-known files: robots.txt, llms.txt, security.txt, sitemap.xml, ads.txt, humans.txt
Looks up DNS records (MX, TXT, NS, CNAME, DNSKEY) for the domain
Performs a TLS handshake to read the SSL certificate
Looks up the domain's ASN and registration date from public sources
Fetches privacy policy and terms of service pages if linked from the homepage

Logo and brand-icon fetches

Separately from crawling product launches, StackScopeBot occasionally requests the homepage of a technology vendor or service (rather than a launch) to fetch its logo or brand icon, which we show next to that vendor on our technology and category pages. If you run a service we have catalogued and you see StackScopeBot on your site, this is almost certainly why.

It is a single request to your homepage plus the icon file it points to (favicon, apple-touch-icon, or Web App Manifest icon). It is not a crawl of your site.
Same identity and etiquette as our launch crawl: the StackScopeBot User-Agent above, and robots.txt is honoured.
We only read the icon image. We do not store other page content from these fetches.

Liveness checks

Many product launches go offline within months. To track which ones are still up (and to chart how indie launches survive over time), StackScopeBot periodically makes up to two lightweight requests to a launch: first your robots.txt to confirm we are still permitted, then, if allowed, a single first-byte request to the homepage to see whether it still responds. It is not a crawl: we read only the response status, render nothing, and store no page content.

robots.txt is checked first. If it disallows StackScopeBot, we record that and back off, and do not request the homepage.
The homepage request fetches only the first byte of the response. No browser, no linked resources, no body stored.
Same StackScopeBot User-Agent as above.
Recently launched sites are checked more often than older ones; sites already confirmed offline are checked rarely.

What it doesn't do

Does not spider your site, only the homepage and explicitly linked legal/well-known pages
Does not submit forms, click buttons, or interact with the page
Does not attempt logins or access authenticated areas
Does not read the values of cookies, localStorage, sessionStorage, IndexedDB records, or Cache Storage entries. Only the key / name identifiers are captured, for fingerprint matching
Does not re-crawl on a fixed schedule. Occasional re-crawls happen on user request, or to apply a bug fix or schema change across the corpus. These may be partial (e.g. refreshing only headers) rather than a full re-fetch.

Crawling etiquette

Default rate: at most 1 request every 3 seconds globally, never parallel requests to the same domain
Respects robots.txt rules for both StackScopeBot and wildcard (*) user agents
Honours Crawl-delay directives. If your robots.txt specifies a longer delay, we follow it
Does not attempt to crawl sensitive paths (admin, login, API endpoints)

How to control access

To block StackScopeBot entirely, add this to your robots.txt:

User-agent: StackScopeBot
Disallow: /

To request a slower crawl rate:

User-agent: StackScopeBot
Crawl-delay: 10

Self-service removal

Site owners can permanently remove their launch from StackScope without contacting us. No account or email address required at any step. The flow:

Find your launch on StackScope (search at /browse or paste your launch URL into the homepage)
On the launch page, click Request fresh snapshot → to reach the owner page
Click Claim this launch. That opens a verification flow with three options: HTML meta tag, .well-known file, or DNS TXT record
Install whichever marker is easiest, then run the validator
Once verified, you receive a private trigger URL. From that page, click Remove launch
We re-verify the marker is still present (proving live domain control), then hide the launch immediately

After removal, the launch URL returns HTTP 410 Gone to verified search-engine crawlers (Googlebot, Bingbot, etc.) and 404 to other visitors. Search engines typically deindex 410'd URLs within days. The internal record is preserved (so re-importing the launch from a future Product Hunt or HN listing won't recreate it without explicit admin intervention) but is never publicly visible again.