Skip to content
SOSEI
7 min readLegalGDPRCompliance

Auto-Generated Legal Pages: What's in Them, What's Not

Every SOSEI rebuild ships with privacy policy, terms of service, cookie policy, impressum, accessibility statement, and a 404 — generated from real site data, not boilerplate. What that includes, what it doesn't, and when you still need a lawyer.

Roughly nine out of ten small-business websites in the EU are operating with at least one missing or outdated legal page. The most common gap is an impressum that doesn’t list a current company register number; the second most common is a cookie policy that names cookies the site stopped using in 2021; the third is a privacy policy lifted from a competitor’s site and never adapted to the actual data the site collects.

Each of these is a soft enforcement target. None typically result in catastrophic fines on their own, but they make the site actionable in a wider GDPR or competition complaint. Closing them is cheap. SOSEI generates all six legal pages on every rebuild, in one parallel AI call, from real source-site data.

Privacy policyWhat data, why, how longGenerated from real scrape dataTerms of serviceWhat the site promisesGenerated from real scrape dataCookie policyEach cookie namedGenerated from real scrape dataImpressumCompany register dataGenerated from real scrape dataAccessibilityWCAG conformance statementGenerated from real scrape data404 pageOn-brand error recoveryGenerated from real scrape data
The six legal pages SOSEI generates on every rebuild — what each one contains and what it doesn't cover.

Privacy policy

The privacy policy explains what personal data the site collects, why, how long it is retained, and what rights the visitor has under GDPR (access, rectification, erasure, portability, objection). The generated version is built from the actual data the site collects:

  • Contact form submissions — name, email, message, optional phone — retained for 24 months unless the user requests erasure.
  • The specific analytics in use (Google Analytics, Meta Pixel, etc., named individually from the scraped tracking codes).
  • The legal basis for each processing purpose (consent for analytics, legitimate interest for contact-form replies).
  • The data controller’s name and contact, populated from the impressum data on the source site.

What it doesn’t cover: anything the source site doesn’t actually do. If the original site has a hidden CRM integration or a server-side marketing-automation tool the scraper can’t see, that has to be added manually.

Terms of service

For most marketing sites this is the simplest of the six. The generated terms cover the relationship between the site owner and the visitor: ownership of content, acceptable use, disclaimers of warranty for the information presented, limitation of liability, and governing law (defaulted to the country of the impressum).

When the site sells products or services with binding consumer contracts (SaaS subscriptions, e-commerce orders, etc.), the generated terms are a starting point only — the specifics of cancellation, refund, and dispute resolution need legal review before you ship to real customers.

Cookie policy

Generated from the actual cookies the scraper detected on the source site. Each cookie is named, categorised (strictly necessary, analytics, marketing, preferences), described in plain language, and given a retention period. Examples from a typical scrape:

  • _ga — Google Analytics user ID, 24 months.
  • _fbp — Meta Pixel browser ID, 90 days.
  • _hjSession_* — Hotjar session, 30 minutes.
  • cookieconsent — consent state, 12 months.

This page exists for compliance, but it’s also genuinely useful: visitors who read it learn what they’re consenting to. A vague “we use cookies for various purposes” page is increasingly treated as non-compliant by EU regulators.

Impressum

Mandatory in Germany (TMG § 5), Austria (ECG § 5), and broadly recommended across the EU under the e-Commerce Directive. The impressum is the “name and address of the company that operates this website” page. SOSEI extracts:

  • Company name and legal form (OS, GmbH, OY, AS, etc.).
  • Registered office address.
  • Commercial register number and registering court.
  • VAT number where present.
  • Email and phone contact.
  • Authorised representative (managing director / board).

The data is pulled from the source site’s existing impressum or footer. If a field is missing on the source, it’s flagged in the dashboard for owner input rather than fabricated.

Accessibility statement

Mandatory under the European Accessibility Act for almost every consumer-facing site since 28 June 2025. The statement declares:

  • Which WCAG version and level the site conforms to (2.1 AA).
  • The date the conformance was last evaluated.
  • The method used (automated tooling + manual review).
  • A feedback channel for users who encounter accessibility barriers.
  • Known issues, if any, and the timeline for fixing them.

SOSEI’s renderer enforces WCAG 2.1 AA at the token level (see WCAG 2.1 AA checklist), so the generated statement can declare conformance honestly — unlike many sites where the statement is aspirational.

404 page

Not strictly “legal,” but generated in the same call because it follows the same pattern: page-shaped content the LLM needs to write once per language, per brand voice. The generated 404 carries the site’s typography, includes a helpful explanation, and links back to the homepage and main sections — rather than dumping a default Apache error.

How they’re generated

All six pages are written in a single Claude Haiku call that runs in parallel with the main site generation. The input is a compact JSON document of the source-site metadata (company name, address, tracking codes, contact details, detected language); the output is six structured page contents.

Haiku is fast and cheap, but long-form structured JSON output occasionally fails to parse. SOSEI ships an auto-healing layer that re-invokes the call with the failed output as context and asks for repair. The recovery step runs inline at publish time and is also exposed as a manual button in the dashboard’s parity-warning banner — so no site ever ships with missing legal pages.

When you still need a lawyer

The generated pages cover the legal baseline for a marketing site. They are not a substitute for review when:

  • You operate in a regulated industry (healthcare, financial services, legal services, insurance).
  • You process special categories of data (health, biometric, political, religious, sexual orientation, criminal records).
  • You sell physical goods cross-border to consumers and need country-specific cancellation rights and warranty terms.
  • You operate a B2B SaaS with binding subscription terms.
  • You process data on behalf of others as a data processor.

For everything else — the local service business, the consultancy, the agency, the cafe — the auto-generated pages are the modern equivalent of a well-formed default. Better than missing, better than copied from a competitor, and easy to keep current as the site changes.

Curious what the legal pages would look like for your specific site? Start a project — the generated set is in your dashboard within three minutes.

Stop losing customers to a 2018 website.

Every day on outdated tech is leads walking past your front door. Get the free 40-point audit — see exactly what's broken across SEO, AI-discoverability, WCAG, GDPR, mobile, performance, and design. No signup. Results in seconds.

See your site's score