Skip to main content

Control what appears in search results

You can decide which pages and content are included in your search results by setting blacklisting, whitelisting, and no-index rules.

To get started:

  • Go to Search Studio > Data Sources > Website Crawling and navigate to the Black- and Whitelisting settings.
  • Click "Add first rule".

Blacklisting and whitelisting

URL and XPath patterns use regular expressions. Escape special characters like []\^$.|?*+(){} with a backslash \.

Blacklist URL patterns

Use blacklists to prevent the crawler from indexing certain pages or file types.

Examples:
  • Exclude admin pages: /admin/
  • Ignore PDF files: *.pdf

Whitelist URL patterns

Use whitelists to limit crawling to specific parts of your site.

Examples:
  • Index only blog posts: /blog/
  • Index only product pages: /products/

Your root URL must match your whitelist pattern (e.g. https://website.com/fr/). If it doesn’t, the root will be excluded and no content will be indexed.

No-index URL patterns

Exclude specific pages from appearing in search results, but still allow their links to be followed.

Example:
  • Hide a landing page but follow its links: /landing-page/

Use no-index for pages that should not show up in search but contain links to other valuable content (e.g. blog landing pages).

No-index XPaths

Hide pages based on structure instead of URL. This is useful when pages don’t share a common pattern.

Example:
  • Exclude pages with a specific element: //div[@class='hidden-content']

Whitelist XPaths

Use XPath whitelists to limit indexing to specific content areas, even if the URLs vary.

Example:
  • Index only the main content: //section[@id='main-content']

For a product category to show in search results, it needs to have a URL and a search result image. You can configure these in Data Platform > Ontology > Categories.