Skip to main content

Use XPaths to customize search results

XPaths are expressions that help you precisely identify elements on a web page. They are similar to CSS selectors but with more precision, especially useful when configuring search results. For example, the XPath //img will select all image elements on a page.

The Zoovu crawler can automatically select the best titles and images for your search results, but you can customize these by using XPaths in the Data Structuring > Content Extraction settings.

Title extraction XPath

You can set up general rules in the Crawler settings, or create more specific rules in the Result Groups page if you're using Result Groups.

Find and test XPaths

To find and test XPaths on your site:

  • Install XPath Helper - a Google Chrome extension.
  • Go to the page you want to extract content from.
  • Click the XPath Helper icon in Chrome to reveal the current XPath expression.
  • Hold the Shift key and hover over the elements on your site. The XPath Helper will highlight them and display the corresponding XPath query.
  • Simplify the XPath by shortening it. For example, you can remove everything before an ID element and start the XPath with //.
  • Copy the XPath and paste it into the relevant section in Zoovu Search under Data Structuring -> Content Extraction.

Example:

  • Original XPath: /html/body[@id='body']/main/section[@class='u-pb-xxl u-pb-xl--sm']/article[@class='flex container'][1]/div[@class='main-feature__content col-6 col-12-sm']
  • Shortened XPath: //*[@id='body']//div[contains(@class,'main-feature__content')]

Test your XPath

  • Copy your shortened XPath query.
  • Navigate to Search Studio > Data Structuring > Content Extraction.
  • Paste your XPath into the appropriate section (Title XPaths, Image XPaths, Include Content XPaths, or Exclude Content XPaths).
  • Click the "Test" button and enter your webpage URL to see if the XPath works correctly.

If everything is correct, the extracted content, headline, or image URL will appear below the test field. You can also index a single URL to check what content will be extracted from that page.

Common XPaths for search results

Here are some typical use cases for XPaths in search results:

XPath TypeDescription
Title XPathsDefault is //h1. Adjust according to your site structure, e.g. by using //title to capture the page title.
Image XPathsDefault is often //img[@id='main']/@src. For lazy-loaded images, use something like //div[@class='product-detail-images']//img/@data-src.
Include content XPathsUse to specify which content blocks should be indexed.
Exclude content XPathsUse to tell the crawler which content blocks to ignore.
Search snippet XPathBy default, Zoovu shows content around the matching search terms. You can also use meta descriptions with //meta[@name="description"]/@content.