Skip to main content

Extraction rules

Extraction rules help you automatically assign attributes to products based on the data you import.

When you import data, the Data Platform creates extraction rules for you. You can edit these rules or add new ones to improve accuracy for your catalog.

To view or edit extraction rules:

  • Go to Ontology > Attributes and click the edit icon next to an attribute.

Attributes

  • In the modal that opens, switch to the Extraction rules tab:

Attributes

How extraction rules work

Each attribute in your ontology can have extraction rules. Rules define:

  • when the attribute should be extracted
  • which values should be extracted

By default an attribute-level rule is created (e.g. match “Color” in the incoming text). If that rule matches, the system then evaluates the value-level rules (e.g. “Black”, “Red”, “Green”).

For example:

  • Attribute: Color
  • Input data: “Color: Black”
  • Attribute rule: match “Color”
  • Value rule: match “Black”
  • Result: Color = Black

Only if BOTH the attribute rule and a value rule match, the pair is extracted. Extraction rules are not case sensitive.

Tips for using extraction rules

  • Start with the rules generated by the Data Platform, then refine them for your catalog.
  • Use phrase match and contains for flexible matching, especially if product data varies in format.
  • Try regex for advanced cases, like extracting sizes or codes.
  • For boolean attributes, add rules for custom keywords or phrases that indicate true or false.
  • Test your rules on sample data to make sure they work as expected.

Rule types

When defining rules, choose how the input data should be matched:

  • matches – input must be an exact match. “Black” only matches if the data is exactly “Black”.
  • phrase match – input must contain the whole phrase as separate text (word boundaries respected). “Black” is extracted from “Night Black” but not from “Blackest”.
  • contains – input must contain the phrase anywhere; it may appear inside another word. “Black” is extracted from “Blackest Night”.
  • regular expression (regex) – advanced pattern matching. For example: detect sizes like 32GB, 64GB etc.

Attribute types and their rules

Different types of attributes use extraction rules in different ways:

  • textual attributes: Rules apply both to the attribute itself (e.g., Color) and to each possible value (e.g. Black, Red, Green). Both must match for extraction to work.
  • numeric and untyped attributes: Only the attribute has rules. Once the attribute-level rule matches, any numeric (or raw) value present is extracted (e.g. “Weight: 2.3 kg”).
  • boolean attributes: Standard representations like true/false and yes/no are interpreted automatically. You can also define domain-specific triggers that set the value.
    • Example: if a garment description contains “UPF” → extract Sun Protection = True.
    • Example: phrases like “supports wireless charging” → extract Wireless charging = True.

Extraction-rules

Example: Tagging wireless charging

  • Go to Ontology > Attributes.
  • Select the attribute Wireless charging.
  • Open Extraction rules.
  • Add rules to detect phrases such as “wireless charging” or “supports wireless charging”.
  • Set the rule to True if found, False if not.
  • Products with these keywords are tagged with Wireless charging = True.