Clean and organize product data
With the Zoovu Data Platform, you can automate the process of cleaning and refining your data to make sure it’s accurate and ready for use in your catalog.
Use the tools in the Pipelines tab to pull in your data, apply rules to clean it up, and export it back in a usable format.
- Start by selecting a Reader to pull your data from sources like CSV, Google Sheets, or APIs.
- Use Processors to remove duplicates, fix formatting, and apply filters to improve data quality.
- Finally, choose a Writer to export your cleaned data or update your catalog directly in Zoovu.
Start your data cleanup process
To get started with cleaning your product data, you need to create a data pipeline and set up the steps required to extract, process, and output your data.
- Create a new data pipeline in the Pipelines tab of the Data Platform. Learn more.
- Select the Blank template.
- Click Add first step to upload your product data. For example, choose FTP upload if your data is stored on an FTP server, or another option depending on how your product data is stored.
- Add a second step by selecting Custom Step to define how your data will be processed or cleaned.
- Click on the newly added step to see the configuration options.
- Choose either Reader, Processor, or Writer based on what you want to do:
- Reader: Extracts data from your data source.
- Processor: Cleans, transforms, or filters your data.
- Writer: Exports or updates your cleaned data.
- Once you’ve selected one, pick the appropriate type of reader, processor, or writer from the right-hand menu.
Use readers to extract product data
Readers are responsible for extracting data from various sources. Zoovu offers 10 different types of Readers, each designed to handle specific formats or locations. Here's a breakdown of the available options and when to use them:
- CSV: For handling simple spreadsheets where each row represents a product and columns contain product attributes like price, color, etc. Use this if your product catalog is stored in CSV files.
- Google Sheets: Connect directly to your Google account to import data into the pipeline.
- JSON: Use this to work with structured data in JSON format, often used for product feeds or data from APIs.
- XLSX: Handles Excel spreadsheets (.xlsx format) for more complex product data or when data includes multiple sheets.
- XML: Best for extracting data from XML feeds, which are commonly used for web-based data exchange or API responses.
- Multi Resource: Use when you need to pull in data from multiple files or sources in formats like CSV, JSON, or XML. Ideal when your data is spread across different files.
- HTTP: Fetch product data directly from a web API endpoint, ideal when your data is stored in an external system that provides an API.
- SAP: Connect directly to your SAP system to extract data.
- HTTP with Parameters: A more specific way to extract data from APIs that require custom query parameters for filtering or pagination.
- Custom CSV: Customize how you want to extract data from CSV files by setting up specific rules for reading the data.
How to choose a Reader
Consider where your data is stored:
- If you’re using spreadsheets, choose CSV or XLSX.
- For structured data or APIs, JSON or HTTP might be the better choice.
- If your data is spread across different formats, use Multi Resource.
Use processors to clean and transform your data
Processors allow you to clean up and reformat your data after it's been extracted. Each processor performs a specific action on your data, making sure it’s usable for your catalog. Here’s how you might use common processors:
- Coalesce: Merge multiple data fields into one. Use this when data about a single product is split across multiple columns.
- Remove Duplicates: Automatically find and delete duplicate entries in your dataset, essential for cleaning large catalogs with repeated data.
- Filter: Narrow down your dataset to only the rows that meet specific criteria (e.g., products priced above a certain threshold).
- Trim: Clean up extra spaces in text fields to ensure consistency, especially useful when importing data from inconsistent sources.
- Rename Columns: Adjust the names of columns to align with your data format, or to match Zoovu’s requirements.
- Add Column: Add new fields or attributes to your data set, like adding a "Discount Price" column to your products.
- Regex Extractor: Use regular expressions to extract specific patterns of text (e.g., product IDs or color codes) from your dataset.
- Math: Perform basic mathematical operations on numeric data, such as adjusting prices by a fixed percentage.
How to choose a processor
Look at the specific cleanup tasks your data requires:
- Use Remove Duplicates if you’ve imported data from multiple sources that may overlap.
- Choose Filter if you want to focus on a subset of your products (e.g., only show in-stock items).
- Use Rename Columns if your data needs clearer labels or if you’re importing data from multiple sources with different naming conventions.
Use writers to output your cleaned data
Once your data is processed and cleaned, writers let you export it into a usable format or update your product catalog in Zoovu. Here’s what the different writer options do:
- JSON: Export your cleaned data into JSON format, commonly used for APIs or integrating with other systems.
- CSV: Create CSV files, which are easy to import into various platforms or use for internal purposes like inventory management.
- Data Platform: Send your processed data directly to the Data Platform, updating your catalog so your product data is ready for use across Zoovu.
- SAP E-COMMERCE: Use it if you’re managing products via SAP.
Click on "Write to Data Platform" to edit the settings. Then manually map all the columns, e.g. write down the exact heading name of the column containing offer URLs in your CSV file.
How to choose writers
Choose a writer depending on where you need the cleaned data:
- Select Data Platform if you’re updating your Zoovu catalog.
- Use CSV or JSON if you need to export the data for other systems or internal use.