Data ExtractionIntermediate45 mins

Multi-Site Marketing Data Extraction

Learn how to build a scalable system for extracting marketing data from multiple websites simultaneously using Krouly's extraction toolkit.

Node.js
Krouly SDK
MongoDB
Docker

Progress

Requirements

  • Node.js ≥ 16.x
  • MongoDB installed locally
  • Basic JavaScript knowledge
  • API key from Krouly Dashboard

Quick Start

Installation
npm install @krouly/extraction-toolkit

Implementation

1. Initialize Project

Set up your project and install required dependencies

Terminal
mkdir marketing-extractor
  cd marketing-extractor
  npm init -y
  npm install @krouly/extraction-toolkit mongodb dotenv

2. Configure Environment

Set up your environment variables and configuration

.env
KROULY_API_KEY=your_api_key_here
  MONGODB_URI=mongodb://localhost:27017/marketing-data
  MAX_CONCURRENT_EXTRACTIONS=5

3. Create Extraction Configuration

Define your extraction rules using Krouly's configuration syntax

config.js
export const extractionConfig = {
    selectors: {
      title: 'h1.product-title',
      price: '.price-value',
      description: '.product-description',
      images: 'img.product-image::src'
    },
    rateLimit: 1000,
    maxPages: 10,
    followLinks: true,
    outputFormat: 'json'
  };

4. Implement Extraction Logic

Create the main extraction script with error handling and data processing

extract.js
import { Krouly } from '@krouly/extraction-toolkit';
  import { MongoClient } from 'mongodb';
  import { extractionConfig } from './config.js';
  
  async function extractMarketingData(sites) {
    const krouly = new Krouly(process.env.KROULY_API_KEY);
    const client = await MongoClient.connect(process.env.MONGODB_URI);
    const db = client.db();
  
    try {
      const extraction = await krouly.createExtraction({
        sites,
        config: extractionConfig,
        hooks: {
          onData: async (data) => {
            await db.collection('marketing').insertMany(data);
          },
          onError: (error) => {
            console.error('Extraction error:', error);
          }
        }
      });
  
      await extraction.start();
      const results = await extraction.getResults();
      return results;
    } finally {
      await client.close();
    }
  }

Deployment

Docker

Deploy as a containerized application using Docker

Deployment Guide

AWS Lambda

Deploy as a serverless function on AWS Lambda

Deployment Guide

Next Steps