Data ExtractionIntermediate45 mins
Multi-Site Marketing Data Extraction
Learn how to build a scalable system for extracting marketing data from multiple websites simultaneously using Krouly's extraction toolkit.
Node.js
Krouly SDK
MongoDB
Docker
Progress
Requirements
- Node.js ≥ 16.x
- MongoDB installed locally
- Basic JavaScript knowledge
- API key from Krouly Dashboard
Quick Start
Installation
npm install @krouly/extraction-toolkit
Prerequisites
Make sure you have Node.js ≥ 16 installed and npm or yarn package manager.
Implementation
1. Initialize Project
Set up your project and install required dependencies
Terminal
mkdir marketing-extractor
cd marketing-extractor
npm init -y
npm install @krouly/extraction-toolkit mongodb dotenv
2. Configure Environment
Set up your environment variables and configuration
.env
KROULY_API_KEY=your_api_key_here
MONGODB_URI=mongodb://localhost:27017/marketing-data
MAX_CONCURRENT_EXTRACTIONS=5
Note
Never commit your .env file to version control. Add it to .gitignore.
3. Create Extraction Configuration
Define your extraction rules using Krouly's configuration syntax
config.js
export const extractionConfig = {
selectors: {
title: 'h1.product-title',
price: '.price-value',
description: '.product-description',
images: 'img.product-image::src'
},
rateLimit: 1000,
maxPages: 10,
followLinks: true,
outputFormat: 'json'
};
4. Implement Extraction Logic
Create the main extraction script with error handling and data processing
extract.js
import { Krouly } from '@krouly/extraction-toolkit';
import { MongoClient } from 'mongodb';
import { extractionConfig } from './config.js';
async function extractMarketingData(sites) {
const krouly = new Krouly(process.env.KROULY_API_KEY);
const client = await MongoClient.connect(process.env.MONGODB_URI);
const db = client.db();
try {
const extraction = await krouly.createExtraction({
sites,
config: extractionConfig,
hooks: {
onData: async (data) => {
await db.collection('marketing').insertMany(data);
},
onError: (error) => {
console.error('Extraction error:', error);
}
}
});
await extraction.start();
const results = await extraction.getResults();
return results;
} finally {
await client.close();
}
}