Why deduplication matters
Email clients and proxies often request tracking pixels multiple times for a single message open:- Prefetch/preview: Email clients may load images before the user sees the message
- Re-renders: Mobile apps may reload images when scrolling
- Proxy caching: Gmail Image Proxy may fetch images multiple times
- Multiple devices: User opens same email on phone and computer
How deduplication works
Email Tracker uses time-window-based deduplication with multiple strategies:- Exact match deduplication - Same IP + User-Agent within time window
- User-Agent-only deduplication - Same User-Agent for proxy requests
- Proxy window deduplication - Any Google proxy hit within time window
open_count.
Deduplication window
The deduplication window is configurable via environment variable:The deduplication window defines how long after an open event the system will consider subsequent requests from the same source as duplicates.
Deduplication strategies
Strategy 1: Exact match (IP + User-Agent)
The most precise strategy matches IP address and User-Agent:Strategy 2: User-Agent-only (for proxies)
For proxy requests (e.g., Gmail Image Proxy), IP addresses may vary, so we deduplicate by User-Agent alone:Strategy 3: Proxy window (any Google proxy)
For Google Image Proxy, any hit within the time window for the same email is considered a duplicate:Proxy detection
Deduplication logic flow
The deduplication decision combines all three strategies:Database flags
Duplicate events are stored with a flag:open_count:
Open count updates
Thetracked_emails.open_count field is incremented only for non-duplicate, non-suppressed opens:
Deduplication vs suppression
Deduplication and suppression are independent filters:| Scenario | is_duplicate | is_sender_suppressed | suppression_reason | Counted? |
|---|---|---|---|---|
| First legitimate open | 0 | 0 | null | ✅ Yes |
| Duplicate open | 1 | 0 | "duplicate" | ❌ No |
| Sender self-open | 0 | 1 | "mark_suppress_next" | ❌ No |
| Duplicate sender open | 1 | 1 | "duplicate" | ❌ No |
If an event is both duplicate and sender-suppressed,
suppression_reason is set to "duplicate" (deduplication takes precedence).IP address normalization
IPv6-mapped IPv4 addresses are normalized before deduplication:::ffff:192.168.1.1→192.168.1.1203.0.113.42→203.0.113.422001:db8::1→2001:db8::1
Configuring the deduplication window
Short window (10 seconds)
Ideal for high-traffic scenarios where you want to catch rapid duplicates but allow re-opens:- Faster re-opens are counted
- User can open on multiple devices within a short time
- May miss slow proxy duplicates
- Prefetch requests may be counted separately
Medium window (30 seconds, default)
Balanced approach for most use cases:- Catches most proxy duplicates
- Handles typical prefetch/preview delays
- User re-opens within 30s are ignored
Long window (2 minutes)
Ideal for high-precision tracking where you want to avoid counting any rapid re-opens:- Very conservative duplicate filtering
- Catches slow proxy behavior
- Legitimate re-opens may be missed
- User switching devices may be ignored
Transaction safety
Deduplication queries and open count updates run in a SQLite transaction:Transactions ensure that deduplication checks and count increments are atomic. Concurrent opens will be serialized by SQLite’s locking.
Dashboard queries
Dashboard APIs exclude duplicates by default:Debugging deduplication
Check duplicate events
Query for duplicates in SQLite:Check time deltas
Monitor server logs
The server logs each pixel hit with duplicate/suppression status:Frequently asked questions
Why is DEDUP_WINDOW_MS 30 seconds by default?
Why is DEDUP_WINDOW_MS 30 seconds by default?
30 seconds is a balance between:
- Catching duplicates: Most proxy prefetch/preview requests arrive within 10-30 seconds
- Allowing re-opens: Users who close and reopen an email after 30s get counted again
- Higher precision, fewer re-opens counted: Increase to 60-120 seconds
- More lenient, count rapid re-opens: Decrease to 10-15 seconds
What if a user opens the same email on phone and computer?
What if a user opens the same email on phone and computer?
If both opens occur within the deduplication window and have:
- Same IP (e.g., same WiFi network)
- Different User-Agents (phone vs computer)
How does deduplication handle Gmail Image Proxy?
How does deduplication handle Gmail Image Proxy?
Gmail Image Proxy requests:
- Are detected via User-Agent (
googleimageproxy) - Use User-Agent-only deduplication (IP varies per Google data center)
- Use proxy window deduplication (any Google proxy hit is a duplicate)
Can I disable deduplication entirely?
Can I disable deduplication entirely?
Yes, set This will count every pixel hit, including duplicates. Not recommended for production.
DEDUP_WINDOW_MS=0:Why are duplicate events stored in the database?
Why are duplicate events stored in the database?
Storing duplicates provides:
- Audit trail: See all pixel hits, not just counted opens
- Debugging: Diagnose deduplication issues
- Analytics: Measure proxy behavior, prefetch rates, etc.
open_events directly (dashboard APIs exclude them).What happens if two opens arrive simultaneously?
What happens if two opens arrive simultaneously?
SQLite transactions serialize concurrent writes. If two pixel requests arrive at exactly the same time:
- First request acquires database lock
- First request runs deduplication queries (finds nothing)
- First request inserts event and increments count
- First request releases lock
- Second request acquires lock
- Second request runs deduplication queries (finds first request’s event)
- Second request marks as duplicate
Related features
Email tracking
Learn how pixel tracking works end-to-end
Sender suppression
Understand identity-based suppression
Dashboard analytics
Explore dashboard APIs and analytics features