---
name: deanonymize-web-traffic
description: Use this skill when the user has a batch of IP addresses from web traffic and wants to identify the people or companies behind them — going beyond company-level to person-level identity.
---

# Deanonymize Web Traffic

Resolve a batch of IP addresses to person-level identities.

## Triggers

"deanonymise my website visitors", "who are these IPs from my logs", "identify anonymous traffic", "match IPs to people", "resolve my web traffic to individuals". Chains IP → hashed emails → LinkedIn → full profiles. Resource-intensive — mandatory credit check before running.

## Chain

```
IP addresses
  ├── ip_to_company          (IP → company name + firmographics)
  └── ip_to_hem              (IP → hashed emails, MD5)
        └── hem_to_best_linkedin     (MD5 → LinkedIn URL)
              └── linkedin_to_business_profile  (LinkedIn → full profile)
```

## Set Expectations

Before firing any calls, give the user a brief, dry heads-up. Deadpan over enthusiastic — no filler, no corporate speak. This is the most intensive skill — set realistic expectations without overdramatising.

Facts to work with:
- 4 sequential rounds: IP → hashed emails → LinkedIn → full business profiles
- Each round feeds the next; only matching records pass through
- ~10–15% of IPs typically yield a complete identity — not a bug, just the nature of the data
- Time: 60–120+ seconds for any meaningful batch
- Not-found results at each step are free

## Step 1 — Credit check (required)

Call `MoltSets:get_billing` + `MoltSets:get_usage` in parallel (free).

Estimate — match rates drop at each step:
```
N IPs × ip_to_company cost             = company lookup
N IPs × ip_to_hem cost                 = HEM lookup
~30% of N × hem_to_best_linkedin cost  = LinkedIn lookup (approx match rate)
~70% of above × linkedin_to_business_profile = profile (approx match rate)
```

Conservative full-chain cost: ~10–15% of IPs yield a complete identity. "Not found" is free.

**If estimated cost > balance:** stop. Report shortfall, how many IPs the balance covers, ask to proceed partially or top up.

## Step 2 — Parse and clean IPs

Accept: line-separated, CSV column, log format. Deduplicate.

**Skip private IPs** (10.x, 192.168.x, 172.16–31.x, 127.x) — tell user these are excluded.

Report: "Processing X unique public IPs."

## Step 3 — Chunk at 100

Process in batches of 100 using array params (`ip_addresses`, `md5s`, `linkedin_urls`).

## Step 4 — Run chain

**Round 1 (parallel):**
- `MoltSets:ip_to_company` with `ip_addresses` array
- `MoltSets:ip_to_hem` with `ip_addresses` array (add `user_agent` if available, `include_sha256: true`)

**Round 2:**
- Collect all MD5s from `ip_to_hem` results (take highest-confidence match per IP)
- `MoltSets:hem_to_best_linkedin` with `md5s` array

**Round 3:**
- Collect all LinkedIn URLs
- `MoltSets:linkedin_to_business_profile` with `linkedin_urls` array

## Step 5 — Merge and output

Join all results back to the original IP.

| IP | Company (IP lookup) | Name | Title | Company (profile) | LinkedIn |
|---|---|---|---|---|---|

**Summary:**
```
IPs processed:    200
Company matches:  143  (72%)
Person matches:    61  (30%)
Full profiles:     48  (24%)
Credits used:     ~XXX
```

## Edge cases

- Multiple HEMs per IP → take highest-confidence match only (unless user asks for all)
- ISP/residential IPs → show under company lookup but note as "Residential/ISP — not a business"
- Stop immediately if credits run out mid-batch: output completed rows, report exact resume point
- Do not fabricate or infer identities — surface only what the API returns
