How to Build an Enrichment Waterfall: Step-by-Step Setup Guide
No single data provider covers everything. ZoomInfo might nail direct dials for enterprise tech but miss on mid-market healthcare. Apollo's email coverage is strong at scale but the phone numbers are hit or miss. An enrichment waterfall runs multiple providers in sequence, using the next provider only when the previous one comes back empty. This approach typically improves overall fill rates by 15-30% compared to relying on a single provider. Here's how to build one that works without burning credits or creating data conflicts.
Step-by-step guide to building a multi-provider enrichment waterfall. Provider selection, priority ordering, field-level routing, and automation setup.
Map Your Data Gaps Before Picking Providers
An enrichment waterfall is only as good as the gaps it fills. Before selecting providers, you need to know exactly where your current data falls short.
Pull an enrichment coverage report from your CRM. For each contact field that matters (direct dial, mobile, work email, personal email, LinkedIn URL, job title, company size), calculate the fill rate. If you have 50,000 contacts and 30,000 have work emails, your email fill rate is 60%. Do this for every field.
Then segment the gaps. A 60% overall email fill rate might break down as 80% for enterprise contacts and 35% for SMB contacts. Or 75% for US contacts and 40% for Europe. These segments determine which providers you need. A provider that's strong on US enterprise data won't help your European SMB gap.
Prioritize the gaps by revenue impact. A missing direct dial on a $200K opportunity matters more than a missing LinkedIn URL on a cold lead. Rank your field gaps by pipeline impact and focus your waterfall on the top 3-4 fields. Trying to fill every field from every provider creates unnecessary complexity and cost.
Select 2-3 Providers That Complement Each Other
The goal is coverage diversity, not provider quantity. Two providers with different data sources will outperform three providers that all scrape the same databases.
ZoomInfo and Cognism are a strong pairing because they have fundamentally different data collection methods. ZoomInfo relies heavily on community-contributed data and web scraping. Cognism uses phone-verified mobile numbers and has strong European coverage. The overlap between them is lower than you'd expect, which means the second provider fills gaps the first one misses.
Apollo and Clearbit complement each other differently. Apollo has broad contact coverage at high volume (260M+ contacts) but variable accuracy. Clearbit (now part of HubSpot) has stronger firmographic and technographic data with higher per-record accuracy but less depth on phone numbers. Using Apollo for initial enrichment and Clearbit for firmographic fill is an effective combination.
Avoid running three providers that all use the same underlying data sources. If Provider A and Provider B both license data from the same third-party aggregator, the second provider in your waterfall won't add much incremental coverage. Ask vendors about their data sources during your evaluation. Providers that combine their own proprietary collection with licensed data tend to add more incremental value in a waterfall.
For most mid-market teams, two providers plus a free/cheap supplement (LinkedIn Sales Navigator for manual research on high-value gaps) is the right balance between coverage and cost.
Design the Waterfall Logic: Provider Order and Field-Level Routing
The waterfall order matters. Your highest-accuracy provider should go first, even if it's not the highest coverage provider. This ensures that when the first provider returns data, you can trust it. The second provider fills gaps, and you accept slightly lower accuracy in exchange for incremental coverage.
Build the logic at the field level, not the contact level. Your best phone number provider might be Cognism, but your best email provider might be Apollo. The waterfall should route phone lookups through Cognism first, then Apollo. And route email lookups through Apollo first, then Cognism. This field-level routing maximizes accuracy per field.
Set clear "accept" and "skip" rules. An accept rule defines when you keep the returned data: the field was empty and the provider returned a value. A skip rule defines when you move to the next provider: the field was already filled, or the provider returned a low-confidence result. Most providers return confidence scores (high/medium/low). Accept high-confidence results and route medium-confidence to the next provider for verification.
Handle conflicts explicitly. If Provider A says the title is "VP of Sales" and Provider B says "Director of Sales," which do you keep? The simplest rule: first provider wins (don't overwrite). A better rule: keep the most recently verified result. The best rule: flag conflicts for manual review on high-value accounts and auto-accept first-provider results for everything else.
Build the Automation Layer With Clay, Zapier, or Custom Code
The waterfall logic needs an execution engine. You have three options depending on your technical resources and volume.
Clay is purpose-built for enrichment waterfalls. You define the waterfall sequence visually, and Clay executes it row by row: try Provider A, check the result, fall through to Provider B if empty, fall through to Provider C, then write the final enriched data back to your CRM. Clay charges per enrichment credit but supports 50+ data providers natively. For teams running 1,000-10,000 enrichments per month, Clay is the fastest path to a working waterfall.
Zapier or Make can handle simpler waterfalls. Build a multi-step zap: new CRM record triggers Provider A lookup, a filter step checks if the result is empty, and a branch triggers Provider B if needed. This works for low-volume waterfalls (under 500 records per month) but gets fragile at higher volumes. Error handling is limited, and debugging failed runs requires manual investigation.
Custom code gives you the most control. A Python script that calls each provider's API in sequence, applies your accept/skip logic, handles rate limits and errors, and writes results to your CRM via API. This is the right approach for teams running 10,000+ enrichments per month or those with complex field-level routing rules. Budget 40-80 hours of engineering time for the initial build and 5-10 hours per month for maintenance.
Regardless of which approach you choose, build in logging from day one. Every enrichment attempt should record which provider was called, what was returned, what was accepted, and what was skipped. Without logs, you can't debug failures or optimize provider ordering.
Set Up Credit Budgets and Cost Controls
Enrichment waterfalls can burn credits fast if you're not careful. Every provider in the chain costs money per lookup, and an uncontrolled waterfall that queries three providers for every record triples your per-contact cost.
Set a per-record cost ceiling. If Provider A charges $0.05/lookup and Provider B charges $0.15/lookup, the maximum per-record cost is $0.20. For 10,000 records per month, that's $2,000. But if Provider A fills 70% of records, only 3,000 flow to Provider B, making the actual cost $0.05 x 10,000 + $0.15 x 3,000 = $950. The waterfall structure itself reduces cost compared to running every record through every provider.
Put the cheapest provider first when accuracy is comparable. If Apollo and ZoomInfo have similar accuracy for email lookups but Apollo charges $0.03/lookup vs. ZoomInfo's $0.10/lookup, put Apollo first. You'll fill most records at the cheaper rate and only spend on ZoomInfo for the gaps.
Set monthly credit budgets and alerts. Most providers let you set spending caps or receive alerts at usage thresholds. Configure these on day one. A runaway automation that enriches your entire database overnight is an expensive mistake. Also configure your waterfall to skip records that have been enriched within the last 90 days. Re-enriching fresh records wastes credits.
Track cost-per-fill by provider. If Provider B is filling 5% of records it receives at $0.15/lookup, you're paying $3.00 per incremental fill. Compare that to the value of that data point. A $3.00 direct dial that leads to a conversation is worth it. A $3.00 LinkedIn URL that your reps never use isn't.
Test, Measure, and Reorder Every Quarter
Provider performance shifts over time. A provider that had 80% match rates on your segment six months ago might be at 65% today if their data sources changed or your target market shifted.
Run a quarterly audit. Pull 200 records that went through the waterfall in the past 90 days. Check accuracy on a sample of 50 records per provider. Calculate fill rate, accuracy, and cost-per-fill by provider and by field. Compare to the previous quarter.
Reorder providers based on the data. If Provider B now has higher accuracy than Provider A on phone numbers, swap their positions in the phone waterfall. If a provider's fill rate drops below 10% on records that reach it, consider dropping it entirely. Every provider in the chain adds latency and cost.
Watch for provider changes. New data partnerships, acquisition announcements, or API changes can shift coverage overnight. Cognism's European phone data improved significantly after specific partnership deals. ZoomInfo's coverage patterns shift as their contributor network grows. Stay on top of provider release notes and re-test when major changes are announced.
Document your waterfall configuration, provider ordering rationale, and quarterly audit results. When you onboard a new ops team member or need to justify your data tool stack at budget time, this documentation pays for itself.
Tools Mentioned in This Guide
Related Categories
Frequently Asked Questions
How many providers should I include in an enrichment waterfall?
Two to three for most teams. Each additional provider adds incremental cost and complexity. The second provider typically adds 15-25% incremental fill rate. The third provider adds 5-10%. Beyond three, the diminishing returns rarely justify the cost and maintenance overhead.
Should the cheapest provider always go first in the waterfall?
No. Put your highest-accuracy provider first. When accuracy is comparable between providers, then use cost as the tiebreaker. The first provider's data is what you'll trust most, so optimize for quality at the top of the waterfall and cost at the bottom.
How do I handle conflicting data between waterfall providers?
The simplest approach is first-provider-wins: don't overwrite existing data. For high-value accounts, flag conflicts for manual review. Some teams use recency (keep the most recently verified result) as the tiebreaker for conflicting job titles or company data.
What tools can I use to build an enrichment waterfall?
Clay is purpose-built for this and supports 50+ providers natively. Zapier or Make work for low-volume waterfalls under 500 records/month. Custom Python scripts give the most control for high-volume operations. Some CRMs like HubSpot now have native waterfall-style enrichment built in.