Stitch input records missing from unified coalesced

Overview

A common issue reported by users is the absence of specific records in the unified_coalesced or unified_preprocessed_raw tables, even when those records appear to be correctly clustered and visible in raw input tables. This KB article outlines how to investigate these "missing" records and identify if the cause is a technical bug or a legitimate system process, such as Compliance Deletion.


Issue Summary

Users may observe that while an Amperity ID exists, certain expected Primary Keys (PKs) associated with that ID are missing from the unified output tables. This typically impacts non-match reporting and data lineage validation.

Diagnostic Symptoms:

  • Queries against unified_coalesced for specific PKs return zero results.

  • The PK might be missing from unified_preprocessed_raw as well, despite being present in the source data.

  • Internal Stitch diagnostic tables (like stitch_scores) may appear empty or missing, complicating standard troubleshooting.


Troubleshooting & Investigation Steps

1. Verify Data Presence Across the Lineage

Before assuming a Stitch failure, verify exactly where the record "disappears." Run a cross-table join to see if the PK exists in the preprocessing stage:

SQL

 
SELECT uc.amperity_id, uc.pk AS coalesced_pk, ur.pk AS preprocessed_pk
FROM Unified_Coalesced uc 
INNER JOIN unified_preprocessed_raw ur ON uc.amperity_id = ur.amperity_id
WHERE uc.amperity_id = 'target-amperity-id'

2. Inspect the Unified_Changes_PKS Table

The most definitive way to trace "lost" records is through the internal Unified_Changes_PKS table (found within the Stitch tables dataset). This table logs changes, including deletions, made during the stitching process.

What to look for:

  • Delete Actions: Look for the specific PKs in this table. If they are marked as deleted, the system intentionally removed them.

  • Job ID Reference: Identify the job_id associated with the deletion. This usually links back to a specific Daily_Load or Stitch workflow.

3. Check Compliance Processing

If records are present in Raw data but missing from all Unified tables, they may have been removed by a Compliance Workflow (GDPR/CCPA).

  • Check the Domain Tables (e.g., Credit:CrdtClassicAcctMstr) to see if the records were filtered out during a compliance job.

  • Compliance deletions occur before data is ingested into the unified layers, which explains why a record might appear to "vanish" during stitching.


Root Cause: Legitimate Deletion vs. Bug

ScenarioExplanation
Compliance DeletionThe record was part of a "Right to be Forgotten" request or filtered via a data retention policy. It is removed from domain tables and excluded from Unified tables.
Merged AccountsThe record might have been merged into a different "Master" PK. Check the merged_accounts_stitch table using the daas_acct_key.
Stitch Over-clusteringIf supersized_id is present, the record might be part of a cluster that was too large to process, causing it to be dropped for performance safety.

Resolution

If the investigation shows the records were deleted via a Unified_Changes_PKS entry linked to a compliance job, no technical fix is required. The system is performing as designed by respecting data privacy or retention rules.

Next Steps for Users:

  • Verify if the missing PKs belong to customers who requested data deletion.

  • Review compliance job logs for the timestamp identified in the Unified_Changes_PKS table.