This article explains why your courier may skip files that arrive in time-specific subfolders (e.g., /00/00/ and /12/00/ folders) and provides the recommended solution to ensure all data is successfully ingested.
Problem
You are seeing files skipped or ingestion workflows failing to pick up all expected data, even though files are present in the source location.
The issue occurs when your source system delivers files at two different times within the day, typically resulting in files landing in separate hourly subdirectories (e.g., .../yyyy/MM/dd/00/00/ and .../yyyy/MM/dd/12/00/).
Your daily scheduled workflow only ingests files from the earlier subdirectory (00/), and misses the later files (12/). Since the next day’s courier run then looks for the new day's files, the previous day's late files are permanently skipped.
Root Cause: File Pattern and Timing Mismatch
The root cause is that a single courier cannot reliably capture files delivered 12 hours apart on a daily schedule.
- Limited Window: The original courier's file pattern and execution window are configured to pick up files from a single time period (e.g.,
00/00). - Skipping Late Files: The courier executes, pulls the files in the
00/00folder, and completes. The late-arriving files in the12/00folder are not available yet or are missed because the time window closes, leading to the data skip.
Solution: Implement a Two-Courier Strategy
The most reliable solution is to create a second, dedicated courier specifically designed to catch the files from the later 12/ delivery folder and ensure they are processed with a time offset.
Step-by-Step Implementation
This example uses a source that delivers files to the /00/00 and /12/00 subfolders.
1. Create the Second Courier (Targeting Late Files)
You need to copy your original courier and adjust its file pattern:
- Go to Sources. Find your original courier (e.g.,
AgencyData Courier), click the ellipses (...), and select Make a copy. - Name the Copy: Name the new courier descriptively, such as
AgencyData_12. - Update the File Pattern: Edit the new courier and change the file pattern to explicitly include the missing
12/path.- Original Pattern Example:
yyyy'/'MM'/'dd'/*/*/*.json - New Pattern:
yyyy'/'MM'/'dd'/12/*/*.json - This ensures the new courier only searches the subdirectory where the late files land.
- Original Pattern Example:
2. Add the New Courier to the Workflow Group (Set Offset)
You must add this new courier to your existing Courier Group with an offset to guarantee it looks back far enough to pick up the missed files from the previous day.
- Go to Activations and open your Daily Workflow Courier Group.
- Add the new courier (
AgencyData_12). - In the settings for this new courier, set the Load Offset to 1 (or 1 day).
Outcome: The primary courier will ingest the early files. The new AgencyData_12 courier will run daily but look back one day to pick up the late files from the 12/ folder, ensuring all data is captured.
This process should be repeated for any other couriers that suffers from the same dual-timing delivery schedule.