Overview
This article explains how to investigate and resolve Predictive workflow warnings where Amperity is unable to fetch the Databricks job status, even though the predictive jobs themselves have completed successfully.
This issue is caused by a temporary Databricks API availability issue and does not indicate a failure of the prediction models or workflow.
Symptoms
You may observe one or more of the following:
Workflow shows a warning or failure in the Predictions section
Error message similar to:
Failed to fetch Databricks job status for task <task_id>:
The service at /api/2.1/jobs/runs/get-output is temporarily unavailable.
Affected tasks are typically:
- Product Affinity – Inference
- PCLV – Inference
- No recent workflow or configuration changes
- Next scheduled workflow run completes successfully
Root Cause
- The Databricks Jobs API was temporarily unavailable when Amperity attempted to retrieve the job output.
- Important clarifications:
- The predictive job did run successfully
- Only the status/output fetch failed
- This is a transient infrastructure issue, not a tenant or query problem
Investigation Steps
Follow the steps below in order.
Step 1: Identify the Affected Predictive Tasks
- Open the failed workflow run
- Note which predictive tasks show warnings or failures
- Capture the task ID(s) referenced in the error message
Step 2: Confirm Error Type
Verify the error includes:
- /api/2.1/jobs/runs/get-output
Confirm the message states “temporarily unavailable” - If yes, continue to Step 3
- If no, escalate for further investigation
Step 3: Verify Job Execution
- Confirm there is no evidence of job execution failure
- Ensure the error only relates to fetching job status, not job execution
- This confirms the issue is status retrieval only.
Step 4: Review Version History
Check workflow and prediction version history
Confirm:
- No recent edits
- No configuration changes prior to the run
- This rules out tenant-side causes.
Step 5: Check Logs (If Needed)
Review Honeycomb or internal logs
Validate that:
- The Databricks service was temporarily unavailable
- No repeated or persistent failures are logged
Step 6: Monitor the Next Run
Check the next scheduled workflow run
Confirm:
- Workflow completes successfully
- Predictive tasks run without warnings
- If the next run succeeds, the issue is resolved.
Resolution
- No corrective action is required.
- Once the next run completes successfully:
- Inform the customer the issue was transient
- Confirm no impact to predictions or downstream data
When to Escalate
- Escalate only if any of the following occur:
- The same error appears in multiple consecutive runs
- Predictive jobs fail to execute (not just status fetch)
- Multiple tenants experience the issue at the same time
If escalating, include:
- Tenant name and ID
- Workflow ID
- Task ID(s)
- Error timestamps