You may encounter a parse error when querying a table, even though the SQL syntax appears correct. In some cases, the query may succeed when selecting specific columns but fail when selecting all columns. This behavior typically indicates a unsupported syntax used on some columns issue rather than a query logic problem.
Common Symptoms
Queries fail with a parse or execution error when selecting all columns (
SELECT *).Queries succeed when excluding one or more specific columns.
The error does not occur consistently across all rows and appears data-dependent.
The failure may surface only when rows containing specific values are processed.
Root Cause
The most common cause is the presence of unsupported or invalid values in numeric columns, particularly when casting string values to numeric types.
In Spark SQL, special string values such as 'Infinity' are not valid numeric literals and cannot be cast directly to DOUBLE. When Spark encounters such values during query execution, it throws a parse or execution error.
Resolution
Identify the problematic column by selecting columns individually.
Inspect the column data for unsupported values (e.g.,
'Infinity','NaN', malformed strings).Avoid casting unsupported string values directly to numeric types.
Replace invalid values with supported numeric equivalents, such as:
A large numeric constant (for example,
1e308) in place of'Infinity'Conditional logic to handle or normalize special values prior to casting
Best Practices
Validate and cleanse data before applying numeric casts.
Use explicit column selection instead of
SELECT *when debugging.Apply defensive SQL logic (e.g.,
CASE WHEN) to handle edge-case values safely.Ensure data transformations are compatible with Spark SQL’s supported data types.
Following these guidelines will help prevent parse errors and ensure reliable query execution in Spark SQL.