It appears the Impala state_vectors_data4 table is both broken (Disk I/O error: Failed to open HDFS file hdfs) and is missing the last 7 days historic data.
Proof of missing data:
select count(*), FROM_unixtime(hour, 'yyyy/MM/dd') as rth_date, 'WINCHESTER-WEST' as area_of_record
FROM state_vectors_data4 where lat between 51.06152 and 51.09048
and lon between -1.366071 and -1.320889
and hour between (1658049843- (3600*24*10.0)) and (1658049843 - (3600*24*0))
group by rth_date order by rth_date desc;
Result is:
+
+
+
| count(*) | rth_date | area_of_record |
+
+
+
+
| 481 | 2022/07/11 | WINCHESTER-WEST |
| 1023 | 2022/07/10 | WINCHESTER-WEST |
| 1031 | 2022/07/09 | WINCHESTER-WEST |
| 389 | 2022/07/08 | WINCHESTER-WEST |
+
+
+
+
If I now increase the date range by another day I then get the hadoop I/O issue.
Proof of Hadoop I/O issue.
select count(*), FROM_unixtime(hour, 'yyyy/MM/dd') as rth_date, 'WINCHESTER-WEST' as area_of_record
FROM state_vectors_data4 where lat between 51.06152 and 51.09048
and lon between -1.366071 and -1.320889
and hour between (1658049843- (3600*24*11.0)) and (1658049843 - (3600*24*0))
group by rth_date order by rth_date desc;
Result: Disk I/O error: Failed to open HDFS file hdfs://nameservice1/user/opensky/tables_v4/state_vectors/hour=1657105200/part-r-00169-858ba597-246d-4210-aa4d-91c6b44665d1.snappy.parquet
Error(2): No such file or directory
Root cause: RemoteException: File does not exist: /user/opensky/tables_v4/state_vectors/hour=1657105200/part-r-00169-858ba597-246d-4210-aa4d-91c6b44665d1.snappy.parquet
Thanks all.
Richard Herson.
Founder of
www.aircrafttrafficsurvey.com - a community offering