Missing historic data and broken state_vectors_data4 table

1 year 9 months ago #2015 by Herson
It appears the Impala state_vectors_data4 table is both broken (Disk I/O error: Failed to open HDFS file hdfs) and is missing the last 7 days historic data.

Proof of missing data:

select count(*), FROM_unixtime(hour, 'yyyy/MM/dd') as rth_date, 'WINCHESTER-WEST' as area_of_record
FROM state_vectors_data4 where lat between 51.06152 and 51.09048
and lon between -1.366071 and -1.320889
and hour between (1658049843- (3600*24*10.0)) and (1658049843 - (3600*24*0))
group by rth_date order by rth_date desc;

Result is:  

+
+
+
| count(*) | rth_date   | area_of_record  |
+
+
+
+
| 481      | 2022/07/11 | WINCHESTER-WEST |
| 1023     | 2022/07/10 | WINCHESTER-WEST |
| 1031     | 2022/07/09 | WINCHESTER-WEST |
| 389      | 2022/07/08 | WINCHESTER-WEST |
+
+
+
+

If I now increase the date range by another day I then get the hadoop I/O issue.

Proof of Hadoop I/O issue.

select count(*), FROM_unixtime(hour, 'yyyy/MM/dd') as rth_date, 'WINCHESTER-WEST' as area_of_record
FROM state_vectors_data4 where lat between 51.06152 and 51.09048
and lon between -1.366071 and -1.320889
and hour between (1658049843- (3600*24*11.0)) and (1658049843 - (3600*24*0))
group by rth_date order by rth_date desc;

Result:  Disk I/O error: Failed to open HDFS file hdfs://nameservice1/user/opensky/tables_v4/state_vectors/hour=1657105200/part-r-00169-858ba597-246d-4210-aa4d-91c6b44665d1.snappy.parquet
Error(2): No such file or directory
Root cause: RemoteException: File does not exist: /user/opensky/tables_v4/state_vectors/hour=1657105200/part-r-00169-858ba597-246d-4210-aa4d-91c6b44665d1.snappy.parquet

Thanks all.

Richard Herson.
Founder of www.aircrafttrafficsurvey.com - a community offering

 

Please Log in or Create an account to join the conversation.

1 year 8 months ago #2016 by huzaifa
Hi Herson, thanks for confirming. I built an application that queries the last 48 hours of flight departure data and noticed it hasn't been displaying any data for the last couple of days.

Is there anything that can be done to fix this?

Thanks and I really appreciate all the work that goes into OpenSky Network.

Please Log in or Create an account to join the conversation.

1 year 8 months ago #2024 by strohmeier
Yeah, there are some issues with the pipeline getting the colelcted data into the Impala tables. It is of course being worked on but the jobs are running slower than anticipated. A few days of the backlog have been added but it takes time.
Nothing is lost and it will be added eventually but for the time being assume that the most recent data won't be in Impala just yet.
The following user(s) said Thank You: liu1322

Please Log in or Create an account to join the conversation.

1 year 8 months ago #2028 by strohmeier
Update: We fixed some issues and things should slowly be returning to normal over the next week; including all missing data.
The following user(s) said Thank You: liu1322, huzaifa, Herson

Please Log in or Create an account to join the conversation.

1 year 8 months ago - 1 year 8 months ago #2034 by Herson
Appears as if the historic data held in then state_vectors_data4 table is now almost fully loaded and current.  FYI: The number of rows returned for the days 2022/07/25 and the 2022/07/27 looks very low, whereas all other dates/days look correct.

Does this also mean we can start using the Airport departure / arrival API?

Thanks.

Richard 
Attachments:

Please Log in or Create an account to join the conversation.

1 year 8 months ago #2035 by strohmeier
Yes almost done.

You can always use whatever is there - you can't break it, just the data might not be complete.

Please Log in or Create an account to join the conversation.

Powered by Kunena Forum
This website uses cookies to offer you the best experience of our services. By using this website you agree to our privacy policy!