Data mining the CA:TWS info dump
May. 10th, 2016 02:29 pmI have seen this idea more than once, and I find it INCREDIBLY victim blaming, that Tony should have already known about HYDRA/ Winter Soldier killing his parents because of the data dump at the end of CA:TWS without Steve telling him.
That is TERABYTES worth of data. That is a fuckton of data. The government keeps records on everything, doesn’t mean those records are complete, well maintained, or organized in a usable fashion. In fact, since HYDRA is involved, they have a vested interested in making it difficult to navigate. You have to be in the know to find the file. And people are expecting a man who is Iron Man, and an Avenger, and the head of R&D for a multi-billion dollar tech company to have the time to look at that data in an in-depth manner.
The key to data mining is The Question. The Question will get you ¾ of the way to an answer. If Tony doesn’t have the proper question, ‘Who killed my parents’, he’s unlikely to find the answer ‘HYDRA killed your parents using the Winter Soldier.’
This is assuming that the SHIELDRA files had any linkage or meta tags that made sense. Tony would have to task JARVIS exclusively for at least a week, more like a month, to sort and organize all that information. They would also have to look for false data, have to analyse what the intent of an operation was before some tags could be applied; is it HYDRA hiding someone inside a department or did someone misfile payroll? In comparison to us, Tony has the advantage of JARVIS being able to see images, but that can speed things up only so much.
JARVIS’ first pass through the data would be to tag everything, which creates the characteristics for the tables and a pile of data that needs more input before it gets tagged. The second pass would be building the tables. And then the third pass would be building an actual relational database, which Tony could start searching through.
And if Tony is looking at this pile of data in-depth, then why would he be looking at files over twenty years old? Tony’s first concern would be double checking that he hasn’t hired any ex-SHIELDRA employees and that the people jumping ship and applying to SI aren’t Nazis. After that, the most pressing documentation would be about the operations currently active and the ones that took place in the past 5-10 years. He would basically be performing triage on the data: ‘Who needs extraction now?’ ‘Who’s about to topple an ally’s government next month?’ ‘Whose water supply has been tainted?’
And then the next flaw in the reasoning the Tony should have known is did Widow dump all of HYDRA’s files, or just the parts they stored with SHIELD? With the new knowledge that there’s a KGB branch of HYDRA, this seems unlikely. Why wouldn’t HYDRA store all Winter Soldier pertinent files in the KGB servers? How do we know they don’t have private server farms that no one’s found?
In short, it is entirely possible for Tony to have access to this data for 2 years and never find the one file that indicated everything he knew about his parents’ deaths was wrong, and that still doesn’t address the hubris and selfishness involved with Steve withholding even a hint that HYDRA was responsible. What was stopping Steve from sending that information to Pepper or Rhodey and letting them be there for Tony if he didn’t want to face Tony’s reaction?
That is TERABYTES worth of data. That is a fuckton of data. The government keeps records on everything, doesn’t mean those records are complete, well maintained, or organized in a usable fashion. In fact, since HYDRA is involved, they have a vested interested in making it difficult to navigate. You have to be in the know to find the file. And people are expecting a man who is Iron Man, and an Avenger, and the head of R&D for a multi-billion dollar tech company to have the time to look at that data in an in-depth manner.
The key to data mining is The Question. The Question will get you ¾ of the way to an answer. If Tony doesn’t have the proper question, ‘Who killed my parents’, he’s unlikely to find the answer ‘HYDRA killed your parents using the Winter Soldier.’
This is assuming that the SHIELDRA files had any linkage or meta tags that made sense. Tony would have to task JARVIS exclusively for at least a week, more like a month, to sort and organize all that information. They would also have to look for false data, have to analyse what the intent of an operation was before some tags could be applied; is it HYDRA hiding someone inside a department or did someone misfile payroll? In comparison to us, Tony has the advantage of JARVIS being able to see images, but that can speed things up only so much.
JARVIS’ first pass through the data would be to tag everything, which creates the characteristics for the tables and a pile of data that needs more input before it gets tagged. The second pass would be building the tables. And then the third pass would be building an actual relational database, which Tony could start searching through.
And if Tony is looking at this pile of data in-depth, then why would he be looking at files over twenty years old? Tony’s first concern would be double checking that he hasn’t hired any ex-SHIELDRA employees and that the people jumping ship and applying to SI aren’t Nazis. After that, the most pressing documentation would be about the operations currently active and the ones that took place in the past 5-10 years. He would basically be performing triage on the data: ‘Who needs extraction now?’ ‘Who’s about to topple an ally’s government next month?’ ‘Whose water supply has been tainted?’
And then the next flaw in the reasoning the Tony should have known is did Widow dump all of HYDRA’s files, or just the parts they stored with SHIELD? With the new knowledge that there’s a KGB branch of HYDRA, this seems unlikely. Why wouldn’t HYDRA store all Winter Soldier pertinent files in the KGB servers? How do we know they don’t have private server farms that no one’s found?
In short, it is entirely possible for Tony to have access to this data for 2 years and never find the one file that indicated everything he knew about his parents’ deaths was wrong, and that still doesn’t address the hubris and selfishness involved with Steve withholding even a hint that HYDRA was responsible. What was stopping Steve from sending that information to Pepper or Rhodey and letting them be there for Tony if he didn’t want to face Tony’s reaction?