By now you may have heard of the Deep Web, which is a loose term describing the invisible or hidden layer of the Web that search engines like Google and Bing don’t index (hence the iceberg metaphor, illustrated above). Sometimes it’s conflated with the Dark Web, which contains darknets that exist between trusted peers using non-standard protocols and ports. Darknets originally were designated as isolated from the ARPANET of the 1970s. You could connect to them and send data to them, but they didn’t respond to network pings or other inquiries.
At any rate, the Deep Web is by definition not easily searchable. It often contains all manner of illegal activity as a result: havens for criminals, terrorists, sex trafficking, and other groups with nefarious purposes. Back in November, a large-scale crime bust by the FBI and the UK’s National Crime Association resulted in the shutdown of over two dozen Tor websites, including Silk Road 2.0, a newer version of the original online black market that made available illegal drugs, weaponry, electronics, and other goods.
The Defense Advanced Research Projects Agency (DARPA) has developed tools (dubbed Memex) that can access and categorize this world for the purposes of tracking these people, and last month the agency open-sourced major portions of the code behind Memex so that other groups can take advantage of the tools. Now researchers at NASA’s Jet Propulsion Laboratory in Pasadena, California have joined in, and are looking to Deep Web search tools for an entirely different reason: mining vast data stores from NASA spacecraft and other science-related objectives.
The idea is to treat the Deep Web like Big Data — a term that usually refers to any tremendous store of structured, semi-structured, or unstructured data that’s too large to process or mine for information using traditional search and database techniques. In this case, it could easily apply to the data retrieved in space missions.
“We’re developing next-generation search technologies that understand people, places, things and the connections between them,” said Chris Mattmann, principal investigator for JPL’s work on Memex, in a statement. Right now, in the Deep Web, Memex is smart enough to check both text-based content and online images, videos, pop-up ads, scripts, and forms, and it can associate content from one source with another in a different format. That kind of search tech could be of great use during space missions, where spacecraft snap photos, videos, and other kinds of data with complex scientific instruments.
For example, a researcher could search through visual information from a planet or asteroid in order to learn more about its geological features, said Elizabeth Landau of JPL, or automatically analyze data from Earth-based missions monitoring the weather and climate. Scientists could also run deep searches on published scientific results from NASA data stores. All of it is with open-source code.
“We’re augmenting Web crawlers to behave like browsers — in other words, executing scripts and reading ads in ways that you would when you usually go online. This information is normally not catalogued by search engines,” Mattmann said. “We are developing open source, free, mature products and then enhancing them using DARPA investment and easily transitioning them via our roles to the scientific community.” It will be interesting to see what NASA engineers — not to mention other researchers — do with the project. The good news is, unlike with typical DARPA projects, we’ll actually get to see and use the results.