Originally published Monday, September 5, 2005 at 12:00 AM
Safecrackers open up the "deep Web"
You think the Web is big? In truth, it's far bigger than it appears. The Web is made up of hundreds of billions of Web documents, far more...
Knight Ridder Newspapers
SAN JOSE, Calif. — You think the Web is big? In truth, it's far bigger than it appears.
The Web is made up of hundreds of billions of Web documents, far more than the 8 billion to 20 billion claimed by Google or Yahoo! But most of these Web pages are largely unreachable by most search engines because they are stored in databases that Web crawlers can't access.
Now a San Mateo, Calif., startup, Glenbrook Networks, says it has a way to tunnel far into the "deep Web" and extract this information.
Search engine
Glenbrook, run by a father-daughter team, demonstrated its technology by building a search engine that scoops up job listings from the databases of various Web sites, something the company claims most search engines cannot do.
But there are many other applications as well, the founders say.
"Most of the information out there, people want you to see," said Julia Komissarchik, Glenbrook Networks' vice president of products. "But it's not designed to be accessed by a machine like a search engine. It requires human intervention."
This is particularly true of Web pages that are stored in databases. Many ordinary Web pages are static files that exist permanently on a server somewhere.
But an untold number of pages do not exist until the very moment an individual fills out a form on a Web site and asks for the information. Online dictionaries, travel sites, library catalogs and medical databases are a few such examples.
Komissarchik and her father, Edward Komissarchik, say they have figured out how to analyze the forms on Web pages and understand the type of information the sites are looking for.
Then, Glenbrook's Web crawlers use artificial intelligence to walk themselves through sometimes-complex Web forms, answering questions, such as the location of their desired job, in the same way a human would.
Julia Komissarchik likens the process to cracking a safe.
![]()
"The way to think of it is, you case the joint," she said, "the scout goes through the form and tries a few options to see what the results will be. Then you have a mastermind or safecracker who gets all this information from the scout and devises a method to open the forms."
Finally, she said, the "harvesters" spring into action to gather up all the information.
"As soon as you know the combination, then you can open all the micro-safes, if you will, that are sitting there," said Edward Komissarchik.
Russian immigrants
The father and daughter immigrated from Russia in 1990. Julia Komissarchik was a math major and computer-science minor at the University of California, Berkeley.
Her father graduated from Moscow University as a math professor and was a professor and researcher who studied databases.
The pair launched a startup in the late 1990s called Better Accent that does speech analysis and helps people learn English.
The Komissarchiks, who took on Palo Alto, Calif., start-up consultant Jeff Clavier as a partner, built a jobs-search site to showcase their deep Web-search technology.
The site, www.glendor.com, culls job listings from hundreds of San Francisco Bay Area company Web sites and a major job-listing site, HotJobs.com.
For added effect, the company merged the listings with Google Maps, so that people can get a geographic sense of job opportunities.
Glenbrook's technology is not entirely new, says Gary Price, a Maryland research librarian who co-authored a book on the deep Web called "The Invisible Web."
"The whole idea of having technology fill out a form and pull results has been around for years," Price said. But he added, "I think this company could be able to do something with the idea of marketing all this data they've collected."
Glenbrook is far from alone in going after the deep Web.
Yahoo! announced partnerships with National Public Radio, the Library of Congress, the New York Public Library and others to index the content in their databases. And Google has added WorldCat, a comprehensive bibliographic database previously accessible only through libraries, to its search results.
Edward Komissarchik said one business opportunity for the company might be to collect and sell hard-to-get data to information brokers such Dun & Bradstreet.
Another possibility is to launch a specialized Internet search site, focused on an area such as local business directories or jobs.
Edward Komissarchik said Glenbrook could extract the many job listings that are stashed away in databases on corporate Web sites.
"We can go directly to the owner of the information, which is the employer," he said.
Growing market
The number of job-related Web sites has mushroomed in recent years. Several smaller sites have emerged — including SimplyHired and Indeed — whose aim is to scoop up all the job listings on various Web sites.
"It was a really a great showcase for us," Clavier said of the Glendor jobs Web site. "But it's not where we see the biggest opportunity for the company. We don't plan to launch a competitor to SimplyHired or Indeed or WorkZoo or those guys."
The Komissarchiks are especially intrigued by the possibility of collecting detailed information about local businesses — such as business hours and product information — and making it more readily available.
"The deep Web is the future," Edward Komissarchik said.
UPDATE - 09:46 AM
Exxon Mobil wins ruling in Alaska oil spill case
UPDATE - 09:32 AM
Bank stocks push indexes higher; oil prices dip
UPDATE - 08:04 AM
Ford CEO Mulally gets $56.5M in stock award
UPDATE - 07:54 AM
Underwater mortgages rise as home prices fall
NEW - 09:43 AM
Warner Bros. to offer movie rentals on Facebook

Entertainment | Top Video | World | Offbeat Video | Sci-Tech
- Madrona dad killed by stray bullet as he drove through Central Area
- SPU surprises neighbors with sale of Queen Anne rec property
- Beer-drinking bridge builders will get training from a counselor
- Matt Flynn has good day in Seahawks' 3-way QB competition
- Boy's pat on president's head captured for history
- Why dealing for Kellen Winslow makes sense for Seahawks | Steve Kelley
- Police arrest New Jersey man who confessed to killing Etan Patz
- Amazon addresses criticism at meeting
- Driver fatally shot in Central Area
- Sources: DOJ sends letters to city blasting police-reform efforts
- Opponents of gay-marriage law say they have enough signatures
843 - Mariners try to extend some other team's misery for a change
337 - Madrona dad killed by stray bullet as he drove through Central Area
235 - Komen controversy hurting Race for the Cure
213 - Sources: DOJ sends letters to city blasting police reform efforts
137 - Typical CEO made $9.6M last year, AP study finds
127 - Fact check: Ad exaggerates Obama's debt
90 - Driver caught in crossfire, fatally shot in Central Area
89 - It's been great; see you soon in my new columns
67 - Mariners look to get back on winning track against Angels
62
- Madrona dad killed by stray bullet as he drove through Central Area
- Dig into colorful history at Oregon's John Day Fossil Beds
- Get a sitter — please — for these 10 great date-night restaurants | All You Can Eat
- SPU surprises neighbors with sale of Queen Anne rec property
- Beer-drinking bridge builders will get training from a counselor
- Zumiez rebounds from recession better than most
- Boy's pat on president's head captured for history
- Driver fatally shot in Central Area
- Downtown building fetches $55M, thanks to Amazon effect
- Gates Foundation grants give local groups a boost
