WWW-VL: W3 SEARCH ENGINES
Click here for:
The WWW-VL: History:
World History Index
WWW-VL: History:
Internet and W3 History
WWW-VL: History: Networking
WWW-VL: History: Software
Technical Bases
Search Techniques
Text & News Searching
Image & Sound Searching
People Searching
National Engines
Virus & Spyware Protection
Website Tools
W3 Future
Search Engines Tips & Secrets
Online Privacy
Passwords
New: DeepPeep search engine specialized in Web forms. The current beta version currently tracks 13,000 forms across 7 domains, 2009, from: WebDB group, School of Computing, University of Utah: WebDB Group Publications [hints]
- Technical Bases
- The
World Wide Web: A very short personal history by Tim Berners-Lee, W3
inventor, W3C
(World Wide Web Consortium), MIT, Cambridge, MA
-
My Decade Of Writing About Search Engines SearchEngineWatch [Danny Sullivan, editor, a
long-time search engine expert]
- "The
Anatomy of a Large-Scale Hypertextual Web Search Engine,"
by Sergey Brin and Lawrence Page [original Google search engine
algorithm designed by Google founders]
- Hilltop: A
Search Engine Based on Expert Documents by Krishna Bharat and
George A. Mihaila [new Google-used search engine]
["Algorithm
operates on a special index of "expert documents," a subset of pages
identified as directories of links to non-affiliated sources on
specific topics.
Results are ranked based on the match between the
query and relevant descriptive text for
hyperlinks on expert pages
pointing to a given result page."]
- Efficient
Crawling Through URL Ordering, by Junghoo Cho, Hector
Garcia-Molina, Lawrence Page; Stanford University ["the fraction of
the Web that is visited (and kept up to date) is more meaningful" .pdf file at archive.org -- click on last date]
- Handwriting Retrieval Demonstrations Center for Intelligent Information Retrieval, University of Mass Amherst [new software able to search handwritten text]
- The
Structure of Information Networks, by Jon Kleinberg, Cornell
University
[Kleinberg did early work in web link topology, hyperlink
structure, associated text. Includes many citation links.] 2002
The Clever Project
IBM early research, Kleinberg, et al, predates Brin and Page, Google. [The CLEVER search engine incorporated
several algorithms that made use of hyperlink structure for discovering high-quality information on the Web.
Clever also looked backward from an authoritative page to see what locations are pointing there -- ie., Hilltop]
- OmniFind, IBM [uses unstructured information management architecture (UIMA), will, according to I.B.M., lead to a third generation in the ability to retrieve computerized data. Will use "discovery systems" that will extract the underlying meaning from stored material no matter how it is structured (databases, e-mail files, audio recordings, pictures or video files).]
Stuff I've Seen: A System for Personal Information Retrieval and Re-Use S. Dumain, E. Cutrell, JJ Cadiz, G. Jancke, R. Sarin, D. Robbins; Microsoft Research, 2003-2004, ACM [SIS application built on top of modular MS Search indexing architecture; personal machine internal search of stored file formats -- doc, pdf, html, etc.]
- Block-level Link Analysis, Microsoft Research ["... a web page contains multiple semantics and hence the web page might not be considered as the atomic node. ...the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic."]
- Spelling correction as an iterative process that exploits the collective knowledge of web users, S. Cucerzan & E. Brill, Microsoft Research [.pdf], University of Penn.
WebFountain Advanced Text Analytic Software, IBM Almaden Research Center [collects, stores and analyzes massive amounts of unstructured and semi-structured text]
Fast Multiresolution Image Querying,
Algorithm on which Retrievr (Flickr search) is based (Jacobs, Finkelstein, Salesin)
New perspective on visual information retrieval, Horst Eidenberger, Vienna University of Technology [.pdf file]
Search Techniques
- DSpace digital library
system, Capture, store, index, preserve, and redistribute the
intellectual output of a research faculty in digital formats. [joint MIT
Libraries and Hewlett-Packard project with free source code]
-
Empirical Analysis of Google SafeSearch, Benjamin Edelman, Berkman Center for
Internet & Society, Harvard Law School [part of a series of projects with Professor Jonathan Zittrain]
- Google agrees to censor Chinese search results, SearchEngineWatch [Danny Sullivan, editor -- results in China version include blocking White House and Library of Congress results]
- Google Websearch Parameters, by Joost de Valk [.pdf file list of terms that when added to google search url will select different types of results.]
- Search
Term Tool, Overture [generates list of search key words]
- MySpace, YouTube and OurMuseums: Using Social Networks for Museums - Reaching Generation Why
Mountain-Plains Museums Association 2007 Web Session [high percentage of searches now coming through google bar on myspace]
- Small Museums & Technology, American Association of Museums AAM 2008 session resources.
- Wordtracker Generates
lists of related search terms [aimed at companines, to find key words that
are often searched for, but not as often used in header titles, etc.
-- becomes very useful tool for searchers by giving you dozens of
different words to try on the same search.]
Submission Services
-
FreeFind [free service providing indexes and search
facilities for individual sites]
Major Search Engines
- A9.com Search Technologies,
Amazon.com search engine, now using Windows Live.com search (MSN) -- 2007; now aimed at e-commerce sites [includes neat tool bar that allows web page notation and search history access
from anywhere. Also roll-over site info, including traffic rank,
in-bound link number.]
- AlltheWeb LiveSearch [New URL: provides information on who owns the domain, last date of changes, subdomains. Excellent tool for fast website
information. Owned by Yahoo: show nice selections, few duplicates, correct
address, if a little behind google updates -- solid searches.]
- Altavista Connections [owned
by Yahoo; Indicates recent memory cache indexing in hour notations, with eclectic collection of sites.]
- AOL Search [uses google
results; provides top ten search list, catagories]
- Ask.com (formerly Ask Jeeves) [natural
language interface, but dated links from Google.com with updated
descriptions; merged with Teoma.com].
- AskCity [new local search service featuring Yahoo user-generated content]
- Collarity Relevance Engine [Beta demonstration: community based search technology]
- eurekster search swiki beta
["search engine powered by social networking technologies" - results are based on user-based information,
and not useful for serious subject searches; lists collection of other people's searches]
- Dig In Digital Integration System [automatically combines relevant information granules into expandable-collapsible hierarchies that objectively describe knowledge relationships within and between information sources; can be purchased for internal system search]
- Factbites ["cross a search engine with an encyclopedia"]
- Gigablast [fast, solid
results, with percentage notations on site relativity; offers unique
search box that can be added for site searches; indexes all generic meta
tags -- does seem to be based on google, but with added sites; slow to reindex, as in months]
- Google Search Engine
[performs "fuzzy" searches with PageRank search matrix algorithm: [2007: google still has problems finding actual web site addresses, instead is finding computer root addresses, ie., machine addresses, rather than human viewed and link addresses; first page site results are also coming and going]
- Google Desktop Search [searches files on local computer -- must be downloaded from google. Beta test; user questions on privacy issues.]
- Google Local Search
[new feature but only useful for commercial sites --
probably paid insertion? Will this cause the death of any truly local
commercial directories? Walmart-type example comes to mind. Works really
well for restaurants. Limited non-commercial results.]
- Google Labs [new technology and
software for searches]
- Google
Personalized Search Beta [allows personal profile to
focus searches]
- Google Scholar Beta
[Searches through journal articles, abstracts and other scholarly literature, including .pdf files]
- jux2browser [uploaded button that provides one-click access comparison search on jux2 among yahoo, google, etc. Indicates which results are unique and those that are on all results. "search engines are more different than people think, typically sharing fewer than 3.5 of their top 10 results" May 2006.]
- KartOO visual meta
search engine kartoo.com [Uses Flash to visually display
search results as interactive maps. Amazingly bright search results. Also
provides search history. HTML version available, but why bother?]
- MetaCrawler [uses
results from Google, Yahoo, Ask Jeeves,
About, Overture, Altavista: lots of commercial results for non-commercial searches.]
- MSN.Com Live Search [MSN search engine seems to add 'depth' of site to search bonus points. Solid results with real differences compared to google or yahoo. Worth using.]
- NVI-DataNet Digital Library System, [on-line data gathering search based on uploaded data sets]
- Open Directory Project
[Netscape's human edited search engine, provides results to All the Web,
AltaVista,
Google, HotBot, Lycos, Teoma, Yahoo, etc. ODP editor's
notes do not follow redirects]
- Powerset, [uses 'natural' language queries with results entirely from Wikipedia]
- PINAKES
Heriot-Watt University, Edinburgh, Scotland [a subject search site]
- Proximic, "open contextual ecosystem to connect publishers, bloggers and end-users by delivering related news, articles, background information and ads in a better way. " [2007; an add-on free software program that requires FireFox]
-
Resource Discovery Network (RDN) [limited to searches for free training in different subjects]
- Scirus for Scientific Information [2004 "Best Directory or Search Engine" WebAward from the WMA]
- search.com CNET.com
[google results mixed with msn.com, ask jeeves, looksmart, etc., nice mix; provides related search options]
- Time Search [new search engine that limits results to wikipedia and google image search -- by year. From Bamber Gascoigne -- best known for his television role as chairman of University Challenge for twenty-five years, 1962-1987.]
- Vivisimo [metaserch engine, with results presented in clusters by subject]
-
WebCrawler [odd mix of results from: Google, Yahoo, AltaVista, Ask
Jeeves,
About, LookSmart, Overture, Teoma, FindWhat]
- Yahoo! [Yahoo! much improved with very fast update of moved/changed URL web site addresses; 2006. (Best to Larry Tesler.)]
- Yooci Meta Search Engine www.yooci.com [Meta search in six languages]
- ZapMeta Metacrawler
[new with good review by Chris Sherman: "ZapMeta has another feature
called "result snapshots"...this...displays thumbnail images of the page
next to the result information. Another...feature is the link to "older
versions" of result pages. This is a hardcoded link to the Wayback
Machine, allowing you to view copies of the page that have been archived
over the years."]
- ZoomInfo ["largest index of people in business in the world"]
Specific Function Search Engines
Topical Search Engines
Selected National Search Engines
- Indexes
- National Engines (in alphabetical order by country or region)
[Maintainer's note: This section has not been recently updated.
Suggestions are welcome.
Thanks to Philip George Find Outer: Search Engines for the World for suggestions.]
Virus and Spyware Protection
- Cache Bashing
Protect Your Company, by Brian Livingston [attacker uses a link at Google that leads to a cached
copy of your company's Web page, copies it and posts it to a different site -- Google eliminates your
page as a "duplicate".]
- Gibson Research Corp.
[Steve Gibson, software pioneer, offers very useful anti-spy ware and hard disk
recovery programs]
- PC
World [safe source of antivirus and antispyware
program downloads]
- Yahoo! Toolbar with Anti-Spy [can delete toolbar after download if wished, but keep anti-spy ware program -- one of the best and free]
Website Tools
HyperText Markup Language
(HTML) Home Page W3C (World Wide Web Consortium)
- HTML
Validator Web Design Group [very handy; doesn't throw out your page if
you have errors, unlike the W3C validator.]
Rensselaer
Polytechnic Institute Writing Center [search engines like sites
that are well written]
The SEO Book, by
Aaron Wall [ebook, updated daily; Aaron Wall is search engine optimization
expert]
Search Engine Spam Reporting
[Here are addresses to report
search engine spam -- not spam email -- Google has good definition.]
- Google Spam Report [amazingly fast service once documented]
- AlltheWeb.com/FAST Search: spam@fastsearch.com
- AltaVista:
search@support.altavista.com
- Inktomi/Yahoo: spamcrusader@inktomi.com
- Teoma/Ask: info@teoma.com
Search
Engines
Tips and Secrets
Slashdot News for nerds,
stuff that matters [amazingly high page rank; over 150,000 inbound
links]
Web W3 Future
- Semantic Web Expert, Ben Adida, June 2008, Yahoo! Search Blog
- Google CEO Dr. Eric Schmidt discusses "Perspectives on the Information Industry", University of Washington, 2005 [Video available, one-hour; Windows Media or Quicktime file.]
- Interview with Google CEO Eric Schmidt, Fred Vogelstein, Wired
- MSN Hires Gary William Flake, Ph.D.
Distinguished Engineer, MSN Search [formerly, Yahoo! VP Technology]
- KurzweilAI.net Ray
Kurzweil, Kurzweil Technologies [inventor reading machines, OCR, sound syn
devices; leading edge]
- MIT Media Lab,
Cambridge, MA [always a fun place to visit: see the future now]
- Marvin Minsky Home
Page, MIT Media Lab and MIT AI Lab, Cambridge, MA
[a father of
AI, mentor to Danny Hillis; and author, The Society of Mind]
- W.
Daniel Hillis, Applied Minds, Inc. [inventor, Connection
Machine]
- Patrick Henry
Winston MIT [Genesis Group for understanding
intelligence; former Director, MIT AI Lab]
- Rodney A. Brooks Director, MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
- Virtual
Reality on the Web Media Literacy Review, University of
Oregon, Eugene
- Stanford University
Search Site [first Google site, of course.]
Search Engine Industry
W3 Online Privacy
WWW-VL:
History was established as HNSource (Kansas History Gateway) on 6 March 1993.
Site maintained by George Laughead, manager, WWW-VL: United States History.
Thanks to Dr. Lynn H. Nelson, original author of the WWW-VL: History Index, the first site on the WWW-Virtual Library, created in 1993 by Tim Berners-Lee, WorldWideWeb inventor. Dr. Lynn H. Nelson's advice on
search engines: "Content, content, content."
Return to the Top. Updated: 23 February 2009.
Thanks to Janet Laughead, Century 21 Real Estate, Wellesley
MA .
WWW-VL: History: W3 Search Engines
URL:
http://vlib.iue.it/history/search/