跳去內容

Grub

出自維基百科,自由嘅百科全書

Grub係套開源distributed search網絡爬蟲平臺。2007年7月27號,珍寶威爾士宣布,發展緊開源搜索器Wikia Search嘅牟利嘅Wikia公司 由LookSmart買咗Grub。[1]

2000年,呢項計劃喺Oklahoma City開始,創始人係Kord Campbell、 Igor Stojanovski 同埋 Ledio Ago[2]。2003年,LookSmart, Ltd. 買咗Grub, Inc 嘅知識產權。[3] 之後一短時間,原班人馬繼續搞項計劃,出咗幾份新程式版本,但許可權(licence)就係唔公開嘅。

There were several controversial issues surrounding the Grub project in the time shortly after LookSmart aquired the project. Grub had a slight tendency to ignore a few mis-configured robots.txt files on the sites it crawled.[未記出處或冇根據] Even when the development team addressed these issues, a few webmasters continued blaming it for crawling their site too much, and not respecting their robots.txt files.[未記出處或冇根據]

Another issue was the closing of the source code base, and the apparent lack of using the crawled data for anything useful, such as a searchable index of the sites it crawled. It appears that Grub was used for a short time to seed the URL list for NetNanny, another acquisition of LookSmart.

Operations of Grub were shut down in late 2005. The site was reactivated on July 27, 2007, and the site is currently being updated. The original developers are assisting with the new deployment, and investigating the robots.txt issue, to ensure a repeat performance does not occur.

Users of Grub can download the peer-to-peer grubclient software and let it run during computer idle time. The client indexes the URLs and send them back to the main grub server in a highly compressed form. The collective crawl could then, in theory, be utilized by an indexing system, such as the one being proposed at Wikia Search. Grub is able to quickly build a large snapshot by asking thousands of clients to crawl and analyze a small portion of the web each.

維基雅 has now released the entire Grub package under an open source software license.

參攷

[編輯]
  1. Wikia, Inc. Press release. "Jimmy Wales and Wikia Release Open Source Distributed Web Crawler Tool". 27 July, 2007
  2. "Grub Inc. Investors page as archived by Archive.org, December 2000". 原先內容歸檔喺2000-12-09. 喺2018-11-29搵到.
  3. LookSmart SEC filing, 2003

連出去

[編輯]

Template:Distributed Search Engines