www/crawl/pkg-descr

   1 The crawl utility starts a depth-first traversal of the web at the
   2 specified URLs. It stores all JPEG images that match the configured
   3 constraints.  Crawl is fairly fast and allows for graceful termination.
   4 After terminating crawl, it is possible to restart it at exactly
   5 the same spot where it was terminated. Crawl keeps a persistent
   6 database that allows multiple crawls without revisiting sites.
   7
   8 The main reason for writing crawl was the lack of simple open source
   9 web crawlers. Crawl is only a few thousand lines of code and fairly
  10 easy to debug and customize.
  11
  12 Some of the main features:
  13  - Saves encountered JPEG images
  14  - Image selection based on regular expressions and size contrainsts
  15  - Resume previous crawl after graceful termination
  16  - Persistent database of visited URLs
  17  - Very small and efficient code
  18  - Supports robots.txt