11.2. scrapy 命令

11.2. scrapy 命令
上一页	第 11 章 Scrapy - Python web scraping and crawling framework	下一页

		
neo@MacBook-Pro ~/Documents/crawler % scrapy     
Scrapy 1.4.0 - project: crawler

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  check         Check spider contracts
  crawl         Run a spider
  edit          Edit spider
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  list          List available spiders
  parse         Parse URL (using its spider) and print the results
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command

11.2.1.

			
neo@MacBook-Pro ~/Documents % scrapy startproject crawler 
New Scrapy project 'crawler', using template directory '/usr/local/lib/python3.6/site-packages/scrapy/templates/project', created in:
    /Users/neo/Documents/crawler

You can start your first spider with:
    cd crawler
    scrapy genspider example example.com

11.2.2. 新建 spider

			
neo@MacBook-Pro ~/Documents/crawler % scrapy genspider netkiller netkiller.cn
Created spider 'netkiller' using template 'basic' in module:
  crawler.spiders.netkiller

11.2.3. 列出可用的 spiders

			
neo@MacBook-Pro ~/Documents/crawler % scrapy list
bing
book
example
netkiller

11.2.4. 运行 spider

			
neo@MacBook-Pro ~/Documents/crawler % scrapy crawl netkiller

运行结果输出到 json 文件中

			
neo@MacBook-Pro ~/Documents/crawler % scrapy crawl netkiller -o output.json

上一页	上一级	下一页
第 11 章 Scrapy - Python web scraping and crawling framework	起始页	11.3. Scrapy Shell