Multi-threaded tool to download images from the Google search engine based on the objects specified. It helps to scrape the images required for your model building, reducing the manual efforts.

Features

  • You can specify the output directory
  • Runs in both headless and head mode
  • You can specify the number of images to download
  • You can specify the maximum number of google suggestions to use
  • You can specify the maximum workers to use for ThreadPool

Screenshots

Example screenshot

Setup

Clone this repo using

git clone https://github.com/Anil-45/ImageCrawler.git

Install the required modules using

pip install -r requirements.txt

Usage

  • --object Specify comma seperated strings to search for
  • --out_dir Specify output directory(default: ./images)
  • --headless Run with or without web driver GUI open
  • --max_count Maximum number of images to download(default: DEFAULT_IMG_COUNT)

Example to run in background:

python main.py --object "cat, dog" --headless --out_dir "./images"  --max_count 25

Example to run in foreground:

python main.py --object "cat, dog" --out_dir "./images"  --max_count 25

You can configure more parameters using constants.py

DEFAULT_IMG_COUNT = 50 specifies the number of images to download. MAX_WORKERS = 50 specifies maximum workers to use for ThreadPool. MAX_SUGGESTIONS = 25 specifies the number of URL suggestions by Google to be used. If you are trying to download a large number of images, keep this value high.

You can find the logs in image_crawler.log

Room for Improvement

  • Add user interface

Contact

Created by @Anil_Reddy

License

This project is available under the MIT.

Disclaimer

This tool downloads the images shown based on Google ranking. Some of them may be subject to copyright. Please be aware while using them.