What is Scrapy?

Scrapy is a powerful, open-source Python framework for web scraping and crawling. It automates the process of extracting data from websites and saves it in structured formats like JSON or CSV. With its asynchronous request handling, built-in support for proxies and cookies, and customizable spiders, Scrapy is a go-to tool for tasks such as:

  • Price Tracking

  • Market Research

  • Data Collection

Scrapy’s spider components allow you to define how to crawl and scrape specific data from web pages, making it both flexible and scalable for various scraping needs.

If you’re using Oculus to access search engines like Google, Bing, or Yandex and facing connection issues, the proxy type could be the reason. ISP Premium Proxies ensure stable and unrestricted access, preventing blocks that standard proxies might encounter. Switching to ISP Premium Proxies can help maintain smooth and reliable performance.

How to Set Up Oculus Proxies With Scrapy

1

Install Scrapy

Open your terminal and install Scrapy using pip:

pip install scrapy
2

Create a New Scrapy Project

1. Start a new Scrapy project:

scrapy startproject <project_name>

Replace <project_name> with your desired project name.

2. Navigate into the project directory:

cd <project_name>
3

Generate a New Spider

1. Create a spider to scrape a specific website:

scrapy genspider <spider_name> <target_url>

For example, to scrape http://httpbin.org/ip, lets create a spider named OculusExample:

scrapy genspider OculusExample http://httpbin.org/ip

2. This will create a new spider file inside the spiders/ directory.

4

Configure Oculus Proxy in Your Spider

Edit your newly created spider (OculusExample.py) and configure the proxy:

import scrapy

class OculusExampleSpider(scrapy.Spider):
    name = "OculusExample"
    start_urls = ['http://httpbin.org/ip']

    def start_requests(self):
        # Define the proxy
        proxy = "http://[USERNAME]:[PASSWORD]@[HOST]:[PORT]"  # Replace with your Oculus Proxy credentials
        
        # Use the proxy for all requests
        for url in self.start_urls:
            yield scrapy.Request(url, meta={'proxy': proxy})

    def parse(self, response):
        # Parse and return the IP address
        yield {
            'proxy_ip': response.text
        }

For country-specific proxies, you can enter a format like your-username-country-US to receive a US exit node.

5

Run the Spider

1. Navigate to your project directory and execute:

scrapy crawl OculusExample

2. To save the data to a file, use:

scrapy crawl OculusExample -o output.json
6

Verify the Output

When the spider runs successfully, it should return the IP address used by the proxy:

[
    {
        "proxy_ip": "{\n  \"origin\": \"123.45.67.89\"\n}"
    }
]

Congratulations! You’ve successfully integrated Oculus Proxies with Scrapy. Now you can securely and efficiently scrape data while avoiding detection and IP bans.