CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a challenge-response test used in computers to evaluate whether or not the user is human.
Captcha is used to prevent automation of bot traffic due to security reasons and to avoid such attacks. Captcha is vastly used to stop submitting automata forms by bots.
So, we can say that selenium and captcha are used for completely different work. One(selenium) is for automation and other(captcha) is for stop automation. Here is that challenge to bypass captcha while web scraping with selenium. Let’s look t the solutions.
Avoid getting detected by reCAPTCHA
- Avoid using conventional viewports with the help of selenium scripts.
options = webdriver.ChromeOptions()
options.add_argument("--window-size=1100,1000")
- Remove automation tags from the browser
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
- Change User Agent at each request
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
- Slow down your script
import time
time.sleep(1) # sleep for 1 second
- Use random
import time
import randomtime_to_sleep = random.randint(5,10)
time.sleep(time_to_sleep) # sleep b/w 5 to 10 seconds
Locate the reCAPTCHA element and click on it
Image source
{{< toc >}}
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.inipec.gov.it/cerca-pec/-/pecs/companies")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//span[@id='recaptcha-anchor']"))).click()
Conclusion
As we all know, many websites do not allow automation, and I am not advocating for it. This essay was written only for educational reasons. So, exercise caution when using this script to avoid Google reCaptcha and enjoy automation.