Click here to Skip to main content
16,022,301 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
Python
  1  import requests
  2  from bs4 import BeautifulSoup
  3  import pandas as pd
  4  
  5  # URL of the club finder page
  6  url = 'https://www.irishrugby.ie/club-finder/'
  7  
  8  # Send a GET request to fetch the raw HTML content
  9  response = requests.get(url)
 10  soup = BeautifulSoup(response.text, 'html.parser')
 11  
 12  # Initialize empty lists to store club data
 13  clubs = []
 14  
 15  # Find all club containers
 16  club_containers = soup.find_all('div', class_='d-inline-block col-xs-12 py-0 px-1 pb-1 mt-1 bg-white map-list__club')
 17  
 18  for container in club_containers:
 19      club_name = container.find('h3', class_='club-title').text.strip()
 20      club_address = container.find('p', class_='club.province').text.strip()
 21      club_contact = container.find('p', class_='club.website').text.strip()
 22      
 23      # Append club information to the list
 24      clubs.append({
 25          'Name': club_name,
 26          'Address': club_address,
 27          'Contact': club_contact
 28      })
 29  
 30  # Create a DataFrame from the list
 31  df = pd.DataFrame(clubs)
 32  
 33  # Save the DataFrame to an Excel file
 34  df.to_excel('irish_rugby_clubs.xlsx', index=False)
 35  
 36  print('Data saved to irish_rugby_clubs.xlsx')


What I have tried:

I am writing this python code but is giving nonetype error. Can anyone help?
Posted
Updated 17-Jun-24 0:26am
v2
Comments
[no name] 17-Jun-24 7:55am    
You have not provided the details of which line causes the error. At a guess it is the for statemant trying to access the list of club_containers. If that is the case then you need to find out why there were no clubs found in the HTML.

I answered a similar question to this a couple of days ago[^]. The problem you have is that you are making assumptions that you are getting values back in lines like this:
Python
club_name = container.find('h3', class_='club-title').text.strip()
You have chained operations together without ensuring that you have values being returned. Suppose, for instance, that you don't find any h3 elements with a class of club-title, then you have a result of None coming back (this is the equivalent of a null in other languages). When you get None back, you can't perform operations on it such as getting the text because there is nothing to perform the actual operations on. What you need to do is break operations down throughout your code if there's a chance that you can get a None back, and split out into multiple parts. I show an example of how to do this in the linked answer.
 
Share this answer
 
Have you tried checking the source of linked site?
There's the div named "d-inline-block col-xs-12 py-0 px-1 pb-1 mt-1 bg-white map-list__club", but there's no list of clubs.

Tip! You can use print function:
Python
print(club_containers)

to find out what is the content of variable :)
 
Share this answer
 
Hey!

There is a 'NoneType' object has no attribute 'text' error because it isn't capturing any text.

I used Selenium because it works better for some reason, or maybe I'm not pointing to the right html tags. Anyways, the df has columns for name, region, address, and link.

I understand not using Selenium, it's significantly slower than BeautifulSoup.

import time
import pandas as pd
from selenium.webdriver.common.by import By

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

option = webdriver.ChromeOptions()
driver = webdriver.Chrome(options = option)

# URL of the club finder page
url = 'https://www.irishrugby.ie/club-finder/'

# Open the webpage
driver.get(url)

# Allow some time for the page to load
time.sleep(5)

# Initialize empty lists to store club data
clubs = []

# Find all club containers
club_containers = driver.find_elements(By.CSS_SELECTOR, 'div.d-inline-block.col-xs-12.py-0.px-1.pb-1.mt-1.bg-white.map-list__club')

for container in club_containers:
    # Extract text or set to 'N/A' if not found
    try:
        club_name = container.find_element(By.CSS_SELECTOR, 'h3').text.strip()
    except:
        club_name = 'N/A'
    
    try:
        club_address = container.find_element(By.CSS_SELECTOR, 'div.map-list__address').text.strip()
    except:
        club_address = 'N/A'
    
    try:
        club_region = container.find_element(By.CSS_SELECTOR, 'p.m-0').text.strip()
    except:
        club_region = 'N/A'
    
    try:
        club_link = container.find_element(By.CSS_SELECTOR, 'a').get_attribute('href')
    except:
        club_link = 'N/A'
    
    # Append club information to the list
    clubs.append({
        'Name': club_name,
        'Address': club_address,
        'Region': club_region,
        'Link': club_link
    })

# Close the WebDriver
driver.quit()

# Create a DataFrame from the list
df = pd.DataFrame(clubs)
 
Share this answer
 
v7

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900