How do I get all the clubs details in excel?

Question

1.00/5 (2 votes)

See more:

Python

  1  import requests
  2  from bs4 import BeautifulSoup
  3  import pandas as pd
  4  
  5  # URL of the club finder page
  6  url = 'https://www.irishrugby.ie/club-finder/'
  7  
  8  # Send a GET request to fetch the raw HTML content
  9  response = requests.get(url)
 10  soup = BeautifulSoup(response.text, 'html.parser')
 11  
 12  # Initialize empty lists to store club data
 13  clubs = []
 14  
 15  # Find all club containers
 16  club_containers = soup.find_all('div', class_='d-inline-block col-xs-12 py-0 px-1 pb-1 mt-1 bg-white map-list__club')
 17  
 18  for container in club_containers:
 19      club_name = container.find('h3', class_='club-title').text.strip()
 20      club_address = container.find('p', class_='club.province').text.strip()
 21      club_contact = container.find('p', class_='club.website').text.strip()
 22      
 23      # Append club information to the list
 24      clubs.append({
 25          'Name': club_name,
 26          'Address': club_address,
 27          'Contact': club_contact
 28      })
 29  
 30  # Create a DataFrame from the list
 31  df = pd.DataFrame(clubs)
 32  
 33  # Save the DataFrame to an Excel file
 34  df.to_excel('irish_rugby_clubs.xlsx', index=False)
 35  
 36  print('Data saved to irish_rugby_clubs.xlsx')

What I have tried:

I am writing this python code but is giving nonetype error. Can anyone help?

Posted 17-Jun-24 0:17am

dimpy tyagi

Updated 17-Jun-24 0:26am

Richard Deeming

v2

Add a Solution

Comments

[no name] 17-Jun-24 7:55am

You have not provided the details of which line causes the error. At a guess it is the for statemant trying to access the list of club_containers. If that is the case then you need to find out why there were no clubs found in the HTML.

3 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Pete O'Hanlon · Answer 1 · 2024-06-27T21:49:00

I answered a similar question to this a couple of days ago[^]. The problem you have is that you are making assumptions that you are getting values back in lines like this:

Python

club_name = container.find('h3', class_='club-title').text.strip()

You have chained operations together without ensuring that you have values being returned. Suppose, for instance, that you don't find any h3 elements with a class of club-title, then you have a result of None coming back (this is the equivalent of a null in other languages). When you get None back, you can't perform operations on it such as getting the text because there is nothing to perform the actual operations on. What you need to do is break operations down throughout your code if there's a chance that you can get a None back, and split out into multiple parts. I show an example of how to do this in the linked answer.

Maciej Los · Answer 2 · 2024-06-19T08:44:00

Solution 1

Have you tried checking the source of linked site?
There's the div named "d-inline-block col-xs-12 py-0 px-1 pb-1 mt-1 bg-white map-list__club", but there's no list of clubs.

Tip! You can use print function:

Python

print(club_containers)

to find out what is the content of variable :)

Posted 19-Jun-24 8:44am

Maciej Los

User 16253776 · Answer 3 · 2024-06-26T12:50:00

Hey!

There is a 'NoneType' object has no attribute 'text' error because it isn't capturing any text.

I used Selenium because it works better for some reason, or maybe I'm not pointing to the right html tags. Anyways, the df has columns for name, region, address, and link.

I understand not using Selenium, it's significantly slower than BeautifulSoup.

import time
import pandas as pd
from selenium.webdriver.common.by import By

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

option = webdriver.ChromeOptions()
driver = webdriver.Chrome(options = option)

# URL of the club finder page
url = 'https://www.irishrugby.ie/club-finder/'

# Open the webpage
driver.get(url)

# Allow some time for the page to load
time.sleep(5)

# Initialize empty lists to store club data
clubs = []

# Find all club containers
club_containers = driver.find_elements(By.CSS_SELECTOR, 'div.d-inline-block.col-xs-12.py-0.px-1.pb-1.mt-1.bg-white.map-list__club')

for container in club_containers:
    # Extract text or set to 'N/A' if not found
    try:
        club_name = container.find_element(By.CSS_SELECTOR, 'h3').text.strip()
    except:
        club_name = 'N/A'
    
    try:
        club_address = container.find_element(By.CSS_SELECTOR, 'div.map-list__address').text.strip()
    except:
        club_address = 'N/A'
    
    try:
        club_region = container.find_element(By.CSS_SELECTOR, 'p.m-0').text.strip()
    except:
        club_region = 'N/A'
    
    try:
        club_link = container.find_element(By.CSS_SELECTOR, 'a').get_attribute('href')
    except:
        club_link = 'N/A'
    
    # Append club information to the list
    clubs.append({
        'Name': club_name,
        'Address': club_address,
        'Region': club_region,
        'Link': club_link
    })

# Close the WebDriver
driver.quit()

# Create a DataFrame from the list
df = pd.DataFrame(clubs)

How do I get all the clubs details in excel?

3 solutions

Solution 3

Solution 1

Solution 2

Add your solution here

Preview 0