Introduction
Selenium is a suite of tools that enable the automation of web browsers across multiple platforms. It is widely used in the automated testing of websites/webapps but its usage is not limited to testing only, other frequent, boring, repetitive and time-consuming web activities can and should also be automated.
This is a cut-to-the-chase post on how to use one of Selenium's components, i.e., WebDriver to automate the given use case. Pray continue.
Use Case
Get for-sale Honda Civic ads posted on LA’s Craigslist and furnish the related information in a spreadsheet:
- Navigate to Craigslist Los Angeles page (https://losangeles.craigslist.org/)
- Click on “cars+trucks” link, under “For sale” section:
- On the next page, click on “BY-OWNER ONLY” link:
- On the next page, In the “MAKE AND MODE” textbox, enter “Honda civic”, a link will appear, click on it:
- The main search page will display 120 ads.
- Go over top 5 of them and fetch these fields:
- Title
- Transmission
- Fuel
- Odometer
- Ad link
- Furnish the fetched information in a spreadsheet (this number can certainly be changed/adjusted by slightly tweaking the code. The code comments duly point out the location where this number can be changed)
- Once the top 5 ads are processed, save the spreadsheet
Technologies Used
- Selenium Webdriver: It helps in the automation of browsers (Chrome, Firefox, Internet Explorer, Safari, etc.). It drives the browser natively as the user would on his/her own system. For this implementation, I have chosen the Firefox (geckodriver) webdriver.
- Node.js: The programming language of choice here is JavaScript and the runtime is Node.js.
- Exceljs: Read/Write/Create/Manipulate Excel spreadsheets using this utility.
Setting Up
Code Overview
The initialization block has the usual stuff happening here; the creation of selenium objects such as webdriver, By, until, firefox, firefoxOptions and the driver along with the excel object (by requiring 'exceljs
' module).
const webdriver = require('selenium-webdriver'),
By = webdriver.By,
until = webdriver.until;
const firefox = require('selenium-webdriver/firefox');
const firefoxOptions = new firefox.Options();
firefoxOptions.setBinary('/Applications/Firefox.app/Contents/MacOS/firefox-bin');
const driver = new webdriver.Builder()
.forBrowser('firefox')
.setFirefoxOptions(firefoxOptions)
.build();
const excel = require('exceljs')
Note: To enable headless browsing (no browser window spawning when this option is turned on), uncomment the following line:
The rest of the code has three async
methods in total:
getcarlinks
The following method retrieves the ad links on the first page, 120 of them and returns them in an array. Following is the further logical breakdown of the function:
- LA Craigslist main page ->
- cars+truks ->
- By-Owner Only ->
- auto make model = "honda civic"
- On the main search page, collect all the car ad links and return them in an array
Source code:
async function getcarlinks() {
await driver.get('https://losangeles.craigslist.org/')
await driver.findElement(By.linkText('cars+trucks')).click()
await driver.findElement(By.linkText('BY-OWNER ONLY')).click()
await driver.findElement(By.name('auto_make_model')).sendKeys('honda civic')
await driver.wait(until.elementLocated(By.linkText('honda civic')), 50000)
.then(
elem => elem.click()
)
let elems = await driver.findElements(By.className('result-info'))
let linktagarr = await Promise.all(elems.map(
async anelem => await anelem.findElements(By.tagName('a'))
))
return await Promise.all(
linktagarr.map(
async anhref => await anhref[0].getAttribute('href')
)
)
}
processlinks
This method:
- Is passed the car links array as obtained by the function above (
getcarlinks
) - Sets up a new workbook
- Adds a new worksheet to the workbook, named 'CL Links Sheet'
- These columns are added to the worksheet: Sr Num, Title, Transmission, Fuel, Odometer and link to the car's ad page
- For each link in the links array, all the way till 5 elements (otherwise, the app will take a long time to process all the 120 links, this setting can be changed however to whichever number is deemed feasible), it does the following:
- Increments the sr (Sr Num) field in the spreadsheet
- 'gets' the given link
- Inside each ad page, look for these: title, transmission, Fuel, Odometer and the link
- Add a new row with the fetched/furnished info
- After processing the given links, it saves the spreadsheet with this name: output.xlsx
Source code:
async function processlinks(links) {
const workbook = new excel.Workbook()
let worksheet = workbook.addWorksheet('CL Links Sheet')
worksheet.columns = [
{ header: 'Sr Num', key: 'sr', width: 5 },
{ header: 'Title', key: 'title', width: 25 },
{ header: 'Transmission', key: 'transmission', width: 25 },
{ header: 'Fuel', key: 'fuel', width: 25 },
{ header: 'Odometer', key: 'odometer', width: 25 },
{ header: 'link', key: 'link', width: 150 }
]
for (let [index, link] of links.entries()) {
if (index < 5) {
let row = {}
row.sr = ++index
row.link = link
await driver.get(link)
let elems = await driver.findElements(By.className('attrgroup'))
if (elems.length === 2) {
row.title = await elems[0].findElement(By.tagName('span')).getText()
let otherspans = await elems[1].findElements(By.tagName('span'))
for (aspan of otherspans) {
let text = await aspan.getText()
let aspanval = text.match('(?<=:).*')
if (text.toUpperCase().includes('TRANSMISSION')) {
row.transmission = aspanval.pop()
}
else if (text.toUpperCase().includes('FUEL')) {
row.fuel = aspanval.pop()
}
else if (text.toUpperCase().includes('ODOMETER')) {
row.odometer = aspanval.pop()
}
}
}
worksheet.addRow(row).commit()
}
}
workbook.xlsx.writeFile('output.xlsx')
}
startprocessing
This function chains getcarlinks
and processcarlinks
by calling them in a sequence (JS internally promise chains these functions). This function is called to start the app, in other words, it's the entry point function:
Source code:
async function startprocessing() {
try {
let carlinks = await getcarlinks();
await processlinks(carlinks);
console.log('Finished processing')
await driver.quit()
}
catch (err) {
console.log('Exception occurred while processing, details are: ', err)
await driver.quit()
}
}
startprocessing()
There you have it, you can download the attached source code to test this app and extend its functionality to better suit your requirement. You can also find the code on my GitHub page.
Important Links