I am using a Bash script
sGDomain="idealista"
sGCitta="fucecchio-firenze"
sGTypo="vendita-case"
iGPagina=1
while :; do
url="https://www.$sGDomain.it/$sGTypo/$sGCitta/lista-$iGPagina.htm"
html_content=$(curl -s -L -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0" "$url")
echo "$html_content" > htmlcompleto.txt
if [[ ! $html_content =~ "Successiva" ]]; then
break
fi
xidel_output=$(xidel --silent --xpath '
//div[contains(@class, "item-info-container")] ! string-join(
(
( "price=" || normalize-space(.//span[contains(@class, "item-price")]/text()[1]) ),
( "size=" || normalize-space(.//span[contains(@class, "item-detail") and contains(text(), "m2")]) ),
( "link=" || normalize-space(.//a[contains(@class, "item-link")]/@href) ),
( "desc=" || normalize-space(.//p[contains(@class, "ellipsis")]) )
),
codepoints-to-string(9)
)
' -)
if [ -f "temp.txt" ]; then
rm temp.txt
fi
echo "$xidel_output" | sed -e "s/desc=\(.*\)\(['\"]\)/desc=\1 /g" > semi.txt
sed -i 's/\([0-9]\{1,\}\)\.\([0-9]\{1,\}\),[0-9]\{2\}/\1\2/g' semi.txt
sed -i 's/m²//g' semi.txt
cat semi.txt >> debugtxt.txt
db_file="immo.db"
while IFS= read -r line; do
prezzo=$(echo "$line" | awk -F 'price=' '{print $2}' | awk -F 'size=' '{print $1}')
size=$(echo "$line" | awk -F 'size=' '{print $2}' | awk -F 'link=' '{print $1}')
link=$(echo "$line" | awk -F 'link=' '{print $2}' | awk -F 'desc=' '{print $1}')
descrizione=$(echo "$line" | awk -F 'desc=' '{print $2}')
if [[ $descrizione =~ "asta" ]]; then
asta=1
else
asta=0
fi
sqlite3 "$db_file" "INSERT INTO $sGDomain (prezzo, link, descrizione, metratura, asta) VALUES ('$prezzo', '$link', '$descrizione', '$size', $asta)"
done < semi.txt
((iGPagina++))
done
to search a web page with a specific XPath expression. Although I believe the XPath is correct, the script fails to find anything on the page. Web page URL: https://www.idealista.it/vendita-case/fucecchio-firenze/lista-18.htm. XPath expression used: xidel_output=$(xidel --silent --xpath ' //div[contains(@class, "item-info-container")] ! string-join( ( ( "price=" || normalize-space(.//span[contains(@class, "item-price")]/text()[1]) ), ( "size=" || normalize-space(.//span[contains(@class, "item-detail") and contains(text(), "m2")]) ), ( "link=" || normalize-space(.//a[contains(@class, "item-link")]/@href) ), ( "desc=" || normalize-space(.//p[contains(@class, "ellipsis")]) ) ), codepoints-to-string(9) ) ' -) **
Expected result: I expect to extract the price, the listing link, the description, and the square meters from each listing on the web page. i tryed also this xpath expression
//div[contains(@class, "items-container items-list")] ! string-join(
(
( "price=" || normalize-space(.//span[contains(@class, "item-price")]/text()[1]) ),
( "size=" || normalize-space(.//span[contains(@class, "item-detail") and contains(text(), "m2")]) ),
( "link=" || normalize-space(.//a[contains(@class, "item-link")]/@href) ),
( "desc=" || normalize-space(.//p[contains(@class, "ellipsis")]) )
),
codepoints-to-string(9)
)
' -)```
with items-container items-list, but nothing
to search a web page with a specific XPath expression. Although I believe the XPath is correct, the script fails to find anything on the page. Web page URL: https://www.idealista.it/vendita-case/fucecchio-firenze/lista-18.htm. XPath expression used:
xidel_output=$(xidel --silent --xpath ' //div[contains(@class, "item-info-container")] ! string-join( ( ( "price=" || normalize-space(.//span[contains(@class, "item-price")]/text()[1]) ), ( "size=" || normalize-space(.//span[contains(@class, "item-detail") and contains(text(), "m2")]) ), ( "link=" || normalize-space(.//a[contains(@class, "item-link")]/@href) ), ( "desc=" || normalize-space(.//p[contains(@class, "ellipsis")]) ) ), codepoints-to-string(9) ) ' -) **
What I have tried:
Expected result: I expect to extract the price, the listing link, the description, and the square meters from each listing on the web page. i tryed also this xpath expression
xidel_output=$(xidel --xpath '
//main//div[contains(@class, "item-info-container ")] ! string-join(
(
( "price=" || normalize-space(.//span[contains(@class, "item-price h2-simulated")]/text()[1]) ),
( "size=" || normalize-space(.//span[contains(@class, "item-detail") and contains(text(), "m2")]) ),
( "link=" || normalize-space(.//a[contains(@class, "item-link")]/@href) ),
( "desc=" || normalize-space(.//p[contains(@class, "ellipsis")]) )
),
codepoints-to-string(9)
)
' "$url")
but nothing