hi i have a script in bash for scrape a web page, it work but i have a regexp not work perfect this is my code
sGCitta="fucecchio"
sGTypo="vendita-case"
sGDomain="immobiliare"
url="https://www.$sGDomain.it/$sGTypo/$sGCitta"
html_content=$(curl -s -L "$url")
xidel_output=$(xidel --xpath '
//li//div[contains-token(@class, "in-listingCardPropertyContent")] ! string-join(
(
( "price=" || tokenize(div[@class = "in-listingCardPrice"])[last()] ),
( "size=" || normalize-space(div[contains-token(@class,"in-listingCardFeatureList")]/div[contains(.,"m²")]) ),
( "link=" || a[@class = "in-listingCardTitle"]/@href ),
( "desc=" || a[@class = "in-listingCardTitle"]/@title )
),
codepoints-to-string(9)
)
' "$url")
if [ -f "temp.txt" ]; then
rm temp.txt
fi
echo "$xidel_output" | awk -F 'price=' '{gsub(/\./,"",$2); gsub(/,[0-9]+/,"",$2); print $2}' | sed -e 's/size=/;/g' -e 's/link=/;/g' -e 's/desc=/;/g' -e 's/m²//g' > temp.txt
db_file="immo.db"
while IFS= read -r row; do
sqlite3 "$db_file" "INSERT INTO $sGDomain (prezzo, link, descrizione, metratura) VALUES ($row)"
done < temp.txt
and this is a example of that extract
price=29.920,00 size=80 m² link=https:
i want have somthing like this
;29920;80;https:
but in my case return this not have at astart ; and remove all dot in txt
29920 ;80 ;https:
remove . also in link part is possible tell to awk remove only a dot in price to space ?
What I have tried:
i tryed also in this mode
#echo "$xidel_output" | sed -e 's/price=/;/g' -e 's/size=/;/g' -e 's/link=//g' -e 's/desc=/;/g' -e 's/m²/;/g' > temp.txt