Click here to Skip to main content
16,022,069 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I have below dataframe with only one column as value

abc,1,2,345,765,876,Kumar r,Raghvan ,04041996

abc,1,2,345,765,876,"sam Bailey,20541789 #here double quote already present after 6th comma

abc,1011,2,32,678,,,,,

I am looking for regular expression in pyspark which add quotes after 6th comma and before digits .

expected output for above values are below

abc,1,2,345,765,876,"Kumar r,Raghvan" ,04041996

abc,1,2,345,765,876,"sam Bailey",20541789

abc,1011,2,32,678,,,,,

I have tried with below code but not received expected outcome

What I have tried:

C#
df_with_quotes = df.withColumn("data_with_quotes",regexp_replace(col("data"), r"((?:[^,],){6})([^"].[^"$])(,[^,]+$)", r'\1"\2"\3'))
Posted
Updated 21-Sep-24 22:09pm
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900