Certainly! I have two Excel files, each containing multiple sheets. In my existing code, using Data frames, I've implemented logic to compare a specific sheet from both files, and I'm obtaining the desired results.
However, the issue is that my code currently only retrieves that specific sheet from both files. But I would like to modify my existing code to also include the remaining sheets from the input files in my output file. How can I achieve this?
What I have tried:
The code I tried:
import pandas as pd
import numpy as np
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Border, Side
df1= pd.read_excel(r'D:\excel1.xlsx','Sheet1',na_values=['NA']).fillna('')
df2= pd.read_excel(r'D:\excel2.xlsx','Sheet2',na_values=['NA']).fillna('')
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')
df3 = pd.concat([df1,df2], sort=True, copy=True)
df3a = df3.stack().explode().groupby(level=[0,1]).apply(lambda x: ', '.join(map(str, x.unique()))).unstack(1).copy()
df3a['status'] = ""
df3a.loc[~df3a.index.isin(df2.index),'status'] = 'old'
df3a.loc[~df3a.index.isin(df1.index),'status'] = 'new'
idx = df3.stack().groupby(level=[0,1]).nunique()
df3a.loc[idx.mask(idx <= 1).dropna().index.get_level_values(0),'status'] = 'modified'
df3a['status'] = df3a['status'].fillna('same')
reorder_columns = df1.columns.tolist() + ['status']
df3a = df3a[reorder_columns]
with pd.ExcelWriter(r'D:\excel_output.xlsx') as writer:
df3a.to_excel(writer, sheet_name='Sheet1', index=True)
workbook = load_workbook(r'D:\excel_output.xlsx')
sheet1 = workbook['Sheet1']
sheet1.delete_cols(13)
workbook.save(r'D:\excel_output.xlsx')