Introduction
Instead of merging ResX files by treating them as text files, which the various merge tools I've tried do, this tool instead treats the ResX files as ResX files. It loads them in using a ResXResourceReader
, then compares and merges into a single ResX file using a ResXResourceWriter
.
Background
I use ResX files (with no code generation) to store/retrieve localized strings. Whenever I do a build merging various branches from other developers on this project, I spend more and more time dealing with our primary ResX file and getting it to merge correctly. The problem has only gotten worse over time to the point where I was spending more time dealing with merging this one file over everything else in the build process combined.
The problem I was finding was that all the various merge tools treated ResX files as straight text files and tried to do line by line merging, instead of entry by entry merging. One solution I found recommended loading each ResX file, sorting it by key, spitting it out sorted, then having the various merge tools compare and merge the sorted ResX files.
I tried this - it actually made my problem far worse. I went from about 500 'merge conflicts' (where the merge tool didn't know which line to use and needed human input) to about 3,500 merge conflicts (this was from a ResX file that has grown to about 4,000 entries).
Finding no good alternatives to my growing problem, and I didn't want to try and figure out a different localization scheme, I instead elected to develop my own ResX merge tool. I've used it for the last couple of builds and it works great and quickly on nearly 12,000 entries (4,000 entries per file). Now instead of dealing with 500 merge conflicts, I quickly deal with about 10 - and it takes less than a minute to pick which ones I want to keep and get rid of the bad entries.
An example of a merge conflict is the attached image:
Note how the line that is conflicting is '</data>' versus '<data name="Axillary" xml:space="preserve">'. One of the reasons for this is not every item has a comment - so the line entries get mismatched from one branch of the code to the next. Another reason is that the merge tool is treating the resX file as a text file and merging it line by line.
If I'm not careful when merging this, I can end up with any number of problems resulting a corrupt resX file:
- I've had instances where I had extra '</data>' tags.
- I've also had instances where I had missing '</data>' tags.
- An other problem is I've had comments show up merged into the wrong entry.
- I've lost entire entire entries.
- I've gotten entire duplicates of entries.
For instance in the above,if I don't merge it correctly, I can have an extra '</data>' before the entries, and a missing one afterwards. I can also lose an entry if I pick only one of the two entries. I have taken the tactic of picking both entries and then end up with duplice entries. For instance, 'Integration Time' might be in the file on the right, just further down, and since I'm merging in entries from both sides I get two 'Integration Time' entries.
The auto-conflict/merge also doesn't work much better. I've had it make similiar mistakes, where the lines get mismatched.
For smaller files, have 20 or so conflicts is not a big problem. But on larger files like mine, I was hitting 500+ merge conflicts everytime I merged branches to do a build.
One solution I tried before this was Tom Clement's solution (Solving the .resx Merge Problem) where I had it sort each resX file before the merge, and then have KDiff3 do the actual merge. This made my problem far worse. I went from about 500 merge conflicts to nearly the entire file being a merge conflict.
For instance, let's say the beginning of the first file (after sorting) is:
<data name="A" xml:space="preserve">
<value>The letter 'A'.</value>
</data>
<data name="B" xml:space="preserve">
<value>The letter 'B'.</value>
</data>
Now the second file (after sorting) is:
<data name="A" xml:space="preserve">
<value>The letter 'A'.</value>
</data>
<data name="A1" xml:space="preserve">
<value>'A1' is a steak sauce.</value>
</data>
<data name="B" xml:space="preserve">
<value>The letter 'B'.</value>
</data>
In the above scenario, every entry starting with A1 and below it would now be a merge conflict, requiring me to manually pick the left file, the right file, or both. Again, for 20 or so entries that isn't much of a problem. But when you have around 4,000 entries and growing, that's a huge problem to have to manually merge 12,000 lines (every entry has 3 to 4 lines) every time you merge in a branch.
My solution:
- I use
ResXResourceReader
to parse each ResX file. - Once I load a
ResXDataNode
, I check to see if its key (in lower case) is in my sorted conflict list (private SortedList<string, ResXConflictNode> mConflicts = new SortedList<string, ResXConflictNode>();
).
- ResXConflictNode is a simple class:
public ResXDataNode BaseNode;
public ResXDataNode LocalNode;
public ResXDataNode RemoteNode;
public ResXConflictNode(ResXDataNode @base, ResXDataNode local, ResXDataNode remote)
{
BaseNode = @base;
LocalNode = local;
RemoteNode = remote;
}
-
This class is used to store a copy of the 3 nodes: base, local, and remote.
-
If the key is not in the list of Conflicts, it checks to see if it exists in my other sorted list (private SortedList<string, ResXSourceNode> mOutput = new SortedList<string, ResXSourceNode>();
)
-
ResXSourceNode
is also a simple class:
public ResXSource Source;
public ResXDataNode Node;
public ResXSourceNode(ResXSource source, ResXDataNode node)
{
this.Source = source;
this.Node = node;
}
-
ResXSource
, is a simple enum that has the Flags() attribute and 3 primary entries: base = 1, local = 2, and remote = 4.
-
I use source to track where the node is being loaded from. Additionally, as long as it is identically from one source to the next, source can be set to multiple sources. For instance, '3' would be base and local, but not remote, and 7 would be the node is identical in all 3 files.
-
If the node doesn't exist, I simply add it. If it does exist, I make sure the key/name, the value, and comment are all equal (case sensitive). If they aren't, I remove the entry from this list, and start a new conflict entry in the other list.
-
After it finishes parsing all three files, I loads the sorted conflict nodes in the displayed DataGridView, then followed by the nodes that were not conflicts. So you will always see conflict nodes first.
-
Additionaly, non-conflic nodes will only have a single row per entry in the DataGridView. Conflict nodes will have up to 3 rows (1 for each source). The user can then determine which entry to keep and delete the other conflict entries.
-
When the user clicks 'Save' the DataGridView will sort all the entries by key/name, then using a ResXResourceWriter
to write each entry into the output file.
-
One benefit of this is that it will throw an error (and display it to the user) if an entry in the DataGridView cannot be made into a valid ResXDataNode
. This may be due to duplicate keys/names. Or the entry has invalid characters.
Basically, instead of trying to parse and merge each ResX file as a text file, my tool parses and merges each ResX file as a ResX file. You don't have to worry about extra or missing </data> tags, or deleted/duplicate entries.
There's probably a more optimized way to handle the list of nodes - you can tweak/rework as needed.
Using the Code
What this tools does: Merges 3 ResX files (base, local, and remote) and merge them into a single output ResX file.
What this tool does not do: merge anything else. You still need a more robust/general merge tool for all your other files.
Additional key information: I use GitHub, along with GitExtensions (a Visual Studio add-in), and KDiff3 for my primary merge tool. This tool has only been tested with that setup - any other setup may not work out of the box. However, the source code should get you on the right track.
How I use it:
- Using GitExtensions in Visual Studio, I configure it to use my ResX merge tool instead of KDiff3. I don't set it as a difftool either, only as my primary merge tool.
- I place my ResX merge tool inside of 'C:\Program Files\KDiff3\' where I have KDiff3 installed.
- The tool is not smart or configurable. It's hard coded to look for a KDiff3.exe file in the same folder. You will need to alter and recompile if you want it to work with a different merge tool.
- When I run a merge in Visual Studio, it launches my merge tool. My merge tool checks the command line arguments it receives. It is expecting 5 arguments:
- Full Path to Base file
- Full Path to Local file
- Full Path to Remote file
- "-o"
- Relative Path to output file
- If it doesn't receive exactly 5 arguments, it aborts, and passes the command line arguments on to KDiff3.exe and launches it.
- It may only have a path to a base file & local file, or base file & remote file. Either way, the file doesn't exist in either the local branch or the remote branch. In which case, we don't need to do a file contents comparison - the user just needs to decide to either keep or delete the file. Which can be handled by KDiff3.
- If the output file extension isn't 'resx', it also aborts. Again, launching KDiff3 passing it the command line arguments - if it isn't a ResX file, then the tool doesn't care about it.
- Once it passes these basic requirements, it loads and parses each of the 3 files, and then populates a DataGridView which can be directly edited:
- New rows can be added
- Existing rows can be altered
- Rows can be deleted
- After it populates the DataGridView, it sorts the rows so that conflicts are at the top, followed by entries that are only in the base file (this generally means the row was deleted in the branch), then by entries that are local or remote only (should mean new items), and then entries that exist in all the files.
- For conflicts, it compares key (case sensitive), value, and comments to determine if a row is different.
- For base only, there is a checkbox (on by default - but it should remember whether you turn it on/off the next time you run the tool) telling it to exclude base only entries from the output.
- I kept resolving conflicts simple: every conflict will have 3 rows: base, local, and remote. You pick which one is 'correct' and delete the other 2.
- Finally, I click Save. If it successfully saves, it closes out, and does not launch KDiff3.
- If it fails to save, it displays an error message of the Exception thrown. At which point, you can try to correct the problem, or you can click the Cancel button. A common error might be that a key is not unique - this is due to not deleting all the unwanted rows in a set of conflict rows.
- If instead I click cancel or close the tool, it defaults back to launching KDiff3 and passing the command line arguments.
One tidbit: I did not build any kind of undo/redo functionality. You mess up and delete an entry or alter an entry and forget what it originally was, you will need to start over (or build your own undo/redo functionality). However, I did include a 'Restart' button. It just clears out the DataGridView and repopulates it with the starting entries it had (so there is basically an 'undo all', but no undo/redo).
Anyway, if you are having trouble with ResX files being merged as text, download the source code, and give it a try. You can manually test it with 3 ResX files if you launch it from the command line with 5 arguments above.
Points of Interest
I'm not sure why, but Git passes the first 3 paths (base, local, and remote) as full/absolute paths, but for the output path, it overrides the default CurrentDirectory (System.IO.Directory.GetCurrentDirectory()) when it launches my tool, and then only passes a relative path for the output file.
To me, it would have made more sense to just pass all 4 paths as absolute paths. But oh well. It works.
42!
History
- 24th October, 2016: Initial version
- Deeksha Shenoy corrected some things for me.
- I added more the Background section in an attempt to better explain what the problem was, and how my tool works better.