←back to thread

On Building Git for Lawyers

(jordanbryan.substack.com)
162 points jpbryan | 1 comments | | HN request time: 0.209s | source
Show context
OutOfHere ◴[] No.42141277[source]
"git for x" is overrated.

Git can compare binary diffs like Word files by using custom diff drivers and external text conversion tools. Git allows you to configure a process where these files are converted into a readable format for diffing.

Define a custom diff driver. Create or edit a `.gitattributes` file in your repository and associate Word files with a specific diff driver, for example: `*.docx diff=word`. Then, configure Git to use an external text conversion tool for this driver. This is done by running a command like `git config --global diff.word.textconv "your-conversion-command"`, where `your-conversion-command` is a tool or script capable of extracting text from Word files.

`pandoc` can extract text from Word documents, and you can configure it by running `git config --global diff.word.textconv "pandoc --to=plain"`. Another option is `docx2txt`, which can be set up with a command like `git config --global diff.word.textconv "docx2txt.pl -"`.

Running `git diff` on Word files will pass them through the configured conversion tool and display the differences in their text content. This approach is ideal for comparing the textual content of Word files but doesn’t account for formatting or binary-level differences. If you need a detailed binary comparison, you can integrate an external diff tool with Git by using `git difftool`.

replies(2): >>42141370 #>>42141566 #
_boffin_ ◴[] No.42141566[source]
you can also just have whatever tool extract docx or doc from its archive and then do the diff on that directory as it's just xml files.
replies(1): >>42142059 #
1. OutOfHere ◴[] No.42142059[source]
Yes, but it makes sense to do it via git, considering that git is the solution for revision control.