Anyway I received a corrupted file which had seemingly a lot of zip structure because the file had a lot of instances of the text "PK" and XML file named nearby which are the indicators of the individually zipped up subfiles that make up the larger zip structure which constitutes the DOCX file. The only problem was the various parts of code were missing all the Null characters which usually surround the PK marker and the subfile name (PK is a leftover designation from the first zip program PKZip and is the initials I believe of the zip file format developer). See the two screenshots below.
|How the file looked.|
|How the file should have looked.|
So my question was, is there a way to reliably reverse this? I did find a possible answer in a program I earlier reviewed in this blog called fixgz. So fixgz.exe or it's Linux version can take a gzip file that has been transferred via FTP as text instead of the proper binary format and fix the file. I had high hopes that maybe this was the cause of the corruption. However, the program didn't work for me and clearly either fixgzip only works on gzip files, not ordinary zip ones, or transferring the DOCX via FTP in text format was not the issue.
I then found this interesting post: http://superuser.com/questions/195612/recovering-corrupted-files-uploaded-in-wrong-ftp-mode. So this gave me the idea that I really should open the program in a hex viewer and maybe try to find the instances of 0d 0a hex byte pairs which are indications of Windows line returns possibly added in a text file like fashion instead of a binary one. This may really be wrongheaded and 0d 0a hex byte pairs may be the way all Windows files, both binary and hex indicate line returns. However that notion didn't stop me from trying.
Anyway here are screenshots of the corrupted file and a good one in a the HxD editor:
|Corrupted file in hex editor.|
|Good file in hex editor.|
So that's what I did. See below for my screenshots of the original file in ZipRepair Pro and the file with the20 character substituted with 00 characters. The substituted file is the 2nd screenshot.
|With the original, ZipRepair Pro wants to skip all the files because it can't recover any.|