Wednesday, November 23, 2016

How to Fix Corrupt OpenOffice and LibreOffice Files that Refuse to Open Producing "Format" or Other Errors

As some of you probably know all OpenOffice and LibreOffice files (at least the Writer, Calc and Impress files are) are conventionally zipped collections of mostly XML sub-files and images.

Follow these directions to fix OpenOffice and LibreOffice XML formatting errors (and maybe other errors as well) where the file refuses to open. The example will use a corrupt Impress Presentation that refuses to open, but this tutorial could apply to all of the different types of OpenOffice and LibreOffice files or at least the Writer, Calc and Impress files. 
  1. Try opening the file in LibreOffice Impress. Note the subfile where the error occurs. These errors may be confined to the "content.xml" or the "styles.xml" subfiles.
  2. Right click on your file and choose rename or hit the F2 key and add ".zip" to the end of the file. For instance "Bhagavad.odp" becomes "".
  3. Unzip the file. If all goes well and you don't get an error, skip to step 6.

  4. If you get an error using Windows, Linux or Mac's built-in unzipping facilities, before trying to unzip again, repair the zip by downloading Info-ZIP's project's Zip program (not InfoZip's unzip program oddly enough) and opening up a command line.

    Use the command:

    zip.exe -FF "Bhagavad.odp" --out 

    Replace "
    Bhagavad".odp with your own file's name.
  5. To get the most information from corrupt zipped files, in addition to using Info-ZIP's zip program to repair the file, I recommend using the 7zip project's standalone console versionUse the command: 
    7za.exe x "" -o"Bhagavad_repaired_output"
    Notice that the lack of space after the "-o" is deliberate. If the results are still unsatisfactory, try downloading the GUI regular version of 7zip and using the stand alone executable found in the installation folder. It behaves a little differently than "standalone console version," and may produce better results for our purposes.

  6. In free Notepad++ open the file the error referred to (either content.xml or styles.xml).
  7. Hit the Ctl and H keys to open the Replace Window.
  8. In the "Search Mode" section in lower left hand corner of the Replace Window, move the radio button from "Normal" to "Extended".
  9. Without closing the Replace Window, click back in the document and hit Ctl-A to select all the text.
  10. Click back in the Replace Window to activate it and again and replace all instances of "><" with ">\r\n<". You'll need to hit the "Replace All" button after putting "><" without the quotes in "Find what" field and ">\r\n<" in the "Replace with" field. What this is doing is putting a new line after every tag. This will allow us to more easily locate the errors. It does not harm the XML files to be in this form rather than in one long line with one tag following the other without any new lines, which is the former state.
  11. Save your changes the content.xml or styles.xml file in NotePad++ and then switch back to your list of unzipped sub-files in Explorer. Hit Ctl-A to select all the files in your unzipped odp file folder. Right click on any part of the selection and send the files to a "Compressed (zipped) folder".
  12. Rename the resulting zip file to any name as long as it ends in ".odp" and try opening the file again in Impress.
  13. Make note of the now easier to find location of the error and return to your content.xml or styles.xml file in NotePad++.
  14. Find the line number of the error indicated and move the cursor to the character indicated. The column number is the character number in the line. The next character for instance in the above error, character 137 is where the error begins.
  15. Notice in the error line 10933, the "draw: mirror vertical" attribute is defined twice, once as equal to "false" and once as "true". In this example, we fix the XML error by removing the duplicate attribute. If your original intention for your drawing in this instance was to have the attribute equal "true", then after you have removed the second instance of the attribute, change the first one from false to true. 
    At this point you can go through the rest of the document and look for similar errors, using the Find and Replace facilities of NotePad++. You'll have to be a little clever to figure out what the pattern of your errors is. For instance in this example the second attribute is always followed by the
    draw:type="ooxml-rect". So you could search for all the instances of draw:mirror-vertical="true" draw:type="ooxml-rect" and replace it with just draw:type="ooxml-rect".

    If you suspect you have fixed the file already, or each kind of XML error is different, you can also rezip all your unzipped files after each XML fix (be sure to save the content.xml or styles.xml changes you have made before doing this), change the extension back to ".odp" and trying to open the file again to see where the next error is.

    Note too I moved the radio button back to 
    Normal Search Mode before hitting the "Replace All" button.
  16. Also take a look at the end of your content.xml or styles.xml file. If it is a lot of nonsense mixed up XML and/or it ends abruptly without the proper file ending XML like </office:presentation></office:body></office:document-content>, then all is not lost, you can look back through the file and try to identify where the XML becomes corrupt. Look for the place in your file where XML tags start missing opening "<" or closing ">" brackets. Remove all this bad XML. The place where the bad XML begins is probably where your OpenOffice or LibreOffice program will indicate that the XML error is anyway.
    After you have truncated the file, and left only good XML tags, you can either add the correct file ending XML tags if you are conversant with it (few people are I'm afraid) or you can use the program xmllint to do it for you. Xmllint is available for fast download for Windows here for other operating systems and other Windows versions, try here.
    Use the following command to add proper end tags to the XML. Note xmllint claims to be able to truncate bad XML and remove bad tags before adding proper ending tags, but I have found this facility to be lacking. Anyway the command is:
    xmllint --recover content.xml -o content.xml
  17. If you haven't done it already, save your content.xml; or styles.xml changes. Then delete your first rezipped version of your file in Explorer (you may need to shut down Impress), again select all your files in and rezip them. Of course if you have copied xmllint to you OpenOffice or LibreOffice unzip folder, then you will want to avoid selecting it as part of the rezipped file collection.

    Rename the resulting zip file to your original file underscore, "repaired" like "Bhagavad_repaired.odp" 
    or something like that, and try opening the repaired file again.

No comments:

Hasleo Data Recovery FreeV3.2 - Free as in Freeware - Permanently from Hasleo Software "Hasleo Data Recovery FreeV3.2 100% Free Data Recovery Software...