Home Page

All the Graphics



Embedded Graphics and Word File Bloat
By Thiravudh Khoman

Recently, on the "Bangkok General" mailing list, there was a question from Frank Lombard as to why Microsoft Word 97 files, when embedded with graphics files, balloon in size so dramatically. Or more to the point, why they grow several times larger than the embedded graphic files themselves.

The most salient responses from the Bangkok General community (sorry, I forgot who said what) suggested:

  • To use "Save As" as opposed to "Save" when saving files.
  • To disable "Fast Save" in Word to prevent the saving/accumulation of revision information.
  • Due to Word inefficiencies, to save graphic embedded files in Adobe Acrobat format instead.

While these are good suggestions - and I tend to agree with them - they still don't quite address the mystery of what causes those bloated Word files. Anyway, I decided to run a few tests.

I started out with a JPEG file, being a photo shot from a digital camera. Since my digital camera is set to capture pictures at 768 x 1024 pixels, most of my saved files are in the range of 135-145Kb each. What happens when I tried inserting this file into Word 97 and Word 2000 respectively? See below:

File Name File Size in Kb Notes
908.JPG 142,277 Unaltered graphic file
908-97.DOC 997,888 Saved with Word 97
908-2K.DOC 163,328 Saved with Word 2000

Oops, I seem to have re-created the problem. But wait, Word 2000 seems to handle the situation much better than Word 97. Let's try another file:

File Name File Size in Kb Notes
912.JPG 136,592 Unaltered graphic file
912-97.DOC 1,112,576 Saved with Word 97
912-2K.DOC 157,696 Saved with Word 2000

Yep, this seems to confirm the first example and the fact that Word 97 seems to be the culprit. But hold on - take a look at these:

File Name File Size in Kb Notes
945.JPG 140,961 Unaltered graphic file
945-97.DOC 160,768 Saved with Word 97
945-2K.DOC 161,792 Saved with Word 2000
947.JPG 133,875 Unaltered graphic file
947-97.DOC 153,600 Saved with Word 97
947-2K.DOC 154,624 Saved with Word 2000

Oops again, what gives? Where's the bloat? But wait still, I'm going to confuse you some more. I "modified" the four graphic files (908.JPG, 912.JPG, 945.JPG and 947.JPG somewhat - I'll explain how later) and here's the results I now get:

File Name File Size in Kb Notes
908.JPG 58,701 Altered graphic file
908-97.DOC 77,824 Saved with Word 97
908-2K.DOC 79,360 Saved with Word 2000
912.JPG 108,309 Altered graphic file
912-97.DOC 127,488 Saved with Word 97
912-2K.DOC 128,512 Saved with Word 2000
945.JPG 70,124 Altered graphic file
945-97.DOC 89,600 Saved with Word 97
945-2K.DOC 90,624 Saved with Word 2000
947.JPG 51,789 Altered graphic file
947-97.DOC 71,168 Saved with Word 97
947-2K.DOC 72,192 Saved with Word 2000

Whoa! Not only is the bloat gone from BOTH Word 97 and Word 200, but the files are quite a bit smaller as well.

Here are the "answers" and some observations:

  1. Word 2000 DOES indeed handle graphic-embedded files better than Word 97. In none of the above cases did a Word 2000 file balloon several times larger than the graphic file itself as Word 97 is wont to do.
  2. Word 97 does NOT ALWAYS create bloated files, as 945.JPG and 947.JPG can attest - although it did muck up 908.JPG and 912.JPG pretty horrendously. (Note that the four files are all pretty close in size.) The only difference I can ascertain in these files is that 945.JPG and 947.JPG are both "lighter" (i.e. contain more "white") than 908.JPG and 912.JPG. Go figure.
  3. Okay, now what did I do to the graphic files in the last table? Answer: I removed all the "metadata" embedded in these files. When I need to fiddle with my graphic files, I usually grab for ACD Systems' (https://www.acdsystems.com) ACDSee v3.1. When manipulating files (cropping, reducing, enhancing, or even do-nothing re-saving) ACDSee ends up removing the metadata. Not only does this make the file smaller, but it also seems to remove the code which causes Word 97 to blow up files beyond all reason.

Now, what in the world is "metadata"? Metadata seems to be embedded data which documents how the file was created; in a way, it's similar to ID3 tags in MP3 files. ACDSee allows you to read the metadata when you look at a file's "Properties" (figure 1). I suspect it may be more than that, though, since I can't imagine how removing a few tags can affect the bloat factor so completely. But then I'm hardly an expert in these matters.

Bottom line: Use a program like ACDSee v3.1 to remove the metadata from the graphic files first, and then embed the files into Word.

Copyright © 2001, Thiravudh Khoman