Imported the OLE modules from ruby-msg (GPLv2)

git-svn-id: file:///home/svn/framework3/trunk@4541 4d416f70-5f16-0410-b530-b9f4589650da
unstable
HD Moore 2007-03-20 16:49:34 +00:00
parent 4b97911605
commit 9c3bfaeee4
8 changed files with 1681 additions and 2 deletions

View File

@ -45,7 +45,7 @@ module Exploit::Remote::HttpClient
OptBool.new('HTTP::method_random_case', [false, 'Use random casing for the HTTP method', false]),
OptBool.new('HTTP::uri_dir_self_reference', [false, 'Insert self-referential directories into the uri', false]),
OptBool.new('HTTP::uri_dir_fake_relative', [false, 'Insert fake relative directories into the uri', false]),
OptBool.new('HTTP::uri_use_backslaces', [false, 'Use back slashes instead of forward slashes in the uri ', false]),
OptBool.new('HTTP::uri_use_backslashes', [false, 'Use back slashes instead of forward slashes in the uri ', false]),
OptBool.new('HTTP::pad_fake_headers', [false, 'Insert random, fake headers into the HTTP request', false]),
OptInt.new('HTTP::pad_fake_headers_count', [false, 'How many fake headers to insert into the HTTP request', 0]),
OptBool.new('HTTP::pad_get_params', [false, 'Insert random, fake query string variables into the request', false]),
@ -93,7 +93,7 @@ module Exploit::Remote::HttpClient
'method_random_case' => datastore['HTTP::method_random_case'],
'uri_dir_self_reference' => datastore['HTTP::uri_dir_self_reference'],
'uri_dir_fake_relative' => datastore['HTTP::uri_dir_fake_relative'],
'uri_use_backslaces' => datastore['HTTP::uri_use_backslaces'],
'uri_use_backslashes' => datastore['HTTP::uri_use_backslashes'],
'pad_fake_headers' => datastore['HTTP::pad_fake_headers'],
'pad_fake_headers_count' => datastore['HTTP::pad_fake_headers_count'],
'pad_get_params' => datastore['HTTP::pad_get_params'],

339
lib/ole/LICENSE Normal file
View File

@ -0,0 +1,339 @@
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.

15
lib/ole/SOURCE Normal file
View File

@ -0,0 +1,15 @@
This code was borrowed from the ruby-msg project, licensed under GPLv2:
http://code.google.com/p/ruby-msg/
Checked out from Subversion on March 20th, 2007
URL: http://ruby-msg.googlecode.com/svn/trunk/lib/ole
Repository Root: http://ruby-msg.googlecode.com/svn
Repository UUID: c30d66de-b626-0410-988f-81f6512a6d81
Revision: 70
Node Kind: directory
Schedule: normal
Last Changed Author: aquasync
Last Changed Rev: 70
Last Changed Date: 2007-02-22 01:27:15 -0600 (Thu, 22 Feb 2007)

5
lib/ole/base.rb Normal file
View File

@ -0,0 +1,5 @@
module Ole # :nodoc:
Log = Logger.new_with_callstack
end

170
lib/ole/file_system.rb Normal file
View File

@ -0,0 +1,170 @@
=begin
full file_system module
will be available and recommended usage, allowing Ole::Storage, Dir, and Zip::ZipFile to be
used pretty exchangably down the track. should be possible to write a recursive copy using
the plain api, such that you can copy dirs/files agnostically between any of ole docs, dirs,
and zip files.
i think its okay to have an api like this on top, but there are certain things that ole
does that aren't captured.
ole::storage can have multiple files with the same name, for example, or with / in the
name, and other things that are probably invalid anyway.
i think this should remain an addon, built on top of my core api.
but still the ideas can be reflected in the core, ie, changing the read/write semantics.
once the core changes are complete, this will be a pretty straight forward file to complete.
=end
module Ole
class Storage
def file
@file ||= FileParent.new self
end
def dir
@dir ||= DirParent.new self
end
def dirent_from_path path_str
path = path_str.sub(/^\/*/, '').sub(/\/*$/, '')
dirent = @root
return dirent if path.empty?
path = path.split /\/+/
until path.empty?
raise "invalid path #{path_str.inspect}" if dirent.file?
if tmp = dirent[path.shift]
dirent = tmp
else
# allow write etc later.
raise "invalid path #{path_str.inspect}"
end
end
dirent
end
class FileParent
def initialize ole
@ole = ole
end
def open path_str, mode='r'
dirent = @ole.dirent_from_path path_str
# like Errno::EISDIR
raise "#{path_str.inspect} is a directory" unless dirent.file?
io = dirent.io
if block_given?
yield io
else
io
end
end
alias new :open
def read path
open(path) { |f| f.read }
end
# crappy copy from Dir.
def unlink path
dirent = @ole.dirent_from_path path
# EPERM
raise "operation not permitted #{path.inspect}" unless dirent.file?
# i think we should free all of our blocks. i think the best way to do that would be
# like:
# open(path) { |f| f.truncate 0 }. which should free all our blocks from the
# allocation table. then if we remove ourself from our parent, we won't be part of
# the bat at save time.
# i think if you run repack, all free blocks should get zeroed.
parent = @ole.dirent_from_path(('/' + path).sub(/\/[^\/]+$/, ''))
parent.children.delete dirent
1 # hmmm. as per ::File ?
end
end
class DirParent
def initialize ole
@ole = ole
end
def open path_str
dirent = @ole.dirent_from_path path_str
# like Errno::ENOTDIR
raise "#{path_str.inspect} is not a directory" unless dirent.dir?
dir = Dir.new dirent, path_str
if block_given?
yield dir
else
dir
end
end
# certain Dir class methods proxy in this fashion:
def entries path
open(path) { |dir| dir.entries }
end
# there are some other important ones, like:
# chroot (!), mkdir, chdir, rmdir, glob etc etc. for now, i think
# mkdir, and rmdir are the main ones we'd need to support
def rmdir path
dirent = @ole.dirent_from_path path
p dirent
# repeating myself
raise "#{path.inspect} is not a directory" unless dirent.dir?
# ENOTEMPTY:
raise "directory not empty #{path.inspect}" unless dirent.children.empty?
# now delete it, how to do that? the canonical representation that is
# maintained is the root tree, and the children array. we must remove it
# from the children array.
# we need the parent then. this sucks but anyway:
parent = @ole.dirent_from_path path.sub(/\/[^\/]+$/, '') || '/'
# note that the way this currently works, on save and repack time this will get
# reflected. to work properly, ie to make a difference now it would have to re-write
# the dirent. i think that Ole::Storage#close will handle that. and maybe include a
# #repack.
parent.children.delete dirent
0 # hmmm. as per ::Dir ?
end
class Dir
include Enumerable
attr_reader :dirent, :path, :entries, :pos
def initialize dirent, path
@dirent, @path = dirent, path
@pos = 0
# FIXME: hack, and probably not really desired
@entries = %w[. ..] + @dirent.children.map(&:name)
end
def each(&block)
@entries.each(&block)
end
def close
end
def read
@entries[@pos]
ensure
@pos += 1 if @pos < @entries.length
end
def pos= pos
@pos = [[0, pos].max, @entries.length].min
end
def rewind
@pos = 0
end
alias tell :pos
alias seek :pos=
end
end
end
end

186
lib/ole/io_helpers.rb Normal file
View File

@ -0,0 +1,186 @@
# move to support?
class IO
def self.copy src, dst
until src.eof?
buf = src.read(4096)
dst.write buf
end
end
end
#
# = Introduction
#
# +RangesIO+ is a basic class for wrapping another IO object allowing you to arbitrarily reorder
# slices of the input file by providing a list of ranges. Intended as an initial measure to curb
# inefficiencies in the Dirent#data method just reading all of a file's data in one hit, with
# no method to stream it.
#
# This class will encapuslate the ranges (corresponding to big or small blocks) of any ole file
# and thus allow reading/writing directly to the source bytes, in a streamed fashion (so just
# getting 16 bytes doesn't read the whole thing).
#
# In the simplest case it can be used with a single range to provide a limited io to a section of
# a file.
#
# = Limitations
#
# * No buffering. by design at the moment. Intended for large reads
#
# = TODO
#
# On further reflection, this class is something of a joining/optimization of
# two separate IO classes. a SubfileIO, for providing access to a range within
# a File as a separate IO object, and a ConcatIO, allowing the presentation of
# a bunch of io objects as a single unified whole.
#
# I will need such a ConcatIO if I'm to provide Mime#to_io, a method that will
# convert a whole mime message into an IO stream, that can be read from.
# It will just be the concatenation of a series of IO objects, corresponding to
# headers and boundaries, as StringIO's, and SubfileIO objects, coming from the
# original message proper, or RangesIO as provided by the Attachment#data, that
# will then get wrapped by Mime in a Base64IO or similar, to get encoded on-the-
# fly. Thus the attachment, in its plain or encoded form, and the message as a
# whole never exists as a single string in memory, as it does now. This is a
# fair bit of work to achieve, but generally useful I believe.
#
# This class isn't ole specific, maybe move it to my general ruby stream project.
#
class RangesIO
attr_reader :io, :ranges, :size, :pos
# +io+ is the parent io object that we are wrapping.
#
# +ranges+ are byte offsets, either
# 1. an array of ranges [1..2, 4..5, 6..8] or
# 2. an array of arrays, where the second is length [[1, 1], [4, 1], [6, 2]] for the above
# (think the way String indexing works)
# The +ranges+ provide sequential slices of the file that will be read. they can overlap.
def initialize io, ranges, opts={}
@opts = {:close_parent => false}.merge opts
@io = io
# convert ranges to arrays. check for negative ranges?
@ranges = ranges.map { |r| Range === r ? [r.begin, r.end - r.begin] : r }
# calculate size
@size = @ranges.inject(0) { |total, (pos, len)| total + len }
# initial position in the file
@pos = 0
end
def pos= pos, whence=IO::SEEK_SET
# FIXME support other whence values
raise NotImplementedError, "#{whence.inspect} not supported" unless whence == IO::SEEK_SET
# just a simple pos calculation. invalidate buffers if we had them
@pos = pos
end
alias seek :pos=
alias tell :pos
def close
@io.close if @opts[:close_parent]
end
def range_and_offset pos
off = nil
r = ranges.inject(0) do |total, r|
to = total + r[1]
if pos <= to
off = pos - total
break r
end
to
end
# should be impossible for any valid pos, (0...size) === pos
raise "unable to find range for pos #{pos.inspect}" unless off
[r, off]
end
def eof?
@pos == @size
end
# read bytes from file, to a maximum of +limit+, or all available if unspecified.
def read limit=nil
data = ''
limit ||= size
# special case eof
return data if eof?
r, off = range_and_offset @pos
i = ranges.index r
# this may be conceptually nice (create sub-range starting where we are), but
# for a large range array its pretty wasteful. even the previous way was. but
# i'm not trying to optimize this atm. it may even go to c later if necessary.
([[r[0] + off, r[1] - off]] + ranges[i+1..-1]).each do |pos, len|
@io.seek pos
if limit < len
# FIXME this += isn't correct if there is a read error
# or something.
@pos += limit
break data << @io.read(limit)
end
# this can also stuff up. if the ranges are beyond the size of the file, we can get
# nil here.
data << @io.read(len)
@pos += len
limit -= len
end
data
end
# you may override this call to update @ranges and @size, if applicable. then write
# support can grow below
def truncate size
raise NotImplementedError, 'truncate not supported'
end
# why not? :)
alias size= :truncate
def write data
# short cut. needed because truncate 0 may return no ranges, instead of empty range,
# thus range_and_offset fails.
return 0 if data.empty?
data_pos = 0
# if we don't have room, we can use the truncate hook to make more space.
if data.length > @size - @pos
begin
truncate @pos + data.length
rescue NotImplementedError
# FIXME maybe warn instead, then just truncate the data?
raise "unable to satisfy write of #{data.length} bytes"
end
end
r, off = range_and_offset @pos
i = ranges.index r
([[r[0] + off, r[1] - off]] + ranges[i+1..-1]).each do |pos, len|
@io.seek pos
if data_pos + len > data.length
chunk = data[data_pos..-1]
@io.write chunk
@pos += chunk.length
data_pos = data.length
break
end
@io.write data[data_pos, len]
@pos += len
data_pos += len
end
data_pos
end
# this will be generalised to a module later
def each_read blocksize=4096
yield read(blocksize) until eof?
end
# write should look fairly similar to the above.
def inspect
# the rescue is for empty files
pos, len = *(range_and_offset(@pos)[0] rescue [nil, nil])
range_str = pos ? "#{pos}..#{pos+len}" : 'nil'
"#<#{self.class} io=#{io.inspect} size=#@size pos=#@pos "\
"current_range=#{range_str}>"
end
end

936
lib/ole/storage.rb Executable file
View File

@ -0,0 +1,936 @@
#! /usr/bin/ruby -w
$: << File.dirname(__FILE__) + '/..'
require 'iconv'
require 'date'
require 'support'
require 'stringio'
require 'tempfile'
require 'ole/base'
require 'ole/types'
# not strictly ole related
require 'ole/io_helpers'
module Ole # :nodoc:
#
# = Introduction
#
# <tt>Ole::Storage</tt> is a simple class intended to abstract away details of the
# access to OLE2 structured storage files, such as those produced by
# Microsoft Office, eg *.doc, *.msg etc.
#
# Initially based on chicago's libole, source available at
# http://prdownloads.sf.net/chicago/ole.tgz
# Later augmented with some corrections by inspecting pole, and (purely
# for header definitions) gsf.
#
# = Usage
#
# Usage should be fairly straight forward:
#
# # get the parent ole storage object
# ole = Ole::Storage.open 'myfile.msg', 'r+'
# # => #<Ole::Storage io=#<File:myfile.msg> root=#<Dirent:"Root Entry">>
# # read some data
# ole.root[1].read 4
# # => "\001\000\376\377"
# # get the top level root object and output a tree structure for
# # debugging
# puts ole.root.to_tree
# # =>
# - #<Dirent:"Root Entry" size=3840 time="2006-11-03T00:52:53Z">
# |- #<Dirent:"__nameid_version1.0" size=0 time="2006-11-03T00:52:53Z">
# | |- #<Dirent:"__substg1.0_00020102" size=16 data="CCAGAAAAAADAAA...">
# ...
# |- #<Dirent:"__substg1.0_8002001E" size=4 data="MTEuMA==">
# |- #<Dirent:"__properties_version1.0" size=800 data="AAAAAAAAAAABAA...">
# \- #<Dirent:"__recip_version1.0_#00000000" size=0 time="2006-11-03T00:52:53Z">
# |- #<Dirent:"__substg1.0_0FF60102" size=4 data="AAAAAA==">
# ...
# # write some data, and finish up (note that open is 'r+', so this overwrites
# # but doesn't truncate)
# ole.root["\001CompObj"].open { |f| f.write "blah blah" }
# ole.close
#
# = TODO
#
# 1. tests. lock down how things work at the moment - mostly good.
# create from scratch works now, as does copying in a subtree of another doc, so
# ole embedded attachment serialization works now. i can save embedded xls in an msg
# into a separate file, and open it. this was a goal. now i would want to implemenet
# to_mime conversion for embedded attachments, that serializes them to ole, but handles
# some separately like various meta file types as plain .wmf attachments perhaps. this
# will give pretty good .eml's from emails with embedded attachments.
# the other todo is .rtf output, with full support for embedded ole objects...
# 2. lots of tidying up
# - main FIXME's in this regard are:
# * the custom header cruft for Header and Dirent needs some love.
# * i have a number of classes doing load/save combos: Header, AllocationTable, Dirent,
# and, in a manner of speaking, but arguably different, Storage itself.
# they have differing api's which would be nice to clean.
# AllocationTable::Big must be created aot now, as it is used for all subsequent reads.
# * ole types need work, can't serialize datetime at the moment.
# 3. need to fix META_BAT support in #flush.
#
class Storage
VERSION = '1.1.1'
# The top of the ole tree structure
attr_reader :root
# The tree structure in its original flattened form. only valid after #load, or #flush.
attr_reader :dirents
# The underlying io object to/from which the ole object is serialized, whether we
# should close it, and whether it is writeable
attr_reader :io, :close_parent, :writeable
# Low level internals, you probably shouldn't need to mess with these
attr_reader :header, :bbat, :sbat, :sb_file
# maybe include an option hash, and allow :close_parent => true, to be more general.
# +arg+ should be either a file, or an +IO+ object, and needs to be seekable.
def initialize arg, mode=nil
# get the io object
@close_parent, @io = if String === arg
[true, open(arg, mode || 'rb')]
else
raise 'unable to specify mode string with io object' if mode
[false, arg]
end
# do we have this file opened for writing? don't know of a better way to tell
@writeable = begin
@io.flush
true
rescue IOError
false
end
# silence undefined warning in clear
@sb_file = nil
# if the io object has data, we should load it, otherwise start afresh
if @io.size > 0; load
else clear
end
end
def self.new arg, mode=nil
ole = super
if block_given?
begin yield ole
ensure; ole.close
end
else ole
end
end
class << self
# encouraged
alias open :new
# deprecated
alias load :new
end
# load document from file.
def load
# we always read 512 for the header block. if the block size ends up being different,
# what happens to the 109 fat entries. are there more/less entries?
@io.rewind
header_block = @io.read 512
@header = Header.load header_block
# create an empty bbat
@bbat = AllocationTable::Big.new self
# extra mbat blocks
mbat_blocks = (0...@header.num_mbat).map { |i| i + @header.mbat_start }
bbat_chain = (header_block[Header::SIZE..-1] + @bbat.read(mbat_blocks)).unpack 'L*'
# am i using num_bat in the right way?
@bbat.load @bbat.read(bbat_chain[0, @header.num_bat])
# get block chain for directories, read it, then split it into chunks and load the
# directory entries. semantics changed - used to cut at first dir where dir.type == 0
@dirents = @bbat.read(@header.dirent_start).scan(/.{#{Dirent::SIZE}}/mo).
map { |str| Dirent.load self, str }.reject { |d| d.type_id == 0 }
# now reorder from flat into a tree
# links are stored in some kind of balanced binary tree
# check that everything is visited at least, and at most once
# similarly with the blocks of the file.
# was thinking of moving this to Dirent.to_tree instead.
class << @dirents
def to_tree idx=0
return [] if idx == Dirent::EOT
d = self[idx]
d.children = to_tree d.child
raise "directory #{d.inspect} used twice" if d.idx
d.idx = idx
to_tree(d.prev) + [d] + to_tree(d.next)
end
end
@root = @dirents.to_tree.first
Log.warn "root name was #{@root.name.inspect}" unless @root.name == 'Root Entry'
unused = @dirents.reject(&:idx).length
Log.warn "* #{unused} unused directories" if unused > 0
# FIXME i don't currently use @header.num_sbat which i should
# hmm. nor do i write it. it means what exactly again?
@sb_file = RangesIOResizeable.new @bbat, @root.first_block, @root.size
@sbat = AllocationTable::Small.new self
@sbat.load @bbat.read(@header.sbat_start)
end
def close
flush if @writeable
@sb_file.close
@io.close if @close_parent
end
# should have a #open_dirent i think. and use it in load and flush. neater.
# also was thinking about Dirent#open_padding. then i can more easily clean up the padding
# to be 0.chr
=begin
thoughts on fixes:
1. reterminate any chain not ending in EOC.
2. pass through all chain heads looking for collisions, and making sure nothing points to them
(ie they are really heads).
3. we know the locations of the bbat data, and mbat data. ensure that there are placeholder blocks
in the bat for them.
this stuff will ensure reliability of input better. otherwise, its actually worth doing a repack
directly after read, to ensure the above is probably acounted for, before subsequent writes possibly
destroy things.
=end
def flush
# recreate dirs from our tree, split into dirs and big and small files
@root.type = :root
# for now.
@root.name = 'Root Entry'
@root.first_block = @sb_file.first_block
@root.size = @sb_file.size
@dirents = @root.flatten
#dirs, files = @dirents.partition(&:dir?)
#big_files, small_files = files.partition { |file| file.size > @header.threshold }
# maybe i should move the block form up to RangesIO, and get it for free at all levels.
# Dirent#open gets block form for free then
io = RangesIOResizeable.new @bbat, @header.dirent_start
io.truncate 0
@dirents.each { |dirent| io.write dirent.save }
padding = (io.size / @bbat.block_size.to_f).ceil * @bbat.block_size - io.size
#p [:padding, padding]
io.write 0.chr * padding
@header.dirent_start = io.first_block
io.close
# similarly for the sbat data.
io = RangesIOResizeable.new @bbat, @header.sbat_start
io.truncate 0
io.write @sbat.save
@header.sbat_start = io.first_block
@header.num_sbat = @bbat.chain(@header.sbat_start).length
io.close
# what follows will be slightly more complex for the bat fiddling.
# create RangesIOResizeable hooked up to the bbat. use that to claim bbat blocks using
# truncate. then when its time to write, convert that chain and some chunk of blocks at
# the end, into META_BAT blocks. write out the chain, and those meta bat blocks, and its
# done.
@bbat.table.map! do |b|
b == AllocationTable::BAT || b == AllocationTable::META_BAT ?
AllocationTable::AVAIL : b
end
io = RangesIOResizeable.new @bbat, AllocationTable::EOC
# use crappy loop for now:
while true
bbat_data = @bbat.save
#mbat_data = bbat_data.length / @bbat.block_size * 4
mbat_chain = @bbat.chain io.first_block
raise NotImplementedError, "don't handle writing out extra META_BAT blocks yet" if mbat_chain.length > 109
# so we can ignore meta blocks in this calculation:
break if io.size >= bbat_data.length # it shouldn't be bigger right?
# this may grow the bbat, depending on existing available blocks
io.truncate bbat_data.length
end
# now extract the info we want:
ranges = io.ranges
mbat_chain = @bbat.chain io.first_block
io.close
mbat_chain.each { |b| @bbat.table[b] = AllocationTable::BAT }
@header.num_bat = mbat_chain.length
#p @bbat.truncated_table
#p ranges
#p mbat_chain
# not resizeable!
io = RangesIO.new @io, ranges
io.write @bbat.save
io.close
mbat_chain += [AllocationTable::AVAIL] * (109 - mbat_chain.length)
@header.mbat_start = AllocationTable::EOC
@header.num_mbat = 0
=begin
bbat_data = new_bbat.save
# must exist as linear chain stored in header.
@header.num_bat = (bbat_data.length / new_bbat.block_size.to_f).ceil
base = io.pos / new_bbat.block_size - 1
io.write bbat_data
# now that spanned a number of blocks:
mbat = (0...@header.num_bat).map { |i| i + base }
mbat += [AllocationTable::AVAIL] * (109 - mbat.length) if mbat.length < 109
header_mbat = mbat[0...109]
other_mbat_data = mbat[109..-1].pack 'L*'
@header.mbat_start = base + @header.num_bat
@header.num_mbat = (other_mbat_data.length / new_bbat.block_size.to_f).ceil
io.write other_mbat_data
=end
@root.type = :dir
# now seek back and write the header out
@io.seek 0
@io.write @header.save + mbat_chain.pack('L*')
@io.flush
end
def clear
# first step though is to support modifying pre-existing and saving, then this
# missing gap will be fairly straight forward - essentially initialize to
# equivalent of loading an empty ole document.
#raise NotImplementedError, 'unable to create new ole objects from scratch as yet'
Log.warn 'creating new ole storage object on non-writable io' unless @writeable
@header = Header.new
@bbat = AllocationTable::Big.new self
@root = Dirent.new self, :dir
@root.name = 'Root Entry'
@dirents = [@root]
@root.idx = 0
@root.children = []
# size shouldn't display for non-files
@root.size = 0
@sb_file.close if @sb_file
@sb_file = RangesIOResizeable.new @bbat, AllocationTable::EOC
@sbat = AllocationTable::Small.new self
# throw everything else the hell away
@io.truncate 0
end
# could be useful with mis-behaving ole documents. or to just clean them up.
def repack temp=:file
case temp
when :file; Tempfile.open 'w+', &method(:repack_using_io)
when :mem; StringIO.open(&method(:repack_using_io))
else raise "unknown temp backing #{temp.inspect}"
end
end
def repack_using_io temp_io
@io.rewind
IO.copy @io, temp_io
clear
Storage.open temp_io do |temp_ole|
temp_ole.root.type = :dir
Dirent.copy temp_ole.root, root
end
end
def bat_for_size size
# note >=, not > previously.
size >= @header.threshold ? @bbat : @sbat
end
def inspect
"#<#{self.class} io=#{@io.inspect} root=#{@root.inspect}>"
end
# A class which wraps the ole header
class Header < Struct.new(
:magic, :clsid, :minor_ver, :major_ver, :byte_order, :b_shift, :s_shift,
:reserved, :csectdir, :num_bat, :dirent_start, :transacting_signature, :threshold,
:sbat_start, :num_sbat, :mbat_start, :num_mbat
)
PACK = 'a8 a16 S2 a2 S2 a6 L3 a4 L5'
SIZE = 0x4c
# i have seen it pointed out that the first 4 bytes of hex,
# 0xd0cf11e0, is supposed to spell out docfile. hmmm :)
MAGIC = "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1" # expected value of Header#magic
# what you get if creating new header from scratch.
# AllocationTable::EOC isn't available yet. meh.
EOC = 0xfffffffe
DEFAULT = [
MAGIC, 0.chr * 16, 59, 3, "\xfe\xff", 9, 6,
0.chr * 6, 0, 1, EOC, 0.chr * 4,
4096, EOC, 0, EOC, 0
]
# 2 basic initializations, from scratch, or from a data string.
# from scratch will be geared towards creating a new ole object
def initialize *values
super(*(values.empty? ? DEFAULT : values))
validate!
end
def self.load str
Header.new(*str.unpack(PACK))
end
def save
to_a.pack PACK
end
def validate!
raise "OLE2 signature is invalid" unless magic == MAGIC
if num_bat == 0 or # is that valid for a completely empty file?
# not sure about this one. basically to do max possible bat given size of mbat
num_bat > 109 && num_bat > 109 + num_mbat * (1 << b_shift - 2) or
# shouldn't need to use the mbat as there is enough space in the header block
num_bat < 109 && num_mbat != 0 or
# given the size of the header is 76, if b_shift <= 6, blocks address the header.
s_shift > b_shift or b_shift <= 6 or b_shift >= 31 or
# we only handle little endian
byte_order != "\xfe\xff"
raise "not valid OLE2 structured storage file"
end
# relaxed this, due to test-msg/qwerty_[1-3]*.msg they all had
# 3 for this value.
# transacting_signature != "\x00" * 4 or
if threshold != 4096 or
num_mbat == 0 && mbat_start != AllocationTable::EOC or
reserved != "\x00" * 6
Log.warn "may not be a valid OLE2 structured storage file"
end
true
end
end
#
# +AllocationTable+'s hold the chains corresponding to files. Given
# an initial index, <tt>AllocationTable#chain</tt> follows the chain, returning
# the blocks that make up that file.
#
# There are 2 allocation tables, the bbat, and sbat, for big and small
# blocks respectively. The block chain should be loaded using either
# <tt>Storage#read_big_blocks</tt> or <tt>Storage#read_small_blocks</tt>
# as appropriate.
#
# Whether or not big or small blocks are used for a file depends on
# whether its size is over the <tt>Header#threshold</tt> level.
#
# An <tt>Ole::Storage</tt> document is serialized as a series of directory objects,
# which are stored in blocks throughout the file. The blocks are either
# big or small, and are accessed using the <tt>AllocationTable</tt>.
#
# The bbat allocation table's data is stored in the spare room in the header
# block, and in extra blocks throughout the file as referenced by the meta
# bat. That chain is linear, as there is no higher level table.
#
class AllocationTable
# a free block (I don't currently leave any blocks free), although I do pad out
# the allocation table with AVAIL to the block size.
AVAIL = 0xffffffff
EOC = 0xfffffffe # end of a chain
# these blocks correspond to the bat, and aren't part of a file, nor available.
# (I don't currently output these)
BAT = 0xfffffffd
META_BAT = 0xfffffffc
attr_reader :ole, :io, :table, :block_size
def initialize ole
@ole = ole
@table = []
end
def load data
@table = data.unpack('L*')
end
def truncated_table
# this strips trailing AVAILs. come to think of it, this has the potential to break
# bogus ole. if you terminate using AVAIL instead of EOC, like I did before. but that is
# very broken. however, if a chain ends with AVAIL, it should probably be fixed to EOC
# at load time.
temp = @table.reverse
not_avail = temp.find { |b| b != AVAIL } and temp = temp[temp.index(not_avail)..-1]
temp.reverse
end
def save
table = truncated_table #@table
# pad it out some
num = @ole.bbat.block_size / 4
# do you really use AVAIL? they probably extend past end of file, and may shortly
# be used for the bat. not really good.
table += [AVAIL] * (num - (table.length % num)) if (table.length % num) != 0
table.pack 'L*'
end
# rewriting this to be non-recursive. it broke on a large attachment
# building up the chain, causing a stack error. need tail-call elimination...
def chain start
a = []
idx = start
until idx >= META_BAT
raise "broken allocationtable chain" if idx < 0 || idx > @table.length
a << idx
idx = @table[idx]
end
Log.warn "invalid chain terminator #{idx}" unless idx == EOC
a
end
def ranges chain, size=nil
chain = self.chain(chain) unless Array === chain
blocks_to_ranges chain, size
end
# Turn a chain (an array given by +chain+) of big blocks, optionally
# truncated to +size+, into an array of arrays describing the stretches of
# bytes in the file that it belongs to.
#
# Big blocks are of size Ole::Storage::Header#b_size, and are stored
# directly in the parent file.
# truncate the chain if required
# convert chain to ranges of the block size
# truncate final range if required
def blocks_to_ranges chain, size=nil
chain = chain[0...(size.to_f / block_size).ceil] if size
ranges = chain.map { |i| [block_size * i, block_size] }
ranges.last[1] -= (ranges.length * block_size - size) if ranges.last and size
ranges
end
# quick shortcut. chain can be either a head (in which case the table is used to
# turn it into a chain), or a chain. it is converted to ranges, then to rangesio.
# its not resizeable or migrateable. it probably could be resizeable though, using
# self as the bat. but what would the first_block be?
def open chain, size=nil
io = RangesIO.new @io, ranges(chain, size)
if block_given?
begin yield io
ensure; io.close
end
else io
end
end
def read chain, size=nil
open chain, size, &:read
end
# ----------------------
def get_free_block
@table.each_index { |i| return i if @table[i] == AVAIL }
@table.push AVAIL
@table.length - 1
end
# must return first_block
def resize_chain first_block, size
new_num_blocks = (size / block_size.to_f).ceil
blocks = chain first_block
old_num_blocks = blocks.length
if new_num_blocks < old_num_blocks
# de-allocate some of our old blocks. TODO maybe zero them out in the file???
(new_num_blocks...old_num_blocks).each { |i| @table[blocks[i]] = AVAIL }
# if we have a chain, terminate it and return head, otherwise return EOC
if new_num_blocks > 0
@table[blocks[new_num_blocks-1]] = EOC
first_block
else EOC
end
elsif new_num_blocks > old_num_blocks
# need some more blocks.
last_block = blocks.last
(new_num_blocks - old_num_blocks).times do
block = get_free_block
# connect the chain. handle corner case of blocks being [] initially
if last_block
@table[last_block] = block
else
first_block = block
end
last_block = block
# this is just to inhibit the problem where it gets picked as being a free block
# again next time around.
@table[last_block] = EOC
end
first_block
else first_block
end
end
class Big < AllocationTable
def initialize(*args)
super
@block_size = 1 << @ole.header.b_shift
@io = @ole.io
end
# Big blocks are kind of -1 based, in order to not clash with the header.
def blocks_to_ranges blocks, size
super blocks.map { |b| b + 1 }, size
end
end
class Small < AllocationTable
def initialize(*args)
super
@block_size = 1 << @ole.header.s_shift
@io = @ole.sb_file
end
end
end
# like normal RangesIO, but Ole::Storage specific. the ranges are backed by an
# AllocationTable, and can be resized. used for read/write to 2 streams:
# 1. serialized dirent data
# 2. sbat table data
# 3. all dirents but through RangesIOMigrateable below
#
# Note that all internal access to first_block is through accessors, as it is sometimes
# useful to redirect it.
class RangesIOResizeable < RangesIO
attr_reader :bat
attr_accessor :first_block
def initialize bat, first_block, size=nil
@bat = bat
self.first_block = first_block
super @bat.io, @bat.ranges(first_block, size)
end
def truncate size
# note that old_blocks is != @ranges.length necessarily. i'm planning to write a
# merge_ranges function that merges sequential ranges into one as an optimization.
self.first_block = @bat.resize_chain first_block, size
@ranges = @bat.ranges first_block, size
@pos = @size if @pos > size
# don't know if this is required, but we explicitly request our @io to grow if necessary
# we never shrink it though. maybe this belongs in allocationtable, where smarter decisions
# can be made.
# maybe its ok to just seek out there later??
max = @ranges.map { |pos, len| pos + len }.max || 0
@io.truncate max if max > @io.size
@size = size
end
end
# like RangesIOResizeable, but Ole::Storage::Dirent specific. provides for migration
# between bats based on size, and updating the dirent, instead of the ole copy back
# on close.
class RangesIOMigrateable < RangesIOResizeable
attr_reader :dirent
def initialize dirent
@dirent = dirent
super @dirent.ole.bat_for_size(@dirent.size), @dirent.first_block, @dirent.size
end
def truncate size
bat = @dirent.ole.bat_for_size size
if bat != @bat
# bat migration needed! we need to backup some data. the amount of data
# should be <= @ole.header.threshold, so we can just hold it all in one buffer.
# backup this
pos = @pos
@pos = 0
keep = read [@size, size].min
# this does a normal truncate to 0, removing our presence from the old bat, and
# rewrite the dirent's first_block
super 0
@bat = bat
# just change the underlying io from right under everyone :)
@io = bat.io
# important to do this now, before the write. as the below write will always
# migrate us back to sbat! this will now allocate us +size+ in the new bat.
super
@pos = 0
write keep
@pos = pos
else
super
end
# now just update the file
@dirent.size = size
end
# forward this to the dirent
def first_block
@dirent.first_block
end
def first_block= val
@dirent.first_block = val
end
end
#
# A class which wraps an ole directory entry. Can be either a directory
# (<tt>Dirent#dir?</tt>) or a file (<tt>Dirent#file?</tt>)
#
# Most interaction with <tt>Ole::Storage</tt> is through this class.
# The 2 most important functions are <tt>Dirent#children</tt>, and
# <tt>Dirent#data</tt>.
#
# was considering separate classes for dirs and files. some methods/attrs only
# applicable to one or the other.
class Dirent
MEMBERS = [
:name_utf16, :name_len, :type_id, :colour, :prev, :next, :child,
:clsid, :flags, # dirs only
:create_time_str, :modify_time_str, # files only
:first_block, :size, :reserved
]
PACK = 'a64 S C C L3 a16 L a8 a8 L2 a4'
SIZE = 128
EPOCH = DateTime.parse '1601-01-01'
TYPE_MAP = {
# this is temporary
0 => :empty,
1 => :dir,
2 => :file,
5 => :root
}
COLOUR_MAP = {
0 => :red,
1 => :black
}
# used in the next / prev / child stuff to show that the tree ends here.
# also used for first_block for directory.
EOT = 0xffffffff
# All +Dirent+ names are in UTF16, which we convert
FROM_UTF16 = Iconv.new 'utf-8', 'utf-16le'
TO_UTF16 = Iconv.new 'utf-16le', 'utf-8'
include Enumerable
# Dirent's should be created in 1 of 2 ways, either Dirent.new ole, [:dir/:file/:root],
# or Dirent.load '... dirent data ...'
# its a bit clunky, but thats how it is at the moment. you can assign to type, but
# shouldn't.
attr_accessor :idx
# This returns all the children of this +Dirent+. It is filled in
# when the tree structure is recreated.
attr_accessor :children
attr_reader :ole, :type, :create_time, :modify_time, :name
def initialize ole, type
@ole = ole
# this isn't really good enough. need default values put in there.
@values = [
0.chr * 2, 2, 0, # will get overwritten
1, EOT, EOT, EOT,
0.chr * 16, 0, nil, nil,
AllocationTable::EOC, 0, 0.chr * 4]
# maybe check types here.
@type = type
@create_time = @modify_time = nil
@children = []
if file?
@create_time = Time.now
@modify_time = Time.now
end
end
def self.load ole, str
# load should function without the need for the initializer.
dirent = Dirent.allocate
dirent.load ole, str
dirent
end
def load ole, str
@ole = ole
@values = str.unpack PACK
@name = FROM_UTF16.iconv name_utf16[0...name_len].sub(/\x00\x00$/, '')
@type = TYPE_MAP[type_id] or raise "unknown type #{type_id.inspect}"
if file?
@create_time = Types.load_time create_time_str
@modify_time = Types.load_time modify_time_str
end
end
# only defined for files really. and the above children stuff is only for children.
# maybe i should have some sort of File and Dir class, that subclass Dirents? a dirent
# is just a data holder.
# this can be used for write support if the underlying io object was opened for writing.
# maybe take a mode string argument, and do truncation, append etc stuff.
def open
return nil unless file?
io = RangesIOMigrateable.new self
if block_given?
begin yield io
ensure; io.close
end
else io
end
end
def read limit=nil
open { |io| io.read limit }
end
def dir?
# to count root as a dir.
type != :file
end
def file?
type == :file
end
def time
# time is nil for streams, otherwise try to parse either of the time pairse (not
# sure of their meaning - created / modified?)
#@time ||= file? ? nil : (Dirent.parse_time(secs1, days1) || Dirent.parse_time(secs2, days2))
create_time || modify_time
end
def each(&block)
@children.each(&block)
end
def [] idx
return children[idx] if Integer === idx
# path style look up.
# maybe take another arg to allow creation? or leave that to the filesystem
# add on.
# not sure if '/' is a valid char in an Dirent#name, so no splitting etc at
# this level.
# also what about warning about multiple hits for the same name?
children.find { |child| idx === child.name }
end
# solution for the above '/' thing for now.
def / path
self[path]
end
def to_tree
if children and !children.empty?
str = "- #{inspect}\n"
children.each_with_index do |child, i|
last = i == children.length - 1
child.to_tree.split(/\n/).each_with_index do |line, j|
str << " #{last ? (j == 0 ? "\\" : ' ') : '|'}#{line}\n"
end
end
str
else "- #{inspect}\n"
end
end
MEMBERS.each_with_index do |sym, i|
define_method(sym) { @values[i] }
define_method(sym.to_s + '=') { |val| @values[i] = val }
end
def to_a
@values
end
# flattens the tree starting from here into +dirents+. note it modifies its argument.
def flatten dirents=[]
@idx = dirents.length
dirents << self
children.each { |child| child.flatten dirents }
self.child = Dirent.flatten_helper children
dirents
end
# i think making the tree structure optimized is actually more complex than this, and
# requires some intelligent ordering of the children based on names, but as long as
# it is valid its ok.
# actually, i think its ok. gsf for example only outputs a singly-linked-list, where
# prev is always EOT.
def self.flatten_helper children
return EOT if children.empty?
i = children.length / 2
this = children[i]
this.prev, this.next = [(0...i), (i+1..-1)].map { |r| flatten_helper children[r] }
this.idx
end
attr_accessor :name, :type
def save
tmp = TO_UTF16.iconv(name)
tmp = tmp[0, 62] if tmp.length > 62
tmp += 0.chr * 2
self.name_len = tmp.length
self.name_utf16 = tmp + 0.chr * (64 - tmp.length)
begin
self.type_id = TYPE_MAP.to_a.find { |id, name| @type == name }.first
rescue
raise "unknown type #{type.inspect}"
end
# for the case of files, it is assumed that that was handled already
# note not dir?, so as not to override root's first_block
self.first_block = Dirent::EOT if type == :dir
if 0 #file?
#self.create_time_str = ?? #Types.load_time create_time_str
#self.modify_time_str = ?? #Types.load_time modify_time_str
else
self.create_time_str = 0.chr * 8
self.modify_time_str = 0.chr * 8
end
@values.pack PACK
end
def inspect
str = "#<Dirent:#{name.inspect}"
# perhaps i should remove the data snippet. its not that useful anymore.
if file?
tmp = read 9
data = tmp.length == 9 ? tmp[0, 5] + '...' : tmp
str << " size=#{size}" +
"#{time ? ' time=' + time.to_s.inspect : nil}" +
" data=#{data.inspect}"
else
# there is some dir specific stuff. like clsid, flags.
end
str + '>'
end
# --------
# and for creation of a dirent. don't like the name. is it a file or a directory?
# assign to type later? io will be empty.
def new_child type
child = Dirent.new ole, type
children << child
yield child if block_given?
child
end
def delete child
# remove from our child array, so that on reflatten and re-creation of @dirents, it will be gone
raise "#{child.inspect} not a child of #{self.inspect}" unless @children.delete child
# free our blocks
child.open { |io| io.truncate 0 }
end
def self.copy src, dst
# copies the contents of src to dst. must be the same type. this will throw an
# error on copying to root. maybe this will recurse too much for big documents??
raise unless src.type == dst.type
dst.name = src.name
if src.dir?
src.children.each do |src_child|
dst.new_child(src_child.type) { |dst_child| Dirent.copy src_child, dst_child }
end
else
src.open do |src_io|
dst.open { |dst_io| IO.copy src_io, dst_io }
end
end
end
end
end
end
if $0 == __FILE__
puts Ole::Storage.open(ARGV[0]) { |ole| ole.root.to_tree }
end

28
lib/ole/types.rb Normal file
View File

@ -0,0 +1,28 @@
require 'ole/base'
module Ole # :nodoc:
# FIXME
module Types
# Parse two 32 bit time values into a DateTime
# Time is stored as a high and low 32 bit value, comprising the
# 100's of nanoseconds since 1st january 1601 (Epoch).
# struct FILETIME. see eg http://msdn2.microsoft.com/en-us/library/ms724284.aspx
def self.load_time str
low, high = str.unpack 'L2'
time = EPOCH + (high * (1 << 32) + low) * 1e-7 / 86400 rescue return
# extra sanity check...
unless (1800...2100) === time.year
Log.warn "ignoring unlikely time value #{time.to_s}"
return nil
end
time
end
# turn a binary guid into something displayable.
# this will probably become a proper class later
def self.load_guid str
"{%08x-%04x-%04x-%02x%02x-#{'%02x' * 6}}" % str.unpack('L S S CC C6')
end
end
end