Storing Word Documents in Oracle
James Koopmann, jkoopmann@qwest.net

http://www.dbasupport.com/oracle/ora9i/storing_word_docs.shtml




Have you ever wondered about storing documents into your Oracle database and just didn't know where to start? Here is a quick introduction to the basics you need to know.



Manipulating Oracle Files with UTL_FILE showed you how to read the alert log and do some manipulation on the file while it was external to the database. You should review this article as it contains some background information you will need to know, along with some explanation of some of the procedures in this code that I will not go into here. The next logical extension to the last article is the manipulation of external files, such as documents, and the storage in the database. This article will take you through a brief overview of the datatypes and procedures in order to store word documents within the database.



The Datatypes

When talking about manipulating documents within a database, there are only a few choices for a datatype that can handle a large document. These large objects (LOBs) can use any one of the four datatypes depending on the characteristics of the object you are storing. These large objects can be in the form of text, graphics, video or audio.



Datatype Description

BLOB

Used to store unstructured binary data up to 4G. This datatype stores the full binary object in the database.

CLOB/NCLOB

Used to store up to 4G of character data. This datatype stores the full character data in the database.

BFILE

Used to point at large objects that are external to the database and in operating system files. The BFILE column also contains binary data and cannot be selected.

Benefits of LOBs

It use to be that the largest object you could store in the database was of the datatype LONG. Oracle has for the last few releases kept telling us to convert our LONG datatypes to a LOB datatype (maybe they will too). The reason for converting our LONGs to LOBs can be seen in this short list of benefits.

  1. LOB columns can reach the size of 4G.

     

  2. You can store LOB data internally within a table or externally.

     

  3. You can perform random access to the LOB data.

     

  4. It is easier to do transformations on LOB columns.

     

  5. You can replicate the tables that contain LOB columns.

     

Create a Table to Store the Document

In order to store the documents into the database you must obviously first create an object to store the information. Following is the DDL to create the table MY_DOCS. You will notice that there is a holder for the bfile location and a column (DOC_BLOB) to hold the document.

	CREATE TABLE my_docs 
		(doc_id   NUMBER, 
		bfile_loc BFILE,
		doc_title VARCHAR2(255),
		doc_blob  BLOB DEFAULT EMPTY_BLOB() );

The Load Procedure

The load procedure takes as arguments the document name and an id number for the document. The procedure will then prime a row for update based on the document id, BFILE location and document name (which becomes the document title). The procedure will then open internal and external BLOBs and load the internal from the external. At this point, the document has been loaded into the database table.

Code Meaning
bfile_loc := BFILENAME('DOC_DIR', in_doc);

In order to load the document, you must first point to the external object through a BFILE locator. The BFILENAME procedure takes a directory location and the document name.

INSERT INTO
  my_docs (doc_id, bfile_loc, doc_title) 
  VALUES (1, bfile_loc, in_doc);

This statement is to prime the row into which the external object will be inserted.

SELECT doc_blob INTO temp_blob 
  FROM my_docs WHERE doc_id = in_id
  FOR UPDATE;

Associate the temporary blob object to the table blob object for updating.

DBMS_LOB.OPEN(bfile_loc, DBMS_LOB.LOB_READONLY);

Open the external blob object for reading.

DBMS_LOB.OPEN(temp_blob, DBMS_LOB.LOB_READWRITE);

Open the temporary blob object for reading and writing.

DBMS_LOB.LOADFROMFILE
  (temp_blob, bfile_loc, Bytes_to_load);

Copy the entire external blob object (BFILE) into the internal temporary blob object.

The Search Procedure

The search procedure takes as arguments a document id and a search string. The search procedure takes as arguments a document id and a search string. The procedure then converts the search string into raw format and places it into the variable named PATTERN. Once the variable PATTERN is populated, it is used for searching the loaded temporary BLOB DOC_BLOB to see if the particular pattern exists.

Pattern    := utl_raw.cast_to_raw(in_search);

Take the input search characters and convert them to raw characters that can be used to search your document.

SELECT doc_blob INTO lob_doc
  FROM my_docs WHERE doc_id = in_id;

Put the document into a temporary BLOB for manipulation.

DBMS_LOB.OPEN (lob_doc, DBMS_LOB.LOB_READONLY);

Open the temporary BLOB for reading.

Position := DBMS_LOB.INSTR
  (lob_doc, Pattern, Offset, Occurrence);

Search the temporary BLOB for the supplied search string. If it finds the pattern then the variable POSITION will be not 0.

How to Use the code.

The procedures that I have given you are very simplistic in nature and are intended to be part of a larger application for managing external documents within a database. They are intended to setup a directory where your documents live, load the documents into a database, and then search for string patterns in the document id provided. I personally can see you taking out the reliance of supplying a document id and allowing the search to span multiple documents within your library. Below I have given a brief description on how to use the code as is but feel free to modify and integrate into your own set of procedures.

How to Use

  1. log into your database of choice as the SYS user
  2. compile the package
	SQL> @mydocs.sql
  1. set serveroutput on
	SQL> set serveroutput on
  1. initial setup of directory object where your documents live
	SQL> exec mydocs.doc_dir_setup
  1. Check to make sure you can read one of your documents on disk.
	SQL> exec mydocs.list(‘Your Document Here.doc');
  1. Load your document into the database
	SQL> exec mydocs.load(‘Your Document Here.doc', 1);
  1. Search your documents for a string pattern
	SQL> exec mydocs.search(‘Search Pattern', 1);

The Code

CREATE OR REPLACE PACKAGE 
          mydocs 
	  AS
 PROCEDURE doc_dir_setup;
 PROCEDURE list   (in_doc    IN VARCHAR2);
 PROCEDURE load   (in_doc    IN VARCHAR2,
                   in_id     IN NUMBER);
 PROCEDURE search (in_search IN VARCHAR2,
                   in_id     IN NUMBER);
END mydocs;
/
CREATE OR REPLACE PACKAGE BODY 
          mydocs 
	  AS
vexists      BOOLEAN;
vfile_length NUMBER;
vblocksize   NUMBER;

PROCEDURE doc_dir_setup IS
 BEGIN
  EXECUTE IMMEDIATE 
  'CREATE DIRECTORY DOC_DIR AS'||
  '''"E:\jkoopmann\publish\databasejournal\Oracle"''';
END doc_dir_setup;

PROCEDURE list (in_doc IN VARCHAR2) IS
BEGIN
      UTL_FILE.FGETATTR('DOC_DIR',
                        in_doc,
			vexists,
			vfile_length,
			vblocksize);
      IF vexists THEN
        dbms_output.put_line(in_doc||'    '||vfile_length); 
      END IF;
END list;

PROCEDURE load (in_doc IN VARCHAR2,
                in_id  IN NUMBER) IS
temp_blob         BLOB := empty_blob();
bfile_loc         BFILE;
Bytes_to_load     INTEGER := 4294967295;
BEGIN
  bfile_loc := BFILENAME('DOC_DIR', in_doc);
  INSERT INTO my_docs (doc_id, bfile_loc, doc_title) 
         VALUES (in_id, bfile_loc, in_doc);
  SELECT doc_blob INTO temp_blob 
    FROM my_docs WHERE doc_id = in_id 
     FOR UPDATE;
  DBMS_LOB.OPEN(bfile_loc, DBMS_LOB.LOB_READONLY);
  DBMS_LOB.OPEN(temp_blob, DBMS_LOB.LOB_READWRITE);
  DBMS_LOB.LOADFROMFILE(temp_blob, bfile_loc, Bytes_to_load);
  DBMS_LOB.CLOSE(temp_blob);
  DBMS_LOB.CLOSE(bfile_loc);
COMMIT;
END load;

PROCEDURE search (in_search VARCHAR2,
                  in_id     NUMBER) IS
lob_doc        BLOB;
Pattern        VARCHAR2(30);
Position       INTEGER := 0;
Offset         INTEGER := 1;
Occurrence     INTEGER := 1;
BEGIN
  Pattern    := utl_raw.cast_to_raw(in_search);
  SELECT doc_blob INTO lob_doc
    FROM my_docs WHERE doc_id = in_id;
  DBMS_LOB.OPEN (lob_doc, DBMS_LOB.LOB_READONLY);
  Position := DBMS_LOB.INSTR(lob_doc, Pattern, Offset, Occurrence);
  IF Position = 0 THEN
    DBMS_OUTPUT.PUT_LINE('Pattern not found');
  ELSE
    DBMS_OUTPUT.PUT_LINE('The pattern occurs at '|| position);
  END IF;
  DBMS_LOB.CLOSE (lob_doc);
END search;

BEGIN
  DBMS_OUTPUT.ENABLE(1000000);
END mydocs;
/