Saturday, May 24, 2008

How do I identify my scanned files. . .

All the files that we scan and convert to digital image files have to be given some sort of identifying "name" (i.e. Smith-John.PDF, 1234.PDF, etc.). We do this by a process called "Indexing". Indexing can be a manual and/or automated process. Index data can also be used to name folders and sub-folders which the image files are placed in. There is also "Metadata"; think of metadata as data about the data (i.e. file creation date, file size, color or black and white image, etc.). PDF Text Searchable and PDF Normal files also contain metadata - the text contained within the document.

When considering a document imaging project, you will have to consider how the files will be named and organized. Typically, many companies have paper lists/logs describing the contents of a hardcopy document and where documents are located; in what file cabinet, what building, etc. Start transferring that information to an Excel spreadsheet - NOW! You will want to have all of your lists/logs stored electronically. The information contained in the spreadsheet can be imported into any database or enterprise content management system. The bottom line - make it electronic!

Having your data in an electronic format will help greatly aid indexing during the scanning and conversion process. In many cases, handwritten information contained in lists/logs has to be manually entered during indexing. Depending on the amount of information needed, this can add significant cost to any scanning conversion project. On the other hand, using existing electronic data will help to minimize the amount of manual indexing required, reduce cost and increase data accuracy. As an example, Twin Imaging Technology recently scanned and converted over 23,000 patient records for a medical group. The client required each patient file to be named using the patient’s last name, first name, and ID number. This would have been time consuming and costly to type in all 23,000 names. Our client was able to provide us with an electronic database containing all of the patient names and ID numbers. We used our batch scanning software, PSI Capture, to automatically "read" the patient ID number from each scanned file. Once the ID number was read (zonal OCR), PSI automatically looked up the name associated with the ID number and named the file. Manually naming a batch of 100 patient files would have taken 15 to 20 minutes. Using the electronic data provided by the client allowed us to index each batch (100 patient records on average) in less than 10 seconds.

Don't fret if you can't provide any of your data electronically. As I mentioned previously, we use a process called Zonal OCR to read text from a document. This is not always accurate, but does help to automate and reduce the time devoted to indexing.

Next time we'll talk about how you can structure your documents to be more scanner and indexing friendly.

Sean Martin, Vice President, Twin Imaging Technology

