So you have a scanned document, and you want to produce a searchable PDF from these images. In my case, I want to digitalize some of my books. There are other tutorials out there, but none of them worked for me. Here's what I did, maybe it helps you.
First, get a debian box and install the packages 
				
			tesseract (this is the OCR software), xsltproc (dark magic), exactimage (for hocr2pdf) and of course pdftk.  Now, take your scan and use scantailor to split it into several neat black and white .tif files, one for each page. Now, create a file called fix-hocr.xsl and put this in it:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- use on hocr file to fix for hocr2pdf 0.8.9 textbox placement -->
<xsl:template match="/html">
   <xsl:text>
</xsl:text>
   <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>
<xsl:template match="node()|@*">
   <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>
<xsl:template match="span[@class='ocr_line']">
   <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
   <xsl:element name="br">
</xsl:element>
</xsl:template>
</xsl:stylesheet>
.tif file, we now do the following:
 * Perform OCR on it, and record the information of where the letters are located in the image. That's what HOCR is all about.
 * Do some vodoo with fix-hocr.xsl on the HOCR information from tesseract because tesseract's output causes problems with hocr2pdf, at least it did for me.
 * Use the hocr2pdf tool to create a PDF document which contains two layers: One layer of text information and the original tif image above it.
And here's how we do that.
for pg in \$(ls *.tif); do 
 tesseract -l eng -psm 1 $pg stdout hocr |
 xsltproc -html -nonet -novalid fix-hocr.xsl - |
 hocr2pdf -i $pg -o "\${pg%%.tif}.pdf"; 
done
?xml tags, they're bogus. Now you should have tons of searchable pdf pages, let's merge them into a document.
pdftk *.pdf cat output book.pdf;
pdf2ps book.pdf book.ps;
gs -dCompatibilityLevel=1.4 -dBATCH -dNOPAUSE -dPDFSETTINGS=/ebook -dPDFA=2 -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=/RGB -dUseCIEColor -sPDFACompatibilityPolicy=2 -sOutputFile=book2.pdf book.ps;
pdf2djvu --loss-level=200 --dpi=299 --verbose --monochrome -o book.djvu book2.pdf;