public class LuceneIndexer
extends java.lang.Object
implements java.io.Serializable
Note: The routines provided by this class are shared by the Search3 and CD publishing functions. Any changes made to this class should be tested with both functions.
Note: This class was intentionally not designed as a singleton class in order to avoid any conflicts that could occur if the Search3 index build is run at the same time that a CD batch is being published.
Modifier and Type | Field and Description |
---|---|
static char |
A_DIAERESIS
use the small letter a with diaeresis to prefix special chars, as it will
not be likely to normally be in the input data and the lucene standard
analyzer thinks it is part of a word (does not use as a token separator)
|
protected static int |
CHAR_ARRAY_ALLOCATE_SZ
Constant to define the number of characters to allocate to character arrays
|
protected java.text.DecimalFormat |
FMT_ATTRSEQ
A number formatter used for formatting the attribute descriptor
sequence number that is output for each descriptor processed.
|
static java.lang.String |
INDEX_FLD_CD_PROD_IMAGE
Index field name for product image (CD only)
|
static java.lang.String |
INDEX_FLD_CD_PROD_SUMMARY2
Index field name for product summary description (CD only)
|
static java.lang.String |
INDEX_FLD_CD_PROD_URL
Index field name for product URL (CD only)
|
static java.lang.String |
INDEX_FLD_PROD_ATTR
Index field name for product attributes
|
static java.lang.String |
INDEX_FLD_PROD_CODE
Index field name for product code
|
static java.lang.String |
INDEX_FLD_PROD_DESC_1
Index field name for product desc 1 (prod table desc)
|
static java.lang.String |
INDEX_FLD_PROD_DESC_2
Index field name for product desc 2 (short desc)
|
static java.lang.String |
INDEX_FLD_PROD_DESC_3
Index field name for product desc 3 (long desc)
|
static java.lang.String |
INDEX_FLD_PROD_DESC_BASE
Index base field name for the product descriptions (not an actual index field name)
|
static java.lang.String |
INDEX_FLD_PROD_UNIT
Index field name for product u/m
|
static java.lang.String |
INDEX_FLD_TOPMOST_CATEGORIES
Index field name for topmost categories which include this product
|
protected org.apache.lucene.index.IndexWriter |
ivIdxWriter
The writer used to write search index documents to the file system.
|
protected java.lang.String |
ivIndexDir
The search index directory.
|
protected java.util.TreeMap<java.lang.String,java.lang.String> |
ivSiteAttrs
A collection of searchable product attribute descriptors and attribute
values.
|
protected java.util.HashMap<java.lang.String,java.util.HashSet<java.lang.String>> |
ivTopMostLookup
A map which stores a list of topmost categories for a given category
|
protected static java.util.regex.Pattern |
PATTERN_DASH
A pattern that is used to find dash characters.
|
protected static java.util.regex.Pattern |
PATTERN_FEET
A pattern that is used to find single quote (feet) characters.
|
protected static java.util.regex.Pattern |
PATTERN_INCHES
A pattern that is used to find double quote (inch) characters.
|
protected static java.util.regex.Pattern |
PATTERN_LBS
A pattern that is used to find hash characters.
|
protected static java.util.regex.Pattern |
PATTERN_POUNDS
A pattern that is used to find hash characters.
|
private static long |
serialVersionUID
The class' serialization version id.
|
Constructor and Description |
---|
LuceneIndexer(java.lang.String indexdir)
Constructs a
LuceneIndexer using the specified search index
directory path. |
Modifier and Type | Method and Description |
---|---|
protected void |
addToSiteAttrs(ProductAttributeDescriptor pad,
ProductAttribute pa)
Adds the given product attribute descriptor and product attribute to the
sorted collection of searchable attributes.
|
void |
closeWriter()
Closes the writer that was used to write search index documents to the
file system.
|
void |
createIndex(java.lang.String[] prodinfo,
java.io.File content,
JDBCConnectionInfo cninf)
Creates the search index for the given set of product information.
|
void |
createIndex(java.lang.String[] prodinfo,
JDBCConnectionInfo cninf)
Creates the search index for the given set of product information.
|
protected org.apache.lucene.document.Document |
getDocument(java.lang.String[] prodinfo,
java.io.File content,
JDBCConnectionInfo cninf)
Returns a search index document for the given set of product information.
|
java.util.TreeMap<java.lang.String,java.lang.String> |
getSiteAttrs()
Returns the sorted collection of searchable attributes.
|
protected void |
indexDescriptionFile(org.apache.lucene.document.Document doc,
java.lang.String prodCode,
int size,
java.lang.String field,
java.lang.String cdField,
boolean forCD)
Places the contents of the a product descrioption (large or small)
in the index document.
|
void |
openWriter()
Opens the writer that is used to write search index documents to the
file system.
|
static java.lang.String |
preprocessIndexData(java.lang.String s)
Looks for and adjusts special situations before a string is indexed.
|
java.lang.String |
preprocessIndexDataInstance(java.lang.String s)
Looks for and adjusts special situations before a string is indexed.
|
static java.lang.String |
readAll(java.io.Reader r)
Reads all data from the given reader into a
String object. |
private static final long serialVersionUID
public static final char A_DIAERESIS
public static final java.lang.String INDEX_FLD_CD_PROD_IMAGE
public static final java.lang.String INDEX_FLD_CD_PROD_SUMMARY2
public static final java.lang.String INDEX_FLD_CD_PROD_URL
public static final java.lang.String INDEX_FLD_PROD_ATTR
public static final java.lang.String INDEX_FLD_PROD_CODE
public static final java.lang.String INDEX_FLD_PROD_DESC_BASE
public static final java.lang.String INDEX_FLD_PROD_DESC_1
public static final java.lang.String INDEX_FLD_PROD_DESC_2
public static final java.lang.String INDEX_FLD_PROD_DESC_3
public static final java.lang.String INDEX_FLD_PROD_UNIT
public static final java.lang.String INDEX_FLD_TOPMOST_CATEGORIES
protected static final int CHAR_ARRAY_ALLOCATE_SZ
protected static final java.util.regex.Pattern PATTERN_LBS
protected static final java.util.regex.Pattern PATTERN_POUNDS
protected static final java.util.regex.Pattern PATTERN_FEET
protected static final java.util.regex.Pattern PATTERN_INCHES
protected static final java.util.regex.Pattern PATTERN_DASH
protected final java.text.DecimalFormat FMT_ATTRSEQ
protected java.lang.String ivIndexDir
protected org.apache.lucene.index.IndexWriter ivIdxWriter
protected java.util.TreeMap<java.lang.String,java.lang.String> ivSiteAttrs
protected java.util.HashMap<java.lang.String,java.util.HashSet<java.lang.String>> ivTopMostLookup
public LuceneIndexer(java.lang.String indexdir)
LuceneIndexer
using the specified search index
directory path.indexdir
- (String) The full path to the search index directory.public void closeWriter()
public void openWriter() throws java.lang.Exception
java.lang.Exception
- - if an error occurs while opening the
index writer.public void createIndex(java.lang.String[] prodinfo, JDBCConnectionInfo cninf) throws java.lang.Exception
prodinfo
- (String[]) An array of product field values.cninf
- (JDBCConnectionInfo) Database connection information.java.lang.Exception
- - if an error occurs while creating the
index.public void createIndex(java.lang.String[] prodinfo, java.io.File content, JDBCConnectionInfo cninf) throws java.lang.Exception
prodinfo
- (String[]) An array of product field values.content
- (File) The File object for the product's generated
cd content (the product's details page). null
if the index is not for CD publishing.cninf
- (JDBCConnectionInfo) Database connection information.java.lang.Exception
- - if an error occurs while creating the
index.protected org.apache.lucene.document.Document getDocument(java.lang.String[] prodinfo, java.io.File content, JDBCConnectionInfo cninf) throws java.lang.Exception
prodinfo
- (String[]) An array of product field values.content
- (File) The File object for the product's generated
cd content (the product's details page). null
if the document is not for CD publishing.cninf
- (JDBCConnectionInfo) Database connection information.java.lang.Exception
- - if an error occurs while creating the
document.protected void indexDescriptionFile(org.apache.lucene.document.Document doc, java.lang.String prodCode, int size, java.lang.String field, java.lang.String cdField, boolean forCD) throws java.lang.Exception
doc
- (Document) The Document object that contains the search
index field values for the specified product.prodCode
- (String) The product code.size
- (int) Designates which description file to index (i.e.
cvFSHelper.SIZE_SMALL)field
- (String) The index field to store data in.cdField
- (String) An additional index field to store data in
for CDs.forCD
- (boolean) Flag to indicate if indexing for a CD.java.lang.Exception
- - if an error occurs while indexing.protected void addToSiteAttrs(ProductAttributeDescriptor pad, ProductAttribute pa)
pad
- (ProductAttributeDescriptor) The attribute descriptor to
to added.pa
- (ProductAttribute) The product attribute to be added.public java.util.TreeMap<java.lang.String,java.lang.String> getSiteAttrs()
public static java.lang.String readAll(java.io.Reader r) throws java.io.IOException
String
object.r
- (Reader) The reader containing the data.java.io.IOException
public java.lang.String preprocessIndexDataInstance(java.lang.String s)
s
- (String) The characters to process.public static java.lang.String preprocessIndexData(java.lang.String s)
s
- (String) The characters to process.