Boyne: fast indexing for slow cpu?

fast indexing for slow cpu?

I have a large document that I want to build an index of for word
searching. (I hear this type of array is really called a concordances).
Currently it takes about 10 minutes. Is there a fast way to do it?
Currently I iterate through each paragraph and if I find a word I have not
encountered before, I add it too my word array, along with the paragraph
number in a subsidiary array, any time I encounter that same word again, I
add the paragraph number to the index. :
associativeArray={chocolate:[10,30,35,200,50001],parsnips:[5,500,100403]}
This takes forever, well, 5 minutes or so. I tried converting this array
to a string, but it is so large it won't work to include in a program
file, even after removing stop words, and would take a while to convert
back to an array anyway.
Is there a faster way to build a text index other than linear brute force?
I'm not looking for a product that will do the index for me, just the
fastest known algorithm. The index should be accurate, not fuzzy, and
there will be no need for partial searches.

Boyne

Tuesday, 3 September 2013

fast indexing for slow cpu?

No comments:

Post a Comment