I am implementing a search bar that should search for 2 main strings A and B I give priority to the results as follows (from most important to least)
- a result combining A and B
- a result for B only
- a result for A only
so for example, if I search for "Egypt"+"Pyramids" i want my first results to be for things like "Egyptian Pyramids", followed by those about "Pyramids" in general or as a geometric shape etc.., then finally results for "Egypt"
I am trying several searching APIs, like Google and Bing, what I currently do is that I search for both first to get result set X, then search for B only to get what i call positive list, then search for A only to get a negative list.. I score the results in X and penalize them if they exist in the negative list, give them a bonus if they exist in the positive list, then at the end i add up whatever's left in the positive list to X..
It works good but still not good enough, i was wondering if someone can help me with an addition to this simple algorithm or a totally different idea
preguntado el 08 de noviembre de 11 a las 16:11
You need to use something called a "set" for a task like this. http://en.wikipedia.org/wiki/Set_%28computer_science%29
If you search for "Egypt" + "Pyramids", create a 'set' for each of the individual search terms. The most important results are in what we call the 'intersection' of the sets, (in both "Egypt"-set and "Pyramids"-set).
The lower priority results are in what we call the 'relative complements' of the sets. Pretend you wanted everything in B that wasn't in A. We call this the relative complement of A in B).
Most programming languages have a library/package implementing a set for you (which are optimized).