Fusionar el algoritmo de resultados de búsqueda

I am implementing a search bar that should search for 2 main strings A and B I give priority to the results as follows (from most important to least)

  1. a result combining A and B
  2. a result for B only
  3. a result for A only

so for example, if I search for "Egypt"+"Pyramids" i want my first results to be for things like "Egyptian Pyramids", followed by those about "Pyramids" in general or as a geometric shape etc.., then finally results for "Egypt"

I am trying several searching APIs, like Google and Bing, what I currently do is that I search for both first to get result set X, then search for B only to get what i call positive list, then search for A only to get a negative list.. I score the results in X and penalize them if they exist in the negative list, give them a bonus if they exist in the positive list, then at the end i add up whatever's left in the positive list to X..

It works good but still not good enough, i was wondering if someone can help me with an addition to this simple algorithm or a totally different idea

preguntado el 08 de noviembre de 11 a las 16:11

1 Respuestas

You need to use something called a "set" for a task like this. http://en.wikipedia.org/wiki/Set_%28computer_science%29

If you search for "Egypt" + "Pyramids", create a 'set' for each of the individual search terms. The most important results are in what we call the 'intersection' of the sets, (in both "Egypt"-set and "Pyramids"-set).

The lower priority results are in what we call the 'relative complements' of the sets. Pretend you wanted everything in B that wasn't in A. We call this the relative complement of A in B).

Most programming languages have a library/package implementing a set for you (which are optimized).

respondido 08 nov., 11:23

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.