Hacer algo significativo con la API de Amazon BrowseNodes

I have a website (www.7bks.com) where people create book lists. It's fairly simple at the moment. I'm already using the Amazon API to pull book information, images etc onto the site.

What I'd like to do is somehow use the Amazon API to pull back category and/or tag data to create some way of browsing lists on my site. Unfortunately, the tag api method is discontinued.

The most likely candidate is the BrowseNodes method of the Amazon API (http://docs.amazonwebservices.com/AWSEcommerceService/2005-10-05/ApiReference/BrowseNodesResponseGroup.html) but the data returned from this call is pretty nonsensical and I was hoping we might be able to put our heads together and figure out how to make sense of it.

Here's a google spreadsheet to show you the kind of data I get. I picked a sample list (http://www.7bks.com/list/549002) and ran the three books through the BrowseNodes API:

https://spreadsheets.google.com/ccc?key=0ApVjkgehRamudHd5SlNhYllPQkZDSDY1cllfQVBQM1E&hl=en&authkey=CN_MxoAO

Looking at the list as a human you don't need to know what the books are in order to see that it's likely the list is about Sci-Fi and Fantasy. That's mainly though because the eye is good at discarding meaningless categories such as "custom stores" and "fiction complete".

I tried de-duping the list of categories, or only looking at the categories that appear for all 3 books but it's still fairly crap data. I would love your thoughts on how I can turn this data into something meaningful for the users.

My best thought so far is just to scan the data and match to a hard-coded list. So something like:

if Count("science fiction & fantasy") > 3 then list is sci fi if Count("business finance & law") > 3 then list is business

etc.

This is very rigid though and ideally I'd like to build something a little more flexible/powerful.

Todas las sugerencias son bienvenidas.

I think this is a high-level question so shouldn't be impacted by HOW I'm calling the API but for reference I'm using Python/Appengine/Webapp.

Muchas Gracias

tom

ACTUALIZACIÓN after much banging of head against desk I've managed to fix this this issue to my satisfaction. It's not that complicated but I've hacked together some python code that does what I want. I welcome anyone improving on my code or offering suggestions.

Basically the logic underlying the code is this: 1) In the XML tree, the bottom node of a node that starts (books > subjects) is the best guess at what the book is actually about. E.g. for this: http://www.amazon.co.uk/Surface-Detail-Iain-M-Banks/dp/1841498939/ it returns "science fiction". Bingo. 2) Typically there's a lot of good information thrown away by limiting ourselves to just those results that start (books > subject). Therefore, 3) I try getting a list of similar books and pulling the categories off them, if that fails then I just get the category assigned to the original book.

Perhaps best explained by giving you the code as follows:

#takes as input the xml output of the amazon api browsenodes call
def getcategories(xml):
    #fetches the names of all the nodes, stores them in a list
    categories = []              
    for book in xml.getElementsByTagName('BrowseNode'):
        category = get_text(book,'Name')
        categories.append(category)

    #turn the one list into a series of individual lists
    #each individual list should be a particular tree from browsenode
    #each list will end 'Books'
    #the first item in the list should be the bottom of the tree
    taglists = []
    while 'Books' in categories:
        find = categories.index('Books') + 1
        list = categories[:find]
        taglists.append(list)
        for word in list:
            categories.remove(word)

    #now, we only return the first item from a list which contains 'Subjects'        
    final = []    
    for tagset in taglists:
        while 'Subjects' in tagset:
            final.append(tagset[0])
            tagset.pop(tagset.index('Subjects'))
    return final

class Browsenodes(webapp.RequestHandler):
    def get(self):
        #get the asin of the target book
        asin = self.request.get('term')
        if book_title:
            #fetch the amazon key
            api = API(AWS_KEY, SECRET_KEY, 'uk', processor=minidom_response_parser)
            try:
                #try getting a list of similar books - note the response group set to browsenodes
                result = api.similarity_lookup(asin, ResponseGroup='BrowseNodes')
            except:
                #there aren't always a list of similar books, so as a failsafe just get the book I wanted.
                result = api.item_lookup(asin, ResponseGroup='BrowseNodes')
            final = getcategories(result)
            #turn it into a set to de-dupe multiple listings of the same category
            self.response.out.write(set(final))

To give you a flavour of the output:

Libro: http://www.amazon.co.uk/Surface-Detail-Iain-M-Banks/dp/1841498939/

Tags: Contemporary Fiction Products Space Opera Science Fiction

http://www.amazon.co.uk/Godel-Escher-Bach-Eternal-anniversary/dp/0140289208/ Psychology History of Mathematics Mathematical Logic General AAS Popular Maths Scientific, Technical & Medical Arts & Music Philosophy of Mind Amazon Maths Architecture & Logic Contemporary Philosophy: 1900- Logic Classics Physics Metaphysics Philosophy of Physics General Technology Algebraic Number Theory Artificial Intelligence History of Science

http://www.amazon.co.uk/Flatland-Romance-Dimensions-Dover-Thrift/dp/048627263X/ Contemporary Fiction Philosophy of Mathematics General AAS Popular Maths Philosophy Scientific, Technical & Medical Philosophy of Mind Science Fiction Maths Contemporary Philosophy: 1900- Algebraic Number Theory Products Classics Metaphysical & Visionary Myths & Fairy Tales Topology General Topics General Theoretical Methods Metaphysics Artificial Intelligence History of Science

http://www.amazon.co.uk/Victoria-Condor-Books-Knut-Hamsun/dp/0285647598/ Contemporary Fiction Literary Fiction Psychological General AAS Classics Short Stories

preguntado el 09 de enero de 11 a las 08:01

2 Respuestas

My best thought so far is just to scan the data and match to a hard-coded list. So something like:

if Count("science fiction & fantasy") > 3 then list is sci fi if Count("business finance & law") > 3 then list is business

I think this might not be a bad idea? Grab the top level book categories from Amazon and just match against those. It's not very elegant but it would work.

Alternatively, perhaps you could use the dc:subject data from the Google Book API? (I haven't used it though so it may also be garbage).

Respondido el 20 de junio de 20 a las 12:06

Hum.. First of all, the curent APi is dated 2011-08-01. maybe you could do yourself a favor by looking at an up to date documentation ? Advertising Products API

To me, the XML makes a lot of sense!

Maybe because , when I want to understand properly one of those answers, I copy the XML into visual studio XML editor, where I can open and close nodes.

The structure is something like this:

  <BrowseNodes>
    <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
      <BrowseNode>...</BrowseNode>
    </BrowseNodes>

Then inside of each BrowseNode, it will be something like this:

<BrowseNode>
      <BrowseNodeId>10399</BrowseNodeId>
      <Name>Classics</Name>
      <Ancestors>
        <BrowseNode>
          <BrowseNodeId>17</BrowseNodeId>
          <Name>Literature &amp; Fiction</Name>
          <Ancestors>
            <BrowseNode>
              <BrowseNodeId>1000</BrowseNodeId>
              <Name>Subjects</Name>
              <IsCategoryRoot>1</IsCategoryRoot>

Notice the "IsCategoryRoot"? There is no points going higher than that, as this is just so generic it does not make sense using it. The name is "Subjects" for Books, but it is "Categories" for eBooks, so it does seems to make more sense to check on the "IsCategoryRoot" element.

I am not 100% sure what you want to do, and I don't know python much, but I do know databases... I would get the book ASIN identifier (which is unique worldwide for amazon, meaning you can look for the same asin on amazon.Com, but also, co.uk, Fr, de, and so on...), put in in a table, along with whatever other data you feel usefull, create a tables for categories, put in there their names and id, then one link table with one entry for each lower level BrowseNode, with the BrowseNodeID and book's ASIN, then for the nested browsenode (wich in facts are the parents, or ancestors), put both their child id and their own. Obviously, before inserting those categories I would check it does not already exists.

The goal here is to have one record per book, one record per category, and as many links between categories to books, and between themselves as needed.

That way, it would be extremely easy to search books from categories, and vice versa.

Sorry if I have been a bit long, but there is no short answer to your question. Hope this helps.

Bernardo

contestado el 03 de mayo de 13 a las 03:05

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.