So, I'd like to represent one of Shakespeare's plays, Hamlet, into the following objects (maybe this isn't the best representation, if so please tell me):
class Play(): acts =  ... def add_act(self, act): acts.append(act) class Act(): scenes =  ... def add_scene(self, scene): scenes.append(scene) class Scene(): elems =  def __init__(self, title, setting=""): ... def add_elem(self, elem): elems.append(elem) ... class StageDirection(): # elem def __init__(self, text): ... class Line(): # elem def __init__(self, id, text, character = None): ... # A None character represents a continuation from the previous line # id could be, for example, 1.1.1
There are other methods, of course, for printing and such in each of the classes.
The question is, how do I get a structure based on these classes (or something like them) from HTML 4 code that looks like this:
<H3>ACT I</h3> <h3>SCENE I. Elsinore. A platform before the castle.</h3> <p><blockquote> <i>FRANCISCO at his post. Enter to him BERNARDO</i> </blockquote> <A NAME=speech1><b>BERNARDO</b></a> <blockquote> <A NAME=1.1.1>Who's there?</A><br> </blockquote> <A NAME=speech2><b>FRANCISCO</b></a> <blockquote> <A NAME=1.1.2>Nay, answer me: stand, and unfold yourself.</A><br> </blockquote> <A NAME=speech3><b>BERNARDO</b></a> <blockquote> <A NAME=1.1.3>Long live the king!</A><br> </blockquote> <A NAME=speech4><b>FRANCISCO</b></a> <blockquote> <A NAME=1.1.4>Bernardo?</A><br> </blockquote> <A NAME=speech5><b>BERNARDO</b></a> <blockquote> <A NAME=1.1.5>He.</A><br> </blockquote> <!-- for more, see the source of shakespeare.mit.edu/hamlet/full.html -->
translating that into something like this:
play = Play() actI = Act() sceneI = Scene("Scene I", "Elsinore. A platform before the castle.") sceneI.add_elem(StageDirection("Francisco at his post. Enter to him Bernardo.")) sceneI.add_elem(Line("Bernardo", "Who's there?")) ...
Of course, I don't expect all the code—but what libraries and, when there aren't libraries, logic should I use?
(This is for a future opensource project and me learning Python for fun—not homework.)
preguntado el 09 de enero de 11 a las 02:01
lxml or a similar parser. They will read your HTML (XML?) into a document tree, which is basically a more generic version of the data structure which you have written.
You can then iterate over the tree generated and prune it or rebuild another tree in memory that looks the way you want to. But the HTML -> data structure step is a solved problem.
Wait, do you want to generate the actual Python code? Why on earth would you want that?
BTW, your code won't do what you want:
class Play(): acts =  ... def add_act(self, act): acts.append(act)
Prueba esto en su lugar:
class Play(): def __init__(self): self.acts =  ... def add_act(self, act): self.acts.append(act)