Pregunta de pereza de Haskell o por qué esta mónada no funciona como se esperaba

here is kinda huge piece of code, it finally got huge, because of monadic stuff, but task is simple: parse following string to data structure:

"hello(some, args)" -> [("fid","hello"),("sym","("),("args","some, args"),("sym",")")]

but code i written produces following:

"hello(some, args)" -> [("fid",""),("sym","("),("args",""),("sym",")")]

as you can see 'args' and 'fid' values are lost somewhere on the way, i suspect compiler decided not to calculate them for some mysterious reason.

I guess the code is completely bad, also i marked with "?"'s parts which seem useless to me, but compiler forced me to leave them in place :)

Y aquí está el código:

type PStream = String
type PToken a = (String, a)
data Pstate a = Pstate (String -> ([PToken String], PStream)) a

instance Monad Pstate where
    return x = Pstate (\_ -> ([("start", "")], "?")) x -- not used ?
    (Pstate bindparser v) >>= f  = Pstate newparser fv
            Pstate fparser fv = f v
            (btok, brest) = bindparser "this string also not used"
            (tok, rest) = fparser brest
            newparser _ = (btok ++ tok, rest)

-- parsers
parseFid :: Pstate String
parseFid = Pstate parser "???"
    where parser r = let (fid, rest) = span (/= '(') r in ([("fid", fid)],rest)

parseSym :: Char -> Pstate String
parseSym c = Pstate parser "???"
    where parser r = let rest = parseOne c r in ([("sym", [c])],rest)

parseOne s (h:t) = if h == s then t else error $ "symbol not match:" ++ [h] ++ " /= " ++ [s]
parseOne s []    = []

parseArgs :: Pstate String
parseArgs = Pstate parser "???"
    where parser r = let (args,rest) = span (/=')') r in ([("args", args)],rest)

-- util
load :: String -> Pstate String
load s = Pstate (\ls -> ([("load", "")],ls)) "???"

runP :: Pstate String -> ([PToken String], PStream)
runP (Pstate fparser fv) = fparser "???"

-- combined parser
parseFunction :: String -> Pstate String
parseFunction s = do
    load s --- should be 'return' here ?
    parseSym '('
    parseSym ')'

main = putStrLn $ show $ runP $ parseFunction "hello(a b c)"

preguntado el 09 de enero de 11 a las 04:01

It seems a bit like that you didn't really understood what the monad's doing in this case. Can you tell me a bit more about this code if my answer isn't what you're looking for? -

2 Respuestas

First, about the "???" you had to leave there. Consider your definition of Pstate:

data Pstate a = Pstate (String -> ([PToken String], PStream)) a

This means, that your data constructor has the following type:

Pstate :: (String -> ([PToken String], PStream)) -> a -> Pstate a

This is the default construct of a monad. If you define monadic combinators, it's actually not uncommon to have some combinators where this is not needed, so the convention is to leave it to () en este caso.

But actually I think that your code is very strange, it seems like you didn't grabbed the point of a stateful monad. Let me explain:

Usually, a stateful computation has this type:

data MyState a = MyState (TypeOfState -> (a, TypeOfState))

This means, that your monadic action is actually some kind of computation, that does something (possible with your piece of state) and than returns a result and a new state. The state is wrapped up in the monad, so you don't have to think about it.

In your code, you're using the same pattern, but somewhat different. It seems like that you fixed the result of the computation to [PToken String]. Let me fix up your definition a bit:

data Pstate a = Pstate (PStream -> (a, PStream))

So now, you get the return value of your computation by applying the combinators, which look like this:

instance Monad Pstate where
  -- return should just wrap the computation up, so no changes
  return x = Pstate (\p -> (x,p))
  parser1 >>= parser2  = Pstate $ \input -> let
    Pstate parser1' = parser1
    Pstate parser2' = parser2
    (output, rest) = parser1' input
    result = parser2' output rest
    in result

Now, you can look at the type signatures for your parsers, they should be something like this: parseFid :: Pstate [PToken PStream]. This means, your parser consumes some input and returns the parsed stuff as [PToken PStream] and sets the new input to what is left. Consider this definition of parseFid about how it could look like:

parseFid :: Pstate [PToken PStream]
parseFid = Pstate $ \r -> let (fid, rest) = span (/= '(') r in ([("fid", fid)],rest)

The rest is left as an exercise to the reader. I would suggest you to reformulate your parser using the State monad from Control.Monad.State.Strict instead. You'll see, that the monad above is basically the same.

Actually, it's most times easier to rely on existing and well known tools, instead of rolling down an own parser. Here's a parser for what you need created with Parsec, a state of the art library for parsing:

import Text.Parsec

parseFunction = do name   <- parseName
                   obrace <- parseOpeningBrace
                   args   <- parseArguments
                   cbrace <- parseClosingBrace
                   return [name,obrace,args,cbrace]

parseName         = many (noneOf "(") >>= \name -> return ("fid",name)
parseOpeningBrace = char '(' >> return ("sym","(")
parseArguments    = many (noneOf ")") >>= \name -> return ("args",name)
parseClosingBrace = char ')' >> return ("sym",")")

main = case parse parseFunction "" "hello(some, args)" of
  Left error   -> print error
  Right result -> print result

Aquí está el resultado:

[("fid","hello"),("sym","("),("args","some, args"),("sym",")")]

I actually would suggest you to think of some better representation of the parsed function, this may make things easier.

Respondido el 10 de enero de 11 a las 16:01

Thanks for such reply, it clears tings a lot, but still only problem is bind method, i guess those k a s' etc is hard to keep in memory, is it possible to name variables in terms of parsers ? Also (m s) look strange, you applying value to Pstate type, shouldn't it be unpacked first ? - Dfr

@Dfr, yes. Exactly. I actually just copied the code of Control.Monad.State.Lazy, and modified it to my need. There went something wrong obviously. I'm going to fix this soon. - fuz

@Dfr: Fixed the issues. Have a look again! - fuz

If you run the code as posted, you can see that the "this string also not used" string is in fact used, as you get this output:

([("load",""),("fid","this string also not used"),("sym","("),("args","this string also not used"),("sym",")")],"")

In fact the string is basically used as input for all the parsers. In the definition of >>=, the string is given as input to the bindparser. This parser then takes it as it's input and creates tokens from it. parseFid for example produces the token ("fid","this string also not used").

La newparser that is constructed in >>= ignores any input it might receive later, it just returns the result of parsing "this string also not used". Similar, the parser created with return ignores the value it should return.

The parsers created with bind should not ignore/override their inputs for parsing to work correctly.

Also you should decide what role the second parameter of Pstate should fulfill, since at the moment is mostly contains "???", which doesn't look particularly useful.

Respondido el 09 de enero de 11 a las 15:01

Yes, this is another weird spot i missed, i was thinking about it and come to conclusion that bindparser should ignore its input,because it happens to be newparser created from return and it comes from previous bind: newparser _ = (btok ++ tok, rest). But result is completely unexpected and still wrong because bindparser input passes unparsed. The second parameter it think it should be result of last parsing routine, and we could say fid <- parseFid. - Dfr

And the real parser is fparser, which does all parsing while bindparser just passes result through, that was an idea. - Dfr

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.