me gustaría hacer
register s3n://uw-cse344-code/myudfs.jar -- load the test file into Pig --raw = LOAD 's3n://uw-cse344-test/cse344-test-file' USING TextLoader as (line:chararray); -- later you will load to other files, example: raw = LOAD 's3n://uw-cse344/btc-2010-chunk-000' USING TextLoader as (line:chararray); -- parse each line into ntriples ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray); --filter 1 subjects1 = filter ntriples by subject matches '.*rdfabout\\.com.*' PARALLEL 50; --filter 2 subjects2 = subjects1;
pero me sale el error:
2012-03-10 01:19:18,039 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input ';' expecting LEFT_PAREN Details at logfile: /home/hadoop/pig_1331342327467.log
so it seems pig doesn't like that. How do I accomplish this?
preguntado el 10 de marzo de 12 a las 01:03
i don't think that kind of 'typical' assignment works in pig. It's not really a programming language in the strict sense - it's a high-level language on top of hadoop with some specialized functions.
i think you'll need to simply re-project the data from subjects1 to subjects2, such as:
subjects2 = foreach subjects1 generate $0, $1, $2;
another approach might be to use the LIMIT function with some absurdly high parameter.
subjects2 = subjects2 LIMIT 100000000 ;
there could be a lot of reasons why that doesn't make sense, but it's a thought.
i sense you are considering doing things as you would in a programming language
- i have found that rarely works out like you want it to but you can always get the job done once you think like Pig.
As I understand your example fro DataScience coursera course. It's strange but I found the same problem. This code works on the on amount of data and don't on the another.
Because we need to change parameters I used this code:
filtered2 = foreach filtered generate subject as subject2, predicate as predicate2, object as object2;