- copy input files
- run streaming jar (now located in contrib/streaming)
- hope.
1) copy input files to dfs:
2) run streaming job:
Step back and run a simpler job.
This is the example code. Why did it fail again? Is there a problem with my input file?
I'll pick this back up tomorrow (or later). A little birdie just told me it was 3:40am. Well past my bedtime. I'd like to finish this up, but that's what I've been saying since I started at midnight. I have the Hadoop::Streaming::Mapper and ::Reducer modules all packaged up and ready to make an initial push to CPAN, but first I need to get an example running under hadoop. I did finish writing the tests for the non-hadoop case, and those are clean and ready. Feel free to follow along at my github repository.
Success!
hadoop dfs -copyFromLocal examples/wordcount wordcount
hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar \
-input wordcount \
-output wordcountout \
-mapper examples/wordcount/map.pl \
-reducer examples/wordcount/reduce.pl \
packageJobJar: [/home/hadoop/tmp/hadoop-hadoop/hadoop-unjar5876487782773207253/] [] /tmp/streamjob4555454909817451366.jar tmpDir=null
Sigh. Failure. Time to debug the job. Definitely needs the
10/01/05 03:29:34 INFO mapred.FileInputFormat: Total input paths to process : 1
10/01/05 03:29:35 INFO streaming.StreamJob: getLocalDirs(): [/home/hadoop/tmp/hadoop-hadoop/mapred/local]
10/01/05 03:29:35 INFO streaming.StreamJob: Running job: job_201001050303_0003
10/01/05 03:29:35 INFO streaming.StreamJob: To kill this job, run:
10/01/05 03:29:35 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201001050303_0003
10/01/05 03:29:35 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0003
10/01/05 03:29:36 INFO streaming.StreamJob: map 0% reduce 0%
10/01/05 03:30:19 INFO streaming.StreamJob: map 100% reduce 100%
10/01/05 03:30:19 INFO streaming.StreamJob: To kill this job, run:
10/01/05 03:30:19 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201001050303_0003
10/01/05 03:30:19 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0003
10/01/05 03:30:19 ERROR streaming.StreamJob: Job not Successful!
10/01/05 03:30:19 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
-file flag to bundle the executables to the remote machine.
-file map.pl
-file reducer.pl
hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar -input wordcount -output wordcountout3 -mapper /bin/cat -reducer /bin/wc
10/01/05 03:36:17 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0004
10/01/05 03:36:17 ERROR streaming.StreamJob: Job not Successful!
10/01/05 03:36:17 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Update
or maybe I'll just try a few more times...hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar -input wordcount -output wordcountout7 -mapper map.pl -reducer reduce.pl -file examples/wordcount/map.pl -file examples/wordcount/reduce.pl
packageJobJar: [examples/wordcount/map.pl, examples/wordcount/reduce.pl, /home/hadoop/tmp/hadoop-hadoop/hadoop-unjar390944251948922559/] [] /tmp/streamjob7610913425753318391.jar tmpDir=null
10/01/05 03:59:11 INFO mapred.FileInputFormat: Total input paths to process : 1
10/01/05 03:59:11 INFO streaming.StreamJob: getLocalDirs(): [/home/hadoop/tmp/hadoop-hadoop/mapred/local]
10/01/05 03:59:11 INFO streaming.StreamJob: Running job: job_201001050303_0010
10/01/05 03:59:11 INFO streaming.StreamJob: To kill this job, run:
10/01/05 03:59:11 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201001050303_0010
10/01/05 03:59:11 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0010
10/01/05 03:59:12 INFO streaming.StreamJob: map 0% reduce 0%
10/01/05 03:59:23 INFO streaming.StreamJob: map 100% reduce 0%
10/01/05 03:59:35 INFO streaming.StreamJob: map 100% reduce 100%
10/01/05 03:59:38 INFO streaming.StreamJob: Job complete: job_201001050303_0010
10/01/05 03:59:38 INFO streaming.StreamJob: Output: wordcountout7
hadoop dfs -ls wordcountout7;
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2010-01-05 03:59 /user/hadoop/wordcountout7/_logs
-rw-r--r-- 1 hadoop supergroup 125 2010-01-05 03:59 /user/hadoop/wordcountout7/part-00000
hadoop dfs -cat wordcountout7/part*
apple 2
bar 2
baz 1
c 1
c++ 2
cpan 9
foo 2
haskell 4
lang 1
lisp 1
ocaml 2
orange 2
perl 9
python 1
ruby 4
scheme 1
search 1
4 comments:
you saved my day!
Nicolas,
I'm glad I was able to save your day! I'd love to hear more about what you're working on, what's your hadoop project? Are you using my Hadoop::Streaming perl module?
Hi,
Whenever I am trying to use Java class files as my mapper and/or reducer I am getting the following error:
java.io.IOException: Cannot run program "MapperTst.class": java.io.IOException: error=2, No such file or directory
I executed the following command on the terminal:
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file /home/hadoop/codes/MapperTst.class -mapper /home/hadoop/codes/MapperTst.class -file /home/hadoop/codes/ReducerTst.class -reducer /home/hadoop/codes/ReducerTst.class -input gutenberg/* -output gutenberg-outputtstch27
Please let me know if I am going wrong.
Regards
Shrish
Shrish,
When you use -file to include a file in your jar, the file will be placed in the working directory. Use a local path to access it.
e.g.
-file /home/hadoop/codes/MapperTst.class
-file /home/hadoop/codes/ReducerTst.class
-mapper MapperTst.class
-reducer ReducerTst.class
Post a Comment