Create a simple RDD
Workaround
put the input.txt onto hdfs
hdfs dfs -put /opt/spark/bin/input.txt hdfs://localhost:9000/Spark
update the RDD from generating from hdfs file
val inputfile = sc.textFile("hdfs://localhost:9000/Spark")
val counts = inputfile.flatMap(line => line.split(" ")).map(word =>(word, 1)).reduceByKey(+);
counts.saveAsTextFile("output")
The result is saved on hdfs
hdfs dfs -ls hdfs://localhost:9000/user/root/output
experiment on a big hdfs block
hdfs dfs -cat hdfs://localhost:9000/user/root/titles/part-m-00000
val input = sc.textFile("hdfs://localhost:9000/user/root/titles/part-m-00000")
val counts = input.flatMap(line => line.split(" ")).map(word =>(word, 1)).reduceByKey(+);
counts.saveAsTextFile("title_output")
finishing in a very short time
check the result saved on hdfs
12. Scala