Create a simple RDD

Workaround

put the input.txt onto hdfs

hdfs dfs -put /opt/spark/bin/input.txt hdfs://localhost:9000/Spark

update the RDD from generating from hdfs file

val inputfile = sc.textFile("hdfs://localhost:9000/Spark")

val counts = inputfile.flatMap(line => line.split(" ")).map(word =>(word, 1)).reduceByKey(+);

counts.saveAsTextFile("output")

The result is saved on hdfs

hdfs dfs -ls hdfs://localhost:9000/user/root/output

experiment on a big hdfs block

hdfs dfs -cat hdfs://localhost:9000/user/root/titles/part-m-00000

val input = sc.textFile("hdfs://localhost:9000/user/root/titles/part-m-00000")

val counts = input.flatMap(line => line.split(" ")).map(word =>(word, 1)).reduceByKey(+);

counts.saveAsTextFile("title_output")

finishing in a very short time

check the result saved on hdfs

12. Scala

results matching ""