博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Spark运行任务
阅读量:4202 次
发布时间:2019-05-26

本文共 2273 字,大约阅读时间需要 7 分钟。

1. 启动hadoop
sh start-dfs.sh
sh start-yarn.sh
2. 启动spark
cd /appl/spark-1.4.0/
sbin/start-all.sh
3. 准备数据
hadoop fs -put /mk/test/kmeans_data.txt /test/
4. 编写程序

Java

import org.apache.spark.api.java.*;import org.apache.spark.api.java.function.Function;import org.apache.spark.mllib.clustering.KMeans;import org.apache.spark.mllib.clustering.KMeansModel;import org.apache.spark.mllib.linalg.Vector;import org.apache.spark.mllib.linalg.Vectors;import org.apache.spark.SparkConf;/* Test: * sh start-dfs.sh * sh start-yarn.sh * cd /appl/spark-1.4.0/ * sbin/start-all.sh * hadoop fs -put /mk/test/kmeans_data.txt /test/ * ./bin/spark-submit /mk/test/KMeansSim.jar */public class KMeansSim {  public static void main(String[] args) {	// environment initialization    SparkConf conf = new SparkConf().setAppName("K-means Example");    JavaSparkContext sc = new JavaSparkContext(conf);    // Load and parse data (${SPARK_HOME}/data/mllib/kmeans_data.txt)    String path = "/test/kmeans_data.txt";    JavaRDD
data = sc.textFile(path); JavaRDD
parsedData = data.map( new Function
() { public Vector call(String s) { return Vectors.dense(toDoubleArray(s)); } } ); parsedData.cache(); // Cluster the data into two classes using KMeans int numClusters = 2; int numIterations = 20; KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations); // Evaluate clustering by computing Within Set Sum of Squared Errors double WSSSE = clusters.computeCost(parsedData.rdd()); System.out.println("Within Set Sum of Squared Errors = " + WSSSE); // Save and load model clusters.save(sc.sc(), "myModelPath"); KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath"); // predict test System.out.println("~~~predict:" + clusters.predict(Vectors.dense( toDoubleArray("1.0 2.1 3.8")))); // ending sc.stop(); } // String to double[] public static double[] toDoubleArray(String s) { String[] sarray = s.split(" "); double[] values = new double[sarray.length]; for (int i = 0; i < sarray.length; i++) values[i] = Double.parseDouble(sarray[i]); return values; }}

5. 运行

./bin/spark-submit /mk/test/KMeansSim.jar

你可能感兴趣的文章
【数据库之mysql】 mysql 入门教程(二)
查看>>
【HTML5/CSS/JS】A list of Font Awesome icons and their CSS content values(一)
查看>>
【HTML5/CSS/JS】<br>与<p>标签区别(二)
查看>>
【HTML5/CSS/JS】开发跨平台应用工具的选择(三)
查看>>
【心灵鸡汤】Give it five minutes不要让一个好主意随风而去
查看>>
【React Native】Invariant Violation: Application AwesomeProject has not been registered
查看>>
【ReactNative】真机上无法调试 could not connect to development server
查看>>
【XCode 4.6】常用快捷键 特别是格式化代码ctrl+i
查看>>
【iOS游戏开发】icon那点事 之 实际应用(二)
查看>>
【iOS游戏开发】icon那点事 之 图标设计(三)
查看>>
【IOS游戏开发】之测试发布(Distribution)
查看>>
【IOS游戏开发】之IPA破解原理
查看>>
【一天一道LeetCode】#45. Jump Game II
查看>>
【一天一道LeetCode】#46. Permutations
查看>>
【一天一道LeetCode】#47. Permutations II
查看>>
【一天一道LeetCode】#48. Rotate Image
查看>>
【一天一道LeetCode】#56. Merge Intervals
查看>>
【一天一道LeetCode】#57. Insert Interval
查看>>
【一天一道LeetCode】#58. Length of Last Word
查看>>
【一天一道LeetCode】#59. Spiral Matrix II
查看>>