博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
saprk2 structed streaming
阅读量:4565 次
发布时间:2019-06-08

本文共 2415 字,大约阅读时间需要 8 分钟。

netcat (windows) >nc -L -p 9999

import java.sql.Timestampimport org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.functions._/**  */object Test extends App {  val host = "localhost"  val port = 9999  val windowSize = 10  val slideSize = 5  if (slideSize > windowSize) {    System.err.println("
must be less than or equal to
") } val windowDuration = s"$windowSize seconds" val slideDuration = s"$slideSize seconds" val spark = SparkSession .builder .appName("StructuredNetworkWordCountWindowed") .master("local[3]") .config("spark.sql.shuffle.partitions", 3) .getOrCreate() spark.sparkContext.setLogLevel("ERROR") import spark.implicits._ // Create DataFrame representing the stream of input lines from connection to host:port val lines = spark.readStream .format("socket") .option("host", host) .option("port", port) .option("includeTimestamp", true) .load() // Split the lines into words, retaining timestamps val words = lines.as[(String, Timestamp)].flatMap(line => line._1.split(" ").map(word => (word, line._2)) ).toDF("word", "timestamp") // Group the data by window and word and compute the count of each group val windowedCounts = words.groupBy( window($"timestamp", windowDuration, slideDuration), $"word" ).count().orderBy($"window".desc) // Start running the query that prints the windowed word counts to the console val query = windowedCounts.writeStream .outputMode("complete") .format("console") .option("truncate", "false") .start() query.awaitTermination()}

Result:

-------------------------------------------Batch: 1-------------------------------------------+---------------------------------------------+----+-----+|window                                       |word|count|+---------------------------------------------+----+-----+|[2017-10-24 16:09:30.0,2017-10-24 16:09:40.0]|b   |3    ||[2017-10-24 16:09:30.0,2017-10-24 16:09:40.0]|a   |3    ||[2017-10-24 16:09:30.0,2017-10-24 16:09:40.0]|c   |1    ||[2017-10-24 16:09:30.0,2017-10-24 16:09:40.0]|d   |1    ||[2017-10-24 16:06:40.0,2017-10-24 16:06:50.0]|a   |4    ||[2017-10-24 16:06:35.0,2017-10-24 16:06:45.0]|a   |8    ||[2017-10-24 16:06:30.0,2017-10-24 16:06:40.0]|a   |4    |+---------------------------------------------+----+-----+

窗口移动5秒,窗口宽度10秒。

聚合维度: window, {world}

转载于:https://www.cnblogs.com/luweiseu/p/7724034.html

你可能感兴趣的文章
UVALive 3635 Pie(二分法)
查看>>
win系统查看自己电脑IP
查看>>
Backup&recovery备份和还原 mysql
查看>>
全局变量、局部变量、静态全局变量、静态局部变量的区别
查看>>
一道面试题及扩展
查看>>
Unity 3D 我来了
查看>>
setup elk with docker-compose
查看>>
C++ GUI Qt4学习笔记03
查看>>
Java基础回顾 —反射机制
查看>>
c# 前台js 调用后台代码
查看>>
2017-02-20 可编辑div中如何在光标位置添加内容
查看>>
$.ajax()方法详解
查看>>
day42
查看>>
jquery操作select(增加,删除,清空)
查看>>
Sublimetext3安装Emmet插件步骤
查看>>
MySQL配置参数
查看>>
全面理解Java内存模型
查看>>
A - Mike and palindrome
查看>>
DOTween教程
查看>>
java web中java和python混合使用
查看>>