Flume支持众多的source和sink类型,详细手册可参考官方文档,更多source和sink组件

http://flume.apache.org/FlumeUserGuide.html

Flume官网入门指南:


1:Flume的概述和介绍:

(1):Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。
(2):Flume可以采集文件,socket数据包等各种形式源数据,又可以将采集到的数据输出到HDFS、hbase、hive、kafka等众多外部存储系统中
(3):一般的采集需求,通过对flume的简单配置即可实现
(4):Flume针对特殊场景也具备良好的自定义扩展能力,因此,flume可以适用于大部分的日常数据采集场景

2:Flume的运行机制:

(1):Flume分布式系统中最核心的角色是agent,flume采集系统就是由一个个agent所连接起来形成。

(2):每一个agent相当于一个数据传递员,内部有三个组件:
    a):Source:采集源,用于跟数据源对接,以获取数据。主要作用是接受客户端发送的数据,并将数据发送到channel中,source和channel之间的关系是多对多的关系,不过一般情况下使用一个source对应多个channel。通过名称区分不同的source。flume常用的source有:Avro Source,Thrift Source,Exec Source,Kafka Source,Netcat Source。
    b):Sink:下沉地,采集数据的传送目的,用于往下一级agent传递数据或者往最终存储系统传递数据。主要作用就是定义数据写出方式,一般情况下sink从channel中获取数据,然后将数据写出到file,hdfs或者网络上。channel和sink之间的关系是一对多的关系,通过不同的名称来区分sink。Flume常用的sink有:hdfs sink,hive sink,file sink.hbase sink,avro sink,thrift sink,logger sink等等。
    c):Channel:angent内部的数据传输通道,用于从source将数据传递到sink。主要作用是提供一个数据传输通道,提供数据传输和数据存储等功能。source将数据放到channel中,sink从channel中拿数据。通过不同的命令来区分channel。Flume常用的channel有:memory channel,jdbc channel,kafka channel,file channel等等。

注意:Source 到 Channel 到 Sink之间传递数据的形式是Event事件;Event事件是一个数据流单元。

下面介绍单个Agent的fulme数据采集示意图:

多级agent之间串联示意图:

3:Flume的安装部署:

(1)、Flume的安装非常简单,只需要解压即可,当然,前提是已有hadoop环境:
  a):上传安装包到数据源所在节点上,上传过程省略。
  b):然后解压  tar -zxvf apache-flume-1.6.0-bin.tar.gz;

    [root@master package]# tar -zxvf apache-flume-1.6.0-bin.tar.gz -C /home/hadoop/
  c):然后进入flume的目录,修改conf下的flume-env.sh,在里面配置JAVA_HOME;(由于conf目录下面是 flume-env.sh.template,所以我复制一个flume-env.sh,然后进行修改JAVA_HOME)

    [root@master conf]# cp flume-env.sh.template flume-env.sh

    [root@master conf]# vim flume-env.sh

    然后将#注释去掉,加上自己的JAVA_HOME:export JAVA_HOME=/home/hadoop/jdk1.7.0_65
(2)、根据数据采集的需求配置采集方案,描述在配置文件中(文件名可任意自定义);
(3)、指定采集方案配置文件,在相应的节点上启动flume agent;

(4)、可以先用一个最简单的例子来测试一下程序环境是否正常(在flume的conf目录下新建一个文件);

4:部署安装好,可以开始配置采集方案(这里是一个简单的采集方案配置的使用,从网络端口接收数据,然后下沉到logger), 然后需要配置一个文件,这个采集配置文件名称,netcat-logger.conf,采集配置文件netcat-logger.conf的内容如下所示:

 # example.conf: A single-node Flume configuration

 # Name the components on this agent
#定义这个agent中各组件的名字,给那三个组件sources,sinks,channels取个名字,是一个逻辑代号:
#a1是agent的代表。
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source 描述和配置source组件:r1
#类型, 从网络端口接收数据,在本机启动, 所以localhost, type=spoolDir采集目录源,目录里有就采
#type是类型,是采集源的具体实现,这里是接受网络端口的,netcat可以从一个网络端口接受数据的。netcat在linux里的程序就是nc,可以学习一下。
#bind绑定本机localhost。port端口号为44444。 a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = # Describe the sink 描述和配置sink组件:k1
#type,下沉类型,使用logger,将数据打印到屏幕上面。
a1.sinks.k1.type = logger # Use a channel which buffers events in memory 描述和配置channel组件,此处使用是内存缓存的方式
#type类型是内存memory。
#下沉的时候是一批一批的, 下沉的时候是一个个eventChannel参数解释:
#capacity:默认该通道中最大的可以存储的event数量,1000是代表1000条数据。
#trasactionCapacity:每次最大可以从source中拿到或者送到sink中的event数量。
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel 描述和配置source  channel   sink之间的连接关系
#将sources和sinks绑定到channel上面。
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

下面在flume的conf目录下面编辑这个文件netcat-logger.conf:

[root@master conf]# vim netcat-logger.conf

启动agent去采集数据,然后可以进行启动了,启动命令如下所示:

bin/flume-ng agent -c conf -f conf/netcat-logger.conf -n a1  -Dflume.root.logger=INFO,console

  -c conf   指定flume自身的配置文件所在目录

  -f conf/netcat-logger.con  指定我们所描述的采集方案

  -n a1  指定我们这个agent的名字

 启动命令:
#告诉flum启动一个agent。
#--conf conf指定配置参数,。
#conf/netcat-logger.conf指定采集方案的那个文件(自命名)。
#--name a1:agent的名字,即agent的名字为a1。
#-Dflume.root.logger=INFO,console给log4j传递的参数。
$ bin/flume-ng agent --conf conf --conf-file conf/netcat-logger.conf --name a1 -Dflume.root.logger=INFO,console

演示如下所示:

启动的信息如下所示,可以启动到前台,也可以启动到后台:

 [root@master apache-flume-1.6.-bin]# bin/flume-ng agent --conf conf --conf-file conf/netcat-logger.conf --name a1 -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /home/hadoop/apache-flume-1.6.-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/home/hadoop/hadoop-2.4./bin/hadoop) for HDFS access
Info: Excluding /home/hadoop/hadoop-2.4./share/hadoop/common/lib/slf4j-api-1.7..jar from classpath
Info: Excluding /home/hadoop/hadoop-2.4./share/hadoop/common/lib/slf4j-log4j12-1.7..jar from classpath
Info: Including Hive libraries found via () for Hive access
+ exec /home/hadoop/jdk1..0_65/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/home/hadoop/apache-flume-1.6.0-bin/conf:/home/hadoop/apache-flume-1.6.0-bin/lib/*:/home/hadoop/hadoop-2.4.1/etc/hadoop:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-io-2.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/hadoop-annotations-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/hadoop-auth-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/httpclient-4.2.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/httpcore-4.2.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jets3t-0.9.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/stax-api-1.0-2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1-tests.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/hadoop-nfs-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/common/jdiff:/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib:/home/hadoop/hadoop-2.4.1/share/hadoop/common/sources:/home/hadoop/hadoop-2.4.1/share/hadoop/common/templates:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-io-2.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-2.4.1-tests.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/hadoop-hdfs-nfs-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/jdiff:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/sources:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/templates:/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/webapps:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/activation-1.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-io-2.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-lang-2.6.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-client-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-json-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jettison-1.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jline-0.9.94.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/zookeeper-3.4.5.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-api-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-client-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-common-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-tests-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/sources:/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/test:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/hadoop-annotations-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.1-tests.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib-examples:/home/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/sources:/home/hadoop/hadoop-2.4.1/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/home/hadoop/hadoop-2.4./lib/native org.apache.flume.node.Application --conf-file conf/netcat-logger.conf --name a1
-- ::, (lifecycleSupervisor--) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:)] Configuration provider starting
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:)] Reloading configuration file:conf/netcat-logger.conf
-- ::, (conf-file-poller-) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:)] Added sinks: k1 Agent: a1
-- ::, (conf-file-poller-) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:)] Processing:k1
-- ::, (conf-file-poller-) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:)] Processing:k1
-- ::, (conf-file-poller-) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:)] Post-validation flume configuration contains configuration for agents: [a1]
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:)] Creating channels
-- ::, (conf-file-poller-) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:)] Creating instance of channel c1 type memory
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:)] Created channel c1
-- ::, (conf-file-poller-) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:)] Creating instance of source r1, type netcat
-- ::, (conf-file-poller-) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:)] Creating instance of sink: k1, type: logger
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:)] Channel c1 connected to [r1, k1]
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:)] Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1ce79b8 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:)] Starting Channel c1
-- ::, (lifecycleSupervisor--) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:)] Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
-- ::, (lifecycleSupervisor--) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:)] Component type: CHANNEL, name: c1 started
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:)] Starting Sink k1
-- ::, (conf-file-poller-) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:)] Starting Source r1
-- ::, (lifecycleSupervisor--) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:)] Source starting
-- ::, (lifecycleSupervisor--) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:]

然后可以向这个端口发送数据,就打印出来了,因为这里输出是在console的:

相当于产生数据的源:[root@master hadoop]# telnet localhost 44444

[root@master hadoop]# telnet localhost 44444
bash: telnet: command not found

我的机器没有安装telnet ,所以先安装一下telnet ,如下所示:

第一步:检测telnet-server的rpm包是否安装 ???
  [root@localhost ~]# rpm -qa telnet-server
  若无输入内容,则表示没有安装。出于安全考虑telnet-server.rpm是默认没有安装的,而telnet的客户端是标配。即下面的软件是默认安装的。

第二步:若未安装,则安装telnet-server:

   [root@localhost ~]#yum install telnet-server

第三步:3、检测telnet的rpm包是否安装 ???
  [root@localhost ~]# rpm -qa telnet
  telnet-0.17-47.el6_3.1.x86_64

第四步:若未安装,则安装telnet:

  [root@localhost ~]# yum install telnet

第五步:重新启动xinetd守护进程???

  由于telnet服务也是由xinetd守护的,所以安装完telnet-server,要启动telnet服务就必须重新启动xinetd
  [root@locahost ~]#service xinetd restart

完成以上步骤以后可以开始你的命令,如我的:

  [root@master hadoop]# telnet localhost 44444
  Trying ::1...
  telnet: connect to address ::1: Connection refused
  Trying 127.0.0.1...
  Connected to localhost.
  Escape character is '^]'.

解决完上面的错误以后就可以开始测试telnet数据源发送和flume的接受:

测试,先要往agent采集监听的端口上发送数据,让agent有数据可采集,随便在一个能跟agent节点联网的机器上:telnet localhost 44444

然后可以看到flume已经接受到了数据:

如何退出telnet呢???

  首先按ctrl+]退出到telnet > ,然后输入telnet> quit即可退出,记住,quit后面不要加;

5:flume监视文件夹案例:

 监视文件夹

 第一步:
首先 在flume的conf的目录下创建文件名称为:vim spool-logger.conf的文件。
将下面的内容复制到这个文件里面。 # Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
#监听目录,spoolDir指定目录, fileHeader要不要给文件夹前坠名
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/flumespool
a1.sources.r1.fileHeader = true # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1 第二步:根据a1.sources.r1.spoolDir = /home/hadoop/flumespool配置的文件路径,创建相应的目录。必须先创建对应的目录,不然报错。java.lang.IllegalStateException: Directory does not exist: /home/hadoop/flumespool
[root@master conf]# mkdir /home/hadoop/flumespool 第三步:启动命令:
bin/flume-ng agent -c ./conf -f ./conf/spool-logger.conf -n a1 -Dflume.root.logger=INFO,console
[root@master apache-flume-1.6.0-bin]# bin/flume-ng agent --conf conf --conf-file conf/spool-logger.conf --name a1 -Dflume.root.logger=INFO,console
第四步:测试:
往/home/hadoop/flumeSpool放文件(mv ././xxxFile /home/hadoop/flumeSpool),但是不要在里面生成文件。

6:采集目录到HDFS案例:

(1)采集需求:某服务器的某特定目录下,会不断产生新的文件,每当有新文件出现,就需要把文件采集到HDFS中去
(2)根据需求,首先定义以下3大要素
  a):采集源,即source——监控文件目录 :  spooldir
  b):下沉目标,即sink——HDFS文件系统  :  hdfs sink
  c):source和sink之间的传递通道——channel,可用file channel 也可以用内存channel
(3):Channel参数解释:

  capacity:默认该通道中最大的可以存储的event数量;

  trasactionCapacity:每次最大可以从source中拿到或者送到sink中的event数量;

  keep-alive:event添加到通道中或者移出的允许时间;

配置文件编写:

 #定义三大组件的名称
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1 # 配置source组件
agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /home/hadoop/logs/
agent1.sources.source1.fileHeader = false #配置拦截器
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = host
agent1.sources.source1.interceptors.i1.hostHeader = hostname # 配置sink组件
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path =hdfs://master:9000/weblog/flume-collection/%y-%m-%d/%H-%M
agent1.sinks.sink1.hdfs.filePrefix = access_log
agent1.sinks.sink1.hdfs.maxOpenFiles =
agent1.sinks.sink1.hdfs.batchSize=
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize =
agent1.sinks.sink1.hdfs.rollCount =
agent1.sinks.sink1.hdfs.rollInterval =
#agent1.sinks.sink1.hdfs.round = true
#agent1.sinks.sink1.hdfs.roundValue =
#agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive =
agent1.channels.channel1.capacity =
agent1.channels.channel1.transactionCapacity = # Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

7:采集文件到HDFS案例:

(1):采集需求:比如业务系统使用log4j生成的日志,日志内容不断增加,需要把追加到日志文件中的数据实时采集到hdfs

(2):根据需求,首先定义以下3大要素

  采集源,即source——监控文件内容更新 :  exec  ‘tail -F file’

  下沉目标,即sink——HDFS文件系统  :  hdfs sink

  Source和sink之间的传递通道——channel,可用file channel 也可以用 内存channel

配置文件编写:

 agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1 # Describe/configure tail -F source1
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -F /home/hadoop/logs/access_log
agent1.sources.source1.channels = channel1 #configure host for source
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = host
agent1.sources.source1.interceptors.i1.hostHeader = hostname # Describe sink1
agent1.sinks.sink1.type = hdfs
#a1.sinks.k1.channel = c1
agent1.sinks.sink1.hdfs.path =hdfs://master:9000/weblog/flume-collection/%y-%m-%d/%H-%M
agent1.sinks.sink1.hdfs.filePrefix = access_log
agent1.sinks.sink1.hdfs.maxOpenFiles =
agent1.sinks.sink1.hdfs.batchSize=
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize =
agent1.sinks.sink1.hdfs.rollCount =
agent1.sinks.sink1.hdfs.rollInterval =
agent1.sinks.sink1.hdfs.round = true
agent1.sinks.sink1.hdfs.roundValue =
agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive =
agent1.channels.channel1.capacity =
agent1.channels.channel1.transactionCapacity = # Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

待续......

日志采集框架Flume以及Flume的安装部署(一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统)的更多相关文章

  1. Flume的概述和安装部署

    一.Flume概述 Flume是一种分布式.可靠且可用的服务,用于有效的收集.聚合和移动大量日志文件数据.Flume具有基于流数据流的简单灵活的框架,具有可靠的可靠性机制和许多故障转移和恢复机制,具有 ...

  2. Redis Sentinel安装与部署,实现redis的高可用

    前言 对于生产环境,高可用是避免不了要面对的问题,无论什么环境.服务,只要用于生产,就需要满足高可用:此文针对的是redis的高可用. 接下来会有系列文章,该系列是对spring-session实现分 ...

  3. kubeadm安装集群系列-2.Master高可用

    Master高可用安装 VIP负载均衡可以使用haproxy+keepalive实现,云上用户可以使用对应的ULB实现 准备kubeadm-init.yaml文件 apiVersion: kubead ...

  4. .NetCore 分布式日志收集Exceptionless 在Windows下本地安装部署及应用实例

    自己安装时候遇到很多问题,接下来把这些问题写出来希望对大家有所帮助 搭建环境: 1.下载安装 java 8 SDK (不要安装最新的10.0) 并配置好环境变量(环境变量的配置就不做介绍了) 2.下载 ...

  5. Centos7.3安装部署Zabbix3.4.15(成功可用)

    1.Xshell 远程连接到Centos7.3.连接centos 系统后,首先关闭防火墙和SELINUX,如不关闭会各种拦截,网页访问等故障,容易造成蛋疼哦.#systemctl stop firew ...

  6. 在CentOS上安装部署MooseFS分布式文件系统

    参考资料: http://www.moosefs.org/tl_files/manpageszip/moosefs-step-by-step-tutorial-cn-v.1.1.pdf 环境介绍:OS ...

  7. 01_日志采集框架Flume简介及其运行机制

    离线辅助系统概览: 1.概述: 在一个完整的大数据处理系统中,除了hdfs+mapreduce+hive组成分析系统的核心之外,还需要数据采集.结果数据导出. 任务调度等不可或缺的辅助系统,而这些辅助 ...

  8. FlumeNG介绍及安装部署

    本节内容: Flume简介 Flume NG核心组件 Flume部署种类 Flume单机安装 一.Flume简介 Flume是一个分布式.可靠.高可用的海量日志聚合系统,支持在系统中定制各类数据发送方 ...

  9. 日志采集框架Flume

    前言 在一个完整的大数据处理系统中,除了hdfs+mapreduce+hive组成分析系统的核心之外,还需要数据采集.结果数据导出.任务调度等不可或缺的辅助系统,而这些辅助工具在hadoop生态体系中 ...

随机推荐

  1. 手机支持USB功能、驱动文件对应关系

    手机支持USB功能: 1.UMS(USB MASS Stronge) : 连接PC作为存储盘使用 2.ADB : 用于调试 3.MTP :连接PC作为存储盘使用(win XP需要安装WMP10 以上 ...

  2. 单实例Singleton

    单实例Singleton设计模式可能是被讨论和使用的最广泛的一个设计模式了,这可能也是面试中问得最多的一个设计模式了.这个设计模式主要目的是想在 整个系统中只能出现一个类的实例.这样做当然是有必然的, ...

  3. PHP学习心得(二)——实用脚本

    <?php 来表示 PHP 标识符的起始,然后放入 PHP 语句并通过加上一个终止标识符 ?> 来退出 PHP 模式 调用函数phpinfo(),将会看到很多自己系统的信息,以及预定义变量 ...

  4. 【HDOJ】2037 今年暑假不AC

    qsort排序后DP,水题.注意,数组开大点儿,把时间理解为0~23,开太小会wa. #include <stdio.h> #include <stdlib.h> #defin ...

  5. ActiveMQ持久化消息(转)

    ActiveMQ的另一个问题就是只要是软件就有可能挂掉,挂掉不可怕,怕的是挂掉之后把信息给丢了,所以本节分析一下几种持久化方式: 一.持久化为文件 ActiveMQ默认就支持这种方式,只要在发消息时设 ...

  6. dev Gridcontrol单元格值格式化及在模板中调用命令

    <dxg:GridColumn>                    <dxg:GridColumn.EditSettings>                        ...

  7. CareerCup All in One 题目汇总

    Chapter 1. Arrays and Strings 1.1 Unique Characters of a String 1.2 Reverse String 1.3 Permutation S ...

  8. [Win10]安装msi时2503,2502错误及其解决

    简述 刚安装了win10系统,在安装TortoiseGit和TortoiseSvn时,这两个软件是.msi后缀的安装文件,在点击安装时老是提示2503,2502错误,因此无法安装上 搜索了下一般都提到 ...

  9. linux cfs调度器_模型实现

    调度器真实模型的主要成员变量及与抽象模型的对应关系 I.cfs_rq结构体    a) struct sched_entity *curr        指向当前正在执行的可调度实体.调度器的调度单位 ...

  10. MongoDB入门---数据库&amp;&amp;&amp;集合的基本操作

    MongoDB作为一种nosql的数据库,它自己本身的增伤改查还有数据库集合的创建和展示与一般的数据库较之是有一部分差别的.我们今天就来看一下MongoDB的一些基本操作.    首先呢,就是先来数据 ...