文件名称:
Flume+Solr演示demo.pdf
开发工具:
文件大小: 5mb
下载次数: 0
上传时间: 2019-09-01
详细说明:该脑图是介绍Flume+Solr演示demo,请贡献给大家下载!tier1 sources, source, channels=channel1
tier1, channels, channe ll, type=memory
tier1 channels. channell capacity=10000000
tier1 channe ls channe ll. transactionCapacity=10000
tier1 channels, channe ll, keep-alive=60
tier sinks
k1. type
apache. fl
ink, soLr, morphline. MorphlineSolrsink
i tier sinks, sink channel channell
tier sinks. sink. morphlineFile =/home/ec2-user/morphline. conf
tier1 sinks,, sink. morphlineld= morphline1
ticrl. sourccs-sourcc1
tier l channels=ch anne l1
tier sinks=sink
tier1
es source type - avro
#tierl.sourcessourcel.type=org.apacheflumesourcehttpFttpsOurcE
tier 1. sources. source1 bind =0.0.0.0
tier l sources, source port=45678
#tierI, sources, sourcel handler org apache, flume sink, solr, morphline. BlobHandler
4tier1 sources source1 handler. max BlobLength= 260000000G
#ticrl. sources. sourcel interceptors= uuidintcrccptor
#tiers
ces. sourcel. interceptors. uuidinterceptor type org apache. f lume sink solr morphline. UUIDInterceptor sUi lde
#tier l sources, source interceptors. uuidinterceptor headerName id
tier1 sources. source1 channe ls=channe l1
tier l channels. channell type=memory
tier 1. channels. channell, capacity=10000000
tierl channe ls, channeLl. transactioncapaci ty=10000
tier 1. channels. channell, keep-alive=6G
tier 1. sinks, sinkI, type org apache. f lume, sink solr. morphlire MorphlineSolrsink
er1 sinks, sink
k1, channel channel1
tier 1.sinks. sinkI, morph ineF i le =/home/ec2-user/morph line, cont
3.准备 morphline的配置文件
#f Specify server locations in a SOLR LOCATOR variable; used later in
f variable substitutions
SOLR LOCATOR:
f Name of solr collection
collection collection 1
#f ZooKeeper ensemble
khOst:"ip-172-31-12-213:2181/sour"
: Specify an array of one or more morphlines, each of which defines an EtL
#f trans formation chain. A morphline consists of one or more potentially
i #t nested commands, A morphline is a way to consume records such as Flume
events
: HDFS files or b locks, turn them into a stream of records, and pipe the
stream
: of records through a set of easily configurable transformations on its
way to
t Solr
morpholines
#f Name used to identify a morphline. For example, used if there are
multiple
*f morphlines in a morphline config file
d: morph line1
*f Import all morphline commands in these java packages and their
subpackage
#f other commands that may be present on the classpath are not visible
to this
morphline.
mport Commands
["org. kitesdk*k*
:"org. apache, solr.**","com, cloudera, example.**"
!1l
commands
ead]son i
extractJsonPaths f
flatten false
paths t
d: /id
user name /user screen name
created at /created at
text :/text
i text cn:/text cn
#f Consume the output record of the previous command and pipe another
t record downstream
t convert timestamp field to native solr timestamp format
# such as2012-09-06T07:14:34Zto2012-09-06T07:14:34.000Z
convertTimestamp t
field i created at
inputFormats ["yyyy-MM-dd'T'HH: mm:ss 'Z,yyyy-MM-dd"
inputTimezone America/Los Angeles
outputFormat : yyyy-MM-dd'T'HH: mm: SS SSs Z
outputTimezone UTC
f Consume the output record of the previous command and pipe another
f record downstream
#
t this command deletes record fields that are unknown to solr
# schema.xm1。
Recall that solr throws an exception on any attempt to load a document
f that contains a field that is not specified in schema, xml
sanitizeUnknownSolrFields t
f Location from which to fetch solr schema
solrLocator SOLR LOCATOR]
f log the record at debug level to SLF4J
i logDebugi format "output record:i", args :[11 I
f load the record into a Solr server or Mapreduce reducer
LoadSolr t
solrLocator $ISOLR LOCATOR]
Lec2-usereip-172-31-12-213: - cat morphline. conf
Specify server locations in a SOLR LOCATOR variable; used later in
variable substitutions
SOLR LOCATOR
t Name of solr collection
coLLection collection1
#f ZooKeeper ensemble
zkHos t:"ip-172-31-12-213:2181/soLr"
Specify an arr ay of one or more morphlines, each of which defines an ETL
transformation chain. A morphline consists of one or more potentially
nested commands, A mor phline is a way to consume records such as Flume events
HDFS files or blocks, turn them into a stream of records, and pipe the stream
of records through a set of easily configurable transformations on its way to
Solr
morpholines
Name used to identify a morphline. For example, used if there are multiple
morpholines in a morphline config file
id: morphine
Import all mor phline commands in these java packages and their subpackages
other commands that may be present on the classpath are not visible to this
mor phline
impor commands :[ org. kitesdk *x,org. apache, solr. **,"com clouder a example. **1
commands
reason 1
extract Paths
flatten: false
paths t
id :/id
user name /user screen name
created at : /created at
text
/ text
text cn. /text cn
Consume the output record of the previous command and pipe another
s record downs tream
convert timestamp field to native Solr timestamp format
# such as2012-69-06Te7:14:34Zto212-9-06T67:14:34,⊙60Z
convertTimes tamp
field created at
inputFormats : yyyy-MM-dd HH: mm: ss 2,yyyy-MM-dd]
nputTimezone Ameri
output Format :yyyy-MM-dd ' THH: mm: ss SSSZ
outputTimezone
4.到
https://repository.cloudera.com/artifactory/cdh-releases-rcs/org/apache/lucene/
lucene- analyzers-smartcn/下载对应CDH版本的中文分词包,将下载的jar包放到
每台机器的/opt/ cloudera/ parcels/CDH/lib/solr/ webapps/sol/WEB-NF/lib和
/opt/ cloudera/ parcels/CDH/lb/ lume-ng/ib,重启Sor和 Flume(demo演示过程
中省略
ec2-usereip-172-31-12-213: /opt/cloudera/parcels/CDH/lib/flume-ng/lib> pwd
/opt/cloudera/parcels/ CDH/Lib/flume-ng/lib
c2-usereip-172-31-12-213: /opt/cloudera/parce ls/CDH/lib/flume-ng/1ib> l1 lucene*
rwxrwxrwx 1 root root 3602595 4 23 02: 20 lucenc-analyzcrs-smartcn-4.10.3-cdh5 6.0, jar
/parcels/CDH/liE/flume-ng/ibs 11 /opt/cloudera/parcelsCDH/1.b/solr/webapps/solr/WEB-INF/1ib/smarten"
5.根据数据格式制作 schema文件
field name=text cn" type=text ch indexed=true stored=true"/
/fields
types