大数据服务合集

2022-09-01 hadoop+spark

hadoop+spark初识

2022-11-11 zookeeper

向集群加入了zookeeper分布式

2022-11-19 hive

这里也写了适配CentOS6.x的配置

2022-11-19 flume

日志采集

大数据服务合集

大数据服务合集

2022-09-01hadoop和spark的初识[1]hadoop and spark参考资料
2022-11-11添加了zookeeper分布式[2]zookeeper参考资料
2022-11-19添加了结合Mariadb的hive数据库使用[3]hive参考资料

hadoop+spark+zookeeper+hive分布式集群部署

环境准备

环境的准备基于我写的初始化脚本,自用7.x系列的CentOS,老版本的就支持CentOS/Redhat6,7,8但是有点不完善,需要可以邮箱或者博客留言。

os\iphostnameblock
centos7.9 192.168.222.226masterrsmanager,datanode,namenode.snamenode,nmanager
centos7.9 192.168.222.227node1snamenode,nmnager,datanode
centos7.9 192.168.222.228node2datanode,nmanager

国外服务器托管代码,可能被墙

1
2
3
4
5
6
7
# git clone https://github.com/linjiangyu2/K.git   //可能会拉不下来,多拉几次就下来了,因为托管代码的服务器是国外的
# cd K
# cat README.md //不会使用的要看一下这个文件,了解脚本需要输入的配置
# ./ksh //依次输入你自己的配置,第一次使用这个脚本一定要看README.md文件
# 如果需要有时候改IP地址图方便的话,直接把ksh这个二进制的脚本放在/usr/bin下,便可以在全局执行了
# mv ksh /usr/bin/ksh
使用ksh初始化后,开始配置

本站托管代码放心食用

1
2
3
4
5
# yum install -y https://mirrors.linjiangyu.com/centos/tianlin-release.noarch-7-1.x86_64.rpm
# yum --enablerepo='tianlin' install -y ksh-5.0-1.x86_64
# cat /usr/share/K/README.md //不会使用的要看一下这个文件,了解脚本需要输入的配置
# ksh //依次输入你自己的配置,第一次使用这个脚本一定要看README.md文件
使用ksh初始化后,继续下文配置

CDN托管放心食用

1
2
3
4
5
6
# wget https://cdn.staticaly.com/gh/linjiangyu2/K@master/ksh
# chmod +x ./ksh
# ./ksh //依次输入你自己的配置,第一次使用这个脚本一定要看https://github.com/linjiangyu2/K
# 如果需要有时候改IP地址图方便的话,直接把ksh这个二进制的脚本放在/usr/bin下,便可以在全局执行了
# mv ksh /usr/bin/ksh
使用ksh初始化后,继续下文配置

对应自己的IP地址,最好/etc/hosts的解析名和我一致,不然下面的配置文件需要自己对应自己的解析名修改

1
2
3
4
5
6
7
8
9
10
11
12
[master]# 
cat > /etc/hosts <<END
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
192.168.222.226 master
192.168.222.227 node1
192.168.222.228 node2
END
[master]# ssh-keygen -P '' -f ~/.ssh/id_rsa
[master]# for i in master node{1..2};do ssh-copy-id $i;done
[master]# for i in node{1..2};do rsync -av /etc/hosts root@$i:/etc/hosts;done
[master]# for i in master node{1..2};do ssh $i yum install -y openssl-devel;done

2.搭建

hadoop分布式

上传jdk和hadoop的tar包

这里使用的二进制包

配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  [master]# tar xf hadoop...  //不知道你使用的版本,写了...,以下也是,tab键或者对应修改就可以
# ...是表示我不知道你使用的版本,自己改
[root@ master]# tar xf jdk...
[root@ master]# tar xf hadoop...
[root@ master]# mv hadoop... /opt/hadoop285
[root@ master]# mv jdk... /usr/local/jdk

# vim /etc/profile
export JAVA_HOME=/usr/local/jdk
export HADOOP_HOME=/opt/hadoop285
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH

# source !$

以下是自己直接写入配置,在master服务器上进行

1
2
3
4
5
# cd /opt/hadoop285/etc/hadoop
# vim hadoop-env.sh //修改文件里面的export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/jdk
# vim yarn-env.sh //修改前面有注释的export JAVA_HOME为
export JAVA_HOME=/usr/local/jdk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# vim core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/data/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/data/hdfs/data</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
# vim yarn-site.xml 

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
# cp mapred-site.xml.template mapred-site.xml
# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
1
2
3
4
# vim slaves
master
node1
node2

然后在master节点把配置发到各个节点

1
2
3
4
5
[master]# for i in node{1..2};do rsync -av /usr/local/jdk root@$i:/usr/local/;done
# for i in node{1..2};do rsync -av /opt/hadoop285 root@$i:/opt/;done
# for i in node{1..2};do rsync -av /etc/profile root@$i:/etc/profile;done
然后到各个节点
[node1,2]# source /etc/profile

在node1,2上操作,最后在master操作

1
2
# hdfs namenode -format   //初始化
# ls -d /opt/data //此文件夹产生就是初始化成功

在master上操作

1
[root@ master]# start-all.sh

最后可以在各个节点使用jps命令查看各自的部件

1
[root@ xxx]# jps

当然web界面也可以访问的,浏览器访问192.168.222.226:8088和192.168.222.226:50070(对应自己IP地址)

来尝试运行一下第一个hadoop分布式任务吧

1
2
3
[root@ master]# hdfs dfs -put /etc/passwd /t1
[root@ master]# hadoop jar /opt/hadoop285/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /t1 /output/00
[root@ master]# hdfs dfs -ls /output/00 //查看运行后的结果文件,运行后的数据在part-r-00000

spark分布式

下面开始搭建分布式spark

这里使用spark的3.3.0版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 把spark包上传到机器上,然后到该包的目录,这里统一以spark-3.3.0-bin-hadoop3.tgz这个包为演示
[root@ master]# tar xf spark-3.3.0-bin-hadoop3.tgz
[root@ master]# mv spark-3.3.0-bin-hadoop3 /opt/spark
[root@ master]# vim /etc/profile
export PATH=/opt/spark/bin:/opt/spark/sbin:$PATH
[root@ master]# cd /opt/spark/conf
[root@ master]# mv spark-env.sh.template spark-env.sh
[root@ master]# vim spark-env.sh
export JAVA_HOME=/usr/local/jdk
export HADOOP_CONF_DIR=/opt/hadoop285/etc/hadoop
export SPARK_MASTER_IP=master #对应自己的master机器IP或者master解析的域名,如果是按照我上面做的直接写master即可
export SPARK_WORKER_MEMORY=1024m
export SPARK_WORKER_CORES=2
export SPARK_EXECUTOR_MEMORY=1024m
export SPARK_WORKER_INSTANCES=1
export SOARK_MASTER_PORT=7077
export SPARK_EXECUTOR_CORES=1
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://master:9000/spark_logs"
[root@ master]# cp spark-defaults.conf.template spark-defaults.conf
[root@ master]# vim spark-defaults.conf
spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/spark_logs
[root@ master]# vim slaves //对应自己的三台主机IP地址或者解析的域名
master
node1
node2
[root@ master]# cd /opt/spark/sbin
[root@ master]# mv start-all.sh spark-start.sh
[root@ master]# mv stop-all.sh spark-stop.sh
[root@ master]# source /etc/profile
[root@ master]# scp -r /opt/spark root@node1:/opt/
[root@ master]# scp -r /opt/spark root@node2:/opt/
[root@ master]# scp -r /etc/profile root@node1:/etc/
[root@ master]# scp -r /etc/profile root@node2:/etc/
# 然后在各工作节点执行命令
[root@ node1,node2]# source /etc/profile
# 在master节点执行
[root@ master]# start-all.sh
[root@ master]# hdfs dfs -mkdir /spark_logs
[root@ master]# spark-start.sh //启动spark集群
[root@ master]# jps //查看

以上便搭建好了spark结合hadoop的分布式集群,spark也有自己的web界面,可以浏览器访问192.168.222.226:8080来查看(对应自己IP地址)

zookeeper分布式

这里使用的二进制包

在master机器上执行

1
2
3
4
5
6
7
8
9
10
11
12
# tar xf zookeeper*
# mv zookeeper* /opt/zookeeper
# mv /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
# vim /opt/zookeeper/conf/zoo.cfg
修改
dataDir=/opt/data/zookeeper
添加
dataLogDir=/opt/data/zookeeper/logs
server.1=master:2888:3888
server.2=node1:2888:3888
server.3=node2:2888:3888
# 这里对应自己的主机名

在各机器上执行

1
2
# mkdir -p /opt/data/zookeeper/logs
# echo 1 > /opt/data/zookeeper/myid #这里master对应上面的server.1 便要echo1,node1对应server.2便要echo 2,node3对应server.3便要echo 3

在master机器上执行

1
2
3
4
5
# vim /etc/profile
export ZOOKEEPER_HOME=/opt/zookeeper
export PATH=${ZOOKEEPER_HOME}/bin:$PATH
# for i in node{1..2};do rsync -av /opt/zookeeper root@$i:/opt/;done
# for i in node{1..2};do rsync -av /etc/profile /etc/;done

在各机器上执行

1
2
3
# source /etc/profile
# zkServer.sh start #这个命令最好使用多命令一起执行,就是多个机器的执行时间要差不多一直,因为zookeeper对时间的要求性很高和各种问题
# zkServer.sh status # 可以在各节点查看自己的角色是leader还是follower

hive

Mariadb

这里为了方便直接安装mariadb作为MySQL使用,CentOS7.x和CentOS6.x使用方法不同(为了朋友写了CentOS6的,泪目了),使用前提网络要能访问外网

CentOS 7.x

1
2
3
4
5
6
[root@master ~]# yum install -y mariadb mariadb-server
[root@master ~]# systemctl enable mariadb && systemctl start mariadb
[root@master ~]# mysqladmin password abcd1234
[root@master ~]# mysql -uroot -pabcd1234 -e "create user 'root'@'%' identified by 'abcd1234';" -e "grant all privileges on *.* to 'root'@'%';"
[root@master ~]# mysql_secure_installation
按顺序输入abcd1234,n,y,n,y,y

CentOS 6.x

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@master ~]# mkdir /etc/yum.repos.d/bak
[root@master ~]# mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/bak/
[root@master ~]# wget -O /etc/yum.repos.d/CentOS-Base.repo https://gcore.jsdelivr.net/gh/linjiangyu2/halo@master/repo/CentOS-Base.repo && yum install -y epel-release
[root@master ~]# vim /etc/yum.repos.d/mariadb.repo
[mariadb]
name=MariaDB
baseurl=https://mirrors.aliyun.com/mariadb/yum/10.4/centos6-amd64
enabled=1
gpgkey=https://mirrors.aliyun.com/mariadb/yum/RPM-GPG-KEY-MariaDB
gpgcheck=1
[root@master ~]# yum install -y mysql mysql-devel mysql-server
[root@master ~]# service mysql start && chkconfig --add mysql && chkconfig mysql on
[root@master ~]# mysqladmin password abcd1234
[root@master ~]# mysql -uroot -pabcd1234 -e "create user 'root'@'%' identified by 'abcd1234';" -e "grant all privileges on *.* to 'root'@'%';"
[root@master ~]# mysql_secure_installation
按顺序输入abcd1234,n,y,n,y,y
这里使用的二进制包

把二进制包上传到master机器的opt目录下

hive配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@master ~]# cd /opt
[root@master opt]# tar xf apache-hive-3.1.2-bin.tar.gz
[root@master opt]# mv apache-hive-3.1.2-bin hive
[root@master opt]# cd hive/conf
[root@master conf]# cp -a hive-env.sh.template hive-env.sh
[root@master conf]# vim hive-env.sh
在最前面添加,这里对应好自己的目录
export JAVA_HOME=/usr/local/jdk
export HADOOP_HOME=/opt/hadoop285
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_HEAPSIZE=1024
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HIVE_AUX_JARS_PATH=${HIVE_HOME}/lib
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
[root@master conf]# vim hive-site.xml	// 以下对应注释更改自己的配置
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<!--这里对应填入自己主节点机器在hosts文件解析的域名,我是master,运行错误的话应该是你哪里的设置有问题就换成IP地址,后面对应的也就都换成IP地址-->
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<!--MySQL登陆用户-->
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<!--用户密码-->
<value>abcd1234</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<!--这里对应解析第二台的hosts文件解析的域名-->
<value>node1</value>
</property>
<property>
<name>hive.server3.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server3.thrift.bind.host</name>
<!--这里对应解析第三台的hosts文件解析的域名-->
<value>node2</value>
</property>
</configuration>
1
2
3
4
5
6
7
[root@master conf]# cp hive-log4j2.properties.template hive-log4j2.properties
[root@master conf]# vim hive-log4j2.properties
把INFO全部更改为ERROR
[root@master conf]# vim /etc/profile
export HIVE_HOME=/opt/hive
export PATH=${HIVE_HOME}/bin:$PATH
[root@master conf]# source /etc/profile

上传连接MySQL需要的jar包

mysql-connector-java-8.0.17.jar
1
2
3
4
5
[root@master ~]# mv  mysql-connector-java-8.0.17.jar /opt/hive/lib/
[root@master ~]# cd /opt/hive/bin
[root@master bin]# ./schematool -initSchema -dbType mysql // 初始化
[root@master ~]# mysql -uroot -pabcd1234
mysql> show tables from hive; // 有数据则初始化成功

连接操作测试

hive的启动需要先启动hadoop和spark服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@master]# start-all.sh && spark-start.sh
# 把服务放在不同节点测试连接数据库操作
[root@master]# scp -r /opt/hive root@node1:/opt/
[root@master]# scp -r /opt/hive root@node2:/opt/
[root@master]# scp /etc/profile root@node1:/etc/
[root@master]# scp /etc/profile root@node2:/etc/
# 然后在各节点上使用命令
# source /etc/profile
# 回到master机器操作
[root@master]# hiveserver2
# 重开终端开启一个可被另外节点连接的服务终端
[root@master ~]# hive --service metastore
# 这里使用node1来连接,可能要等待久点才能起10000端口
[root@node1]# beeline # 依次自己尝试
beeline> !connect jdbc:hive2://master:10000
Connecting to jdbc:hive2://master:10000
Enter username for jdbc:hive2://master:10000: root
Enter password for jdbc:hive2://master:10000: ***
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://master:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (1.442 seconds)

表创建测试

在master机器上准备一下用到的txt文件,上传到hdfs文件系统

1
2
3
4
5
6
[master@root ~]# vim t.txt
1,linjiangyu,20
2,lintian,20
3,k,20
[master@root ~]# hdfs dfs -mkdir /t
[master@root ~]# hdfs dfs -put ./t.txt /t/

回到node1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
0: jdbc:hive2://master:10000> create database k ;
No rows affected (0.267 seconds)

0: jdbc:hive2://master:10000> use k;
No rows affected (0.078 seconds)

0: jdbc:hive2://master:10000> create table k_user(kid int,kname string,kage int) row format delimited fields terminated by ',' location '/t';
No rows affected (0.558 seconds)

0: jdbc:hive2://master:10000> show tables;
+-----------+
| tab_name |
+-----------+
| k_user |
+-----------+
1 row selected (0.114 seconds)

0: jdbc:hive2://master:10000> select * from k_user;
+-------------+---------------+--------------+
| k_user.kid | k_user.kname | k_user.kage |
+-------------+---------------+--------------+
| 1 | linjiangyu | 20 |
| 2 | lintian | 20 |
| 3 | k | 20 |
+-------------+---------------+--------------+
3 rows selected (3.141 seconds)

flume

貌似是日志收集应用

  • apache-flume-bin.tar.gz下载并上传到系统中

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    # tar xf apache-flume-1.11.0-bin.tar.gz && rm -f apache-flume-1.11.0-bin.tar.gz

    # mv apache-flume* /usr/local/flume

    # vim /etc/profile
    export FLUME_HOME=/usr/local/flume
    export PATH=${FLUME_HOME}/bin:$PATH

    # source /etc/profile

    # cd /usr/local/flume/conf/
    # cp flume-env.sh.template flume-env.sh
    # vim flume-env.sh
    // 在最上面添加
    export JAVA_HOME=/usr/local/jdk

    # vim netcat-logger.conf
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = 0.0.0.0
    a1.sources.r1.port = 44444
    a1.sinks.k1.type = logger
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    # flume-ng agent -n a1 -c ./ -f ./netcat-logger.conf -Dflume.root.logger=INFO,console // 启动服务

    # yum install -y telnet
    # telnet 127.0.0.1 44444
    // 随便写点东西回车会有OK出现就行了

sqoop

  1. 下载包sqoop放到/opt下

  2. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    # tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
    # mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop
    # cd sqoop
    # cp conf/sqoop-env-template.sh conf/sqoop-env.sh
    export HADOOP_COMMON_HOME=/opt/hadoop285

    #Set path to where hadoop-*-core.jar is available
    export HADOOP_MAPRED_HOME=/opt/hadoop285

    #set the path to where bin/hbase is available
    #export HBASE_HOME=

    #Set the path to where bin/hive is available
    export HIVE_HOME=/opt/hive

    #Set the path for where zookeper config dir is
    export ZOOCFGDIR=/opt/zookeeper/conf

    # vim /etc/profile
    export SQOOP_HOME=/opt/sqoop
    export CLASSPATH=.:${JAVA_HOME}/lib:${SQOOP_HOME}/lib
    export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HIVE_HOME}/bin:${SQOOP_HOME}/bin:$PATH
    # source /etc/profile
    # cp /opt/hive/lib/mysql-connector-java-8.0.17.jar /opt/sqoop/lib/
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    # vim bin/configure-sqoop	把文件全部替换为以下
    #!/bin/bash
    #
    # Copyright 2011 The Apache Software Foundation
    #
    # Licensed to the Apache Software Foundation (ASF) under one
    # or more contributor license agreements. See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership. The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License. You may obtain a copy of the License at
    #
    # http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.

    # This is sourced in by bin/sqoop to set environment variables prior to
    # invoking Hadoop.

    bin="$1"

    if [ -z "${bin}" ]; then
    bin=`dirname $0`
    bin=`cd ${bin} && pwd`
    fi

    if [ -z "$SQOOP_HOME" ]; then
    export SQOOP_HOME=${bin}/..
    fi

    SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}

    if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then
    . "${SQOOP_CONF_DIR}/sqoop-env.sh"
    fi

    # Find paths to our dependency systems. If they are unset, use CDH defaults.

    if [ -z "${HADOOP_COMMON_HOME}" ]; then
    if [ -n "${HADOOP_HOME}" ]; then
    HADOOP_COMMON_HOME=${HADOOP_HOME}
    else
    if [ -d "/usr/lib/hadoop" ]; then
    HADOOP_COMMON_HOME=/usr/lib/hadoop
    else
    HADOOP_COMMON_HOME=${SQOOP_HOME}/../hadoop
    fi
    fi
    fi

    if [ -z "${HADOOP_MAPRED_HOME}" ]; then
    HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
    if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
    if [ -n "${HADOOP_HOME}" ]; then
    HADOOP_MAPRED_HOME=${HADOOP_HOME}
    else
    HADOOP_MAPRED_HOME=${SQOOP_HOME}/../hadoop-mapreduce
    fi
    fi
    fi

    # We are setting HADOOP_HOME to HADOOP_COMMON_HOME if it is not set
    # so that hcat script works correctly on BigTop
    if [ -z "${HADOOP_HOME}" ]; then
    if [ -n "${HADOOP_COMMON_HOME}" ]; then
    HADOOP_HOME=${HADOOP_COMMON_HOME}
    export HADOOP_HOME
    fi
    fi

    if [ -z "${HBASE_HOME}" ]; then
    if [ -d "/usr/lib/hbase" ]; then
    HBASE_HOME=/usr/lib/hbase
    else
    HBASE_HOME=${SQOOP_HOME}/../hbase
    fi
    fi
    if [ -z "${HCAT_HOME}" ]; then
    if [ -d "/usr/lib/hive-hcatalog" ]; then
    HCAT_HOME=/usr/lib/hive-hcatalog
    elif [ -d "/usr/lib/hcatalog" ]; then
    HCAT_HOME=/usr/lib/hcatalog
    else
    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
    if [ ! -d ${HCAT_HOME} ]; then
    HCAT_HOME=${SQOOP_HOME}/../hcatalog
    fi
    fi
    fi
    if [ -z "${ACCUMULO_HOME}" ]; then
    if [ -d "/usr/lib/accumulo" ]; then
    ACCUMULO_HOME=/usr/lib/accumulo
    else
    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
    fi
    fi
    if [ -z "${ZOOKEEPER_HOME}" ]; then
    if [ -d "/usr/lib/zookeeper" ]; then
    ZOOKEEPER_HOME=/usr/lib/zookeeper
    else
    ZOOKEEPER_HOME=${SQOOP_HOME}/../zookeeper
    fi
    fi
    if [ -z "${HIVE_HOME}" ]; then
    if [ -d "/usr/lib/hive" ]; then
    export HIVE_HOME=/usr/lib/hive
    elif [ -d ${SQOOP_HOME}/../hive ]; then
    export HIVE_HOME=${SQOOP_HOME}/../hive
    fi
    fi

    # Check: If we can't find our dependencies, give up here.
    if [ ! -d "${HADOOP_COMMON_HOME}" ]; then
    echo "Error: $HADOOP_COMMON_HOME does not exist!"
    echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.'
    exit 1
    fi
    if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
    echo "Error: $HADOOP_MAPRED_HOME does not exist!"
    echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.'
    exit 1
    fi

    ## Moved to be a runtime check in sqoop.
    #if [ ! -d "${HBASE_HOME}" ]; then
    # echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
    # echo 'Please set $HBASE_HOME to the root of your HBase installation.'
    #fi
    #
    ### Moved to be a runtime check in sqoop.
    #if [ ! -d "${HCAT_HOME}" ]; then
    # echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
    # echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
    #fi
    #
    #if [ ! -d "${ACCUMULO_HOME}" ]; then
    # echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
    # echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
    #fi
    #if [ ! -d "${ZOOKEEPER_HOME}" ]; then
    # echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
    # echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
    #fi

    # Where to find the main Sqoop jar
    SQOOP_JAR_DIR=$SQOOP_HOME

    # If there's a "build" subdir, override with this, so we use
    # the newly-compiled copy.
    if [ -d "$SQOOP_JAR_DIR/build" ]; then
    SQOOP_JAR_DIR="${SQOOP_JAR_DIR}/build"
    fi

    function add_to_classpath() {
    dir=$1
    for f in $dir/*.jar; do
    SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
    done

    export SQOOP_CLASSPATH
    }

    # Add sqoop dependencies to classpath.
    SQOOP_CLASSPATH=""
    if [ -d "$SQOOP_HOME/lib" ]; then
    add_to_classpath $SQOOP_HOME/lib
    fi

    # Add HBase to dependency list
    #if [ -e "$HBASE_HOME/bin/hbase" ]; then
    # TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`$HBASE_HOME/bin/hbase classpath`
    # SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
    #fi
    #
    ## Add HCatalog to dependency list
    #if [ -e "${HCAT_HOME}/bin/hcat" ]; then
    # TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`
    # if [ -z "${HIVE_CONF_DIR}" ]; then
    # TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
    # fi
    # SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
    #fi

    # Add Accumulo to dependency list
    if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
    for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do
    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
    done
    for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do
    SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
    done
    fi

    ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}
    if [ -d "${ZOOCFGDIR}" ]; then
    SQOOP_CLASSPATH=$ZOOCFGDIR:$SQOOP_CLASSPATH
    fi

    SQOOP_CLASSPATH=${SQOOP_CONF_DIR}:${SQOOP_CLASSPATH}

    # If there's a build subdir, use Ivy-retrieved dependencies too.
    if [ -d "$SQOOP_HOME/build/ivy/lib/sqoop" ]; then
    for f in $SQOOP_HOME/build/ivy/lib/sqoop/*/*.jar; do
    SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
    done
    fi

    add_to_classpath ${SQOOP_JAR_DIR}

    HADOOP_CLASSPATH="${SQOOP_CLASSPATH}:${HADOOP_CLASSPATH}"
    if [ ! -z "$SQOOP_USER_CLASSPATH" ]; then
    # User has elements to prepend to the classpath, forcibly overriding
    # Sqoop's own lib directories.
    export HADOOP_CLASSPATH="${SQOOP_USER_CLASSPATH}:${HADOOP_CLASSPATH}"
    fi

    export SQOOP_CLASSPATH
    export SQOOP_CONF_DIR
    export SQOOP_JAR_DIR
    export HADOOP_CLASSPATH
    export HADOOP_COMMON_HOME
    export HADOOP_MAPRED_HOME
    export HBASE_HOME
    export HCAT_HOME
    export HIVE_CONF_DIR
    export ACCUMULO_HOME
    export ZOOKEEPER_HOME
    1
    2
    3
    4
    5
    # mysql -uroot -p123
    mysql> create user 'root'@'127.0.0.1' identified by '123';
    mysql> grant all privileges on *.* to 'root'@'127.0.0.1';
    mysql> flush privileges;
    mysql> exit
  3. 测试连接

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    # sqoop list-databases -connect jdbc:mysql://localhost:3306/ --username root --password 123
    输出
    23/05/20 23:14:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    23/05/20 23:14:23 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    23/05/20 23:14:23 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
    mysql
    information_schema
    performance_schema
    sys
    hive
    rsyslog
    Syslo