欧美亚洲中文,在线国自产视频,欧洲一区在线观看视频,亚洲综合中文字幕在线观看

      1. <dfn id="rfwes"></dfn>
          <object id="rfwes"></object>
        1. 站長資訊網(wǎng)
          最全最豐富的資訊網(wǎng)站

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          Hadoop在整個大數(shù)據(jù)技術(shù)體系中占有至關(guān)重要的地位,是大數(shù)據(jù)技術(shù)的基礎(chǔ)和敲門磚,對Hadoop基礎(chǔ)知識的掌握程度會在一定程度決定在大數(shù)據(jù)技術(shù)的道路上能走多遠(yuǎn)。

          最近想要學(xué)習(xí)Spark,首先需要搭建Spark的環(huán)境,Spark的依賴環(huán)境比較多,需要Java JDK、Hadoop的支持。我們就分步驟依次介紹各個依賴的安裝和配置。新安裝了一個Linux Ubuntu 18.04系統(tǒng),想在此系統(tǒng)上進(jìn)行環(huán)境搭建,詳細(xì)記錄一下過程。

          訪問Spark的官網(wǎng),閱讀Spark的安裝過程,發(fā)現(xiàn)Spark需要使用到hadoop,Java JDK等,當(dāng)然官網(wǎng)也提供了Hadoop free的版本。本文還是從安裝Java JDK開始,逐步完成Spark的單機安裝。

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          1、Java JDK8的安裝

          前往Oracle官網(wǎng)下載JDK8,選擇適合自己操作系統(tǒng)的版本,此處選擇Linux 64

          https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          下載之后的包放到某個目錄下,此處放在/opt/java目錄

          linuxidc@linuxidc:~/www.linuxidc.com$ sudo cp /home/linuxidc/www.linuxidc.com/jdk-8u231-linux-x64.tar.gz /opt/java/
          [sudo] linuxidc 的密碼:
          linuxidc@linuxidc:~/www.linuxidc.com$ cd /opt/java/
          linuxidc@linuxidc:/opt/java$ ls
          jdk-8u231-linux-x64.tar.gz

          使用命令:tar -zxvf jdk-8u231-linux-x64.tar.gz 解壓縮

          linuxidc@linuxidc:/opt/java$ sudo tar -zxf jdk-8u231-linux-x64.tar.gz
          linuxidc@linuxidc:/opt/java$ ls
          jdk1.8.0_231  jdk-8u231-linux-x64.tar.gz

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          修改配置文件/etc/profile,使用命令:sudo nano /etc/profile

          linuxidc@linuxidc:/opt/java$ sudo nano /etc/profile

          在文件末尾增加以下內(nèi)容(具體路徑依據(jù)環(huán)境而定):

          export JAVA_HOME=/opt/java/jdk1.8.0_231
          export JRE_HOME=/opt/java/jdk1.8.0_231/jre
          export PATH=${JAVA_HOME}/bin:$PATH

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          保存退出,在終端界面使用命令: source /etc/profile 使配置文件生效。

          linuxidc@linuxidc:/opt/java$ source /etc/profile

          使用java -version驗證安裝是否成功,以下回顯表明安裝成功了。

          linuxidc@linuxidc:/opt/java$ java -version
          java version “1.8.0_231”
          Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
          Java HotSpot(TM) 64-Bit Server VM (build 25.231-b11, mixed mode)
          linuxidc@linuxidc:/opt/java$

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          2、安裝Hadoop

          前往官網(wǎng)https://hadoop.apache.org/releases.html下載hadoop,此處選擇版本2.7.7

          http://www.apache.org/dist/hadoop/core/hadoop-2.7.7/hadoop-2.7.7.tar.gz

          hadoop需要ssh免密登陸等功能,因此先安裝ssh。

          使用命令:

          linuxidc@linuxidc:~/www.linuxidc.com$ sudo apt-get install ssh

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          linuxidc@linuxidc:~/www.linuxidc.com$ sudo apt-get install rsync

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          將下載的包放到某個目錄下,此處放在/opt/hadoop

          linuxidc@linuxidc:~/www.linuxidc.com$ sudo cp /home/linuxidc/www.linuxidc.com/hadoop-2.7.7.tar.gz /opt/hadoop/

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          使用命令:tar -zxvf hadoop-2.7.7.tar.gz 進(jìn)行解壓縮

          此處選擇偽分布式的安裝方式(Pseudo-Distributed)

          修改解壓后的目錄下的子目錄文件 etc/hadoop/hadoop-env.sh,將JAVA_HOME路徑修改為本機JAVA_HOME的路徑,如下圖:

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          配置Hadoop的環(huán)境變量

          使用命令:

          linuxidc@linuxidc:/opt/hadoop/hadoop-2.7.7/etc/hadoop$ sudo nano /etc/profile

          添加以下內(nèi)容:

          export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7

          修改PATH變量,添加hadoop的bin目錄進(jìn)去

          export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          修改解壓后的目錄下的子目錄文件 etc/hadoop/core-site.xml

          linuxidc@linuxidc:/opt/hadoop/hadoop-2.7.7/etc/hadoop$ sudo nano core-site.xml

          <configuration>
              <property>
                  <name>fs.defaultFS</name>
                  <value>hdfs://localhost:9000</value>
              </property>
          </configuration>

          如下圖:

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          修改解壓后的目錄下的子目錄文件 etc/hadoop/hdfs-site.xml

          linuxidc@linuxidc:/opt/hadoop/hadoop-2.7.7/etc/hadoop$ sudo nano hdfs-site.xml

          <configuration>
              <property>
                  <name>dfs.replication</name>
                  <value>1</value>
              </property>
          </configuration>

          如下圖:

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          設(shè)置免密登陸

          linuxidc@linuxidc:~/www.linuxidc.com$ ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
          Generating public/private rsa key pair.
          Your identification has been saved in /home/linuxidc/.ssh/id_rsa.
          Your public key has been saved in /home/linuxidc/.ssh/id_rsa.pub.
          The key fingerprint is:
          SHA256:zY+ELQc3sPXwTBRfKlTwntek6TWVsuQziHtu3N/6L5w linuxidc@linuxidc
          The key’s randomart image is:
          +—[RSA 2048]—-+
          |        . o.*+. .|
          |        + B o o.|
          |        o o =o+.o|
          |        B..+oo=o|
          |        S.*. ==.+|
          |        +.o .oo.|
          |        .o.o… |
          |          oo .E .|
          |          ..  o==|
          +—-[SHA256]—–+
          linuxidc@linuxidc:~/www.linuxidc.com$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
          linuxidc@linuxidc:~/www.linuxidc.com$ chmod 0600 ~/.ssh/authorized_keys

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          使用命令:ssh localhost 驗證是否成功,如果不需要輸入密碼即可登陸說明成功了。

          linuxidc@linuxidc:~/www.linuxidc.com$ ssh localhost
          Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.4.0-999-generic x86_64)

           * Documentation:  https://help.ubuntu.com
           * Management:    https://landscape.canonical.com
           * Support:        https://ubuntu.com/advantage

           * Canonical Livepatch is available for installation.
            – Reduce system reboots and improve kernel security. Activate at:
              https://ubuntu.com/livepatch

          188 個可升級軟件包。
          0 個安全更新。

          Your Hardware Enablement Stack (HWE) is supported until April 2023.
          Last login: Sat Nov 30 23:25:35 2019 from 127.0.0.1

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          接下來需要驗證Hadoop的安裝

          a、格式化文件系統(tǒng)

          linuxidc@linuxidc:/opt/hadoop/hadoop-2.7.7$ bin/hdfs namenode -format
          19/11/30 23:29:06 INFO namenode.NameNode: STARTUP_MSG:
          /************************************************************
          STARTUP_MSG: Starting NameNode
          STARTUP_MSG:  host = linuxidc/127.0.1.1
          STARTUP_MSG:  args = [-format]
          STARTUP_MSG:  version = 2.7.7
          ……

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          b、啟動Namenode和Datanode

          linuxidc@linuxidc:/opt/hadoop/hadoop-2.7.7$ sbin/start-dfs.sh
          Starting namenodes on [localhost]
          localhost: starting namenode, logging to /opt/hadoop/hadoop-2.7.7/logs/hadoop-linuxidc-namenode-linuxidc.out
          localhost: starting datanode, logging to /opt/hadoop/hadoop-2.7.7/logs/hadoop-linuxidc-datanode-linuxidc.out
          Starting secondary namenodes [0.0.0.0]
          The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t be established.
          ECDSA key fingerprint is SHA256:OSXsQK3E9ReBQ8c5to2wvpcS6UGrP8tQki0IInUXcG0.
          Are you sure you want to continue connecting (yes/no)? yes
          0.0.0.0: Warning: Permanently added ‘0.0.0.0’ (ECDSA) to the list of known hosts.
          0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.7.7/logs/hadoop-linuxidc-secondarynamenode-linuxidc.out

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          c、瀏覽器訪問http://localhost:50070

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          3、Scala安裝:

          下載地址:https://www.scala-lang.org/download/2.11.8.html

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          下載好后解壓到:/opt/scala

          linuxidc@linuxidc:~/下載$ sudo tar zxf scala-2.11.8.tgz -C /opt/scala
          [sudo] linuxidc 的密碼:
          linuxidc@linuxidc:~/下載$ cd /opt/scala
          linuxidc@linuxidc:/opt/scala$ ls
          scala-2.11.8

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          配置環(huán)境變量:

          linuxidc@linuxidc:/opt/scala$ sudo nano /etc/profile

          添加:

          export SCALA_HOME=/opt/scala/scala-2.11.8

           Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          source /etc/profile

          4、安裝spark

          前往spark官網(wǎng)下載spark

          https://spark.apache.org/downloads.html

          此處選擇版本如下:

          spark-2.4.4-bin-hadoop2.7

          將spark放到某個目錄下,此處放在/opt/spark

          使用命令:tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz 解壓縮即可

          linuxidc@linuxidc:~/www.linuxidc.com$ sudo cp /home/linuxidc/www.linuxidc.com/spark-2.4.4-bin-hadoop2.7.tgz /opt/spark/
          [sudo] linuxidc 的密碼:
          linuxidc@linuxidc:~/www.linuxidc.com$ cd /opt/spark/
          linuxidc@linuxidc:/opt/spark$ ls
          spark-2.4.4-bin-hadoop2.7.tgz

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          linuxidc@linuxidc:/opt/spark$ sudo tar -zxf spark-2.4.4-bin-hadoop2.7.tgz
          [sudo] linuxidc 的密碼:
          linuxidc@linuxidc:/opt/spark$ ls
          spark-2.4.4-bin-hadoop2.7  spark-2.4.4-bin-hadoop2.7.tgz

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          使用命令: ./bin/run-example SparkPi 10 測試spark的安裝

          配置環(huán)境變量SPARK_HOME

          linuxidc@linuxidc:/opt/spark/spark-2.4.4-bin-hadoop2.7$ sudo nano /etc/profile

          export SPARK_HOME=/opt/spark/spark-2.4.4-bin-hadoop2.7
          export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SPARK_HOME}/bin:$PATH

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          source /etc/profile

          配置配置spark-env.sh

          進(jìn)入到spark/conf/

          sudo cp /opt/spark/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /opt/spark/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh

          linuxidc@linuxidc:/opt/spark/spark-2.4.4-bin-hadoop2.7/conf$ sudo nano spark-env.sh

          export JAVA_HOME=/opt/java/jdk1.8.0_231
          export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
          export HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.7.7/etc/hadoop
          export SPARK_HOME=/opt/spark/spark-2.4.4-bin-hadoop2.7
          export SCALA_HOME=/opt/scala/scala-2.11.8
          export SPARK_MASTER_IP=127.0.0.1
          export SPARK_MASTER_PORT=7077
          export SPARK_MASTER_WEBUI_PORT=8099
          export SPARK_WORKER_CORES=3
          export SPARK_WORKER_INSTANCES=1
          export SPARK_WORKER_MEMORY=5G
          export SPARK_WORKER_WEBUI_PORT=8081
          export SPARK_EXECUTOR_CORES=1
          export SPARK_EXECUTOR_MEMORY=1G
          export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$HADOOP_HOME/lib/native

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          Java,Hadoop等具體路徑根據(jù)自己實際環(huán)境設(shè)置。

          啟動bin目錄下的spark-shell

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          可以看到已經(jīng)進(jìn)入到scala環(huán)境,此時就可以編寫代碼啦。

          spark-shell的web界面http://127.0.0.1:4040

          Ubuntu 18.04下搭建單機Hadoop和Spark集群環(huán)境

          暫時先這樣,如有什么疑問,請在Linux公社下面的評論欄里提出。

          贊(0)
          分享到: 更多 (0)
          網(wǎng)站地圖   滬ICP備18035694號-2    滬公網(wǎng)安備31011702889846號