Cloudera CCA175 시험

CCA Spark and Hadoop Developer Exam 온라인 연습

최종 업데이트 시간: 2025년07월02일

당신은 온라인 연습 문제를 통해 Cloudera CCA175 시험지식에 대해 자신이 어떻게 알고 있는지 파악한 후 시험 참가 신청 여부를 결정할 수 있다.

시험을 100% 합격하고 시험 준비 시간을 35% 절약하기를 바라며 CCA175 덤프 (최신 실제 시험 문제)를 사용 선택하여 현재 최신 96개의 시험 문제와 답을 포함하십시오.

/ 2

Question No : 1

Problem Scenario 30: You have been given three csv files in hdfs as below.
EmployeeName.csv with the field (id, name)
EmployeeManager.csv (id, manager Name)
EmployeeSalary.csv (id, Salary)
Using Spark and its API you have to generate a joined output as below and save as a text tile (Separated by comma) for final distribution and output must be sorted by id.
ld, name, salary, managerName
EmployeeManager.csv
E01, Vishnu
E02, Satyam
E03, Shiv
E04, Sundar
E05, John
E06, Pallavi
E07, Tanvir
E08, Shekhar
E09, Vinod
E10, Jitendra
EmployeeName.csv
E01, Lokesh
E02, Bhupesh
E03, Amit
E04, Ratan
E05, Dinesh
E06, Pavan
E07, Tejas
E08, Sheela
E09, Kumar
E10, Venkat
EmployeeSalary.csv
E01, 50000
E02, 50000
E03, 45000
E04, 45000
E05, 50000
E06, 45000
E07, 50000
E08, 10000
E09, 10000
E10, 10000

정답: Solution:
Step 1: Create all three files in hdfs in directory called sparkl (We will do using Hue}. However, you can first create in local filesystem and then
Step 2: Load EmployeeManager.csv file from hdfs and create PairRDDs
val manager = sc.textFile("spark1/EmployeeManager.csv")
val managerPairRDD = manager.map(x=> (x.split(", ")(0), x.split(", ")(1)))
Step 3: Load EmployeeName.csv file from hdfs and create PairRDDs
val name = sc.textFile("spark1/EmployeeName.csv")
val namePairRDD = name.map(x=> (x.split(", ")(0), x.split('\")(1)))
Step 4: Load EmployeeSalary.csv file from hdfs and create PairRDDs
val salary = sc.textFile("spark1/EmployeeSalary.csv")
val salaryPairRDD = salary.map(x=> (x.split(", ")(0), x.split(", ")(1)))
Step 4: Join all pairRDDS
val joined = namePairRDD.join(salaryPairRDD}.join(managerPairRDD}
Step 5: Now sort the joined results, val joinedData = joined.sortByKey()
Step 6: Now generate comma separated data.
val finalData = joinedData.map(v=> (v._1, v._2._1._1, v._2._1._2, v._2._2))
Step 7: Save this output in hdfs as text file.
finalData.saveAsTextFile("spark1/result.txt")

Question No : 2

Create a file in local filesystem named file1.txt and put it to hdfs

정답: Solution:
Step 1: Create directory
hdfs dfs -mkdir hdfs_commands
Step 2: Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txt
Step 3: Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.
hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExam
Step 4: Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.
touch data_local.txt
hdfs dfs -moveFromLocal /home/cloudera/Desktop/HadoopExam/dataJocal.txt hdfs_commands/
Step 5: Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.
hdfs dfs -touchz hdfscommands/data hdfs.txt
hdfs dfs -getfrdfs_commands/data_hdfs.txt /home/cloudera/Desktop/HadoopExam/
Step 6: Create a file in local filesystem named filel .txt and put it to hdfs
touch filel.txt
hdfs dfs -put/home/cloudera/Desktop/HadoopExam/file1.txt hdfs_commands/

Question No : 3

Data should be written as text to hdfs

정답: Solution:
Step 1: Create directory mkdir /tmp/spooldir2
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume8.conf.
agent1 .sources = source1
agent1.sinks = sink1a sink1b agent1.channels = channel1a channel1b
agent1.sources.source1.channels = channel1a channel1b
agent1.sources.source1.selector.type = replicating
agent1.sources.source1.selector.optional = channel1b
agent1.sinks.sink1a.channel = channel1a
agent1 .sinks.sink1b.channel = channel1b
agent1.sources.source1.type = spooldir
agent1 .sources.sourcel.spoolDir = /tmp/spooldir2
agent1.sinks.sink1a.type = hdfs
agent1 .sinks, sink1a.hdfs. path = /tmp/flume/primary
agent1 .sinks.sink1a.hdfs.tilePrefix = events
agent1 .sinks.sink1a.hdfs.fileSuffix = .log
agent1 .sinks.sink1a.hdfs.fileType = Data Stream
agent1 . sinks.sink1b.type = hdfs
agent1 . sinks.sink1b.hdfs.path = /tmp/flume/secondary
agent1 .sinks.sink1b.hdfs.filePrefix = events
agent1.sinks.sink1b.hdfs.fileSuffix = .log
agent1 .sinks.sink1b.hdfs.fileType = Data Stream
agent1.channels.channel1a.type = file
agent1.channels.channel1b.type = memory
step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume8.conf --name age
Step 5: Open another terminal and create a file in /tmp/spooldir2/
echo "IBM, 100, 20160104" » /tmp/spooldir2/.bb.txt
echo "IBM, 103, 20160105" » /tmp/spooldir2/.bb.txt mv /tmp/spooldir2/.bb.txt /tmp/spooldir2/bb.txt
After few mins
echo "IBM.100.2, 20160104" »/tmp/spooldir2/.dr.txt
echo "IBM, 103.1, 20160105" » /tmp/spooldir2/.dr.txt mv /tmp/spooldir2/.dr.txt /tmp/spooldir2/dr.txt

Question No : 4

Data should be written as text to hdfs

정답: Solution:
Step 1: Create directory mkdir /tmp/spooldir/bb mkdir /tmp/spooldir/dr
Step 2: Create flume configuration file, with below configuration for
agent1.sources = source1 source2
agent1 .sinks = sink1
agent1.channels = channel1
agent1 .sources.source1.channels = channel1
agentl .sources.source2.channels = channell agent1 .sinks.sinkl.channel = channell
agent1 . sources.source1.type = spooldir
agent1 .sources.sourcel.spoolDir = /tmp/spooldir/bb
agent1 . sources.source2.type = spooldir
agent1 .sources.source2.spoolDir = /tmp/spooldir/dr
agent1 . sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = /tmp/flume/finance
agent1-sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1 .sinks.sink1.hdfs.inUsePrefix = _
agent1 .sinks.sink1.hdfs.fileType = Data Stream
agent1.channels.channel1.type = file
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume7.conf --name agent1
Step 5: Open another terminal and create a file in /tmp/spooldir/
echo "IBM, 100, 20160104" » /tmp/spooldir/bb/.bb.txt
echo "IBM, 103, 20160105" » /tmp/spooldir/bb/.bb.txt mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt
After few mins
echo "IBM, 100.2, 20160104" » /tmp/spooldir/dr/.dr.txt
echo "IBM, 103.1, 20160105" »/tmp/spooldir/dr/.dr.txt mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt

Question No : 5

Data should be written as text to hdfs

정답: Solution:
Step 1: Create directory mkdir /tmp/nrtcontent
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume6.conf.
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
agent1 .sources.source1.channels = channel1
agent1 .sinks.sink1.channel = channel1
agent1 . sources.source1.type = spooldir
agent1 .sources.source1.spoolDir = /tmp/nrtcontent
agent1 .sinks.sink1 .type = hdfs
agent1 . sinks.sink1.hdfs .path = /tmp/flume
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1 .sinks.sink1.hdfs.inUsePrefix = _
agent1 .sinks.sink1.hdfs.fileType = Data Stream
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume6.conf --name agent1
Step 5: Open another terminal and create a file in /tmp/nrtcontent
echo "I am preparing for CCA175 from ABCTech m.com " > /tmp/nrtcontent/.he1.txt
mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt
After few mins
echo "I am preparing for CCA175 from TopTech .com " > /tmp/nrtcontent/.qt1.txt
mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt

Question No : 6

Problem Scenario 25: You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)
sex, name, city
1, alok, mumbai
1, jatin, chennai
1, yogesh, kolkata
2, ragini, delhi
2, jyotsana, pune
1, valmiki, banglore
Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1
(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.
Flumemaleemployee1: will contain only male employees data flumefemaleemployee1: Will contain only woman employees data

정답: Solution:
Step 1: Create hive table for flumemaleemployeel and .'
CREATE TABLE flumemaleemployeel
(
sex_type int, name string, city string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
CREATE TABLE flumefemaleemployeel
(
sex_type int, name string, city string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
Step 2: Create below directory and file mkdir /home/cloudera/flumetest/ cd /home/cloudera/flumetest/
Step 3: Create flume configuration file, with below configuration for source, sink and channel and save it in flume5.conf.
agent.sources = tailsrc
agent.channels = mem1 mem2
agent.sinks = stdl std2
agent.sources.tailsrc.type = exec
agent.sources.tailsrc.command = tail -F /home/cloudera/flumetest/in.txt
agent.sources.tailsrc.batchSize = 1
agent.sources.tailsrc.interceptors = i1 agent.sources.tailsrc.interceptors.i1.type = regex_extractor agent.sources.tailsrc.interceptors.il.regex = A(\\d} agent.sources.tailsrc. interceptors. M.serializers = t1 agent.sources.tailsrc. interceptors, i1.serializers.t1 . name = type
agent.sources.tailsrc.selector.type = multiplexing agent.sources.tailsrc.selector.header = type agent.sources.tailsrc.selector.mapping.1 = memi agent.sources.tailsrc.selector.mapping.2 = mem2
agent.sinks.std1.type = hdfs
agent.sinks.stdl.channel = mem1
agent.sinks.stdl.batchSize = 1
agent.sinks.std1 .hdfs.path = /user/hive/warehouse/flumemaleemployeei
agent.sinks.stdl.rolllnterval = 0
agent.sinks.stdl.hdfs.tileType = Data Stream
agent.sinks.std2.type = hdfs
agent.sinks.std2.channel = mem2
agent.sinks.std2.batchSize = 1
agent.sinks.std2 .hdfs.path = /user/hi ve/warehouse/fIumefemaleemployee1
agent.sinks.std2.rolllnterval = 0
agent.sinks.std2.hdfs.tileType = Data Stream
agent.channels.mem1.type = memory agent.channels.meml.capacity = 100
agent.channels.mem2.type = memory agent.channels.mem2.capacity = 100
agent.sources.tailsrc.channels = mem1 mem2
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/flume5.conf --name agent
Step 5: Open another terminal create a file at /home/cloudera/flumetest/in.txt.
Step 6: Enter below data in file and save it.
l.alok.mumbai
1 jatin.chennai
1, yogesh, kolkata
2, ragini, delhi
2, jyotsana, pune
1, valmiki, banglore
Step 7: Open hue and check the data is available in hive table or not.
Step 8: Stop flume service by pressing ctrl+c

Question No : 7

While importing, make sure only male employee data is stored.

정답: Step 1: Create hive table for flumeemployee.'
CREATE TABLE flumemaleemployee
(
name string,
salary int,
sex string,
age int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf.
#Define source, sink, channel and agent.
agent1 .sources = source1
agent1 .sinks = sink1
agent1 .channels = channel1
# Describe/configure source1
agent1 . sources.source1.type = netcat
agent1 .sources.source1.bind = 127.0.0.1
agent1.sources.sourcel.port = 44444
#Define interceptors
agent1.sources.source1.interceptors=il
agent1 .sources.source1.interceptors.i1.type=regex_filter
agent1 .sources.source1.interceptors.i1.regex=female
agent1 .sources.source1.interceptors.i1.excludeEvents=true
## Describe sink1
agent1 .sinks, sinkl.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agentl .sinks.sink1.hdfs.fileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channell.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1 .sources.source1.channels = channel1
agent1 .sinks.sink1.channel = channel1
step 3: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf --name agentl
Step 4: Open another terminal and use the netcat service, nc localhost 44444
Step 5: Enter data line by line.
alok, 100000, male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki.123000.male.29
Step 6: Open hue and check the data is available in hive table or not.
Step 7: Stop flume service by pressing ctrl+c
Step 8: Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;

Question No : 8

Problem Scenario 23: You have been given log generating service as below.
Start_logs (It will generate continuous logs)
Tail_logs (You can check, what logs are being generated)
Stop_logs (It will stop the log service)
Path where logs are generated using above service: /opt/gen_logs/logs/access.log
Now write a flume configuration file named flume3.conf, using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M
Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info.
And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.

정답: Solution:
Step 1: Create flume configuration file, with below configuration for source, sink and channel.
#Define source, sink, channel and agent,
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1 . sources.source1.type = exec
agentl.sources.source1.command = tail -F /opt/gen logs/logs/access.log
#Define interceptors
agent1 .sources.source1.interceptors=i1
agent1 .sources.source1.interceptors.i1.type=timestamp
agent1 .sources.source1.interceptors.i1.preserveExisting=true
## Describe sink1
agent1 .sinks.sink1.channel = memory-channel
agent1 . sinks.sink1.type = hdfs
agent1 . sinks.sink1.hdfs.path = flume3/%Y/%m/%d/%H/%M
agent1 .sinks.sjnkl.hdfs.fileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
Agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
Step 2: Run below command which will use this configuration file and append data in hdfs.
Start log service using: start_logs
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume3.conf -DfIume.root.logger=DEBUG, INFO, console Cname agent1
Wait for few mins and than stop log service.
stop logs

Question No : 9

Write a hive query to read average salary of all employees.

정답: Solution:
Step 1: Create hive table forflumeemployee.'
CREATE TABLE flumeemployee
(
name string, salary int, sex string,
age int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ', ';
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume2.conf.
#Define source, sink, channel and agent,
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1.sources.source1.type = netcat
agent1.sources.source1.bind = 127.0.0.1
agent1.sources.source1.port = 44444
## Describe sink1
agent1 .sinks.sink1.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 . sinks.sink1.hdfs.path = /user/hive/warehouse/flumeemployee
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agent1 .sinks.sink1.hdfs.tileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
Agent1 .sources.sourcel.channels = channell agent1 .sinks.sinkl.channel = channel1
Step 3: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume2.conf --name agent1
Step 4: Open another terminal and use the netcat service.
nc localhost 44444
Step 5: Enter data line by line.
alok, 100000.male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki, 123000, male, 29
Step 6: Open hue and check the data is available in hive table or not.
step 7: Stop flume service by pressing ctrl+c
Step 8: Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;

Question No : 10

Problem Scenario 21: You have been given log generating service as below.
startjogs (It will generate continuous logs)
tailjogs (You can check, what logs are being generated)
stopjogs (It will stop the log service)
Path where logs are generated using above service: /opt/gen_logs/logs/access.log
Now write a flume configuration file named flumel.conf, using that configuration file dumps logs in HDFS file system in a directory called flumel. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events

정답: Solution:
Step 1: Create flume configuration file, with below configuration for source, sink and channel.
#Define source, sink, channel and agent,
agent1. sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1 . sources.source1.type = exec
agent1.sources.source1.command = tail -F /opt/gen logs/logs/access.log
## Describe sinkl
agentl .sinks.sinkl.channel = memory-channel
agentl .sinks.sinkl .type = hdfs
agentl . sinks.sink1.hdfs.path = flumel
agentl .sinks.sinkl.hdfs.fileType = Data Stream
# Now we need to define channell property.
agent1.channels.channel1.type = memory
agent1.channels.channell.capacity = 1000
agent1.channels.channell.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
Step 2: Run below command which will use this configuration file and append data in hdfs.
Start log service using: startjogs
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flumel.conf-Dflume.root.logger=DEBUG, INFO, console
Wait for few mins and than stop log service.
Stop_logs

Question No : 11

Write a Sqoop Job which will import "retaildb.categories" table to hdfs, in a directory name "categories_targetJob".

정답: Solution:
Step 1: Connecting to existing MySQL Database mysql -user=retail_dba --password=cloudera retail_db
Step 2: Show all the available tables show tables;
Step 3: Below is the command to create Sqoop Job (Please note that - import space is mandatory)
sqoop job -create sqoopjob \ -- import \
-connect "jdbc:mysql://quickstart:3306/retail_db" \
-username=retail_dba \
-password=cloudera \
-table categories \
-target-dir categories_targetJob \
-fields-terminated-by '|' \
-lines-terminated-by '\n'
Step 4: List all the Sqoop Jobs sqoop job --list
Step 5: Show details of the Sqoop Job sqoop job --show sqoopjob
Step 6: Execute the sqoopjob sqoopjob --exec sqoopjob
Step 7: Check the output of import job
hdfs dfs -Is categories_target_job
hdfs dfs -cat categories_target_job/part*

Question No : 12

Import departments table from mysql to hdfs as parquet file in departments_parquet directory.

정답: Solution:
Step 1: Import departments table from mysql to hdfs as textfile
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
-password=cloudera \
-table departments \
-as-textfile \
-target-dir=departments_text
verify imported data
hdfs dfs -cat departments_text/part"
Step 2: Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
-password=cloudera \
--table departments \
-as-sequencetlle \
-~target-dir=departments sequence
verify imported data
hdfs dfs -cat departments_sequence/part*
Step 3: Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
--password=cloudera \
--table departments \
--as-avrodatafile \
--target-dir=departments_avro
verify imported data
hdfs dfs -cat departments avro/part*
Step 4: Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
--password=cloudera \
-table departments \
-as-parquetfile \
-target-dir=departments_parquet
verify imported data
hdfs dfs -cat departmentsparquet/part*

Question No : 13

Now export data from hive table departments_hive01 in departments_hive02. While exporting, please note following. wherever there is a empty string it should be loaded as a null value in mysql.
wherever there is -999 value for int field, it should be created as null value.

정답: Solution:
Step 1: Create table in mysql db as well.
mysql ~user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name varchar(45), avg_salary int);
show tables;
Step 2: Now export data from hive table to mysql table as per the requirement.
sqoop export --connect jdbc:mysql://quickstart:3306/retail_db \
-username retaildba \
-password cloudera \
--table departments_hive02 \
-export-dir /user/hive/warehouse/departments_hive01 \
-input-fields-terminated-by '\001' \
--input-Iines-terminated-by '\n' \
--num-mappers 1 \
-batch \
-Input-null-string "" \
-input-null-non-string -999
step 3: Now validate the data, select * from departments_hive02;

Question No : 14

Now import data from mysql table departments_hive01 to this hive table. Please make sure that data should be visible using below hive command. Also, while importing if null value found for department_name column replace it with "" (empty string) and for id column with -999 select * from departments_hive;

정답: Solution:
Step 1: Create hive table as below.
hive
show tables;
create table departments_hive01(department_id int, department_name string, avgsalary int);
Step 2: Create table in mysql db as well.
mysql -user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive01(id int, department_name varchar(45), avg_salary int);
show tables;
step 3: Insert data in mysql table.
insert into departments_hive01 select a.*, null from departments a;
check data inserts
select' from departments_hive01;
Now iserts null records as given in problem. insert into departments_hive01 values(777, "Not known", 1000); insert into departments_hive01 values(8888, null, 1000); insert into departments_hive01 values(666, null, 1100);
Step 4: Now import data in hive as per requirement.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
--password=cloudera \
-table departments_hive01 \
--hive-home /user/hive/warehouse \
--hive-import \
-hive-overwrite \
-hive-table departments_hive0l \
--fields-terminated-by '\001' \
--null-string M"\
--null-non-strlng -999 \
-split-by id \
-m 1
Step 5: Checkthe data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive01
hdfs dfs -cat/user/hive/warehouse/departments_hive01/part"
Check data in hive table.
Select * from departments_hive01;

Question No : 15

Now import data from mysql table departments to this hive table. Please make sure that data should be visible using below hive command, select" from departments_hive

정답: Solution:
Step 1: Create hive table as said.
hive
show tables;
create table departments_hive(department_id int, department_name string);
Step 2: The important here is, when we create a table without delimiter fields. Then default delimiter for hive is ^A (\001). Hence, while importing data we have to provide proper delimiter.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
-password=cloudera \
--table departments \
--hive-home /user/hive/warehouse \
-hive-import \
-hive-overwrite \
--hive-table departments_hive \
--fields-terminated-by '\001'
Step 3: Check-the data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive
hdfs dfs -cat/user/hive/warehouse/departmentshive/part'
Check data in hive table.
Select * from departments_hive;

/ 2

Cloudera