一个项目使用达梦数据库的一个表,有1400万数据,意欲导入oracle数据库.
采用windows客户端工具DBeaver直接导出了csv文件,这个DOS格式的导出文件,直接埋下了祸根,导致后续一直sqlldr导入报错,转换为unix格式后,一切变得简单.
根据达梦数据库的建表语句,编辑oracle建表语句如下:
DROP TABLE "ETPS"."SSXX2" PURGE; TRUNCATE TABLE "ETPS"."SSXX2"; CREATE TABLE "ETPS"."SSXX2" ( "UUID" VARCHAR2(100), "DJXH" VARCHAR2(100), "TYSHXYDM" VARCHAR2(20), "SSSERKHJ" NUMBER(22,6), "NSRZT" VARCHAR2(20), "ZZSE" NUMBER(22,6), "SSYF" VARCHAR2(6), "UPDATETIME" TIMESTAMP (6), "SS_TIMESTAMP" TIMESTAMP (6) ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING TABLESPACE "ETPS" ; ALTER TABLE ETPS.SSXX2 NOLOGGING; --ALTER TABLE ETPS.SSXX2 LOGGING;
创建sqlldr的控制文件如下:
cat > ssxx.ctl << EOF LOAD DATA INFILE 'SSXX_202007091958.csv' APPEND INTO TABLE ETPS.SSXX2 FIELDS TERMINATED BY ',' TRAILING NULLCOLS ( UUID, DJXH, TYSHXYDM, SSSERKHJ, NSRZT, ZZSE, SSYF, UPDATETIME TIMESTAMP 'YYYY-MM-DD HH24:MI:SS', SS_TIMESTAMP TIMESTAMP 'YYYY-MM-DD HH24:MI:SS' ) EOF
开始导入:
timestamp=date +%Y%m%d%H%M
sqlldr_ctl=ssxx.ctl
sqlldr_log=ssxx.log.${timestamp}
sqlldr system/password CONTROL=${sqlldr_ctl} LOG=${sqlldr_log} SILENT=HEADER DIRECT=TRUE PARALLEL=TRUE BINDSIZE=20971520 READSIZE=20971520
以上各个设置是经历很多次失败之后才成功的,在此再次吐槽下oracle,其实很多东西是可以简化处理的,非要搞得这么复杂!
其中一个重要的错误处理,是源导出csv文件由于是在windows客户端导出的,默认是dos格式的文本,在linux下必须转换为unix格式才可以!否则一定会遇见奇怪的格式转换问题,比如如下错误:
Record 1: Rejected - Error on table ETPS.SSXX2, column UPDATETIME. ORA-01841: (full) year must be between -4713 and +9999, and not be 0 Record 2: Rejected - Error on table ETPS.SSXX2, column SS_TIMESTAMP. ORA-01840: input value not long enough for date format
在vim下通过:set ff?可以看到DOS或UNIX的字样,DOS格式转换为UNIX的方法是在vim下执行:set ff=unix或者使用vim和sed的全文替换:%s/^M//g,注意^M在linux下通过同时摁下Ctrl+V+M得到.
实践证明,只要配置得当,比如关闭日志,配置和合适的缓存参数等,1400万行记录的csv文件,导入时间仅需30秒左右:
Stream buffer bytes: 256000 Read buffer bytes:20971520 Total logical records skipped: 0 Total logical records read: 14388211 Total logical records rejected: 1 Total logical records discarded: 0 Total stream buffers loaded by SQL*Loader main thread: 2920 Total stream buffers loaded by SQL*Loader load thread: 3623 Run began on Fri Jul 10 11:11:38 2020 Run ended on Fri Jul 10 11:12:10 2020 Elapsed time was: 00:00:32.68 CPU time was: 00:00:28.66
以下为csv文件内容示意:
A43D013DBE9C5BDFE0531A4A0C4BD2EB,10111302010000045929,91130205MA07UJTA5Y,96.250000,正常,,201906,2020-05-22 16:23:43,2020-05-22 16:23:43
文章评论
[补充]在windows下导出csv文件时,format setting配置时行尾务必选择\n\r,这样在linux下面就不会有行尾的格式问题.