使用python
用到的包:xml.dom.minidom
需求:
有一個表,裡面數據量比較大,每天一更新,其字段可以通過配置文件進行配置,即,可能每次建表的字段不一樣。
上游跑時會根據配置從源文件中提取,到入庫這一步需要根據配置進行建表。
解決:
寫了一個簡單的xml,配置需要字段及類型
上游讀取到對應的數據
入庫這一步,先把原表刪除,根據配置建新表
XML文件
- <?xml version="1.0" encoding="UTF-8"?>
- <!-- 表名 ,數據庫名 可靈活配置插入哪個庫哪個表 -->
- <table name="top_query" db_name="evaluting_sys">
- <!-- 非業務主鍵,自增長,可配名,其他 INTEGER UNSIGNED AUTO_INCREMENT -->
- <primary_key>
- <name>id</name>
- </primary_key>
- <!-- 字段開始 -->
- <field>
- <name>query</name>
- <type>varchar(200)</type>
- <is_index>false</is_index>
- <description>query</description>
- </field>
- <field>
- <name>pv</name>
- <type>integer</type>
- <is_index>false</is_index>
- <description>pv</description>
- </field>
- <field>
- <name>avg_money</name>
- <type>integer</type>
- <is_index>false</is_index>
- <description></description>
- </field>
- <!-- 字段配置結束 -->
- </table>
腳本:
- #!/usr/bin/python
- # -*- coding:utf-8 -*-
- #author: ken
- #desc: use to read db xml config.
- #-----------------------
- #2012-02-18 created
-
- #----------------------
-
- import sys,os
- from xml.dom import minidom, Node
-
- def read_dbconfig_xml(xml_file_path):
- content = {}
-
- root = minidom.parse(<span style="background-color: rgb(255, 255, 255); ">xml_file_path)</span>
- table = root.getElementsByTagName("table")[0]
-
- #read dbname and table name.
- tabletable_name = table.getAttribute("name")
- db_name = table.getAttribute("db_name")
-
- if len(table_name) > 0 and len(db_name) > 0:
- db_sql = "create database if not exists `" + db_name +"`; use " + db_name + ";"
- table_drop_sql = "drop " + table_name + " if exists " + table_name + ";"
- content.update({"db_sql" : db_sql})
- content.update({"table_sql" : table_drop_sql })
- else:
- print "Error:attribute is not define well! db_name=" + db_name + " ;table_name=" + table_name
- sys.exit(1)
- #print table_name, db_name
-
- table_create_sql = "create table " + table_name +"("
-
- #read primary cell
- primary_key = table.getElementsByTagName("primary_key")[0]
- primary_keyprimary_key_name = primary_key.getElementsByTagName("name")[0].childNodes[0].nodeValue
-
- table_create_sql += primary_key_name + " INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,"
-
- #print primary_key.toxml()
- #read ordernary field
- fields = table.getElementsByTagName("field")
- f_index = 0
- for field in fields:
- f_index += 1
- name = field.getElementsByTagName("name")[0].childNodes[0].nodeValue
- type = field.getElementsByTagName("type")[0].childNodes[0].nodeValue
- table_create_sql += name + " " + type
- if f_index != len(fields):
- table_create_sql += ","
- is_index = field.getElementsByTagName("is_index")[0].childNodes[0].nodeValue
-
- table_create_sql += ");"
- content.update({"table_create_sql" : table_create_sql})
- #character set latin1 collate latin1_danish_ci;
- print content
-
-
- if __name__ == "__main__":
- read_dbconfig_xml(sys.argv[1])
涉及方法:
root = minidom.parse(xml_file_path) 獲取dom對象
root.getElementsByTagName("table") 根據tag獲取節點列表
table.getAttribute("name") 獲取屬性
primary_key.getElementsByTagName("name")[0].childNodes[0].nodeValue 獲取子節點的值(<name>id</name> 得到id)
-----------------------------------------------------
簡單的讀取就是這樣,這裡只是簡單使用了下
後續需去了解下python讀取xml的幾個庫
xml.dom.minidom
xml.etree.ElementTree
libxml2
lxml.etree
xpath