I want to create a data frame with Python's Pandas and insert it into mssql Data frames are formed in 14 columns at an average of 250,000 rows.
sqlalchemy.create_engine
to insert the whole in sql form, and it took 45 minutes locally
dataframe.iterrows()
inserts cursor.execute() line by line through repeat statement, which also takes a long time and is currently running.
How can I quickly insert a data frame into msql when it takes less than 10 seconds to process it?
The largest capacity file in the json file is 316,650 KB.
python mssql sql-server
import pandas as pd
import pymysql
import time
def makefile(name,q):
f = open(name,'w')
f.write('ls,num,id,pass,name\n')
for i in range(q):
txt = '%s,%s,id%s,pw%s,nm%s\n'%(i,i,i,i,i)
f.write(txt)
def conn():
db = pymysql.connect(host='localhost',port=3306,user='howoni',passwd='#',db='howoni')
c = db.cursor()
c.execute('truncate TABLES')
return db
makefile('hi2.csv',500000)
df = pd.read_csv('hi2.csv',index_col='ls')
b = conn()
a = b.cursor()
s = time.time()
sql = 'insert into TABLES values '
data = []
for idx, i in enumerate(df.values):
sql += '(%s, %s, %s, %s),'
for j in i:
data.append(j)
if idx % 20 == 0:
sql = sql.rstrip(',')
a.execute(sql, tuple(data))
sql = 'insert into TABLES values '
data = []
sql = sql.rstrip(',')
a.execute(sql,data)
print(time.time()-s)
s = time.time()
sql = 'insert into TABLES values (%s, %s, %s, %s)'
for i in df.values:
a.execute(sql, tuple(i))
print(time.time()-s)
a.execute('select count(*) from TABLES')
r = a.fetchone()
print(r)
b.commit()
a.close()
1) "insert into [TABLES] values (%s, %s, %s, %s) * 20" * 25000 : 500000rows 15 seconds,
2) "insert into [TABLES] values (%s, %s, %s) * 500000 : 500000rows required 120 seconds
I think you can refer to it when you use pimssql to hit it in a way other than bulkFor Use
Stackoverflow has the same question. There are various answers, so please test it.
© 2024 OneMinuteCode. All rights reserved.