模拟股指期货报撤单数据

Reading time ~8 minutes

模拟股指期货报撤单数据

开始

  • 因项目需要
  • 股指期货的报撤单数据属于绝对机密,中金所不肯提供
  • 那就只能模拟造出来了

逻辑及思路

  • 20万客户,每天3万交易客户
  • 每天20万笔报撤单记录
  • 一年大概五千万条数据
  • 需要模拟的数据
  • 交易日 tradingday : date
  • 客户id clientid : int
  • 合约id instrumentid : text
  • 买卖方向 direction : text
  • 报单日期 insertdate : date
  • 报单时间 inserttime : time

核心实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import random
import pandas.io.sql as psql
import pandas as pd


#交易日,[2:6]
tradingday = ['20150101','20150102','20150103','20150104','20150105','20150106','20150107','20150108','20150109','20150110','20150111','20150112','20150113','20150114','20150115','20150116','20150117','20150118','20150119','20150120','20150121','20150122','20150123','20150124','20150125','20150126','20150127','20150128','20150129','20150130','20150131','20150201','20150202','20150203','20150204','20150205','20150206','20150207','20150208','20150209','20150210','20150211','20150212','20150213','20150214','20150215','20150216','20150217','20150218','20150219','20150220','20150221','20150222','20150223','20150224','20150225','20150226','20150227','20150228','20150301','20150302','20150303','20150304','20150305','20150306','20150307','20150308','20150309','20150310','20150311','20150312','20150313','20150314','20150315','20150316','20150317','20150318','20150319','20150320','20150321','20150322','20150323','20150324','20150325','20150326','20150327','20150328','20150329','20150330','20150331','20150401','20150402','20150403','20150404','20150405','20150406','20150407','20150408','20150409','20150410','20150411','20150412','20150413','20150414','20150415','20150416','20150417','20150418','20150419','20150420','20150421','20150422','20150423','20150424','20150425','20150426','20150427','20150428','20150429','20150430','20150501','20150502','20150503','20150504','20150505','20150506','20150507','20150508','20150509','20150510','20150511','20150512','20150513','20150514','20150515','20150516','20150517','20150518','20150519','20150520','20150521','20150522','20150523','20150524','20150525','20150526','20150527','20150528','20150529','20150530','20150531','20150601','20150602','20150603','20150604','20150605','20150606','20150607','20150608','20150609','20150610','20150611','20150612','20150613','20150614','20150615','20150616','20150617','20150618','20150619','20150620','20150621','20150622','20150623','20150624','20150625','20150626','20150627','20150628','20150629','20150630','20150701','20150702','20150703','20150704','20150705','20150706','20150707','20150708','20150709','20150710','20150711','20150712','20150713','20150714','20150715','20150716','20150717','20150718','20150719','20150720','20150721','20150722','20150723','20150724','20150725','20150726','20150727','20150728','20150729','20150730','20150731','20150801','20150802','20150803','20150804','20150805','20150806','20150807','20150808','20150809','20150810','20150811','20150812','20150813','20150814','20150815','20150816','20150817','20150818','20150819','20150820','20150821','20150822','20150823','20150824','20150825','20150826','20150827','20150828','20150829','20150830','20150831','20150901','20150902','20150903','20150904','20150905','20150906','20150907','20150908','20150909','20150910','20150911','20150912','20150913','20150914','20150915','20150916','20150917','20150918','20150919','20150920','20150921','20150922','20150923','20150924','20150925','20150926','20150927','20150928','20150929','20150930','20151001','20151002','20151003','20151004','20151005','20151006','20151007','20151008','20151009','20151010','20151011','20151012','20151013','20151014','20151015','20151016','20151017','20151018','20151019','20151020','20151021','20151022','20151023','20151024','20151025','20151026','20151027','20151028','20151029','20151030','20151031','20151101','20151102','20151103','20151104','20151105','20151106','20151107','20151108','20151109','20151110','20151111','20151112','20151113','20151114','20151115','20151116','20151117','20151118','20151119','20151120','20151121','20151122','20151123','20151124','20151125','20151126','20151127','20151128','20151129','20151130','20151201','20151202','20151203','20151204','20151205','20151206','20151207','20151208','20151209','20151210','20151211','20151212','20151213','20151214','20151215','20151216','20151217','20151218','20151219','20151220','20151221','20151222','20151223','20151224','20151225','20151226','20151227','20151228','20151229','20151230','20151231']
#客户id
clientid = []
#股指期货合约
instrumentid = 'IF'
#买卖方向
direction = ['0','1']
#交易日期
insertdate = []
#交易时间09:00:00--11:00:00 13:00:00--15:00:00
inserttime = []
hours = [9,10,11,13,14]

#随机生成不重复用户3万个
number = 30000
start = 100000
end = 200000
clientid = random.sample(range(start,end),number)

#生成一条报撤单数据
one = ()
#报撤单数据列表
datalist =[]

def getinstrumentid(tradingday):
	li = []
	li.append('IF' + tradingday[2:6])
	li.append('IF' + str(int(tradingday[2:6])+1))
	return random.choice(li)

#获取交易日期
def gettradingday():
	return random.choice(tradingday)

#获取用户id
def getclientid():
	return random.choice(clientid)

#从买房方向中选择一个
def getdirection():
	return random.choice(direction)

#获取随机交易时间
def gettime():
	hour = random.choice(hours)
	minute = random.randint(0,59)
	second = random.randint(0,59)
	return str(hour) + ':' + str(minute) + ':' + str(second)


if __name__=='__main__':

	for i in range(0,20):
		for d in range(0,2000000):
			day = gettradingday()
			one = (day,getclientid(),getinstrumentid(day),getdirection(),day,gettime())
			datalist.append(one)
		df = pd.DataFrame(datalist)
		csv = df.to_csv('/opt/IF' + str(i) + '.csv',index=False,header=False)
		print('%s is saved......' % i)
	print('IF data is all successful..........')

Greenplum创建表

1
2
3
4
5
6
7
8
9
10
create table if(
tradingday date,
clientid int,
instrumentid char(6),
direction char(1),
insertdate date,
inserttime time
)
WITH (appendonly=true,orientation=column,compresstype=QUICKLZ,COMPRESSLEVEL=1)  
distributed by (clientid)

使用GPLOAD载入数据库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
VERSION: 1.0.0.1
DATABASE: fitl
USER: gpadmin
HOST: mdw
PORT: 5432
GPLOAD:
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - mdw
         PORT: 8092
         FILE:
           - /fitl/IF/*.csv
    - COLUMNS:
          -  tradingday : date
          -  clientid : int
          -  instrumentid : text
          -  direction : text
          -  insertdate : date
          -  inserttime : time
    - FORMAT: csv
    - DELIMITER: ','
    - QUOTE: '"'
    - HEADER: false
    - ERROR_LIMIT: 500
    - ERROR_TABLE: public.if_err
   OUTPUT:
    - TABLE: public.if
    - MODE: INSERT

生成4亿6千2百万报撤单数据

1
2
3
4
5
6
7
2016-05-23 18:10:38|INFO|gpload session started 2016-05-23 18:10:38
2016-05-23 18:10:39|INFO|started gpfdist -p 8092 -P 8093 -f "/fitl/IF/IF20.csv" -t 30
2016-05-23 18:10:52|INFO|running time: 14.26 seconds
2016-05-23 18:10:53|INFO|rows Inserted          = 42000000
2016-05-23 18:10:53|INFO|rows Updated           = 0
2016-05-23 18:10:53|INFO|data formatting errors = 0
2016-05-23 18:10:53|INFO|gpload succeeded

IF数据

Puppet证书过期处理

Published on November 12, 2018

沪牌拍牌有多难

Published on March 12, 2018