CSVization of python text files (logs)

I tried to get a log of the equipment and convert it into CSV, but there are about 1 million lines of logs, and the VBA cannot display all of them and cannot process them.
Therefore, I am trying to convert the log into CSV in python, but I do not know what to do with the following two points at all.

①I want to add the date and time displayed by the data command to the left of the column such as ID, MAC Address, etc.
②Headers such as ID and MAC Address are displayed every time a command is executed, so if you want to CSVize it, you only want to do it first.
③No need for date, wlc# show ap-discovered, or discovered APs and stations (4249 entries) to CSVize. 　I want to delete commands, etc.

sample.py

file_name="C:/Work/before.log"

with open(file_name, 'r', encoding='shift-jis') asf:
    lines=f.readlines()

For line in lines:
    newlines=line.split()
    print(newlines)

before.log → Log output from equipment (about 1 million lines, but only a few lines) After.csv→before.log and the format you want to achieve

before.log

date
Tue Nov 11:00:00 JST 2022
wlc# show ap-discovered

ID MAC Address Type Channel Confirmed - Channel SSID BSSID Last Previous Current Pkts Rx RF Band Name            

40 aa:bb:cc:dd:ee:ff AP66 samples sample 11:22:33:44:55:66 00d:00h:00s 0-77 313382 802.11gn AP1        
40gg: hh:ii:jj:kk:ll AP67 samples sample 22:33:44:55:66:7700d:00h:00m:01s 0-752840 802.11gn AP2
    Discovered APs and Stations (4249 entries)
date
Tue Nov 11 00:05:00 JST 2022
wlc# show ap-discovered

ID MAC Address Type Channel Confirmed - Channel SSID BSSID Last Previous Current Pkts Rx RF Band Name            

40 aa:bb:cc:dd:ee:ff AP66 samples sample 11:22:33:44:55:66 00d:00h:00s 0-77 313382 802.11gn AP1        
40gg: hh:ii:jj:kk:ll AP67 samples sample 22:33:44:55:66:7700d:00h:00m:01s 0-752840 802.11gn AP2
date
Tue Nov 11 00:10:00 JST 2022
wlc# show ap-discovered

ID MAC Address Type Channel Confirmed - Channel SSID BSSID Last Previous Current Pkts Rx RF Band Name            

40 aa:bb:cc:dd:ee:ff AP66 samples sample 11:22:33:44:55:66 00d:00h:00s 0-77 313382 802.11gn AP1        
40gg: hh:ii:jj:kk:ll AP67 samples sample 22:33:44:55:66:7700d:00h:00m:01s 0-752840 802.11gn AP2
    Discovered APs and Stations (4249 entries)

after.csv

Day, Month, Date, Time, ID, MAC Address, Type, Channel, Confirmed-Channel, SSID, BSSID, Last, Previous, Current, Pkts, Rx, RF, Band, Name            
Tue, Nov, 11, 00:00:00, 40, aa:bb:cc:dd:ee:ff, AP, 6, 6, samples sample, 11:22:33:44:55:56, 00d:00h:00m:00s, 0,-77, 313382, 802.11gn, AP1
Tue, Nov, 11, 00:00:00, 40, gg: hh:ii:jj:kk:ll, AP, 6, 6, samples sample, 22:33:44:55:66:77, 00d:00m:00s, 0,-77, 313382, 802.11gn, AP2        
Tue, Nov, 11, 00:05:00, 40, aa:bb:cc:dd:ee:ff, AP, 6, 6, samples sample, 11:22:33:44:55:56, 00d:00h:00m:00s, 0,-77, 313382, 802.11gn, AP1
Tue, Nov, 11, 00:05:00, 40, gg: hh:ii:jj:kk:ll, AP, 6, 6, samples sample, 22:33:44:55:66:77, 00d:00m:00s, 0,-77, 313382, 802.11gn, AP2

I would appreciate it if you could provide me with an example code.


python
python3
					
					

	


		
	

	
		2022-11-20 00:44



			

			
			1 Answers


	
		
If you use readlines(), you will get all the contents of the files in memory.
One million lines may be hard on memory.
Basically, if you read the date line, remember the contents of the date.

If you spit it out along with the date when you read the record line 

You should only have to remember the latest date content to remember the parts you called in the past
Is it like this?
file_name="C:/Work/before.log"

with open(file_name, 'r', encoding='shift-jis') asf:
    # lines=f.readlines()

    print('Day, Month, Date, Time, ID, MAC Address, Type, Channel, Confirmed-Channel, SSID, BSSID, Last, Previous, Current, Pkts, Rx, RF, Band, Name')

    # read the file line by line
    line=f.readline()
    while line:

        # If it's a date line, read the next line, get only 4 columns, connect it with a comma, and store it in the date_csv variable.
        ifline.startswith('date'):
            line=f.readline()
            date=line.split() [:4]
            date_csv=', '.join(date)+', '

        # If you start with a number, you can connect the date_csv and its lines and output them in one line.
        if line[0]>='0' and line[0]<='9':
            print(date_csv+', '.join(line.split())))

        line=f.readline()
*As it is a log, I set the output destination to standard output, but if you redirect it, it will fit into a file, and if you want to output a fixed file, you can fill in a code like with open(file_name, 'w')
1The first character starts with a number as a record line, but I don't know what the contents of の are, so if you want to do it correctly, it might be better to match it with a regular expression.

The decision cost will be heavy, so it may take time if it is for 1 million lines.

		
		
			

				

					
				

				
					2022-11-20 10:27
				
			
		
	
			
			If you have any answers or tips



		

	
		Popular Tags
	
	python x 4647
android x 1593
java x 1494
javascript x 1427
c x 927
c++ x 878
ruby-on-rails x 696
php x 692
python3 x 685
html x 656
	


	
		Popular Questions
	
	
	1026 /usr/bin/google-chrome:symbol lookup error:/usr/bin/google-chrome: undefined symbol:gbm_bo_get_modifier

	869 Uncaught (inpromise) Error on Electron: An object could not be cloned

	711 I'm a beginner at Flask. The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

	775 Error in x, y, and format string must not be None

	790 M2 Mac fails to install rbenv install 3.1.3 due to errors