CSVization of python text files (logs)

Asked 1 years ago, Updated 1 years ago, 410 views

I tried to get a log of the equipment and convert it into CSV, but there are about 1 million lines of logs, and the VBA cannot display all of them and cannot process them.
Therefore, I am trying to convert the log into CSV in python, but I do not know what to do with the following two points at all.


①I want to add the date and time displayed by the data command to the left of the column such as ID, MAC Address, etc.
②Headers such as ID and MAC Address are displayed every time a command is executed, so if you want to CSVize it, you only want to do it first.
③No need for date, wlc# show ap-discovered, or discovered APs and stations (4249 entries) to CSVize.  I want to delete commands, etc.


sample.py

file_name="C:/Work/before.log"

with open(file_name, 'r', encoding='shift-jis') asf:
    lines=f.readlines()

For line in lines:
    newlines=line.split()
    print(newlines)


before.log → Log output from equipment (about 1 million lines, but only a few lines)
After.csv→before.log and the format you want to achieve

before.log

date
Tue Nov 11:00:00 JST 2022
wlc# show ap-discovered

ID MAC Address Type Channel Confirmed - Channel SSID BSSID Last Previous Current Pkts Rx RF Band Name            

40 aa:bb:cc:dd:ee:ff AP66 samples sample 11:22:33:44:55:66 00d:00h:00s 0-77 313382 802.11gn AP1        
40gg: hh:ii:jj:kk:ll AP67 samples sample 22:33:44:55:66:7700d:00h:00m:01s 0-752840 802.11gn AP2
    Discovered APs and Stations (4249 entries)
date
Tue Nov 11 00:05:00 JST 2022
wlc# show ap-discovered

ID MAC Address Type Channel Confirmed - Channel SSID BSSID Last Previous Current Pkts Rx RF Band Name            

40 aa:bb:cc:dd:ee:ff AP66 samples sample 11:22:33:44:55:66 00d:00h:00s 0-77 313382 802.11gn AP1        
40gg: hh:ii:jj:kk:ll AP67 samples sample 22:33:44:55:66:7700d:00h:00m:01s 0-752840 802.11gn AP2
date
Tue Nov 11 00:10:00 JST 2022
wlc# show ap-discovered

ID MAC Address Type Channel Confirmed - Channel SSID BSSID Last Previous Current Pkts Rx RF Band Name            

40 aa:bb:cc:dd:ee:ff AP66 samples sample 11:22:33:44:55:66 00d:00h:00s 0-77 313382 802.11gn AP1        
40gg: hh:ii:jj:kk:ll AP67 samples sample 22:33:44:55:66:7700d:00h:00m:01s 0-752840 802.11gn AP2
    Discovered APs and Stations (4249 entries)

after.csv

Day, Month, Date, Time, ID, MAC Address, Type, Channel, Confirmed-Channel, SSID, BSSID, Last, Previous, Current, Pkts, Rx, RF, Band, Name            
Tue, Nov, 11, 00:00:00, 40, aa:bb:cc:dd:ee:ff, AP, 6, 6, samples sample, 11:22:33:44:55:56, 00d:00h:00m:00s, 0,-77, 313382, 802.11gn, AP1
Tue, Nov, 11, 00:00:00, 40, gg: hh:ii:jj:kk:ll, AP, 6, 6, samples sample, 22:33:44:55:66:77, 00d:00m:00s, 0,-77, 313382, 802.11gn, AP2        
Tue, Nov, 11, 00:05:00, 40, aa:bb:cc:dd:ee:ff, AP, 6, 6, samples sample, 11:22:33:44:55:56, 00d:00h:00m:00s, 0,-77, 313382, 802.11gn, AP1
Tue, Nov, 11, 00:05:00, 40, gg: hh:ii:jj:kk:ll, AP, 6, 6, samples sample, 22:33:44:55:66:77, 00d:00m:00s, 0,-77, 313382, 802.11gn, AP2          

I would appreciate it if you could provide me with an example code.

python python3

2022-11-20 00:44

1 Answers

If you use readlines(), you will get all the contents of the files in memory. One million lines may be hard on memory.

Basically, if you read the date line, remember the contents of the date.
If you spit it out along with the date when you read the record line
You should only have to remember the latest date content to remember the parts you called in the past

Is it like this?

file_name="C:/Work/before.log"

with open(file_name, 'r', encoding='shift-jis') asf:
    # lines=f.readlines()

    print('Day, Month, Date, Time, ID, MAC Address, Type, Channel, Confirmed-Channel, SSID, BSSID, Last, Previous, Current, Pkts, Rx, RF, Band, Name')

    # read the file line by line
    line=f.readline()
    while line:

        # If it's a date line, read the next line, get only 4 columns, connect it with a comma, and store it in the date_csv variable.
        ifline.startswith('date'):
            line=f.readline()
            date=line.split() [:4]
            date_csv=', '.join(date)+', '

        # If you start with a number, you can connect the date_csv and its lines and output them in one line.
        if line[0]>='0' and line[0]<='9':
            print(date_csv+', '.join(line.split())))

        line=f.readline()

*As it is a log, I set the output destination to standard output, but if you redirect it, it will fit into a file, and if you want to output a fixed file, you can fill in a code like with open(file_name, 'w')

1The first character starts with a number as a record line, but I don't know what the contents of の are, so if you want to do it correctly, it might be better to match it with a regular expression.
The decision cost will be heavy, so it may take time if it is for 1 million lines.


2022-11-20 10:27

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.