If you create a data frame with overlapping dictionaries and save it as an Excel file, there's an omitted information

fleet_list = {}

airlines = ['lj-jna', 'ke-kal', 'oz-aar', '7c-jja', 'bx-abl', 'tw-twb', 'rs-asv']

for i in airlines:
    driver.get('https://www.flightradar24.com/data/airlines/%s' % i)
    driver.find_element_by_css_selector('nav > a:nth-child(3)').click()
    driver.find_element_by_css_selector('#list-aircraft > dt:nth-child(2)').click()
    try:
        for i in range(4, 51, 2):
            driver.find_element_by_css_selector('#list-aircraft > dt:nth-child(%s)' % i).click()
    except:
        pass
    fleet = driver.find_elements_by_css_selector('td:nth-child(1) > a')
    airline_code = driver.find_element_by_css_selector('div.row.m-t-l.m-l-l > h2').text[3:]
    for j in fleet:
        if airline_code not in fleet_list:
           fleet_list[airline_code] = {}
        fleet_list[airline_code].setdefault(j.text)
        for i in fleet_list[airline_code].keys():
            fleet_list[airline_code][i] = []
    time.sleep(2)

for i in fleet_list['ASV'].keys():
    driver.get('https://www.flightradar24.com/data/aircraft/%s' % i)
    html = driver.page_source
    raw = pd.read_html(html)
    fleet_list['ASV'][i].append(raw[0])
    time.sleep(3)

asv_data = pd.DataFrame(fleet_list)
asv_data.to_csv('sample2.csv')

When executing the current code, the value of fleet_list is

{'ASV': {'HL7212': [                                       FLIGHTS HISTORY  ... Unnamed: 32
0    RS904 06 Oct 2020 0:50 Landed 03:22 STD 02:00A...  ...         NaN
1    RS903 06 Oct 2020 0:50 Landed 01:37 STD 00:15A...  ...         NaN
2    RS902 05 Oct 2020 0:49 Landed 23:49 STD 22:35A...  ...         NaN
3    RS901 05 Oct 2020 0:53 Landed 22:06 STD 21:00A...  ...         NaN
4    RS908 05 Oct 2020 0:48 Landed 11:40 STD 10:40A...  ...         NaN
..                                                 ...  ...         ...
96   RS906 22 Sep 2020 0:48 Landed 07:36 STD 06:25A...  ...         NaN
97   RS905 22 Sep 2020 0:55 Landed 05:29 STD 04:20A...  ...         NaN
98   RS904 22 Sep 2020 0:54 Landed 03:12 STD 02:00A...  ...         NaN
99   RS903 22 Sep 2020 0:51 Landed 01:15 STD 00:15A...  ...         NaN
100                                                NaN  ...         NaN
[101 rows x 33 columns]],
# The rest is omitted the latter part

And if you save it as csv,

https://res.cloudinary.com/eightcruz/image/upload/v1601955667/hdxfzlxixyyp1ws3cdxa.png

It comes out like this. ... It's not omitted, it's just ... written on it. How can I make everyone come out?

The image is not coming out, so upload it as a link

python dictionary list dataframe pandas

2022-09-20 19:55

1 Answers

I would do this.


asv_list = []
for i in fleet_list["ASV"].keys():
    driver.get("https://www.flightradar24.com/data/aircraft/%s" % i)
    html = driver.page_source
    raw = pd.read_html(html)
    # # fleet_list["ASV"][i].append(raw[0])
    df = raw[0]
    df["key"] = i
    asv_list.append(df)
    time.sleep(3)

asv_data = pd.concat(asv_list)
asv_data.to_csv("sample2.csv")

A nested dictionary with a list of data frames inside... We add a column to the data frame, and then we combine the data frames of the same structure into one (pd.concat), and we create one large data frame, and we store it as a csv.

If you try, sometimes the table you bring in changes in structure, you can take care of it and throw it away...

When I searched stackoverflow, there seems to be a way to save it by dividing it into several sheets.

https://stackoverflow.com/a/14225838/100093 : Using Pandas' ExcelWriter.

2022-09-20 19:55

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656