Fill in a specific column of Python Pandas data frames with two given series

There are two data frames and two series. I'd like to fill the amount column in df2 with the value I want. Based on the fruit id, the "amount" is the first priority, and if the fruit id is blank, the average amount of origin is added. It's easier said than done.

python machine-learning pandas

2022-09-20 13:14

2 Answers

I thought it would be fun, so I did it.

In Pandas, there is a series, a data frame, and an index, which can be convenient when you think of it as the same concept as the dictionary. That's an example of this question. If you simply set df[column] = series, it works like dict.update.

Please take a closer look at the example code and understand it.

import numpy as np
import pandas as pd

df1 = pd.DataFrame(
    {
        "Fruit name": ["Apple", "Tomato", "Banana", "Grape"],
        "Fruitid": [0, 1, 2, 3],
        "Amount": [600, 700, 300, 400],
        "Country of origin": [Seoul", "Seoul", "Jeju", "Daejeon",
        "Average amount of origin": [800, 800, 200, 500],
    }
)
df2 = pd.DataFrame(
    {
        "Fruit name": ["Apple", "Banana", "Geobong"],
        "Fruitid": [0, 2, 4],
        "Country of origin": [Seoul", "Jeju", "Daejeon",
        "Amount": [np.NaN, np.NaN, np.NaN],
        "Required Value": [600, 300, 500],
    }
)
s1 = pd.Series ([600, 700, 300, 400], index=["Apple", "Tomato", "Banana", "Grape"])
s2 = pd.Series ([800, 200, 500], index=["Seoul", "Jeju", "Daejeon"])

colums_org = df2.columns

df2 = df2.set_index ("fruit name")
df2["amount"] = s1
df2 = df2.reset_index()

print(df2.to_markdown())
"""
|    | fruit name | fruit id | country of origin | amount | desired value |
|---:|:---------|---------:|:---------|-------:|-----------:|
|  0 | Apple | 0 | Seoul | 600 | 600 |
|  1 | Banana | 2 | Jeju | 300 | 300 |
|  2 | Geobong | 4 | Daejeon | nan | 500 |
"""

df2 = df2.set_index ("origin")
df2.loc[df2["amount"].isna(), "amount"] = s2
df2 = df2.reset_index()

print(df2.to_markdown())
"""
|    | country of origin | fruit name | fruit id | amount | desired value |
|---:|:---------|:---------|---------:|-------:|-----------:|
|  0 | Seoul | Apple | 0 | 600 | 600 |
|  1 | Jeju | Banana | 2 | 300 | 300 |
|  2 | Daejeon | Geobong | 4 | 500 | 500 |
"""

df2 = df2[colums_org]
print(df2.to_markdown())
"""
|    | fruit name | fruit id | country of origin | amount | desired value |
|---:|:---------|---------:|:---------|-------:|-----------:|
|  0 | Apple | 0 | Seoul | 600 | 600 |
|  1 | Banana | 2 | Jeju | 300 | 300 |
|  2 | Geobong | 4 | Daejeon | 500 | 500 |
"""

2022-09-20 13:14

In short, the problem situation is

That's the right?
If so, I'm not familiar with the data frame, so SQL, but it's not that it's not, but it's actually a bit tricky.

select
  total.item_name,
  total.area_name,
  ifnull(some.price, total.default_price) AS posible_price -- look this part up in magic.some and write total if not
from (
  select
    items.id AS item_id,
    items.name As item_name,
    areas.id AS area_id,
    areas.name AS area_name,
    areas.default_value AS default_price
  From items, areas -- combine the numbers of all possible cases
) as total -- add 'already known fruit+origin information' to each possible case
left join item_area_pricings some -- this table corresponds to df1
  ON some.item_id = total.item_id AND some.area_id = total.area_id -- Both conditions must be met
order by total.item_id, total.area_id;

I hope it's useful.

2022-09-20 13:14

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656