machine learning when characteristic quantities are related to each other

I have a question about machine learning.

when the feature has several layers of structure
For machine learning with Python scikit-learn, etc. Please tell me what kind of data to treat and what model is appropriate.

Let me give you a simple example.(This may not be a good example, but...)

Based on the physical properties data of the aqueous solution in which substance A and substance B are mixed in various proportions,
Suppose you want to predict the physical properties of the aqueous solution mixed at an unknown rate.
At this point, the data is
Material A Material B Physical Properties 1% 1% 0.5
2% 2.5% 2.1
I think it will be like this.
If this is the case, the concentration of substance A and substance B is used as the characteristic quantity to predict the physical properties
I think I can make a model.
However, if substances A and B can also be represented by data (e.g., intrinsic values such as molecular weight and melting point)
This prediction model may also extend to substances C, D.
(Of course, you won't know until you try.)

If you want to do something like that,
A substance with a molecular weight of 〇、 and a melting point of △△ is 1%. 2% molecular weight □□, melting point ××,
The physical properties at this time are ~~
This data is a collection of data.
In other words, first, the molecular weight and melting point data determine what kind of substance
What percentage of the substance is concentrated?

In this case, although the molecular weight and melting point of substance A are related,
The molecular weight of substance A and the concentration of substance B are not related.

How do you represent this form of data as a data set?
Also, when processing such data, the general model of machine learning, etc., is
What do you have?

I would appreciate it if you could tell me only the websites and books that you can refer to.

python machine-learning

2022-09-30 21:48

2 Answers

I'm also a beginner in machine learning, so I don't know if it will be helpful...

データI haven't seen many examples like those of the questioner in the data analysis book, so I wonder if it is something that should be explored in machine learning.

分子 Since molecular weight and melting point are inherent values of matter,

from the viewpoint of multiple collinearity

sample1=['molecular weight of substance A', 'melting point of substance A', 'concentration of substance A',
           'Molecular weight of substance B', 'melting point of substance B', 'concentration of substance B',
           'Physical properties of the generated substance']

Then, since each molecular weight and melting point do not make sense in the first place,

sample1=['concentration of substance A', 'concentration of substance B', 'physical properties of the generated substance']

I think it is equivalent to .Therefore, the value of "physical properties of the generated substance" from "concentration of substance A" and "concentration of substance B" will be obtained from the regression plane.

物質 When it comes to predicting physical properties including molecular weight and melting point as well as the concentration of substances, I feel it is necessary to use new characteristic quantities that combine molecular weight and concentration with some meaningful calculation.

I understand what the questioner wants to say, but I am sorry that I can only answer with my senses.
I am very concerned, so I will follow your questions.

2022-09-30 21:48

If you put in various characteristic quantities and predict continuous values such as physical properties, it will be a problem called regression.
There are many types of regression, including multiple regression analysis, regression tree, and ensemble learning.

Next, I think it will depend on what the questioner wants to get.
If you want to create a model that can be used for any combination of substances, as you said, you can include molecular weight and melting point.

If you want to learn only two substances: A and B. As with the previous respondent's answer, molecular weight and melting point are meaningless.This is not because of the multi-collinear point of view, but because there is no unique change in the fixed value.I think it was like primary independence.

You probably want to do something like this, but if I were you,

Material A's molecular weight Material A's melting point Material A's concentration Material B's melting point Material B's concentration Material B's physical properties Material A's molecular weight Material A's melting point Material A's concentration 2 Material B's melting point Material B's concentration 2 Physical properties
...
Material A's molecular weight Material A's melting point Material A's concentration Material A's melting point Material C's melting point Material C's physical properties Material A's molecular weight Material A's melting point Material A's concentration 2 Material C's melting point Material C's concentration 2 Physical properties
...
Material D's molecular weight Material D's melting point Material D's concentration 1 Material E's melting point Material E's physical properties
Material D's molecular weight Material D's melting point Material D's concentration 2 Material E's melting point Material E's concentration 2 Physical properties

Detects physical properties by varying concentrations of and various substances Conduct multiple regression analysis
Molecular weight, melting point, or concentration can be determined by the value of t. Multi-collinearity is determined by VIF value
...
I think it's a trend.

2022-09-30 21:48

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656