I want to delete the beginning and end of the URL string in the regular expression.

in python3

https://hoge1.hoge2

From a string like the one shown in

Remove https:// and hoge2
Could you tell me how to take out only hoge1

python python3

2022-09-30 19:10

2 Answers

"Retrieve subdomains from URLs" is interpreted as a goal to the question.

If you want to get some information from the URL, use urllib.parse first (I'll leave the solution using the regular expression to someone else).

>>from urlib.parse import urlparse
>>>o=urlparse('https://hoge1.hoge2')
>>print(o.hostname.split('.')[0])
'hoge'

By the way, if you want to get domains and subdomains more accurately, you can also use a package called tldextract.
https://pypi.python.org/pypi/tldextract

2022-09-30 19:10

If you want to use regular expressions

(?<=https:\/\/)\w+(?=\.\w+)

This pattern matches hoge1.
(?<=) is Lookbehind, read later.Match only if it follows this parenthesis pattern.
(?=) is Lookahead, look ahead.Match only if it precedes the pattern in parentheses.
So https:\/\/ (slash escapes) to Lookbehind and
Specify \.\w+ (where \w is an alphabet, in this case matches .hoge2) as Lookahead and
Remove the \w (alphabetical) between the two.

2022-09-30 19:10

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656