"Retrieve subdomains from URLs" is interpreted as a goal to the question.
If you want to get some information from the URL, use urllib.parse
first (I'll leave the solution using the regular expression to someone else).
>>from urlib.parse import urlparse
>>>o=urlparse('https://hoge1.hoge2')
>>print(o.hostname.split('.')[0])
'hoge'
By the way, if you want to get domains and subdomains more accurately, you can also use a package called tldextract.
https://pypi.python.org/pypi/tldextract
If you want to use regular expressions
(?<=https:\/\/)\w+(?=\.\w+)
This pattern matches hoge1.
(?<=)
is Lookbehind, read later.Match only if it follows this parenthesis pattern.
(?=)
is Lookahead, look ahead.Match only if it precedes the pattern in parentheses.
So https:\/\/
(slash escapes) to Lookbehind and
Specify \.\w+
(where \w is an alphabet, in this case matches .hoge2) as Lookahead and
Remove the \w
(alphabetical) between the two.
© 2024 OneMinuteCode. All rights reserved.