I want to delete the information at the end of the URL using the regular expression.

Asked 2 years ago, Updated 2 years ago, 79 views

2 Answers

If you want to exclude the fourth slash or later from the URL and match it,

preg_match('#http://(?:/[^/]+){4}#', $url, $matches);
$result=$matches[0];
// If you want to limit the domain
// # http://www\.amazon\.co\.jp(?:/[^/]+){3}#

It matches from http:// until the slash-free block /[^/+] starts with a slash (including the domain name) continues four times.If you want to remove a query string without a fourth slash, you should use [^/?] instead of [^/].

Well, if it's only for Amazon, you can recognize /dp/xxxx and match it in detail.

preg_match('#https?://www\.amazon\.co\.jp/(?:[^/]+/)?dp/[A-Z0-9]+#',$url,$matches);

Run Sample https://regex101.com/r/nY8dN2/1


2022-09-30 20:23

I'm sorry I couldn't do it all at once, but I think it's like this.

<?php
preg_match_all('#(https?:\/\/(?:.*?\/){4})#', 'http://www.amazon.co.jp/%E3%83%8E%E3%83%BC%E3%83%88%E3%83%91%E3%82%BD%E3%82%B3%E3%83%B3-EeeBook-X205TA-WHITE10-Windows10-11-6%E3%82%A4%E3%83%B3%E3%83%81%E3%83%AF%E3%82%A4%E3%83%89/dp/B015DTB87Q/ref=sr_1_1?s=computers&ie=UTF8&qid=1460353489&sr=1-1&keywords=%E3%83%91%E3%82%BD%E3%82%B3%E3%83% B3', $m);

if(isset($m[1][0])){
    echo trim($m[1][0], '\/');
}

?>

Results

http://www.amazon.co.jp/%E3%83%8E%E3%83%BC%E3%83%88%E3%83%91%E3%82%BD%E3%82%B3%E3%83%B3-EeeBook-X205TA-WHITE10-Windows10-11-6%E3%82%A4%E3%83%B3%E3%83%81%E3%83%AF%E3%82%A4%E3%83%89/dp/B015DTB87Q


2022-09-30 20:23

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.