首页 > PHP教程 > php开发知识文章

PHP利用正则表达式将相对路径转成绝对路径

本文主要介绍了PHP利用正则表达式将相对路径转成绝对路径的方法,欢迎大家的学习。

在做网络爬虫程序开发的时候特别需要将爬虫程序搜索到的超链接进行处理,统一都改成绝对路径的,本文就写了一个用正则表达式来对搜索到的链接进行处理。

通常我们可能会搜索到如下的链接:

<!-- 空超链接 -->
<a href=""></a>

<!-- 空白符 -->
<a href=" " rel="external nofollow"> </a>

<!-- a标签含有其它属性 -->
<a href="index.html" rel="external nofollow" alt="超链接">index.html </a>
<a href="/" rel="external nofollow" rel="external nofollow" target="_blank"> / target="_blank" </a>
<a target="_blank" href="/" alt="超链接"> target="_blank" /alt="超链接" </a>
<a target="_blank" title="超链接" href="/" alt="超链接"> target="_blank" title="超链接" /alt="超链接" </a>

<!-- 根目录 -->
<a href="/" rel="external nofollow"> / </a>
<a href="a" rel="external nofollow"> a </a>

<!-- 含参数 -->
<a href="/index.html?id=1" rel="external nofollow"> /index.html?id=1 </a>
<a href="?id=2" rel="external nofollow">?id=2 </a>

<!-- // -->
<a href="//index.html" rel="external nofollow"> //index.html </a>
<a href="//www.yuqingqi.com" rel="external nofollow">//www.yuqingqi.com </a>

<!-- 站内链接 -->
<a href="http://www.hole_1.com/index.html" rel="external nofollow">http://www.hole_1.com/index.html </a>

<!-- 站外链接 -->
<a href="http://www.yuqingqi.com" rel="external nofollow">http://www.yuqingqi.com </a>
<a href="http://www.numberer.net" rel="external nofollow">http://www.numberer.net </a>

<!-- 图片,文本文件格式的链接 -->
<a href="1.jpg" rel="external nofollow"> 1.jpg </a>
<a href="1.jpeg" rel="external nofollow">1.jpeg </a>
<a href="1.gif" rel="external nofollow"> 1.gif </a>
<a href="1.png" rel="external nofollow"> 1.png </a>
<a href="1.txt" rel="external nofollow"> 1.txt </a>

<!-- 普通链接 -->
<a href="index.html" rel="external nofollow" rel="external nofollow">index.html </a>
<a href="index.html" rel="external nofollow" rel="external nofollow" >index.html </a>
<a href="./index.html" rel="external nofollow"> ./index.html </a>
<a href="../index.html" rel="external nofollow">../index.html </a>
<a href=".../" rel="external nofollow"> .../ </a>
<a href="..." rel="external nofollow">... </a>

<!-- 非链接,含有链接冒号 -->
<a href="javascript:void(0)" rel="external nofollow"> javascript:void(0) </a>
<a href="a:b" rel="external nofollow"> a:b </a>
<a href="/a#a:b" rel="external nofollow"> /a#a:b </a>
<a href="mailto:'mafutian@126.com'" rel="external nofollow"> mailto:'mafutian@126.com' </a>
<a href="/tencent://message/?uin=335134463">/tencent://message/?uin=335134463 </a>

<!-- 相对路径 -->
<a href="." rel="external nofollow"> . </a>
<a href=".." rel="external nofollow">.. </a>
<a href="../" rel="external nofollow"> ../ </a>
<a href="/a/b/.." rel="external nofollow"> /a/b/.. </a>
<a href="/a" rel="external nofollow"> /a </a>
<a href="./b" rel="external nofollow"> ./b </a>
<a href="./././././././././b" rel="external nofollow"> ./././././././././b </a>

<!-- 其实就是 ./b -->
<a href="../c" rel="external nofollow">../c </a><a href="../../d" rel="external nofollow"> ../../d </a>
<a href="../a/../b/c/../d" rel="external nofollow">../a/../b/c/../d </a>
<a href="./../e" rel="external nofollow"> ./../e </a>
<a href="http://www.hole_1.org/./../e" rel="external nofollow">http://www.hole_1.org/./../e </a>
<a href="./.././f" rel="external nofollow"> ./.././f </a>
<a href="http://www.hole_1.org/../a/.../../b/c/../d/..">http://www.hole_1.org/../a/.../../b/c/../d/.. </a>

<!-- 带有端口号 -->
<a href=":8081/index.html" rel="external nofollow">:8081/index.html </a>
<a href="http://www.yuqingqi.com:80/index.html"> :80/index.html </a>
<a href="http://www.yuqingqi.com:8081/index.html"> http://www.yuqingqi.com:8081/index.html </a>
<a href="http://www.yuqingqi.com:8082/index.html"> http://www.yuqingqi.com:8082/index.html </a>

处理的第一步,设置成绝对路径:

http:// ... / ../ ../

然后本文讲讲如何去除绝对路径中的 './'、'../'、'/..'的实现代码:

function url_to_absolute ($relative) {
$absolute = ''; // 去除所有的 './'
$absolute = preg_replace('/(?<!/.)/.///', '', $relative);
$count = preg_match_all('/(?<!//)//([^//]{1,}?)///././//', $absolute, $res); // 迭代去除所有的 '/abc/../'
do {
$absolute = preg_replace('/(?<!//)//([^//]{1,}?)///././//', '/', $absolute);
$count = preg_match_all('/(?<!//)//([^//]{1,}?)///././//', $absolute, $res);
} while ($count >= 1); // 除去最后的 '/..'
$absolute = preg_replace('/(?<!//)//([^//]{1,}?)///./.$/', '/', $absolute);
$absolute = preg_replace('////./.$/', '', $absolute); // 除去存在的 '../'
$absolute = preg_replace('/(?<!/.)/././//', '', $absolute);
return $absolute;
}

$relative = 'http://www.aa.org/../a/.../../b/c/../d/..';
var_dump(url_to_absolute($relative));// 输出:string 'http://www.aa.org/a/b/' (length=26)

PHP利用正则表达式将相对路径转成绝对路径

以上就是这篇文章PHP利用正则表达式将相对路径转成绝对路径的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对本站的支持。

关闭
感谢您的支持,我会继续努力!
扫码打赏,建议金额1-10元


提醒:打赏金额将直接进入对方账号,无法退款,请您谨慎操作。