首页 > PHP教程 > php开发知识文章

php 转换MHTML(mht)文件为html

本文介绍php解析mht文件(MHTML),并转换成html文件的方法,欢迎大家的学习。

MHTML文件

MHTML文件称为聚合HTML文档、Web档案或单一文件网页。就是将网站的所有元素(包括文本和图形)都保存到单个文件中。

这种封装使您能将整个网站发布为单个内嵌MIME的聚合HTML文档(MHTML)文件,或将整个网站作为一个电子邮件或附件发送。

 

浏览器保存网页为MHTML文件

在 Chrome 地址栏中键入“chrome://flags”,回车,这是一个 Chrome 的功能配置页面,项目比较多,

通过 Ctrl+F 来搜索“mhtml”,找到“将网页另存为 MHTML”这一项,然后点击下方的“启用”。

php 转换MHTML(mh)t文件为html

 

PHP转换实现

 class mhtparse

{

var $file = '';
var $boundary = '';
var $filedata = '';
var $countparts = 1;
var $log = '';

function extract()
{
$this->read_filedata();
$this->file_parts();

return 1;
}

function set_file($p)
{
$this->file = $p;
}

function get_log()
{
return $this->log;
}

function file_parts()
{
$lines = explode("\n", substr($this->filedata, 0, 8192));
foreach ($lines as $line) {
$line = trim($line);
if (strpos($line, '=') !== FALSE) {
if (strpos($line, 'boundary', 0) !== FALSE) {
$range = $this->getrange($line, '"', '"', 0);
$this->boundary = "--" . $range ['range'];
$this->filedata = str_replace($line, '', $this->filedata);
break;
}
}
}
if ($this->boundary != '') {
$this->filedata = explode($this->boundary, $this->filedata);
unset ($this->filedata [0]);
$this->filedata = array_values($this->filedata);
$this->countparts = count($this->filedata);
} else {
$tmp = $this->filedata;
$this->filedata = array(
$tmp
);
}
}

function get_all_part_file()
{
return $this->filedata;
}

function get_part_to_file($i)
{
$line_data_start = 0;
$encoding = '';
$part_lines = explode("\n", ltrim($this->filedata [$i]));
foreach ($part_lines as $line_id => $line) {
$line = trim($line);
if ($line == '') {
if (trim($part_lines [0]) == '--')
return 1;
$line_data_start = $line_id;
break;
}
if (strpos($line, ':') !== FALSE) {
$pos = strpos($line, ':');
$k = strtolower(trim(substr($line, 0, $pos)));
$v = trim(substr($line, $pos + 1, strlen($line)));
if ($k == 'content-transfer-encoding') {
$encoding = $v;
}
if ($k == 'content-location') {
$location = $v;
}
if ($k == 'content-type') {
$contenttype = $v;
}
}
}

foreach ($part_lines as $line_id => $line) {
if ($line_id <= $line_data_start)
$part_lines [$line_id] = '';
}

$part_lines = implode('', $part_lines);
if ($encoding == 'base64')
$part_lines = base64_decode($part_lines);
elseif ($encoding == 'quoted-printable')
$part_lines = imap_qprint($part_lines);

return $part_lines;
}

function read_filedata()
{
$handle = fopen($this->file, 'r');
$this->filedata = fread($handle, filesize($this->file));
fclose($handle);
}

function getrange(&$subject, $Beginmark_str = '{', $Endmark_str = '}', $Start_pos = 0)
{
if (empty ($Beginmark_str))
$Beginmark_str = '{';
$Beginmark_str_len = strlen($Beginmark_str);

if (empty ($Endmark_str))
$Endmark_str = '}';
$Endmark_str_len = strlen($Endmark_str);

/* $Start_pos_cache = 0; */
do {
/* !algus */
if (!is_int($Begin_firstOccurence_pos))
$Start_pos_cache = $Start_pos;

/* ?algus-test */
$Start_pos_cache = @strpos($subject, $Beginmark_str, $Start_pos_cache);

/* this is possible start for range */
if (is_int($Start_pos_cache)) {
/* skip */
$Start_pos_cache = ($Start_pos_cache + $Beginmark_str_len);
/* test possible range start pos */
if (is_int($Begin_firstOccurence_pos)) {
if ($Start_pos_cache < $range_end_pos)
$rangeClean = 0;
elseif ($Start_pos_cache > $range_end_pos)
$rangeClean = 1;
}
/* here it is */
if (!is_int($Begin_firstOccurence_pos))
$Begin_firstOccurence_pos = $Start_pos_cache;
} /* VIGA NR 0 ALGUST EI OLE */

if (!is_int($Start_pos_cache)) {
/* !algus */
/* VIGA NR 1 ALGUSMARKI EI LEITUD : VIIMANE VOIMALIK ALGUS */
if (is_int($Begin_firstOccurence_pos) and ($Start_pos_cache < $range_end_pos))
$rangeClean = 1;
else
return false;
}
if (is_int($Begin_firstOccurence_pos) and ($rangeClean != 1)) {
if (!is_int($End_pos_cache))
$End_sequel_pos = $Begin_firstOccurence_pos;

$End_pos_cache = strpos($subject, $Endmark_str, $End_sequel_pos);

/* ok */
if (is_int($End_pos_cache) and ($rangeClean != 1)) {
$range_current_lenght = ($End_pos_cache - $Begin_firstOccurence_pos);
$End_sequel_pos = ($End_pos_cache + $Endmark_str_len);
$range_end_pos = $End_pos_cache;
}
/* VIGA NR 2 LOPPU EI LEITUD */
if (!is_int($End_pos_cache))
if ($End_pos_cache == false)
return false;
}
} while ($rangeClean < 1);

if (is_int($Begin_firstOccurence_pos) and is_int($range_current_lenght))
$Range = substr($subject, $Begin_firstOccurence_pos, $range_current_lenght);
else
return false;

return array(
'range' => $Range,
'begin' => $Begin_firstOccurence_pos,
'end' => $End_sequel_pos
);
}
}

以上这篇php 转换MHTML(mht)文件为html的实例就是小编分享给大家的全部内容了,希望能给大家一个参考。

关闭
感谢您的支持,我会继续努力!
扫码打赏,建议金额1-10元


提醒:打赏金额将直接进入对方账号,无法退款,请您谨慎操作。