0

PHP to Read Keyword, Title and Description

Posted by paris on Apr 16, 2017 in Code
<?php
if ($_POST)
{
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	echo "
	<table border='1'>
		<thead>
			<tr>
				<th>URL</th>
				<th>Title</th>
				<th>Description</th>
				<th>Keywords</th>
			</tr>
		</thead>
		<tbody>";
	foreach (explode(PHP_EOL, $_POST['textarea']) as $url)
	{
		$url = trim($url);
		if (!preg_match("/^https?:\/\//i", $url))
		{
			continue;
		}
		else
		{
			curl_setopt($ch, CURLOPT_URL, $url);
			$html = curl_exec($ch);
			if (!$html)
				continue;
			$data = parse_page($html);
 
			// secure the data for printing
			$url = htmlentities($url, ENT_QUOTES);
			foreach ($data as $key => $value)
				$data[$key] = htmlentities($value, ENT_QUOTES, "UTF-8");
			echo "<tr>";
			echo "<td>{$url}</td>";
			echo "<td>{$data['title']}</td>";
			echo "<td>{$data['description']}</td>";
			echo "<td> {$data['keywords']}</td>";
			echo "</tr>";
		}
	}
	curl_close($ch);
	echo "
		</tbody>
	</table>";
}
 
 
function parse_page($html)
{
     /* get page's title */
     preg_match("/<title>(.+)<\/title>/siU", $html, $matches);
     $title = $matches ? $matches[1] : null;
     /* get page's keywords */
 
$re="<meta\s+name=['\"]??keywords['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";
     preg_match("/$re/siU", $html, $matches);
     $keywords = $matches ? $matches[1] : null;
 
     /* get page's description */
 
$re="<meta\s+name=['\"]??description['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";
     preg_match("/$re/siU", $html, $matches);
     $desc = $matches ? $matches[1] : null;
 
     /* parse links */
     $re="<a\s[^>]*href\s*=\s*(['\"]??)([^'\">]*?)\\1[^>]*>(.*)<\/a>";
     preg_match_all("/$re/siU", $html, $matches);
     $links = $matches ? $matches[2] : null;
 
	return array(
		"title" => $title,
		"description" => $desc,
		"keywords" => $keywords,
	);
}
?>
 
<form method="post" action="?">
<textarea name="textarea" cols="45" rows="5"><?php echo @htmlentities($_POST['textarea'], ENT_QUOTES, "UTF-8")?></textarea><br />
<input type="submit" name="button" id="button" value="Submit" />
</form>

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Tags: , , , ,

 
0

Regex To Remove Spam Links from WordPress Website

Posted by paris on Feb 18, 2014 in Code, Wordpress

paydayloanscamRecently a wordpress site had multiple SQL injections into the content randomly throughout the 100 or so blog posts as per right. These included generic keywords such as :

  • levitra
  • cialis
  • payday
  • viagra
  • pharmacy
  • pfizer

The sites it linked to where :

http://masagro.mx/index.php/en/payday-loans-in-goldsboro-nc
http://simlesa.cimmyt.org/index.php/payday-loans-indiana
http://www.redclara.net/generic-viagra-us/
http://greatvines.com/cialis-online-fda
http://www.crackunit2.com/purchase-cheap-levitra/

Going through these with Search and Replace plugin was going to take ages , so I tried to look for a regex script. I can across the following , curtious of https://managewp.com/clean-link-injections-hacked-websites however this only looked for cetrain Div Tags. I needed something to remove Hyperlinks containing the above keywords. I modified the code to the below and placed into the functions.php file and ran with preview on then off and went through the keyword list. Cleared about 1000 links!!

//Enter keyword below to check for in hyperlinks ( the whole link )
    $spamkeyword = "spamkeyword";
 
    // By default only preview infected posts. Change to 0 to clean posts
    $preview_only = 1;
 
    // This is the pattern to search and replace with blank
    $pattern = '%<a href=[\"\'][^"]*?'.$spamkeyword.'.*?[\"\']>.*?</a>%';
 
    // This is the query to find suspicious posts using fast SQL query
    $query="SELECT ID, post_content from $wpdb->posts where post_content LIKE '%$spamkeyword%'";
 
    global $wpdb;
    $num_cleaned = 0;
 
    $posts = $wpdb->get_results($query);
 
    echo "Suspicious: ".count($posts)." ";
 
    if ($preview_only)
      echo "Post IDs: ";
 
    // go through all suspicious posts
    foreach ($posts as $post)
//echo   $post->post_content;
    {
        if (!$preview_only)
        {
            // try the pattern
            $new_content=preg_replace($pattern, '',  $post->post_content);
 
            // update the cleaned content
            if ($new_content!=$post->post_content) {
              $wpdb->update(
                $wpdb->posts,
                array(
                    'post_content' => $new_content
                ),
                array( 'ID' => $post->ID ));
 
                $num_cleaned++;
            }      
        }
        else echo $post->ID." ";
 
    //UnComment Below to See Results of Preview before comitting
    //echo preg_replace($pattern, '',  $post->post_content);
    }
 
    if (!$preview_only)
      echo "Cleaned: $num_cleaned";

 

regex Upon searching for help with this , I did have to smile at the irony of the Regex Help Website being hacked in the same fashion , although obviously all clear now!

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Tags: , , , , , , ,

Copyright © 2017 Welcome to Pariswells.com All rights reserved. Theme by Laptop Geek. Privacy Policy