Beginner’s guide to HTML parsing or web scrapping with PHP

shape
shape
shape
shape
shape
shape
shape
shape

What is Web Scraping?

According to Wikipedia, Web scraping means “Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.”
It simply means that you can copy data from other websites to show or store it on your website. Lets say you want to create a currency exchange calcular but you don’t want to input currency rates everyday on the website because its tedious work. You can purchase an API that can cost you around 500 USD or more or you can simply parse the currency values from any website that offers the same service. Simple as that.

Lets start with our today’s project

I’d extract information from state bank of Pakistan’s website to obtain US Dollar, GB Pound, Japanese Yen and Euro to Pakistani Rupee exchange rates.. In PHP, We need a library called PHP Simple HTML DOM Parser for this specific purpose. So lets head over to the website and download the library.

Source Code

[php]
<?php require_once("simpledom.php"); // Loading the Library
$sUrl = file_get_contents("http://www.sbp.org.pk/"); //Enter the Webpage you want to parse
$sPageContent = new simple_html_dom();                       // Create New object
$sPageContent->load($sUrl);

$sTable = $sPageContent->find("form",0)->find("table",22);   // We’d Parse the Table No 23 within first form

$sUsd   = $sTable->find("table",0)->find("td",1)->plaintext;  // Get value from td 2 within table 1 in plain text
$sUsd   =  substr($sUsd,strpos($sUsd," ")+1);                 // Elemenate "USD " from the result

$sGbp   = $sTable->find("table",1)->find("td",1)->plaintext;  // Rinse and Repeat
$sGbp   =  substr($sGbp,strpos($sGbp," ")+1);

$sJpy   = $sTable->find("table",2)->find("td",1)->plaintext;
$sJpy   =  substr($sJpy,strpos($sJpy," ")+1);

$sEur   = $sTable->find("table",3)->find("td",1)->plaintext;
$sEur   =  substr($sEur,strpos($sEur," ")+1);

print "USD : ".$sUsd." – GBP : ".$sGbp." – JPY : ".$sJpy." – EUR : ".$sEur; // Here is the output
?>

[/php]

Here are the currency rates as off 22 July, 2016.

USD : 104.8398 – GBP : 138.3047 – JPY : 0.9938 – EUR : 115.4916

These Values are in Pakistani rupee and we can do whatever we want to with them. Hope you liked this tutorial. Here’s detailed manual.

You can reach Waqas Yousaf through twitter @wiqi.

One Comment:

  1. I must say you have very interesting articles here. Your website should go viral.
    You need initial boost only. How to get it? Search for;
    seo strategies

Comments are closed.