HowTo:scraping technique for Pakistani hum Tv
Upon request from many members how to extract dynamic video link i put this tutorial,but to mention that every website has different approach so i will mention general steps.
The tutorial include three parts
1-General guide how to extract sreaming url from website
2-programing the parsing method and building scraping script by windows python
3- coding HumTvplayer plugin
requirement
-hidownload software or any other url sniffing softare
-python GUI http://www.python.org/getit/
-google chrome
to do this you should have basic knowledge in html stucture and python
i will take pakistani hum tv as example http://www.hum.tv/
1- first we should know the structure and format of the streaming video url and w can catch the url by hidwonload software
http://lwx006.tunefi...4389651ab7b.flv
better to test the stream on enigma2 box before proceeding to next step because the format may be not acceptable by the box
,the simplist method to test the stream open the site http://www.vlc.eu.pn/index.php and put the link mentioned above and submit and open the TSmedia/user streams/user stream tester you will find the stream in the top of the list ,play it and see if working or not
2- now our target to get the stream url mentioned above for the site www.hum.tv and first we open the site and look for the source by right mouse click and search for http://lwx006.tunefi...4389651ab7b.flv but nothing found and this is expected as kind of url protection
so we start to think the stream url is stored in external file executed by the site and by help of google chrome(f12) we find url
http://tune.pk/playe...240&autoplay=no
<iframe width="320" height="240" src="http://tune.pk/player/embed_player.php?vid=134121&folder=2013/07/10/&width=320&height=240&autoplay=no" frameborder="0" allowfullscreen scrolling="no"></iframe>
open the file and search for extension .flv but also nothing and that,s mean that the stream url is not stored in this url and by help of google chrome network(f12) we find the url containg the stream url
http://embed.tune.pk...121?autoplay=no
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Embedded Player</title>
</head>
<body style="margin:0px; padding:0px">
<iframe src="http://embed.tune.pk/play/134121?autoplay=no"
width="320" height="240"
frameborder="0" allowfullscreen scrolling="no"></iframe>
</body>
</html>
now we open the url source of http://embed.tune.pk...121?autoplay=no and by search for extension .flv we find this
sources: [
{
file: "http://lwx006.tunefiles.com/files/videos/2013/07/10/13734389651ab7b.flv",
width: "100%",
height: "100%",
label : "SD",
type : "flv",
default : true
our final stream url catched by hidownload presents in the code
now we take the stream url and test it by the vlc tester mentioned above
as we the extracted stream url by above method is not constant and will change may be after minute,hour ... so everytime you need to play the stream you should do this procedure and sure this is unpractical and by aid of python we make this is practical and easy