| Author |
Message |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Wed Apr 30, 2008 9:30 pm Post subject: Would there be any issues with this? |
|
|
I want to fill up my tutorial site's database with tutorials (just the links to them and a title). So I was thinking about creating a PHP script which could sort of "crawl" large tutorial sites and index the URLs and titles from those sites in my own database.
Would there be any copyright issues or anything by doing this? _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Thu May 01, 2008 4:28 am Post subject: |
|
|
Yes and no, it depends on tutorial per tutorial basis I guess. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Thu May 01, 2008 5:58 am Post subject: |
|
|
How so? Would this not be like a search engine crawling pages on the internet and storing them? _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Thu May 01, 2008 6:22 am Post subject: |
|
|
Certain sites don't allow direct linking of content or only allow certain other sites to link to them. They could ask you to remove the links. Though I think that the chances of that happening are small. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Thu May 01, 2008 9:10 pm Post subject: |
|
|
Now another question in regards to this topic, is it possible to find the URL a site redirects to in PHP? _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Fri May 02, 2008 2:41 am Post subject: |
|
|
Yes, is it a http or meta redirect? _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Fri May 02, 2008 10:16 am Post subject: |
|
|
You should use normal sockets for this one. Cause then you can simply read the http headers. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Fri May 02, 2008 11:57 am Post subject: |
|
|
Ok well I've been messing around with fsock and LLP people seems to block it by giving a 400 error but how do you read the headers after you have opened a site? _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Fri May 02, 2008 1:21 pm Post subject: |
|
|
Send a regular http request, like: http://pastebin.com/f685ee30b (pastebin cause of security mod of forum)
each line should end with a line break \r\n. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
krt ...

Joined: 11 Jan 2005 Posts: 4765 Location: Down Under
|
Posted: Fri May 02, 2008 6:59 pm Post subject: |
|
|
Use cURL with the followlocation option on. Much simpler. cURL is also simpler when you have to deal with security, SSL, proxies and what not. This abridged list of Curl options lists the other options you may be interested in if you want to deal with what I said above or other things - POST data, headers, cookies, user agent spoofing etc.
Example: | Code: | <?php
$url = 'http://example.com';
// Create cURL resource
$ch = curl_init();
// Set options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// Return output
$output = curl_exec($ch);
// Close cURL resource
curl_close($ch);
// Do what you want with output
echo $output;
?> |
And no, there shouldn't be problems with what you are doing. I use and recommend the 10% maximum excerpt rule and an easy mechanism for reporting indexed pages for content authors to be on the safe side. Most will just appreciate the link and extra traffic. |
|
| Back to top |
|
| |
Scott tutorialtoday.com

Joined: 24 Mar 2005 Posts: 2650 Location: Mississauga, Ontario
|
Posted: Fri May 02, 2008 7:30 pm Post subject: |
|
|
Thanks for the help from both of you. I already wrote the script before I saw your post about cURL, I just used fsockopen() as SolidRaven was talking about and then used preg_match_all() to check the headers for the location.
I was thinking the same thing that they wouldn't mind, I would personally prefer if people automatically index my tutorials on their site. _________________ Tutorial Management Script - Version 1.4 Released
TutorialToday - Up and running, submit your tutorials!
Linux Tutorials - Coming Soon |
|
| Back to top |
|
| |
LP-SolidRaven Dictator of the Dump

Joined: 06 Jun 2004 Posts: 7297 Location: The cheese is made out of moon
|
Posted: Sat May 03, 2008 12:29 am Post subject: |
|
|
The only problem with that krt is that cURL isn't installed everywhere. _________________
| Quote: |
<bart416> I just realized something
<bart416> we celebrate the fact that this piece of rock made one rotation around a glowing ball of plasma that is kept together due to its own gravity well
<njsg> HAPPY NEW YEAR
<Easter> ^^
|
|
|
| Back to top |
|
| |
ClickFanatic Est. 2005

Joined: 18 Jan 2005 Posts: 4100 Location: A particular geographic area
|
|
| Back to top |
|
| |
|
|
|