Home | Community | Message Board

MushroomCube.com
This site includes paid links. Please support our sponsors.


Welcome to the Shroomery Message Board! You are experiencing a small sample of what the site has to offer. Please login or register to post messages and view our exclusive members-only content. You'll gain access to additional forums, file attachments, board customizations, encrypted private messages, and much more!

Jump to first unread post Pages: 1
InvisibleOJK
Stranger
 User Gallery

Registered: 06/08/03
Posts: 10,629
regular expressions problem - extracting a subset of URLs from HTML source
    #7459710 - 09/27/07 10:38 AM (16 years, 4 months ago)

Hey,

I'm trying to backup all the images hosted in my photobucket account.

There isn't built-in functionality to do this, so I'm trying to hack something up.

I was hoping I could just use a spider to do it, but photobucket doesn't link to the raw images, so a spider is out.

However, the URLs are there in the source, something like: Code:
<!-- BEGIN: Album Thumbnail Element 0 -->

<li style="z-index:1;">
<input type="hidden" name="mediaUrl_0" id="mediaUrl_0" value="https://proxy.mind-media.com/proxy.php?url=http%3A%2F%2Fimg.photobucket.co%3Cwbr%3Em%2Falbums%2Fv290%2FOdiumjunkie%2Ffoo.jpg%3Cwbr%3E">
<input type="hidden" name="mediaDesc_0" id="mediaDesc_0" value="">

<p id="pDesc_0" class="desc" title="">
makebiosphere
</p>


So, I was hoping that I might be able to use regular expressions to trawl the source and copy every URL begining withCode:
http://img.*

as these are the links to the full images.

Once I had such a list of URLs, spidering the contents of my account and saving to disk would be a sinch.

Is this possible/easy to do using regular expressions? Is there another way to do this that I've overlooked?

As always, thanks for the help :sun:


Edited by Odiumjunkie (09/27/07 10:39 AM)


Extras: Filter Print Post Top
OfflineYthanA
ᕕ( ᐛ )ᕗ
Male User Gallery


Registered: 08/08/97
Posts: 18,774
Loc: NY/MA/VT Borderlands Flag
Last seen: 3 hours, 49 minutes
Re: regular expressions problem - extracting a subset of URLs from HTML source [Re: OJK]
    #7460093 - 09/27/07 12:13 PM (16 years, 4 months ago)

Something like this should work:

Code:

preg_match_all('%<input type="hidden" name="mediaUrl_\d+" id="mediaUrl_\d+" value="([^"]+)">%', $source, $output);

echo nl2br(print_r($output[1], true));



Extras: Filter Print Post Top
Offlineinsectvhore
lord of flies

Registered: 07/09/99
Posts: 1,233
Last seen: 7 months, 16 days
Re: regular expressions problem - extracting a subset of URLs from HTML source [Re: OJK]
    #7462440 - 09/27/07 10:32 PM (16 years, 4 months ago)

dont the resources of any web page you save get saved along with all text and formatting?

so couldnt you click "view all" at the top, and once they are all loaded, right click on the page and choose save web page as...


or am i incredibly stupid?


Extras: Filter Print Post Top
InvisibleOJK
Stranger
 User Gallery

Registered: 06/08/03
Posts: 10,629
Re: regular expressions problem - extracting a subset of URLs from HTML source [Re: insectvhore]
    #7463443 - 09/28/07 08:39 AM (16 years, 4 months ago)

> so couldnt you click "view all" at the top, and once they are all loaded, right click on the page and choose save web page as...

That would save copies of the thumbnails, not the full-sized images

Thanks Ythan, I'll give that a shot


Extras: Filter Print Post Top
Jump to top Pages: 1


Similar ThreadsPosterViewsRepliesLast post
* Why do I keep having this problem? User Exists 1,949 12 08/22/03 11:08 AM
by User Exists
* outlook express.. MetaShroom 572 3 03/12/03 02:55 AM
by Kurrupt
* AIM Express question_for_joo 664 2 03/20/04 06:46 AM
by TinMan
* CD Burning Problems... EffedS 1,573 4 10/25/03 12:02 AM
by Effed
* Windows 98 problem funkymonk 848 3 11/28/03 04:31 PM
by funkymonk
* ATi's PCI Express Line-up (X-series) daba 1,236 4 06/02/04 05:59 PM
by daba
* computer problems thePatient 1,433 8 03/26/04 04:14 AM
by Seuss
* computer problems blacksabbathrulz 1,405 11 02/06/04 02:17 AM
by Xochitl

Extra information
You cannot start new topics / You cannot reply to topics
HTML is disabled / BBCode is enabled
Moderator: trendal, automan, Northerner
694 topic views. 0 members, 0 guests and 0 web crawlers are browsing this forum.
[ Show Images Only | Sort by Score | Print Topic ]
Search this thread:

Copyright 1997-2024 Mind Media. Some rights reserved.

Generated in 0.02 seconds spending 0.004 seconds on 12 queries.