Home | Community | Message Board


This site includes paid links. Please support our sponsors.


Welcome to the Shroomery Message Board! You are experiencing a small sample of what the site has to offer. Please login or register to post messages and view our exclusive members-only content. You'll gain access to additional forums, file attachments, board customizations, encrypted private messages, and much more!

Jump to first unread post Pages: 1
OfflineBicycleT
Stranger
 User Gallery
Registered: 10/20/18
Posts: 288
Last seen: 2 years, 2 months
"Scraping on a Schedule with AWS Lambda and CloudWatch" by @kagemusha Jun 2018
    #25784865 - 02/01/19 10:32 PM (5 years, 1 month ago)

Web page data is often ephemeral. Think the top story on a news site, airline/concert ticket prices, or Ebay’s stream of Daily Deals. If we want to capture a historical time series of this data, scraping may be the only option.

There are many ways to architect a scraping task. We will use AWS Lambda to execute the task and CloudWatch to schedule it. For those not familiar, Lamda allows you do deploy functions to the cloud without worrying about the infrastructure on which the code is executed. AWS worries about that, in addition to things like scaling resources to meet demand. The advantage of the using services like Lambda and CloudWatch are that you don’t have to worry about setting up and maintaining servers. Or paying for dedicated instances. Lambda functions will only get executed when you needed and you only pay for execution. If you are only running your scraping tasks daily or every few hours, this will be much cheaper than paying for continually-running instances. A more complete discussion of serverless architectures is here.

We will give a small concrete scraping example in this post, but this simple cloud-based architecture is generalizable to much more complex use cases. The code is available on Github.

The big tasks we need to accomplish are:

    Write a function to scrape our page, parse it and save to S3. We’ll be using Python with the requests package to get the page, and Beautiful Soup to parse the page’s html
    Deploy our function as an AWS Lambda function. We’ll use the Serverless Framework to facilitate this.
    Create an AWS CloudWatch event to run our Lambda function on a schedule

We’ve broken this down into the steps below:

https://medium.com/@kagemusha_/scraping-on-a-schedule-with-aws-lambda-and-cloudwatch-caf65bc38848

Extras: Filter Print Post Top
OfflineGeedfod
Stranger
Registered: 02/13/19
Posts: 2
Last seen: 5 years, 1 month
Re: "Scraping on a Schedule with AWS Lambda and CloudWatch" by @kagemusha Jun 2018 [Re: BicycleT]
    #25809644 - 02/13/19 09:23 AM (5 years, 1 month ago)

interesting too!

Extras: Filter Print Post Top
Jump to top Pages: 1


Similar ThreadsPosterViewsRepliesLast post
* Awful hard disk slowness funnybunny 667 2 05/07/07 05:08 AM
by Seuss
* anyone not on the shroomery team for folding@home?
( 1 2 3 4 ... 14 15 )
amyloid 87,048 293 10/20/07 03:41 PM
by Ythan
* What do you think about this? School project I'm working on... Hints? SymmetryGroup8 979 6 03/16/07 10:09 PM
by SymmetryGroup8
* Light to heat energy formula? Drink_Punk_Soda 2,664 7 02/12/05 04:26 PM
by TinMan
* Physic dudes, got a question... SymmetryGroup8 1,020 6 05/01/07 10:47 PM
by RiverMan
* Right Click menu invisible four20snakeman 998 6 06/04/06 01:54 AM
by doodoomaster
* Bye Microsoft!
( 1 2 3 4 all )
YthanA 6,031 70 03/05/07 01:39 PM
by HELLA_TIGHT
* Downloading torrents on Ubuntu
( 1 2 all )
foodsgoodtoo 3,131 29 02/01/10 01:06 PM
by frith

Extra information
You cannot start new topics / You cannot reply to topics
HTML is disabled / BBCode is enabled
Moderator: trendal, automan, Northerner
293 topic views. 0 members, 0 guests and 1 web crawlers are browsing this forum.
[ Show Images Only | Sort by Score | Print Topic ]
Search this thread:

Copyright 1997-2024 Mind Media. Some rights reserved.

Generated in 0.022 seconds spending 0.007 seconds on 14 queries.