Friday, April 17, 2015

R : Webcrawler Parser with Try-Catch

Well, I felt the need to do some analysis over content hosted on some initial set of web-sites and then aggregate it, plot. I created a very simple and easy parser which would crawl and parse the data read from these websites. I used the try-catch block for fault tolerance and resilience from error (website down, unavailable or trust error, etc). Here is a sample code, where I have used tryCatch and readLines methods :

>myUrlStats <- function(urlToCrawl) {
    statData <- tryCatch(
   dataReadFromUrls <- readLines(con=urlToCrawl)
        error=function(errorMessageStr) {
            message(paste("URL does not seem to exist:", urlToCrawl))
            message("Error message:")
        warning=function(warningMessageStr) {
            message(paste("URL caused a warning:", urlToCrawl))
            message("Warning message:")
            # Choose a return value in case of warning
   ##Clean up code

> myWebCrawlParser <- function(dataReadFromURL){
# Do your analysis parsing here
# also like you can mine other outlinks from this data read for further traversing the web-links

> urlToCrawl <- c(
> finalReslt<- mapply(myUrlStats, urlToCrawl)
Happy programming!

No comments:

[Solved] Android Studio : Android emulator is incompatible with Hyper-V

Have been lately doing Android development. Ran into a minor problem. So I have two machines, a desktop and a laptop. I set the project on ...