<< Click to Display Table of Contents >> Navigation: ImSeoArchive RegEx Grabber > How to go |
These are the steps to follow to scrape data from webpages.
1.Create a txt file containing urls to scrape data. Insert one url per line. Below an example with three urls.
2.Save file.
3.Click on “Load button” (point 4) and load the previously created file. Optionally you can modify file content clicking on “Edit” button (point 2)
4.Select your type of data you want to extract (email, links, proxies, etc.) from menu (point 8). Optionally you can edit regular expressions clicking on “Edit RegEx” button (point 7).
5.Click on "Apply Regex" button and my software will insert regular expressions in "Regex Pattern" column. If you want to clean your data you can add also a regex pattern in "Regex replace" column. See my guide to testing new regex.
6.Set Threads number and timeout (points 5 and 10). For improving data extraction set a low threads number and an high timeout.
7.Click on “Start button” and wait that extraction is completed. During the process you will see how many matches for single url will be found in the colum "Matched #"
8.Now you can export to txt file all extracted data clicking on “Export” -> "Export results as txt file".