Create your own Price comparison web application
Summary: RumboJ is a Price Comparison web application for comparing
product prices in different Shopping websites. Currently supports price
comparison only for selected product categories (Mobile Phones and Watches) and
data is compared only between Amazon.com and Flipkart.com.
Architecture:
Description: RumboJ harvests web pages with web crawlers. Loaded documents are
then parsed, reduced, and then indexed. Index data is stored on RAM. At
application start, it loads index data from a backup file into RAM so that
subsequent searches are faster. Administrator controls the crawling process
through REST services. Through this, they can start crawlers, stop crawlers,
and check crawling status, all from the GUI. So each time data is added to
index through crawlers, a background job runs periodically and writes index
data in RAM to a backup directory. There is a separate crawler for each
shopping website. Same products from multiple websites are then merged together
and the price information for each website is then added to the product
details.
Components: RumboJ
consists of a variety of components that serve the Indexing, Searching,
Updating multiple prices for each product, HTML Parsers, and other supporting
operations. The following table shows the different components that are being
used.
Description :
· Java 8, JEE 7
· Apache Lucene
· Phantom JS
· JQuery
· Bootstrap
· Spring MVC, Bean,
Security
· Apache Tomcat
Achievements :
· Time taken to serve
each user is approximately 1.5 s
· The Data scraper
program used simulates human behaviour (Scrolling up and down, Staying on the
page for some sometime) while loading shopping webpages to avoid getting blocked
· Fresh Tor IP
networks were used after regular period of time, during data extraction, to
avoid IP address block
· Carefully edited
Http header information used by data scraper program to mimic a real web browser
· A Product data
stored in just 10KB in RAM
· Same product from
different shopping websites are matched using Product title and product
description
· Spell Checker to
suggest product keywords if the entered string is incorrect.
· Chat feature for
queries on products. (Yet to be implemented)
Visit this website
- https://github.com/RagulkumarRaj/RumboJ
Comments
Post a Comment