Home » Continuous Integration » Phamtomjs – More than UI testing

Phamtomjs – More than UI testing

Phantomjs is well known as a “HEADLESS WEBSITE TESTING” tool. People often use Phantomjs in form of a WebDriver tool (such as Selenium) to test whether their webpages work like expected.
In this blog post, I will demonstrate a page of this functionality and to show you how I use Phantomjs to crawl/scrape webpages. Crawling is fun!

Foreword: This blog post assumes you’re at least familiar with Javascript code. This doesn’t cover how to install Phantomjs, for example.

I. Installation

I myself choose to install latest version of PhantomJS from sources.

II. Capture page differences

Imagine development team update the website with new version, and we want to capture the changes, by any DOM elements. How can we even do that? Yes we can use our eyes but how about subtle changes that we can’t point out by our eyes? Phantoms came to rescue!
First, install page-monitor, an extension of Phantomjs (*)
Let’s say we’ll check http://vnexpress.net. Create a file named monitor.js with the following content:

Execute by running:

Output look somehow like this

page_monitor_2

Later (30 minutes or so), we will comeback and check the page again, output will *slightly* the same.
We know that vnexpress.net is a news page, so the articles will be up and down almost in no time. In 30′ or so, the page will change somehow somewhere.
We have new file named check_diff.js to check diff:

 

Output of this process will be an image that shows exactly the diff, visualized!!
We can go much further, depends on how we need. For example to create a dashboard that extract stored webpage’s status as revision and show the diff..

III. Crawling

There’re some important things should be resolved when using PhantomJS to crawl a webpage:
– How to login (1)
– How to maintain that logged-in status (store and use cookies) (2)
(1) depends on how website’s login system works.
(2) is just a technique we should remember with Phantomjs
Here is the code. Within this, we
– Try to login Pluralsight.
– Check if existing cookies is valid or not. If yes, we don’t need to login again. If no, we process logging-in
– Render the homepage that show we already logged-in

Execute by running:

Full code here: https://gist.github.com/tuannh99/ef247247fda68793efcb

Hope this is helpful to you all. In case you need discussion, drop me an email to tuan_nh@septeni-technology.jp, or leave comment on Github page.

(*): Install using npm, by this command: