Saturday, 27 June 2009

(Off Topic) My Shiny Tiny WRAP Firewall Running Voyage Linux

So, I purchased a WRAP (Wireless Router Application Platform) from Yawarra, that can be used as firewalls or wireless routers. The advantage with Yawarra is that they give you a nice chassis to work with.

Other bits I needed:
  • A 1GB CompactFlash card for the Operating System.
  • An all-in-one card reader/writer.
  • A null modem serial cable, for connecting to the WRAP.
  • A USB to RS232 Converter, because most desktop motherboards do not have serial ports these days.
Next step, installing Voyage Linux, which is based on Debian. Very easy to follow the README.

Victory! Below is a screenshot using CuteCom to connect to the router via a serial cable. Notice that I am using /dev/ttyUSB0 because I am using a USB to RS232 Converter.



A picture of the WRAP:



Planned usage:

Thursday, 25 June 2009

How to compile ELinks from source on Debian or Ubuntu

ELinks is a text-based Web browser, that is rather handy.

First install GnuTLS development files to allow SSL support:
aptitude install libgnutls-dev
Then, compile and install ELinks:
wget http://elinks.or.cz/download/elinks-0.12pre4.tar.bz2
tar -xjvf elinks-0.12pre4.tar.bz2
cd elinks-0.12pre4
./configure --with-gnutls
make
make install

Wednesday, 24 June 2009

Spidering 104: How to crawl a Web-site into a MySQL database (coming soon)

You were introduced to wget in lesson 1011. Now we are going to put the fetched documents into a database with meta information such as the URL, retrieval date, outgoing links, etc. This will provide a simple point of integration into other parts of your application suite and an indexed table to lookup your data. Indexed data is stored in Random Access Memory for quick retrieval.

Coming soon...

Spidering 103: How to analyse HTTP traffic

Analysing HTTP traffic is useful for discovering the personality of Web-sites.

You can analyse HTTP traffic with the Firebug extension for Mozilla Firefox. If you need to pretend to be using Internet Explorer for some reason, you can use a User Agent Switcher.

Wireshark and tcpdump are also useful, but may be annoying when trying to analyse HTTPS and do not provide integration with your Web browser.

(Off Topic) How to measure the performance of your application development team

If you ever end up working for a company that implements Key Performance Indicators, the below objectives, actions and KPIs (measures) may be useful for defining your role as an Analyst Programmer:

Objective 1: Design Simple Solutions That Meet Requirements

Actions:
  • Keep the design simple
  • Use known design patterns where appropriate
  • Split design into comprehensive components
  • Focus on deliverables during design phase
  • Maintain a high degree of discussion between developers
  • Maintain communication with business owners
  • Maintain communication with domain experts
  • Seek feedback from business owners early
  • Use mock-ups where appropriate to convey concepts clearly
Measures:
  • Perceived visibility to design process
  • Discuss effectiveness of design in post-release meeting

Objective 2: Efficiently Implement Maintainable & Reliable Solutions

Actions:
  • Write unit tests
  • Comment code (where appropriate; do not add useless comments)
  • Document APIs
  • Think carefully when naming classes, methods, properties etc
  • Use the Issue Tracker
  • Split work into smaller tasks
  • Plan all work (do not skip the design phase)
  • Conduct peer review
  • Communicate with other developers
  • Ensure testing processes are followed
  • Do not over-engineer implementations (KISS: Keep It Simple, Stupid)
  • Produce efficient applications (optimisation, high performance, scalability, lower hardware costs)
Measures:
  • Assess code readability and documentation
  • Unit test coverage reports
  • Feedback from peer review
  • Assess difficulty in refactoring
  • Assess perceived regression (in relation to scope) that is the result of not following process
  • Assess actual regression (in relation to scope) that is the result of not following process
  • Assess responsiveness and scalability of applications
  • Assess Issue Tracker usage and organisation skills

Objective 3: Effectively Deliver & Coordinate Projects According to Schedule

Actions:
  • Produce reliable time-estimates
  • Maintain high-visibility with a Gantt chart
  • Split work into comprehensive tasks
  • Be organised
  • Hold meetings
  • Communicate with business owners
  • Do not accept changes without a formal change request
  • Any changes to the design must be reflected in the schedule
  • Plan releases and patches
Measures:
  • Actual adherence to schedule
  • Perceived adherence to schedule (visibility)
  • Assess scope creep

Objective 4: Maintain Reliability of Deployed Systems

Actions:
  • Apply risk management discipline
  • Opt for low-risk, pragmatic solutions
  • Investigate alternative (“proper”) solutions regularly
  • Triage new bugs
  • Reduce key-person dependencies by sharing knowledge and using documentation
  • Develop maintainable solutions
  • Develop reliable solutions
Measures:
  • Assess actual regression from patches
  • Assess perceived regression from patches
  • Assess difficulty in understanding code
  • Assess difficulty in refactoring and making changes to code
  • Assess the Bus factor.
  • Assess issue resolution handling

Objective 5: Foster Positive Customer Relationships

Actions:
  • Manage meetings effectively
  • Communicate with customers via a Technical Portal
  • Triage issues and maintain communication
Measures:
  • Customer feedback

Objective 6: Self Development

Actions:
  • Seek feedback from other developers
  • Approach new roles within the team
  • Further your own understanding
  • Raise questions and/or suggestions at weekly meetings
  • Demonstrate initiative (improve process, etc)
Measures:
  • Peer review
  • Team meetings
  • Assess initiatives undertaken
  • Assess courses undertaken that were relevant to the business

Objective 7: OH&S

Actions:
  • Use ergonomic human interface devices to avoid RSI
  • Maintain a healthy posture
  • Take frequent breaks away from the computer (split up your work day)
  • Avoid exposure to noisy hardware
  • Avoid eye strain

Spidering 1012: How to transform malformed HTML into easy to use XML (XHTML)

Step 1, Use HTML Tidy to transform the HTML document into XHTML:
tidy -asxhtml < bad.html > good.html
Now, Tidy sometimes fails on bad data (such as binary code)! No worries: this is where you manually write a script that removes any bad data that HTML Tidy chokes on. You will need to do string replacement of the bad data with a blank string. You may find regular expressions useful where the string to replace varies. Then, pipe it into Tidy as usual!

Now we have nice clean XHTML that we can parse with an XML parser an manipulate very easily with DOM and XPath! Enjoy!

Spidering 1011: How to fetch an entire Web-site with wget

"GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies."

Basic wget usage is to fetch a single URL:
wget http://example.org/
But, we want to spider an entire site recursively! The manual is most useful for anybody who is competent:
man wget
You will want to use the --recursive and --level options.

Some HTTP daemons block strange user agents. You can masquerade as an ordinary browser with --user-agent.

Some Web scripts block requests that do not have a referrer (you must click a link to the URL and not access it directly). You can pretend that you were referred from a page with --referrer.

Other useful options for recursive spidering:
  • --accept/--reject: Specify comma-separated lists of file name suffixes or patterns to accept or reject.
  • --domains/--exclude-domains: Set domains to be followed.
  • --follow-tags/--ignore-tags: Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option.
  • --span-hosts: Enable spanning across hosts when doing recursive retrieving.
  • --no-parent: Do not ever ascend to the parent directory when retrieving recursively.
Off course, there are many more options that will be useful such as --include-directories any many more!

Lynx (a text-based Web browser) is also as useful tool.