Archive for September, 2008


Eclipse TPTP usage notes..

Sep 30, 2008 in Eclipse

Notes on setting profiling and performance monitoring in Eclipse with TPTP – Test and Performance Tools Platform.

Activate Vista administrator account ..

Sep 26, 2008 in Windows


  1. Logon to Vista using your usual account.
  2. Launch the cmd prompt – Make sure you select, ‘Run as administrator’
  3. Net user administrator p@ssw0rD
  4. Net user administrator /active:yes
  5. Switch User, or logoff
  6. Logon as Administrator Password p@ssw0rD

Java profiler, notes on tracking memory leaks..

Sep 10, 2008 in Eclipse, Java, JavaUsage

Free profiler for Eclipse .. this is no longer active.. use TPTP instead..

Notes on hunting memory leaks..

Notes from ‘Lucene in Action’ ..

Sep 07, 2008 in Books, Java, lucene

Lucene In Action
Lucene in Action

Ch. 1 – Introduction

  • Lucene is a high performance, scalable Information Retrieval (IR) library.
  • Lucene’s creator is Doug Cutting.
  • Creating an index – see ‘’ (in ‘Files’, top right tabs)
  • Indexing API:
    — IndexWriter
    — Directory (RAMDirectory)
    — Analyzer
    — Document
    — Field

  • Searching an index – see ‘’ (in ‘Files’, top right tabs)
  • Searching API:
    — IndexSearcher
    — Term
    — Query
    — TermQuery
    — Hits

Ch. 2 – Indexing

  • The Analyzer tasks:
    — Decompose text into tokens.
    — Remove ‘stop words’.
    — Reduces words to roots.
  • The ‘Inverted Index’ – an efficient method of finding documents
    that contain given words.
    In other words, instead of trying to answer the question “what words are contained
    in this document?” this structure is optimized for providing quick answers to
    “which documents contain word X?”
  • Lucene doesn’t offer an update(Document) method;
    instead, a Document must first be deleted from an index and then re-added to it.
  • Use ‘doc.setBoost(float)’ to adjust the importance of documents.
    Use ‘field.setBoost(float)’ to set level for fields.
  • Using indexable date/time fields to high resolution (milliseconds) may cause
    performance problems.
  • Use indexable numeric fields for range queries (store the size of email messages,
    for example).
  • Tuning indexing performance – system properties org.apache.lucene.X where X is:
    — mergeFactor – 10 – Controls segment merge frequency and size
    — maxMergeDocs – Integer.MAX_VALUE – Limits the number of documents per segment
    — minMergeDocs – 10 – Controls the amount of RAM used when indexing
  • Use ‘addIndexes(Directory[])’ to copy indexes from one IndexWriter to
    another – for example, from RAMDirectory to FSDirectory .
  • Limit Field sizes with maxFieldLength – default is 10K terms per document.
  • Optimizing an index
    — Merging segments
    — Optimizing an index only affects the speed of searches
    against that index, and does not affect the speed of indexing.
    — API invoke pattern:
    IndexWriter writer = new IndexWriter(“/path/to/index”, analyzer, false);
  • Ch. 3 – Search in applications

  • Scoring
    — tf(t in d) Term frequency factor for the term (t) in the document (d).
    — idf(t) Inverse document frequency of the term.
    — boost(t.field in d) Field boost, as set during indexing.
    — lengthNorm(t.field in d) Normalization value of a field, given the number of terms within the
    field. This value is computed during indexing and stored in the index.
    — coord(q, d) Coordination factor, based on the number of query terms the
    document contains.
    — queryNorm(q) Normalization value for a query, given the sum of the squared weights
    of each of the query terms.
  • Query types
    — TermQuery
    — RangeQuery
    — PrefixQuery
    — BooleanQuery
    — PhraseQuery
    — WildcardQuery
    — FuzzyQuery (the Levenshtein distance)
  • Ch. 4 – Analysis

  • Analysis operations:
    — Extract words
    — Discard punctuation
    — Remove accents from characters
    — Lowercase (also called normalizing),
    — Remove common words
    — Reduce words to a root form (stemming)
    — Change words into the basic form (lemmatization)

Penetration testing info ..

Sep 04, 2008 in Security