Lucene Index Toolbox (Luke): http://www.getopt.org/luke/ -> This tool is very helpful for us to deal with functionality of Lucene search engine. It supports index, document browsing, search, ... with graphical UI (cross-platform).
-> In this article, the author tried to implement his own summarizer based mainly on two simple summarization algorithms, namely Classifier4J (C4J) and Open Text Summarizer (OTS) using Lucene, an open-source source engine API.
2) Lucene Analyzer, Tokenizer and TokenFilter: http://mext.at/?p=26 -> how to use analyzer, tokenizer, filter in Lucene
3) Lucene Indexing and Document Scoring: (googling with the keyword "lucene indexing and document scoring") -> contains some basic concepts and definitions in Lucene under comprehensive explanation.
The idea automatically creating technical surveys using AI algorithms seems to be interesting but quite ambitious (according to my understanding). This is a inter-disciplinary research combining the various techniques in Natural Language Processing, Natural Language Understanding as well as Natural Language Generation. To some extent, it is really hard, still far away from present :D.
See more in the newest paper at NAACL'09 "Using Citations to Generate Surveys of Scientific Paradigms"! Initially, the authors use existing citation contexts of articles combining with state-of-the-arts techniques (e.g. Trimmer, LexRank, C-LexRank, C-RR) in extractive multi-document summarization (almost in news domain) to generate the surveys. They also concluded some important points as follows: - approaches in other domains applied in the scientific extent can produce satisfactory results - citation contexts and abstracts contain much more useful information for summaries than full texts in papers
My comments on this are as follows: - the specific features of scientific survey articles are not used yet. For example: the structure of technical surveys, topic coherence, ... - information fusion. Different citation contexts may contain overlapping information. How to pinpoint them?