HOANG Cong Duy Vu's research logs: 2009-07-26

Thursday, 30 July 2009

JAVA stuffs

Of courses, a lot of available sites concerning this, the below is one of them:

http://www.java2s.com/

(to be updated!)

--
Cheers,
Vu

Wednesday, 29 July 2009

Porter's Stemming Algorithm Online

http://maya.cs.depaul.edu/~classes/ds575/porter.html

Useful for quick reference to Porter Stemming Algorithm for English!

--
Cheers,
Vu

Tuesday, 28 July 2009

These days, I am trying to use Lucene for my own research purpose. I figure out here some stuffs that may be relevant and useful:

Lucene in general: http://lucene.apache.org
-> you can find out more detail on this site!

Lucene Index Toolbox (Luke): http://www.getopt.org/luke/
-> This tool is very helpful for us to deal with functionality of Lucene search engine. It supports index, document browsing, search, ... with graphical UI (cross-platform).

Great articles:

1) Summarization with Lucene: http://sujitpal.blogspot.com/2009/02/summarization-with-lucene.html

-> In this article, the author tried to implement his own summarizer based mainly on two simple summarization algorithms, namely Classifier4J (C4J) and Open Text Summarizer (OTS) using Lucene, an open-source source engine API.

2) Lucene Analyzer, Tokenizer and TokenFilter: http://mext.at/?p=26
-> how to use analyzer, tokenizer, filter in Lucene

3) Lucene Indexing and Document Scoring: (googling with the keyword "lucene indexing and document scoring")
-> contains some basic concepts and definitions in Lucene under comprehensive explanation.

4) Understanding Lucene Scoring: http://www.opensourcereleasefeed.com/article/show/understanding-lucene-scori

5) Lucene Query Syntax: http://lucene.apache.org/java/2_3_2/queryparsersyntax.html (replace the version "2_3_3" if you are using newer ones)

(to be continued ...)

--
Cheers,
Vu

IBM Many Aspects Document Summarization Tool

http://www.alphaworks.ibm.com/tech/manyaspects

--
Cheers,
Vu

Monday, 27 July 2009

AI softwares

Summarization: http://summarizer.intellexer.com/index.html
Extractor: http://www.extractor.com/

Surprising! AI softwares indeed!

--
Cheers,
Vu

Keywords Co-Occurrence and Semantic Connectivity

http://www.miislita.com/semantics/c-index-1.html

I would like to adapt some techniques mentioned in this article to the problem of keyword co-occurrence in scientific domain (e.g. ACL Anthology).

--
Cheers,
Vu

Sunday, 26 July 2009

Brown Coherence Toolkit

Link for download: http://www.cs.brown.edu/~melsner/egrid-distr.tgz
Manual: http://www.cs.brown.edu/~melsner/manual.html
Link to the author: http://www.cs.brown.edu/~melsner/

--
Cheers,
Vu

iOPENER

http://tangra.si.umich.edu/clair/iopener/index.html

The idea automatically creating technical surveys using AI algorithms seems to be interesting but quite ambitious (according to my understanding). This is a inter-disciplinary research combining the various techniques in Natural Language Processing, Natural Language Understanding as well as Natural Language Generation. To some extent, it is really hard, still far away from present :D.

See more in the newest paper at NAACL'09 "Using Citations to Generate Surveys of Scientific Paradigms"! Initially, the authors use existing citation contexts of articles combining with state-of-the-arts techniques (e.g. Trimmer, LexRank, C-LexRank, C-RR) in extractive multi-document summarization (almost in news domain) to generate the surveys. They also concluded some important points as follows:
- approaches in other domains applied in the scientific extent can produce satisfactory results
- citation contexts and abstracts contain much more useful information for summaries than full texts in papers

My comments on this are as follows:
- the specific features of scientific survey articles are not used yet. For example: the structure of technical surveys, topic coherence, ...
- information fusion. Different citation contexts may contain overlapping information. How to pinpoint them?

--
Cheers,
Vu

Clair library

The Clair Library - A Perl package for Natural Language Processing, Information Retrieval and Network Analysis.

Just a note for further reference!

--
Cheers,
Vu

Scholarship links

Sites to seek for scholarships for different levels (undergraduate, master, PhD):

1)
http://scholarship-position.blogspot.com/

2)
http://scholarshipsboard.com/

--
Cheers,
Vu

HOANG Cong Duy Vu's research logs