Wednesday 5 August 2009

Markov Logic

Markov Logic - a new graphical model for Natural Language Processing

Should study about this model as soon as possible!

--
Cheers,
Vu


Sunday 2 August 2009

ACL-IJCNLP'09 participation

Day 1 - 02/08/2009

Tutorial 1
: Topics in Statistical Machine Translation by Kevin Knight (ISI) and Philippe Koehn (Edinburgh Univ.)

Some sub-topics within this tutorial I need to be take into account are as follows:

- Minimum Bayesian Risk Decoding
- Re-evaluation of phrase-based SMT outputs
- MT system combination
- Efficient decoding (e.g. using cube pruning)
- Discriminative training with various features

I am looking for their slides (soft) for my further reference. If you have it, please share with me, thanks a lot!

Day 2 - 03/08/2009
Session 2B: Generation and Summarization 1

Talk 1: DEPEVAL (summ): Dependency-based Evaluation for Automatic Summaries by Karolina Owczarzakek
- the main idea is to use dependency relations to summary evaluation
- better in comparison with ROUGE (2004) and BE (2005)
+ Question: difference between DEPEVAL and BE?
Note:
- Lexical-Functional Grammar (e.g. two syntactic structures to one functional structure)
- LFG parser
+ Charniak-Johnson syntactic parser (2005)
+ LFG annotation (2008)

Talk 2: Summarizing Definition from Wikipedia by Shiren Ye
- raise new problem with their challenges in summarization of Wikipedia articles
+ recursive links
+ hidden information
- single-document summarization
- use existing approach named Document Concept Lattice (IPM 2007)

Talk 3: Automatically Generating Wikipedia Articles: A Structure-aware Approach by C. Sauper
- new problem in generating overview articles in Wikipedia using various resources crawled from the Internet
- template creation using clustering existing section topics in database
- proposed joint learning model that integrates Integer Linear Programming (ILP) into learning to optimize weights (for each section topic)
Note:
- evaluation of quality of generated articles is subjective (Prof. Hovy asked about this)!

Talk 4: Learning to tell tales: A Data-driven Approach to Story Generation by Neil McIntyre
- an interesting problem that results end-to-end generation system
- content selection (content) -> content planning (grammar) -> generation (use LM)
Note:
- how to evaluate the quality of generated stories in terms of coherence and interestingness?

Day 3 - 04/08/2009
Relax to save my energy to enjoy the interesting remaining sessions, especially in EMNLP!

Day 4 - 05/08/2009

Talk 1: SMS based Interface for FAQ Retrieval
- actually cannot follow the Indian guy who is speaker of this talk.

Talk 2: A Syntax-free Approach to Japanese Sentence Compression
- It is worthy noting some materials relevant to my current interest as follows:
+Intra-sentence positional term weighting
+Patched language modeling
- Analysis of human-made reference compression
-> very helpful to figure out challenges in specific problems!
- combinatorial optimization problem
-> used to do parameter optimization (MCE-Minimum Classification Error in this paper)
- Statistical significance using Wilcoxin sign T-test

Talk 3: Application-driven Statistical Paraphrase Generation
- use SMT-like techniques but propose some new models within noisy channel model
+ paraphrase model (adapt)
+ LM (re-use)
+ usability (propose)
- seems to be not compelling about error analysis (only exhibit the very good outputs of proposed system), and figure out which components in proposed framework are most influential?

Talk 4: Word or Phrase? Learning Which Unit to Stress for Information Retrieval
It seems to not interest me a lot, IR stuffs!

Talk 5: A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections
- Learn about query expansion techniques and how to integrate it into specific problem (blog post retrieval in this paper)
+ worthy noting query expansion based on external resources

Talk 6: An Optimal-Time Binarization Algorithm for Linear Context-Free Rewriting Systems with Fan-Out Two
- a lot of parsing-relevant stuffs (especially in algorithm complexity) in this talks that made me extremely confused!

Talk 7: A Polynominal-Time Parsing Algorithm for TT-MCTAG
- cannot understand any materials!

Talk 8: Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web
- quite simple idea (just in my opinion) based on observation from Wikipedia
- use dependency as key components

Closing session:
- very interesting and sometimes funny!
- Prof. Frederick Jelineks has been received the Lifetime Achievement Award and then following by interesting talks about his biography sketch.
- announcements about future NLP conferences (COLING'10, ACL'10, ACL'11, NAACL'10, IJCNLP'10, LREC'10)
- announcements about best paper awards (see more details in ACL-IJCNLP'09 website)

Exhausted but helpful day. Prepare for next days with ACL workshops and EMNLP sessions!

Day 5 - 06/08/2009
Talk 1 (invited talk): Query-focused Summarization Using Text-to-Text Generation: when Information Comes from Multilingual Sources by Kathleen R. McKeown
- This is the first time I have seen the face of Prof. Kathleen McKeown who was supervisor of my current supervisor (A/P Min-Yen KAN) hehe.
Some main points:
- typical approach for query-based summarization:
+ choose key sentences (word freq, position, clue words)
+ matches of query term against sentence terms
=> leads to
+ irrelevant sentences
+ sentences placed out of context -> misconceptions
<= - generate new sentences from selected phrases + fluent sentences -> disfluent sentences
+ edit references to people (focus mainly on names)
- remove irrelevant sentences using sentence simplification
+ project DARPA GALE
+ interactive question user input
- NIGHTINGALE
+ use Wikipedia to expand query
+ consider name translation in multilingual resources
+ better if operating over phrases
- GLARF parser from NYU
- long sentences -> shorter sentences using sentence simplification
- redundancy detection => pairwise similarity across all sentences to identify concepts
+ alignment of dependency parses --> hypergraph
+ BOW
- future research direction: text generation for QA

Talk 2: A Classification Algorithm for Predicting the Structure of Summaries
- Interesting motivated question: how to "paste" selected sentences during abstracting?
- abstracting
+ some of materials not present
+ be modeled by cut-and-paste operations (Mani, 01)
- use specific verbs (predicates), for example: present, conclude, include, ...
- language tools
+ GATE (POS, morpho)
+ SUPPLE parser

Talk 3: Entity Extraction via Ensemble Semantics
- web -> entities -> top-PMI (point-wise mutual information) entities

Talk 4: Clustering to Find Exemplar Terms for Keyphrase Extraction
- relatedness
+ co-occurrence (statistics)
+ Wikipedia-based (e.g. PMI)

Day 6 - 07/08/2009
TBA

Just for taking notes!

--
Cheers,
Vu