Day 1 - 02/08/2009
Tutorial 1: Topics in Statistical Machine Translation by Kevin Knight (ISI) and Philippe Koehn (Edinburgh Univ.)
Some sub-topics within this tutorial I need to be take into account are as follows:
- Minimum Bayesian Risk Decoding
- Re-evaluation of phrase-based SMT outputs
- MT system combination
- Efficient decoding (e.g. using cube pruning)
- Discriminative training with various features
I am looking for their slides (soft) for my further reference. If you have it, please share with me, thanks a lot!
Day 2 - 03/08/2009Session 2B: Generation and Summarization 1
Talk 1: DEPEVAL (summ): Dependency-based Evaluation for Automatic Summaries by Karolina Owczarzakek
- the main idea is to use dependency relations to summary evaluation
- better in comparison with ROUGE (2004) and BE (2005)
+ Question: difference between DEPEVAL and BE?
Note:
- Lexical-Functional Grammar (e.g. two syntactic structures to one functional structure)
- LFG parser
+ Charniak-Johnson syntactic parser (2005)
+ LFG annotation (2008)
Talk 2: Summarizing Definition from Wikipedia by Shiren Ye
- raise new problem with their challenges in summarization of Wikipedia articles
+ recursive links
+ hidden information
- single-document summarization
- use existing approach named Document Concept Lattice (IPM 2007)
Talk 3: Automatically Generating Wikipedia Articles: A Structure-aware Approach by C. Sauper
- new problem in generating overview articles in Wikipedia using various resources crawled from the Internet
- template creation using clustering existing section topics in database
- proposed joint learning model that integrates Integer Linear Programming (ILP) into learning to optimize weights (for each section topic)
Note:
- evaluation of quality of generated articles is subjective (Prof. Hovy asked about this)!
Talk 4: Learning to tell tales: A Data-driven Approach to Story Generation by Neil McIntyre
- an interesting problem that results end-to-end generation system
- content selection (content) -> content planning (grammar) -> generation (use LM)
Note:
- how to evaluate the quality of generated stories in terms of coherence and interestingness?
Day 3 - 04/08/2009Relax to save my energy to enjoy the interesting remaining sessions, especially in EMNLP!
Day 4 - 05/08/2009Talk 1: SMS based Interface for FAQ Retrieval
- actually cannot follow the Indian guy who is speaker of this talk.
Talk 2: A Syntax-free Approach to Japanese Sentence Compression
- It is worthy noting some materials relevant to my current interest as follows:
+Intra-sentence positional term weighting
+Patched language modeling
- Analysis of human-made reference compression
-> very helpful to figure out challenges in specific problems!
- combinatorial optimization problem
-> used to do parameter optimization (MCE-Minimum Classification Error in this paper)
- Statistical significance using Wilcoxin sign T-test
Talk 3: Application-driven Statistical Paraphrase Generation
- use SMT-like techniques but propose some new models within noisy channel model
+ paraphrase model (adapt)
+ LM (re-use)
+ usability (propose)
- seems to be not compelling about error analysis (only exhibit the very good outputs of proposed system), and figure out which components in proposed framework are most influential?
Talk 4: Word or Phrase? Learning Which Unit to Stress for Information Retrieval
It seems to not interest me a lot, IR stuffs!
Talk 5: A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections
- Learn about query expansion techniques and how to integrate it into specific problem (blog post retrieval in this paper)
+ worthy noting query expansion based on external resources
Talk 6: An Optimal-Time Binarization Algorithm for Linear Context-Free Rewriting Systems with Fan-Out Two
- a lot of parsing-relevant stuffs (especially in algorithm complexity) in this talks that made me extremely confused!
Talk 7: A Polynominal-Time Parsing Algorithm for TT-MCTAG
- cannot understand any materials!
Talk 8: Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web
- quite simple idea (just in my opinion) based on observation from Wikipedia
- use dependency as key components
Closing session:
- very interesting and sometimes funny!
- Prof. Frederick Jelineks has been received the Lifetime Achievement Award and then following by interesting talks about his biography sketch.
- announcements about future NLP conferences (COLING'10, ACL'10, ACL'11, NAACL'10, IJCNLP'10, LREC'10)
- announcements about best paper awards (see more details in
ACL-IJCNLP'09 website)
Exhausted but helpful day. Prepare for next days with ACL workshops and EMNLP sessions!
Day 5 - 06/08/2009Talk 1 (invited talk): Query-focused Summarization Using Text-to-Text Generation: when Information Comes from Multilingual Sources by Kathleen R. McKeown
- This is the first time I have seen the face of Prof. Kathleen McKeown who was supervisor of my current supervisor (A/P Min-Yen KAN) hehe.
Some main points:
- typical approach for query-based summarization:
+ choose key sentences (word freq, position, clue words)
+ matches of query term against sentence terms
=> leads to
+ irrelevant sentences
+ sentences placed out of context -> misconceptions
<= - generate new sentences from selected phrases + fluent sentences -> disfluent sentences
+ edit references to people (focus mainly on names)
- remove irrelevant sentences using sentence simplification
+ project DARPA GALE
+ interactive question user input
- NIGHTINGALE
+ use Wikipedia to expand query
+ consider name translation in multilingual resources
+ better if operating over phrases
- GLARF parser from NYU
- long sentences -> shorter sentences using sentence simplification
- redundancy detection => pairwise similarity across all sentences to identify concepts
+ alignment of dependency parses --> hypergraph
+ BOW
- future research direction: text generation for QA
Talk 2: A Classification Algorithm for Predicting the Structure of Summaries
- Interesting motivated question: how to "paste" selected sentences during abstracting?
- abstracting
+ some of materials not present
+ be modeled by cut-and-paste operations (Mani, 01)
- use specific verbs (predicates), for example: present, conclude, include, ...
- language tools
+ GATE (POS, morpho)
+ SUPPLE parser
Talk 3: Entity Extraction via Ensemble Semantics
- web -> entities -> top-PMI (point-wise mutual information) entities
Talk 4: Clustering to Find Exemplar Terms for Keyphrase Extraction
- relatedness
+ co-occurrence (statistics)
+ Wikipedia-based (e.g. PMI)
Day 6 - 07/08/2009TBA
Just for taking notes!
--
Cheers,
Vu