Yanchuan Sim

Graduate student
Carnegie Mellon University
Language Technologies Institute (GHC 5719)
5000 Forbes Ave
Pittsburgh, PA 15213

Advisor: Noah Smith
Research group: Noah's Ark

Contact
email | +1 (217) 703-4454


Publications (BibTeX)

A Probabilistic Model for Canonicalizing Named Entity Mentions [PDF]
Dani Yogatama, Yanchuan Sim, Noah A. Smith
Annual Meeting of the Association for Computational Linguistics (ACL 2012). Jul, 2012. Jeju, Korea.

Discovering Factions in the Computational Linguistics Community [PDF]
Yanchuan Sim, Noah A. Smith, David A. Smith
Annual Meeting of the Association for Computational Linguistics (ACL 2012) Rediscovering 50 Years of Discoveries WorkshopJeju, Korea. Jul, 2012.

Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling [PDF]
Wei Zhang, Yanchuan Sim, Jian Su, Chew Lim Tan
International Joint Conferences on Artificial Intelligence (IJCAI 2011). Barcelona, Spain. Jul, 2011.

NUS-I2R: Learning a Combined System for Entity Linking [PDF]
Wei Zhang, Yanchuan Sim, Jian Su, Chew Lim Tan
Text Analysis Conference (TAC 2010). Gaithersburg, MD, USA. Nov, 2010.



Projects

CS 499 Senior Thesis: Inducing lexical clusters from unannotated corpus through parsing [PDF]
Yanchuan Sim (2010)

CS 598 Advanced NLP Final Project: Learning a Factorial HMM for Joint Sequence Labeling [PDF] [Poster]
Yanchuan Sim (2010)

Reliable DVD Video Multicast using Network Coding [PDF]
Yanchuan Sim (2008)
2008 Summer attachment at Institute for Infocomms Research


Code
An assortment of tools and code that I use for my projects.

Ark-SAGE is a Java library that implements the L1-regularized version of Sparse Additive GenerativE models of Text (Einsenstein et al, 2011). SAGE is an algorithm for learning sparse representations of text (you can read more about it here).

A growing collection of handy utility modules for NLP with Python (mainly data processing related).

A basic Python script that handles compiling of LaTeX and related files. It supports compiling LaTeX, gnuplot and eps files, and displaying it to the console in a clean manner.

A Java library built on top of Java Simple Argument Parser (JSAP) that allows the use of a default "configuration file" using the --config-file option.



Comments