David Blei (Princeton)
Probabilistic Topic Models of Text and Users
Bin Yu (UC Berkeley)
Concise Comparative Summaries (CCS) of Large Text Corpora with a Human Experiment
In this talk, we propose a general framework for topic-specific summarization of large text corpora and illustrate how it can be used for the analysis of document collections. Our framework, concise comparative summarization (CCS), is built on sparse classification methods. It is a lightweight and flexible tool that offers a compromise between simple word frequency based methods currently in wide use and more heavyweight, model-intensive methods such as latent Dirichlet allocation (LDA). We argue that sparse methods have much to offer for text analysis and hope CCS opens the door for a new branch of research in this important field.
Using news articles from the New York Times, we validate our tool by designing and conducting a human survey to compare the different summarizers with human understanding. We demonstrate our approach with two case studies, a media analysis of the framing of “Egypt” in the New York Times throughout the Arab Spring and an informal comparison of the New York Times’ and Wall Street Journal’s coverage of “energy.” Overall, we find that the Lasso with L2 normalization can be effectively and usefully used to summarize large corpora, regardless of document size. Finally, I will present preliminary results in an on-going project to study legal opinion documents from the 9th Circut Court of Appeals, through CCS in combination with LDA.
(Most of the talk is based on the paper by Miratrix, Jia, Yu, Gawalt, El Ghaoui, Barnesmoore, and Clavier (2014) at http://imstat.org/aoas/next_issue.html)
Charles Manski (Northwestern)
Thomas Richardson (University of Washington)
Unifying the Counterfactual and Graphical Approaches to Causality via Single World Intervention Graphs (SWIGs)
Models based on potential outcomes, also known as counterfactuals, were introduced by Neyman (1923) and later applied to observational contexts by Rubin (1974). Such models are now used extensively within Biostatistics, Statistics, Political Science, Economics, and Epidemiology for reasoning about causation. Directed acyclic graphs (DAGs), introduced by Wright (1921) are another formalism used to represent causal systems. Graphs are also extensively used in Computer Science, Bioinformatics, Sociology and Epidemiology.
In this talk I will present a simple approach to unifying these two frameworks via a new graph, termed the Single-World Intervention Graph (SWIG). The SWIG encodes the counterfactual independences associated with a specific hypothetical intervention on a set of treatment variables. The nodes on the SWIG are the corresponding counterfactual random variables. The SWIG is derived from a causal DAG via a simple node splitting transformation. I will illustrate the theory with a number of examples. Finally I show that SWIGs avoid a number of pitfalls that are present in an alternative approach to unification, based on `twin networks’ that has been advocated by Pearl (2000).
This is joint work with James M. Robins (Harvard School of Public Health).
Victoria Stodden (Columbia)
When Should We Trust the Results of Data Science?