Amit Chakrabarti (Dartmouth College, Computer Science): Big Data, Communication Games, and an Inverse-Square Law
Thursday May 12, 2016, 5.30 PM - 6.30 PM, Kemeny 108
It is now common knowledge that we live in an era of "Big Data".
Science, engineering, technology, and even the routine activities of modern life are producing increasingly large data streams, at petabyte or exabyte scales. At these scales, what used to be routine algorithmic tasks for "small data"---such as estimating basic statistics of a population or understanding the connectivity structure of a graph---may now be challenging problems, with new theoretical principles needed to understand and solve them. Most of my work focuses on building such theoretical principles.
One key result, that I shall highlight in this talk, is an inverse-square law that can be summarized as follows. The working memory required for the statistical and graph-theoretic estimation tasks mentioned above need only grow sub-linearly in the input size (enabling efficient processing of big data) but must grow as the inverse square of the estimation error (revealing a fundamental computational limit).
Communication games---where two or more players collaborate to compute on a massively-long input distributed amongst them---play a crucial role in establishing such theoretical principles of big data analysis. I shall demonstrate this connection with several examples, and give a brief overview of the mathematics behind the above inverse-square law.
Lorenzo Torresani (Dartmouth College, Computer Science): Computer Vision with Big Weakly-Labeled Data
Friday Feb 12, 2015, 6:00 PM - 7:00 PM, Kemeny 108
Most modern computer vision methods employ a strongly supervised learning paradigm that requires training on massive collections of richly-labeled images. These rich labels are provided by either human annotators or auxiliary sensors. Unfortunately, the reliance on time-consuming human labeling or sensory data collection greatly limits the applicability of these methods to new settings or novel domains. I will discuss the idea of eliminating or reducing the need for rich labels by leveraging existing large repositories of weakly-labeled images, i.e., photos annotated only with class labels indicating which objects are present but not their location. This talk reports on joint work with Loris Bazzani, Alessandro Bergamo, Drago Anguelov, Haris Baig.
David Qian (MD/PhD Candidate, Geisel School of Medicine): Statistical Genetics of Lung Cancer Risk: a Pathway-Based Approach
Wednesday April 29, 2015, 6:00 PM - 7:00 PM, Kemeny 006
For most diseases, strong single-gene effects are the exception, not the rule. Genome-wide association studies (GWAS) have identified hundreds of risk-conferring germline mutations for common cancers, such as lung cancer, breast cancer, and prostate cancer. However, each individual mutation explains only a small fraction of phenotypic variation and is therefore a poor predictor of cancer development. The mutations' biomolecular mechanisms of augmenting cancer risk are also usually poorly understood. I conduct "pathway analysis" to evaluate the joint effects of many mutations in the context of cellular pathway disturbances. In contrast to the conventional study of single mutation-gene-protein influences, this approach captures how overall intracellular functions may be affected by groups of mutations without particular emphasis on any individual mutation. By coupling GWAS results with datasets of tissue-specific protein interactions and pathways, I identify pathways that are statistically enriched with the protein products of genes whose sequence or expression levels are altered by cancer-associated mutations. These derived pathways offer not only greater biological insights into cancer development, but also a more meaningful way to characterize patient risk in the clinic compared to existing gene panels. The two most common subtypes of lung cancer will be used for pathway analysis demonstrations.
Jane Wang (Cornell University): Informal Lunch
6th Nov. 2014, 12:00pm-1:30pm, Kemeny 100C
Drop by anytime to have Boloco and talk with Jane about anything mathematical.
Lauren M. Childs ( Center for Communicable Disease Dynamics (CCDD) at the Harvard School of Public Health (HSPH)): Phenotypic variation allows for heighted pathogen virulence
22nd Oct. 2014, 5:00pm-6:00pm, Kemeny 004
Theoretical frameworks for understanding why some pathogens cause virulence in their hosts suggest that pathogens evolve to maximize their reproductive number, a quantity inextricably linked to pathogen growth within the host. Since pathogen-induced mortality is an unavoidable consequence of increased pathogen growth and transmission, a trade-off typically emerges, leading to maximized fitness at an intermediate virulence. While these frameworks are a useful starting point to think about transmission-virulence tradeoffs, they implicitly assume that each pathogen is associated with only a single virulence phenotype encoded by the genotype and need to be re-evaluated in the context of pathogens that exhibit variable phenotypes in identical hosts, such as the diversity of clinical syndromes observed in malaria infections. We develop a theoretical model to test whether a single genotype that gives rise to multiple disease phenotypes can account for the existence of heterogeneous disease outcomes. We find that variation in gene-specific virulence and transmission within a single strain can indeed contribute to its success, expanding regions of coexistence and dominance in competition with a strain having a single optimized virulence. Our results further demonstrate that expressing multiple virulence phenotypes, even when one is highly virulent, can be advantageous for a strain.