Office: 421A Thomas Bldg. Phone: 814-863-4918 Email: sesa at psu dot edu

Statistical approaches to data privacy and confidentiality

Statistical disclosure limitation (SDL) applies statistical tools to the problem of limiting releases of sensitive information about individuals and groups that are part of statistical databases while allowing for proper statistical inference. A major theme of SDL deals with the tradeoff between disclosure risk and data utility. This work is related to privacy-preserving data mining (PPDM). The research reported here has been in part supported by NSF Grant SES-0532407 awarded to the project titled "Statistical Disclosure Limitation Methods for Tabular Data", and by NSF BCS-0941553 to the collaborative project titled " CDI-Type II: Collaborative Research: Integrating Statistical and Computational Approaches to Privacy ".

Privacy Preserving GWAS Data Sharing. (with Uhler, C. and Fienberg, S.). Accepted.

Fibers of multi-way contingency tables given conditionals: relations to marginals, cell bounds and Markov bases. (with Zhu, X. and Petrovic, S). Submitted.

Partial Information Releases for Confidential Contingency Table Entries: Present and Future Research Efforts. Submitted.

Synthetic Two-Way Contingency Table Preserving Conditional Frequencies. (with J. Lee). Statistical Methodology, (2009) -- to appear.

Differential Privacy for Clinical Trial Data: Preliminary Evaluations. (with D. Vu). Proceedings of the International workshop on Privacy Aspects of Data Mining, PADM09, (2009) -- to appear.

Algebraic Geometry of 2 × 2 Contingency Tables. (with S. Fienberg). In Algebraic and Geometric Methods in Statistics, (2009), 63-81.

Valid Statistical Analysis for Logistic Regression with Multiple Sources. (with S. Fienberg, Y. Nardi) In Protecting Persons While Protecting the People. Lecture Notes in Computer Science No. 5661, (2009), 82-94.

Algebraic Statistics and Contingency Table Problems: Log-linear models, likelihood estimation, and disclosure limitation. (with Dobra, A., Fienberg, S.E., Rinaldo, A., Slavkovic, A. and Zhou, Y.). In Emerging Applications of Algebraic Geometry: IMA Volumes in Mathematics and its Applications, 148, (2009), 63-88.

Cell Bounds in Two-Way Contingency Tables Based on Conditional Frequencies. (with B. Smucker). In Privacy in Statistical Databases, Lecture Notes in Computer Science No.5262, (2008), 64-77.

A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries. (with S. Fienberg). In Privacy-Preserving Data Mining: Models and Algorithms, Vol. 34, (2008), 291-312.

Coding: Statistical Data Masking Techniques. In Encyclopedia of Quantitative Risk Assessment. Vol 1, (2008).

"Secure” Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases. (with Nardi, Y., Tibbits, M.M). Proceedings of Workshop on Privacy and Security Aspects of Data Mining, (2007), 723-728.

"Secure” Log-Linear and Logistic Regression Analysis of Distributed Databases. (with Fienberg, S.E, Fulp, W.J., Wrobel, T.). In Privacy in Statistical Databases, Lecture Notes in Computer Science No.4302, (2006), 277-290.

The Space of Compatible Full Conditionals is a Unimodular Toric Variety. (with Sullivant, S.) Special Issue of Journal of Symobolic Computation. 41(2), (2006), 196-209.

Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules. (with Fienberg, S.E.). Data Mining and Knowledge Discovery Journal. 11(2), (2005), 155-180.

Statistical Disclosure Limitation with Released Marginals and Conditionals for Contingency Tables. Proceedings of Workshop on Privacy and Security Aspects of Data Mining ICDM ’04, (2004), 13-20.

Bounds for Cell Entries in Two- Way Tables Given Conditional Frequencies. (with Fienberg, S.E.) Proceedings of Privacy in Statistical Databases 2004. Lecture Notes in Computer Science No.3050, (2004), 30-43.

Making the Release of Confidential Data from Multi-Way Tables Count. (with Fienberg, S.E.). Chance. Vol.17, 3, (2004), 5-10.


Application of statistics in HCI and CSCW

Human computer Interaction (HCI) and Computer Supported Cooperative Work (CSCW) are young scientific communities (1980s on) that lie at the interface of computer science, psychology, design and usability studies. We explore applicability and validity of current statistical methodology and consider development of new methods needed to address the complexity that arise in modeling data from this context.

Supporting Knowledge Sharing and Awareness in Distributed Emergency Management Planning: A Design Research Project". (with with Convertino, G., Mentis, H.M., Rosson, M\ .B., Carroll, J.M.). ACM Transactions on Computer-Human Interaction (TOCHI). Special Issue on Media and Collaborative Systems for Crisis Management. Eds. Hiltz, R. and Diaz.P. - to appear 2011.

Supporting Content and Process Common Ground in Computer - Supported Teamwork. (with Convertino, G., Mentis, H.M., Rosson, M.B., Carroll, J.M. In Proceedings of the 27th international conference on Human factors in computing systems, (2009), 2339-2348.

Articulating Common Ground in Cooperative Work: Content and Process. (with Convertino G., Mentis H., Rosson M.B., Carroll J.M., Ganoe, C.H.). In Proceeding of the 26th annual SIGCHI conference on Human factors in computing systems, (2008), 1637-1646.

Using a Large Projection Screen as an Alternative to Head Mounted Displays for Virtual Environments. (with Patrick, E., Cosgrov, D., Rode, J.A., Verratti, T., Chiselko, G. Proceedings of the ACM CHI 2000 Conf. on Human Factors in Computing Systems, (2000), 478-485.

Novice Heuristics Evaluation of a Complex Interface. (with Cross, K.). Extended Abstracts of the ACM CHI 1999 Conference on Human Factors in Computing Systems, (1999), 96-101.

Statistical Modeling of Activity Awareness in Computer Supported Cooperate Work. (with Convertion, G., Rosson, M.B. and Carrol, J.). To be submitted to Annals of Applied Statistics.

Measuring Activity Awareness. (with Convertino G., Rosson M.B., and Carroll J.M.). To be submitted to ACM Transactions on Computer-Human Interaction (TOCHI).

Content and process common ground in computer-supported teamwork. (with Convertino G., Mentis H.,Rosson M.B., Carroll J.M.). To be submitted to Human-Computer Interaction journal.

Other projects include

Causal inference in transportation safety studies

Causal Inference in Transportation Safety Studies: Comparison of the Potential Outcomes and Causal Bayesian Networks. (with Karwa, V., Donnel, E.T.). Annals of Applied Statistics. Vol 5, Issue 2B, pp. 1428-1455.

Resolving Isoform Expression using Digital gene Expression Data. (with Altman, Wang, Karwa) in Journal of Indian Society of Agricultural Statistics. Special Issue on Statistical Genomics. (2010). Vol. 64, Issue 1, pages 19-31.

Scientific validity of the polygraph, and evaluation of polygraph data.

Automated Scoring of Polygraph Data. In Statistical Data Mining and Knowledge Discovery, (2004), 135-155.

Evaluating Polygraph Data. Technical Report 766. Department of Statistics. Carnegie Mellon University, (2002).

Usability methods in HCI and virtual environments.

Survey data analysis on cessation of smoking





Research interests: statistical disclosure limitation, algebraic statistics, characterization of discrete distributions, application of statistics to social sciences.


1940's U.S. Census confidentiality poster



































CSCW project