August 2016: I graduated from Penn State and no longer maintain or update this page. It will remain available until the powers that be decide to take it down. For my updated page please visit kylewilliams.org.



News


- March 2016: Our papers on answers and abandonment in mobile search as well as predicting satisfaction with intelligent agents have been accepted at SIGIR 2016.
- January 2016: Our tutorial on Information Extraction for Scholarly Digital Libaries has been accepted for presentation at JCDL 2016.
- December 2015: Our paper on Detecting Good Abandonment in Mobile Search has been accepted at WWW 2016.
- Sepetember 2015: Our paper on using SimSeerX to detect fake scientific papers was accepted as a poster at SISAP 2015.
- February 2014: I will be doing a research internship with Microsoft Bing this summer.
- February 2014: SimSeerX now indexes Wikipedia and supports non-academic documents. Check it out at http://simseerx.ist.psu.edu.


Summary


I am Ph.D. candidate at The Pennsylvania State University advised by Dr. C. Lee Giles. My research interests include information retrieval, digital libraries and applications of machine learning. Since beginning my Ph.D. I've interned at Microsoft and Oracle Labs; achieved the highest F1 score among all participants in the source retrieval task for plagiarism detection at PAN 2013 and PAN 2014; placed 2nd and 3rd at the Penn State Graduate Exhibition in successive years; and presented my work at several conferences. My work primarily involves investigating the use of machine learning and text inspection techniques for managing data collections and I'm also involved in the architecture, planning and day to day running of the CiteSeerX digital library.

Resources



Education


The Pennsylvania State University (2012-present)
PhD in Information Sciences and Technology, advised by Prof. Lee Giles.

University of Cape Town (2010-2012)
Master of Science in Computer Science by Dissertation, advised by A/Prof. Hussein Suleman.
Degree awarded with distinction

University of Cape Town (2006-2009)
Bachelor of Business Science in Management Studies in the field of Computer Science.
Degree awarded with second class division two honours

Publications


Click here for list of my publications on DBLP
Click here for my ACM author profile page

Journals


2015

[1] Jian Wu, Kyle Williams, Hung-Hsuan Chen, Madian Khabsa, Cornelia Caragea, Suppawong Taurob, Alexander Ororbia, Douglas Jordan, Prasenjit Mitra, C. Lee Giles. 2015. CiteSeerX: AI in a Digital Library Search Engin. In: Artificial Intelligence Magazine (AI Magazine) 36(3), pages 35-48.

Conferences/Workshops


2016

[2] Kyle Williams, Julia Kiseleva, Aidan C. Crook, Imed Zitouni, Ahmed Hassan Awadallah, Madian Khabsa. 2016. Detecting Good Abandonment in Mobile Search. To appear in: Proceedings of the 2016 International World Wide Web Conference (WWW '16).
GET: [PDF] [ACM] [Slideshare]

[3] Kyle Williams, Julia Kiseleva, Aidan C. Crook, Imed Zitouni, Ahmed Hassan Awadallah, Madian Khabsa. 2016. Is This Your Final Answer? Evaluating the Effect of Answers on Good Abandonment in Mobile Search. To appear in: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16).

[4] Kyle Williams, C. Lee Giles. 2016. Improving Similar Document Retrieval Using a Recursive Pseudo Relevance Feedback Strategy. To appear in: Proceedings of the 2016 International Joint Conference on Digital Libraries (JCDL '16).

[5] Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah, Imed Zitouni, Aidan C. Crook, Tasos Anastasakos. 2016. Predicting User Satisfaction with Intelligent Assistants. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16), pages 495-505.

[6] Kyle Williams, Jian Wu, Zhaohui Wu, C. Lee Giles. 2016. Information Extraction for Scholarly Digital Libraries. To appear in: Proceedings of the 2016 International Joint Conference on Digital Libraries (JCDL '16)
Tutorial.

[7] Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah, Imed Zitouni, Aidan C. Crook, Tasos Anastasakos. 2016. Understanding User Satisfaction with Intelligent Assistants. In: ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR '16), pages 121-130.
GET: [ACM] [Slideshare] [PDF]

[8] Chen Liang, Shuting Wang, Zhaohui Wu, Kyle Williams, Bart Pursel, Benjamin Brautigam, Sherwyn Saul, Hannah Williams, Kyle Bowen, C. Lee Giles. 2016. BBookX: Building Online Open Books for Personalized Learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI '16).

2015

[9] Kyle Williams, C. Lee Giles. 2015. On the Use of Similarity Search to Detect Fake Scientific Papers. In: Proceedings of the 2015 International Conference on Similarity Search and Applications (SISAP '15), pages 1-7.
GET: [PDF]

[10] Chen Liang, Shuting Wang, Zhaohui Wu, Kyle Williams, Bart Pursel, C. Lee Giles. 2015. BBookX: An Automatic Book Creation Framework. In: Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng '15), pages 121-123.
GET: [PDF]

[11] Alexander Ororbia, Jian Wu, Madian Khabsa, Kyle Williams, C. Lee Giles. 2015. Big Scholarly Data in CiteSeerX: Information Extraction from the Web. In: BigScholar, The Second WWW Workshop on Big Scholarly Data, pages 597-602.
GET: [ACM] [PDF]

[12] Jian Wu, Jason Killian, Huaiyu Yang, Kyle Williams, Sagnik Ray Choudhury, Suppawong Taurob, C. Lee Giles. 2015. PDFMEF: A Multi-Entity Knowledge Extraction Framework for Scholarly Documents and Semantic Search. In: Proceedings of the 8th International Conference on Knowledge Capture (K-Cap '15)
Best Paper Nomination.
GET: [PDF]

[13] Shuting Wang, Chen Liang, Zhaohui Wu, Kyle Williams, Bart Pursel, C. Lee Giles. 2015. Concept Hierarchy Extraction from Textbooks. In: Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng '15).
GET: [PDF]

2014

[14] Kyle Williams, Jian Wu, C. Lee Giles. 2014. SimSeerX: A Similar Document Search Engine. In: Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng '14), pages 143-146.
GET: [ACM] [PDF] [Slides]

[15] Kyle Williams, Hung-Hsuan Chen, C. Lee Giles. 2014. Classifying and Ranking Search Engine Results as Potential Sources of Plagiarism. In: Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng '14), pages 97-106.
GET: [ACM] [PDF] [Slides]

[16] Kyle Williams, Lichi Li, Madian Khabsa, Jian Wu, Patrick C. Shih, C. Lee Giles. 2014. A Web Service for Scholarly Big Data Information Extraction. In: 21st IEEE International Conference on Web Services.
GET: [PDF] [Slides]

[17] Kyle Williams, Jian Wu, Sagnik Ray Choudhury, Madian Khabsa, C. Lee Giles. 2014. Scholarly Big Data Information Extraction and Integration in the CiteSeerX Digital Library. In: 10th International Workshop on Information Integration on the Web, pages 68-73.
GET: [IEEEXplore] [PDF] [Slides]

[18] Jian Wu, Kyle Williams, Hung-Hsuan Chen, Madian Khabsa, Cornelia Caragea, Alexander Ororbia, Douglas Jordan, C. Lee Giles. 2014. CiteSeerX: AI in a Digital Library Search Engine. In: Twenty sixth Annual Conference on Innovative Applications of Artificial Intelligence, pages 2930-2937.
GET: [AAAI] [PDF]

[19] Kyle Williams, Hung-Hsuan Chen, C. Lee Giles. 2014. Supervised Ranking for Plagiarism Source Retrieval - Notebook for PAN at CLEF 2014. In: CLEF 2014 Evaluation Labs and Workshop Working Notes Papers
Highest F1-score in Source Retrieval task at PAN 2014.
GET: [PDF]

[20] Jian Wu, Pradeep Teregowda, Kyle Williams, Madian Khabsa, Douglas Jordan, Eric Tree, Zhaohui Wu, C. Lee Giles. 2014. Migrating a Digital Library to a Private Cloud. In: IEEE International Conference on Cloud Engineering.
GET: [IEEEXplore] [PDF]

[21] Cornelia Caragea, Jian Wu, Kyle Williams, Sujatha G. Das, Madian Khabsa, Pradeep Teregowda, C. Lee Giles. 2014. Automatic Identification of Research Articles from Crawled Documents. In: Web-Scale Classification: Classifying Big Data from the Web (Workshop at WSDM 2014).
GET: [PDF]

[22] Jian Wu, Alexander Ororbia, Kyle Williams, Madian Khabsa, Zhaohui Wu, C. Lee Giles. 2014. Utility-Based Control Feedback in a Digital Library Search Engine: Cases in CiteSeerX. In: 9th International Workshop on Feedback Computing.
GET: [USENIX] [PDF] [Slides]

[23] Jian Wu, Kyle Williams, Madian Khabsa, C. Lee Giles. 2014. The Impact of User Corrections To Crawl-Based Digital Libraries: A CiteSeerX Perspective. In: 10th IEEE International Conference on Collaborative Computing.
GET: [EUDL] [PDF]

[24] Zhaohui Wu, Jian Wu, Madian Khabsa, Kyle Williams, Hung-Hsuan Chen, Wenyi Huang, Suppawong Taurob, Sagnik Ray Choudhury, Alexander Ororbia, Prasenjit Mitra, C. Lee Giles. 2014. Towards Building a Scholarly Big Data Platform: Challenges, Lessons and Opportunities. In: International Conference on Digital Libraries.
GET: [PDF]

[25] Cornelia Caragea, Jian Wu, Alina Ciobanu, Kyle Williams, Juan Fernandez-Ramirez, Hung-Hsuan Chen, Zhaohui Wu, C. Lee Giles. 2014. CiteSeerX: A Scholarly Big Dataset. In: 36th European Conference on Information Retrieval, pages 311-322.
GET: [Springer] [PDF]

2013

[26] Kyle Williams, C. Lee Giles. 2013. Near Duplicate Detection in an Academic Digital Library. In: Proceedings of the 2013 ACM Symposium on Document Engineering (DocEng '13), pages 91-94, ACM, New York, NY, USA.
GET: [ACM] [CiteSeerX] [PDF]

[27] Kyle Williams, Hung-Hsuan Chen, Sagnik Ray Choudhury, C. Lee Giles. 2013. Unsupervised Ranking for Plagiarism Source Retrieval - Notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop Working Notes Papers
Highest F1-score in Source Retrieval task at PAN 2013.
GET: [CiteSeerX] [PDF] [Slides]

[28] Kyle Williams, Jorgina Paihama, Hussein Suleman. 2013. A Comparison of Machine Learning Techniques for Handwritten |Xam Word Recognition. In: Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference (SAICSIT '13), pages 37-46, ACM, New York, NY, USA
Best Paper Award.
GET: [ACM] [CiteSeerX] [PDF]

2012

[29] Jorgina Paihama, Kyle Williams, Hussein Suleman. 2012. Assessing the Design of Web Interoperability Protocols. In: Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference (SAICSIT '12), pages 353-362, ACM, New York, NY, USA.
GET: [ACM] [Instituional Repository] [PDF]

[30] Marius Nel, Kyle Williams, Hussein Suleman. 2012. Simple Large Image Support in DSpace. In: Proceedings of 14th International Conference on Asia-Pacific Digital Libraries (ICADL '12), Volume 7634 of Lecture Notes in Computer Science, pages 140-143, Springer Berlin / Heidelberg.
GET: [SpringerLink] [Instituional Repository] [PDF]

[31] Tresor Mvumbi, Flora Kundaeli, Zafika Manzi, Kyle Williams, Hussein Suleman. 2012. An Online Meeting Tool for Low Bandwidth Environments. In: Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference (SAICSIT '12), pages 226-235, ACM, New York, NY, USA.
GET: [ACM] [Instituional Repository] [PDF]

[32] Lighton Phiri, Kyle Williams, Miles Robinson, Stuart Hammar, Hussein Suleman. 2012. Bonolo: A General Digital Library System for File-based Collections. In: Proceedings of 14th International Conference on Asia-Pacific Digital Libraries (ICADL '12), Volume 7634 of Lecture Notes in Computer Science, pages 49-58, Springer Berlin / Heidelberg.
GET: [SpringerLink] [Instituional Repository] [PDF]

[33] Marwan Nour, Kyle Williams, Hussein Suleman. 2012. ORchiD: Evaluating Simple Repository Deposit for Open Educational Resources. In: Proceedings of 14th International Conference on Asia-Pacific Digital Libraries (ICADL '12), Volume 7634 of Lecture Notes in Computer Science, pages 289-298, Springer Berlin / Heidelberg.
GET: [SpringerLink] [Instituional Repository] [PDF]

[34] Michelle Havenga, Kyle Williams, Hussein Suleman. 2012. Motivating Users to Build Heritage Collections Using Games on Social Networks. In: Proceedings of 14th International Conference on Asia-Pacific Digital Libraries (ICADL '12), Volume 7634 of Lecture Notes in Computer Science, pages 279-288, Springer Berlin / Heidelberg.
GET: [SpringerLink] [Instituional Repository] [PDF]

2011

[35] Kyle Williams, Hussein Suleman. 2011. Creating a Handwriting Recognition Corpus for Bushman Languages. In: Proceedings of 13th International Conference on Asia-Pacific Digital Libraries (ICADL '12), Volume 7008 of Lecture Notes in Computer Science, pages 222-231, Springer Berlin / Heidelberg
Honorable Mention.
GET: [SpringerLink] [Institutional Repository] [PDF] [Slides]

[36] Kyle Williams, Hussein Suleman. 2011. Using a Hidden Markov model to Transcribe Handwritten Bushman Texts. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL '11), pages 445-446, ACM, New York, NY, USA.
GET: [ACM] [Institutional Repository] [PDF] [Poster] [Minute Madness Slide]

2010

[37] Rizmari Versfeldi, Spencer lee, Edward A. Fox, Hussein Suleman, Kyle Williams. 2010. Digital Library in a 3D Virtual World: The Digital Bleek and Lloyd Collection in Second Life. In: Proceedings of the 14th European Conference on Research and Advanced Technology for Digital Libraries (ECDL'10), Volume 6273 of Lecture Notes in Computer Science, pages 550-553, Springer Berlin / Heidelberg.
GET: [SpringerLink]

[38] Kyle Williams, Sanvir Manilal, Lebogang Molwantoa, Hussein Suleman. 2010. A Visual Dictionary for an Extinct Language. In: Proceedings of 12th International Conference on Asia-Pacific Digital Libraries (ICADL '10), Volume 6102 of Lecture Notes in Computer Science, pages 1-4, Springer Berlin / Heidelberg.
GET: [SpringerLink] [Institutional Repository] [PDF] [Slides]

[39] Kyle Williams, Hussein Suleman. 2010. Translating handwritten bushman texts. In: Proceedings of the 10th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL '10), pages 109-118, ACM, New York, NY, USA.
GET: [ACM] [Institutional Repository] [PDF] [Slides]

Technical Reports


[40] Christopher Parker, Kyle Williams, Hussein Suleman. 2012. A Lightweight Interface to Local Grid Scheduling Systems. In: Technical Report CS12-05-00, Department of Computer Science, University of Cape Town.
GET: [Institutional Repository] [PDF]

[41] Kyle Williams. 2010. Feasibility of Automatic Transcription of Neatly Rewritten Bushman Texts. In: Technical Report CS12-06-00, Department of Computer Science, University of Cape Town.
GET: [Institutional Repository] [PDF]

Other Academic Output


[42] Kyle Williams, C. Lee Giles. 2015. Classifying Search Engine Results as Potential Sources of Plagiarism. In: The Pennsylvania State University Annual Graduate Exhibition.
GET: [Poster]

[43] Kyle Williams, C. Lee Giles. 2014. Using Documents to Search for Documents. In: The Pennsylvania State University Annual Graduate Exhibition
Third place winner in the Engineering category.
GET: [Poster]

[44] Kyle Williams, C. Lee Giles. 2013. Automatic Document Collection Management: The Case of Duplicates. In: The Pennsylvania State University Annual Graduate Exhibition
Second place winner in the Engineering category.
GET: [PDF] [Poster]

[45] Kyle Williams, Hussein Suleman. 2010. Learning to Read Bushman. In: SAICSIT 2010 Postgraduate Symposium.
GET: [Institutional Repository] [PDF] [Poster] [Slides]

Theses


[46] Kyle Williams. 2012. Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Languages. MSc Thesis. In: Department of Computer Science, University of Cape Town.
GET: [Institutional Repository] [PDF]