Data Science for Everyone
Abstract: Data is the single most critical commodity of the modern world. Every field, businessand industry want to utilize it to effectively empower decision making. but explore theirusage and applications in a uniquely specific way. We see AI service, Big Data, andCloud Computing as ABC of the digital economy and data science is the foundation ofevery academic discipline and industry in the digital economy. The Graduate School ofData Science at Seoul National University aims at changing higher education systemsfor broader education for data science. In this talk, we introduce our initiative of “datascience for everyone” and highlight how we incorporate the emerging trend of AmbientAI into our curriculum.
Bio (Sang Cha) : Professor Sang Kyun Cha has been the founding dean of the Graduate School of DataScience of Seoul National University since February of 2020. He led the effort ofestablishing this new graduate school to help transform Korea’s leading highereducation institution in the age of data-driven innovation and AI since he took the roleof founding director of SNU Big Data Institute in April of 2014.
Before taking this transformative role at the university, Prof. Cha was an innovator andentrepreneur who founded ‘Transact In Memory, Inc.’ in Silicon Valley in early 2000’sbased on his research on in-memory data management. After SAP AG’s acquisition ofthe company in late 2005, he led SAP’s research toward SAP HANA and took the role ofco-founding chief architects of SAP HANA until early 2014 when the industry shiftedtoward the in-memory paradigm triggered by SAP HANA.
He received his BS and MS from Seoul National University and his Ph.D. from StanfordUniversity.
Towards Usability and Trust for Data-Driven Models
Abstract: The growing volumes of data we are accumulating open up newopportunities to create robust, empirical models that can advancescience as well as support data-driven decision and policy making.However, creating models from data demand complex computationalprocesses that are difficult to assemble for data scientists and areout of reach for domain experts that lack training incomputing. Automatic Machine Learning (AutoML) techniques have emergedto address this challenge and streamline model creation. In thistalk, I will discuss recent research on a specific class of AutoMLtechniques -- the automatic synthesis of ML pipelines -- its benefitsand practical limitations. I will present systems we have developedthat insert users into the AutoML loop by empowering them tounderstand and customize models, and guide them through the many tasksrequired in model construction, including data discovery andunderstanding. I will also reflect on the importance ofreproducibility as a means to debug and build trust in the resultsproduced by ML pipelines.
Bio : Juliana Freire is a Professor of Computer Science and Data Science at New York University.She was the elected chair of the ACM Special Interest Group on Management of Data (SIGMOD),served as a council member of the Computing Research Association’s Computing CommunityConsortium (CCC), and was the NYU lead investigator for the Moore-Sloan Data ScienceEnvironment, a $32.8 million grant awarded jointly to UW, NYU, and UC Berkeley. She developsmethods and systems that enable a wide range of users to obtain trustworthy insights from data.This spans topics in large-scale data analysis and integration, visualization, machine learning,rovenance management, and web information discovery, and different application areas, includingurban analytics, predictive modeling, and computational reproducibility. Freire has co-authoredover 200 technical papers (including 11 award-winning publications), several open-sourcesystems, and is an inventor of 12 U.S. patents. According to Google Scholar, her h-index is61 and her work has received over 15,800 citations. She is an ACM Fellow and a recipient ofan NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She was awardedthe ACM SIGMOD Contributions Award in 2020. Her research has been funded by the NationalScience Foundation, DARPA, Department of Energy, National Institutes of Health, SloanFoundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&TResearch, Microsoft Research, Yahoo! and IBM. She received a B.S. degree in computer sciencefrom the Federal University of Ceara (Brazil), and M.Sc. and Ph.D. degrees in computer sciencefrom the State University of New York at Stony Brook.
ML in Microsoft Teams and The Future of Software
Abstract: Software 2.0 – the augmentation and replacement of traditional code with models, especially deep neural networks – is changing how we develop, deploy, and maintain software. In this talk, I will describe the challenges and opportunities that this change brings with it and how we use deep learning to improve Microsoft Teams.
Bio : Johannes Gehrke is a Technical Fellow at Microsoft and the Managing Director of Microsoft Research at Redmond and the CTO and head of machine learning for Microsoft Teams. He has received an Arthur P. Sloan Fellowship, a Humboldt Research Award, the 2011 IEEE Computer Society Technical Achievement Award, and he is an ACM Fellow and an IEEE Fellow. Johannes co-authored the undergraduate textbook Database Management Systems (McGraw-Hill (2002), currently in its third edition), used at universities all over the world. He is a member of the ACM SIGKDD Executive Committee. From 1999 to 2015, Johannes was on the faculty in the Department of Computer Science at Cornell University where he graduated 25 PhD students and was at the time of his leaving holding the endowed chair of Tisch University Professor. From 2005 to 2008, he was Chief Scientist at FAST Search and Transfer. He has been in product groups at Microsoft since 2012, first building Delve and the Office Graph, then building people and feed experiences across all of Microsoft 365, and then serving as the chief architect and head of machine learning for the Microsoft Teams backend. Since June 2020, he has a dual rule across research and product, leading all of Microsoft Research in Redmond and leading AI in Teams.
Learning Rules and Taxonomies for Better Explanation
Abstract: The new wave of interest to rule learning is caused by the shortcomings of Deep Learning models and the growing need for Interpretable and Explainable AI. In this talk we consider the issues of rule learning in its relationship to learning taxonomies of subject domains given by exemplars and their descriptions. In spite of the fact that complexity of generating compact representation of rules is generally intractable, certain realistic constraints and approximation schemes result in efficient algorithmic approaches.
Bio : Sergei O. Kuznetsov graduated from Moscow Institute for Physics and Technology (MIPT), now one of the main Russian centers on AI. Having received doctoral degree and habilitation on an approach to machine learning based on closed descriptions, from Russian Academy of Science, he spent three years at TU-Dresden, first as a Humboldt fellow and then as an invited professor. From 2014 on Sergei Kuznetsov is the head of department of data analysis and AI at the National Research University Higher School of Economics, one of the top-five Russian universities. His scientific interests lie in the field of efficient algorithms of knowledge discovery, data mining and formal concept analysis.
Time series classification at scale
Abstract: Time series classification is a fundamental data science task, providing understandingof dynamic processes as they evolve over time. The recent introduction of ensembletechniques has revolutionised this field, greatly increasing accuracy, but at a cost ofincreasing already burdensome computational overheads. I present new time seriesclassification technologies that achieve the same accuracy as recent state-of-the-artdevelopments, but with many orders of magnitude greater efficiency and scalability.These make time series classification feasible at hitherto unattainable scale.
Bio : Professor Geoff Webb is Research Director of the Monash University Data Futures Institute. Heas editor in chief of Data Mining and Knowledge Discovery, from 2005 to 2014. He has beenProgram Committee Chair of both ACM SIGKDD and IEEE ICDM, as well as General Chair of ICDM andmember of the ACM SIGKDD Executive. He is a Technical Advisor to machine learning as a servicestartup BigML Inc and to recommender systems startup FROOMLE. He developed many of the keymechanisms of support-confidence association discovery in the 1980s. His OPUS search algorithmremains the state-of-the-art in rule search. He pioneered multiple research areas as diverse asblack-box user modelling, interactive data analytics and statistically-sound pattern discovery.He has developed many useful machine learning algorithms that are widely deployed. His manyawards include IEEE Fellow and the inaugural Eureka Prize for Excellence in Data Science (2017).
Knowledge Graphs 2021: Achievements, Challenges and Opportunities
Abstract: Machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing vision and challenge of AI. Over the last two decades, huge knowledge bases, also known as knowledge graphs, have been automatically constructed from web data and text sources, and have become a key asset for search engines and other use cases. Machine knowledge can be harnessed to semantically interpret text in news, social media and web tables, contributing to question answering, natural language processing and data analytics. This talk reviews these advances and discusses lessons learned (see http://dx.doi.org/10.1561/1900000064 for a comprehensive survey). Moreover, the talk identifies open challenges and new research opportunities.
Bio : Gerhard Weikum is a Scientific Director at the Max Planck Institute for Informatics in Saarbruecken, Germany, and an Adjunct Professor at Saarland University. He co-authored a comprehensive textbook on transactional systems, received the VLDB Test-of-Time Award 2002 for his work on automatic database tuning, and is one of the creators of the YAGO knowledge base which was recognized by the WWW Test-of-Time Award in 2018. Weikum is an ACM Fellow and elected member of various academies. He received the ACM SIGMOD Contributions Award in 2011, a Google Focused Research Award in 2011, an ERC Synergy Grant in 2014, and the ACM SIGMOD Edgar F. Codd Innovations Award in 2016.
Industry Insights Keynotes
Deploying Machine Learning to help make Better Decisions
Abstract: In August 2011, Marc Andressen wrote a prophetic article in the Wall StreetJournal, explaining how he believed industries would transform as “Softwarewas Eating the World.” Over the last ten years, this theme has played out asCompanies across sectors have focused on Digital Transformation. The firstiteration of Digital Transformation focused on building better experiences forstakeholders. Bringing data together was essential for creating betterexperiences, which led to deploying APIs, data lakes and other datamiddleware.
The success of Digital Transformation has resulted in data systems withbetter experiences. However, having fancy dashboards and better userexperiences has not necessarily provided insights to individuals to makebetter decisions. Digital Transformation 2.0 will address this and will be aboutassisting stakeholders in making better decisions.In this session, I propose to share how machine learning and artificialintelligence can exploit the data infrastructure we have set up to help makebetter decisions.
We have created an internal data lake that collects data from different internaldata sources. In the last three years, our team at Persistent Systems hasexperimented with building systems that embed machine learning andartificial intelligence in our processes and automation to deliver preciseguidance to stakeholders to make better decisions. As we started ourexperiments focusing on better decisions, we realized that the data we hadcollected was not best suited to derive our insights. We also observed thattraditional business intelligence dashboards can be overwhelming and do notnecessarily provide the insights for better decisions.
In this session, I propose sharing what we have done and what we learnedfrom these experiments as we embedded machine learning in our processesand as part of our automation infrastructure. This has helped our team getbetter insights precisely when needed.
Value creation in Classifieds, E-Commerce and Marketplace business with AI / ML and its challenges.
Bio : Tech Executive with more than 20 years experience in founding and growing global Tech businesses in Europe and US. After building the leading European car classifieds site AutoScout24, he built out Ciao! as the European leading price comparison site and sold it to Microsoft in 2008.
Other roles he acted in the past decade include CIO of the media and technology company Axel Springer SE, Group CTO of the leading Affiliate Marketing platform Zanox/AWIN and Group CTO of the company builder / accelerator HitFox/IONIQ, that launched many successful startups in Fintech like the Banking-as-a-Service platform SolarisBank.
Today Daniel is Group CTO of Visable Group and CEO of Visable Labs, operator of Europe’s leading B2B marketplaces and also advises and invests in tech startups.
Machine Learning Modeling Best Practices
Abstract: I will talk about best practices when developing Machine Learning models such as generatinguncertainty estimates for predictions, balancing exploitation with exploration, generatingexplanations for predictions, handling noise in training data, etc. I will show how we applythese modeling best practices at Amazon using real-world examples from our India e-commercebusiness.
Bio : Rajeev Rastogi is the Vice President of Machine Learning at Amazon India. He leads the development of Machine Learning solutions for Amazon’s India business in the areas of Search, Advertising, Deals, Catalog Quality, Payments, Forecasting, Question Answering, Grocery Grading, etc. Previously, he was Vice President of Yahoo! Labs Bangalore and the founding Director of the Bell Labs Research Center in Bangalore, India. Rajeev is an ACM Fellow and a Bell Labs Fellow. He has published over 125 papers, and holds over 100 patents. Rajeev received his B. Tech degree from IIT Bombay, and a PhD degree in Computer Science from the University of Texas, Austin.
Scaling AI, Responsibly
Abstract: AI is expected to make a strong positive impact across important sectors such as Healthcare, Agriculture,Education, Smart Cities & Infrastructure, Mobility & Transportation across the world. In India, thegovernment and the private sector have been steadily investing in AI in the last four years, as also theIndian IT Industry with products, platforms and services.
In this context, Ananth will explore what it takes to scale AI in real world scenarios. With his experiencein overseeing AI Research and Innovation in industry, he will present new developments in increasingperformance and efficiency. He will discuss the seamless integration of AI/ML inference and training intoproduction IT architectures. Incorporating probabilistic AI/ML predictions into any production processbrings with it the need to monitor correctness and a workflow to correct errors and feed these back formodel re-training on a regular basis. Ananth will highlight the balance AI deployments must maintain toensure the cost of the corrective workflows does not outweigh the benefits of AI/ML.Finally, he will touch upon Cybersecurity, Safety, Legal and Ethical issues relating to AI.
Bio : Ananth directs Research, Innovation and Co-Innovation in TCS. Under his leadership, TCS has created significant range of new products and services with a wide IP portfolio. Ananth has architected an agile model for innovation at scale, across the entire organization. He has been a member of the TCS Corporate leadership since 1999, and has led several strategic initiatives.
Ananth has served on several Governing Councils of Academia, Industry Advisory boards, and Government and Alumni committees.
He was elected a Fellow at the Indian Academy of Engineering (INAE) in recognition of his contributions towards engineering in 2013. He was named a Distinguished Alumnus of IIT Delhi in 2009. He has been listed in Computerworld’s Premier 100 IT Leaders (2007), and in Infoworld’s Top 25 CTOs (2007).
Ananth is an M. Tech. in Computer Science and an M. Sc in Physics from the Indian Institute of Technology, Delhi.