GIAN Course on
Big Data Stream Analytics
26 October - 01 November, 2016
Over the last decade, we have witnessed the emergence of data-intensive applications that need to handle very large flows of data. Examples of such applications include network monitoring, financial applications, security log, and sensors applications. For all these applications, there is a growing need for (new) techniques capable of monitoring or analyzing these streams to detect outliers, intrusions, unusual or anomalous activities, complex correlations, extreme events, or the emergence of patterns. These techniques are grouped together under the term Big data stream analytics. These techniques must be capable of processing the input data quick enough to keep pace with the rate of the stream. The solution adopted to process such data streams is to trade off accuracy for space. Actually, one of the specificities of data intensive applications is that they do not require accurate responses to queries, only (high quality) approximate responses computed by summarizing the data are acceptable. Streaming algorithms, through synopses, handle such features. They read their input data sequentially and there is no requirement regarding the order in which data items are received. They use working memory whose size is much smaller than both the input size, and the domain from which items are drawn. There is a large panel of streaming algorithms that vary according to the number of passes they need to process their input process, the size of the memory they use, the time needed to process each read item, or whether they are randomized or deterministic.
We will bring closer the vast applications of data streaming analysis techniques. The course aims at proposing a comprehensive survey of these techniques, across several practical usages (Statistical metrics over distribution of stream, distributed system safety and network security, cloud monitoring and so on). The course will be presented with some famous models of this research area and advanced algorithms, basics of which will be covered as well. The course is aimed at the more general audience including mathematicians, statisticians, computer scientists and electrical engineers. Applications will be illustrated by practical examples. The participants' knowledge about the course content will be raised to the level such that they will be able to use the methods for their own applications and research.
Click here for the course brochure.
Click here to know about GIAN
Schedule of the Course:
1) We will have two lectures of one hour duration in the forenoon on everyday (except on Sunday, 30 October, which will be a free day )
2) 1.5 to 2 hours of tutorial everyday in the afternoon. Last day, i.e., 01 November afternoon will be free (i.e., course will end by the forenoon session of 01 November.). In addition, we may have one or two short talks by participants in the afternoon. Each short talk will be of maximum of 20 minutes duration.
(The participants are advised to brush-up basic probability and statistics concepts. The notions of probability distribution, random variable, mathematical expectation and conditional distribution. Monte Carlo Integration Methods, Basic sampling methods, Markov and Chebyshev inequality, Chernoff bound, 2-universal hash function. Kullback-Leibler Divergence, Jensen-Shannon divergence, Bhattacharyya coefficient (BC), Covariance. Special probability distributions : Poisson, Pareto, Binomial, Normal, Zipf, Pascal (Negative binomial), Uniform etc.)
The tutorials will be some hands-on training on Apache Storm . The goals of the tutorials are the following: Learn Storm basic concepts; its parallelism and stream grouping; Setup the development environment with Java; Implement and run a toy topology in local mode; Learn Storm system architecture; Update the development environment; Implement and deploy a less simple topology; Enhance with sketch-based data structure to extract statistics in real-time
The participants must bring their laptop with the following installed in it.
- The last version of JDK Java 1.8
- Eclipse (eclipse.org)
- Windows users: In addition to the above, SuperPutty and WinSCP, an SSH client (already included with Mac and Linux).
(The above fee includes lunch, instructional materials for tutorials and assignments, 24 hours internet facility. The participants will be provided with accommodation on payment basis.)
Registration and Accommodation
Click here for registration at the GIAN portal.
Click here for registration at the GIAN portal of IIT Indore.
Bank Details for Fund Transfer or e-payment:
We request the participants to first register at the GIAN portal given at the following link and then based on their selection, register at the GIAN portal of IIT Indore. (Alternatively, one can request for participation by sending an email to ashokm[at]iiti[dot]ac[dot]in along with a short CV of the applicant. After hearing from the organizers about their selection, they can proceed to register at the GIAN portal of IIT Indore.)
Name of the Beneficiary: IIT Indore Project and Consultancy A/c
Name of Bank: Canara Bank
Branch: Indore Navlakha
Beneficiary Account Number: 1476101027440
Bank MICR Code: 452015003
Bank IFS Code: CNRB0001476
Accommodation for all participants (Guest House for faculty and Students' Hostel for students) is arranged in Silver Springs township
(Phase I), where IIT Indore students' hostel is also situated. The Silver Springs township is situtated on the Agra-Bombay Bypass Road, about 14 KM from Indore railway station (INDB) and about 20 KM from Devi Ahiyabai Holkar Airport Indore(IDR). The best way to reach Silver Springs township (both from the railway station and airport) is to hire a taxi/auto.
IIT Indore campus is about 23 KM from the Silver Springs township. Institute buses are running from Silver Spring township to IIT Indore campus. Every day the participants will be taken from Silver Springs township to IIT Indore campus by institute bus.
The outstation participants who require accommodation should send a request to ashokm[at]iiti[dot]ac[dot]in
with the subject `Request for Accommodation'.
The student participants who requested accommodation should fill up the following application form and send their request to hostel[at]iiti[dot]ac[dot]in
(with a cc to ashokm[at]iiti[dot]ac[dot]in
Click here for application form for hostel accommodation..
The registration on 26 October will be between 09:30 to 10:00. The inaugural session will start at 10:00. Institute buses have been arranged to commute between Silver Springs and IIT Indore campus. Bus no. 05 will start from Silver Springs at 08:45 on 26 Oct. and at 09:00 on all other days (27, 28, 29, 31 Oct. and 01 Nov). Bus no. 03 will take the participants from IITI to Silver Springs.
The venue of the course is SB 309 (School Building, IIT Indore)
The participants who require assistance in reaching Silver Springs or other assistance in the hostel or in the campus, can contact any of the following volunteers.
Mr Istkhar Ali (+91 843 572 4770) or Mr Atin Gayen (+91 738 995 6853)
Professor Yann Busnel is currently an Associate Professor at the ENSAI, the National School for Statistics and Information Analysis, since September 2014. He is head of the Computer Sciences department and co-head of the MSc in Big Data. He is member of CREST (Research Center in Economics et Statistics), Laboratory of Statistics et Models. Since April 2015, he is also associated member of Inria Research Center Rennes - Bretagne Atlantique, in the Dionysos team. Finally, he is also associated member of LINA (Computer Science Laboratory of Nantes Atlantic).
Previously, he spent 5 years as assistant professor at University of Nantes, in the Computer Science Department. He obtained his PhD in Computer Science at the University of Rennes (France) in November 2008. Then, he spent one year in Italy, at the University "La Sapienza" of Rome, between 2008 and 2009. In 2016, he has been granted as Invited Professor at La Sapienza for 3 month, and holds a national grant of Scientific Excellence since 2011. His research topics are (but not limited to) : large-scale distributed data streams (for Big Data or Safety context for instance) and distributed system models. He has published more than 40 international papers in these fields.
For more details, visit: http://www.ensai.fr/enseignant/alias/yann-busnel.html
Dr. M. Ashok Kumar received his B.Sc. and M.Sc. from the Manonmaniam Sundaranar University, Tirunelveli, India, in 1999 and 2001, respectively. He was teaching Mathematics in educational institutions during the period 2001-2007. He obtained his PhD from the Indian Institute of Science, Bangalore, India in 2015. He was a Visiting Scientist at the Indian Statistical Institute, Bangalore from December 2014 to May 2015 and subsequently a post doctoral fellow at the Andrew and Erna Viterbi Faculty of Electrical Engineering, Technion-Israel Institute of Technology, from May 2015 to December 2015. He has been an Assistant Professor in the Discipline of Mathematics of the Indian Institute of Technology Indore since January 2016. His research interests broadly lie in Information Theory and Statistics. He is particularly interested in Measures of Information, Statistical Inference Based on Distance Functions, and Information Geometry. For more details, visit http://iiti.ac.in/people/~ashokm/
Dr. Sk. Safique Ahmad received his B.Sc. from Bhadrak College, M.Sc. from Utkal University and M.Phil. from Ravenshaw University. He obtained his Ph.D from the Indian Institute of Technology Guwahati in 2008. He was a Research Associate in the Supercomputer Education and Research Centre of IISc, Bangalore from January 2008 to January 2009. He was a post doctoral fellow at the Institut für Mathematik, Universität Berlin, Germany from February 2009 to December 2009. He has been an Assistant Professor in the Discipline of Mathematics of IIT Indore since December 2009. His research interests lie in Numerical Linear Algebra and the study of logarithmic norm for matrix pencils which are associated with Differential Algebraic Equations (DAE), Differential Equations (DEs), and Stochastic Differential Equations (SDEs). For more details, visit