Home
Search results “Data mining techniques jiawei han”
Jiawei Han
 
01:19:26
Views: 1684 SonicNU
¿Qué es la minería de datos?
 
04:28
En esta época mucha de la información que tenemos es digital, todos eso datos nos cuentan historias y nos dan a entender mejor al mundo ya que ocultan un patrón que nos puede revelar algo que no sabíamos... pero como revelamos ese patron? Ayúdame en Patreon: https://goo.gl/GYb3Jj Invítame un café: ko-fi.com/mindmachinetv ====================================================== Redes Sociales: Twitter: https://goo.gl/LNyICo Facebook:https://goo.gl/lcb4Ab Instagram: https://goo.gl/fmLa4J ====================================================== Fuentes que hicieron posible este video: Data mining concepts and techniques: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf ====================================================== Programas que utilizo: Adobe after effects Adobe illustrator Ableton Live 9 Equipo que utilizo: Huion 680s Audio-Technica ATR2500-USB ====================================================== Musica: https://soundcloud.com/musicadfondo Descarga fondos: http://mindmachinetv.tumblr.com/
Views: 7991 MindMachineTV
Mining Reliable Information from Passively and Actively Crowdsourced Data (Part 1)
 
01:03:52
Authors: Jiawei Han, Department of Computer Science, University of Illinois at Urbana-Champaign Wei Fan, Baidu, Inc. Bo Zhao, LinkedIn Corporation Qi Li, Department of Computer Science and Engineering, University at Buffalo Jing Gao, Department of Computer Science and Engineering, University at Buffalo Abstract: Recent years have witnessed an astonishing growth of crowd-contributed data, which has become a powerful information source that covers almost every aspect of our lives. This big treasure trove of information has fundamentally changed the ways in which we learn about our world. Crowdsourcing has attracted considerable attentions with various approaches developed to utilize these enormous crowdsourced data from different perspectives. From the data collection perspective, crowdsourced data can be divided into two types: "passively" crowdsourced data and "actively" crowdsourced data; from task perspective, crowdsourcing research includes information aggregation, budget allocation, worker incentive mechanism, etc. To answer the need of a systematic introduction of the field and comparison of the techniques, we will present an organized picture on crowdsourcing methods in this tutorial. The covered topics will be interested for both advanced researchers and beginners in this field. More on http://www.kdd.org/kdd2016/ KDD2016 conference is published on http://videolectures.net/
Views: 161 KDD2016 video
Outlier Analysis
 
16:55
Video ini dibuat untuk memenuhi projek mata kuliah Data Mining. Nama Dosen : Dewi Suryani Kelas : LD01 Nama Kelompok - NIM : Bryan Karunachandra - 2001542153 Edwin Tjeng - 2001558832 Elvin Christianto Lienardi - 2001543282 Jeffrey Ivan Limarga - 2001550155 Willie Chandra Putra - 2001581412 Referensi : - Jiawei Han .(2011). Data mining : concepts and techniques . 3rd Edition. Morgan Kaufmann Publishers . Boston . - https://en.wikipedia.org/wiki/Local_outlier_factor - https://towardsdatascience.com/density-based-algorithm-for-outlier-detection-8f278d2f7983 - http://lijiancheng0614.github.io/scikit-learn/auto_examples/covariance/plot_outlier_detection.html - http://bunda-bisa.blogspot.com/2013/04/perbedaan-statistik-parametrik-dan.html - https://www.youtube.com/watch?v=afvYEVbo9qA - https://en.wikipedia.org/wiki/Local_outlier_factor - https://en.wikipedia.org/wiki/mixture_model - https://en.wikipedia.org/wiki/grubbs%27_test_for_outliers - https://en.wikipedia.org/wiki/chi-squared_test Selamat menonton! :)
Views: 181 Bryan Karunachandra
I Dread Data Mining & Warframe Drama
 
08:02
Many words are being spoken about Datamining, Warframe and Void_Glitch, though very little is actually being said. I want to clarify what Datamining actually is so we can move on from this dreadful situation. ****************************** References: 1. http://webcache.googleusercontent.com/search?q=cache:i3-XBI5RnKMJ:https://github.com/VoiDGlitch/WarframeData/blob/master/MissionDecks.txt&num=1&hl=fr&strip=0&vwsrc=0 2. https://docs.google.com/document/d/1ajTaNflzEj3lplS4htrHbBTdOqAlhPXiXtH1cDxWB8M/mobilebasic?pli=1 3. https://www.reddit.com/r/Warframe/comments/3sleik/leaked_primed_mods/ 4. https://www.reddit.com/r/Warframe/comments/6an10t/images_of_umbra_excal_cause_the_post_was_deleted/ 5. https://forums.warframe.com/topic/808245-drop-rates-datamines-and-digital-extremes-ddd/ - https://imgur.com/a/RTzVl 6. https://www.reddit.com/r/Warframe/comments/6j70z7/quick_question_about_data_mining_laws/ 7. https://www.reddit.com/r/Warframe/comments/6j248z/open_letter_to_derebecca/ 8. Han, Jiawei, Micheline Kamber & Jian Pei. "Data Mining: Concepts and Techniques." Elsevier. 2012. P. XXIII (https://books.google.nl/books?hl=nl&lr=&id=pQws07tdpjoC&oi=fnd&pg=PP1&dq=data+mining+twitter&ots=tyMyZ-kCVW&sig=JcB_MjFaI_2-0DrERCFolAVT2vg#v=onepage&q=data%20mining%20twitter&f=false) 9. Fayyad Usama, Gregory Piatetsky-Shapiro, and Padhraic Smyth. "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17.3:(1996) p. 43 10. https://forums.warframe.com/topic/808245-drop-rates-datamines-and-digital-extremes-ddd/?page=10#comment-8773094 ***************** Welcome to the "I Dread" series, where I talk about everything I dread within Warframe. Does this mean I don't like the game? Actually it's the exact opposite, but I don't like dreadful things in it. ****************************** Make Sure you Follow Me on: Twitter: https://www.twitter.com/inglriousb Twitch: https://www.twitch.tv/TNL_official Join the Discord: https://discord.gg/XNYEJhV ************************* Want to start Warframe with a booster whilst supporting me at the same time? Click the Referral link below and start off with a 7 day affinity booster. https://www.warframe.com/signup?referrerId=519728441a4d806e71000033 ************************* Credits: Kevin MacLeod - "Killing Time" Kevin MacLeod - "Wallpaper" Kevin MacLeod - "Cold Sober" The footage and images featured in the video are for critical review and parody. Video footage is credited in the video. Stalker Model by Coverop ************************* Glyph Code PC: 837E-27F5-FDD5-7896 Glyph Code PS4: 0736-AA13-87EC-232A Glyph Code XB1: B4B4-92A2-ECEB-EA2C Redeem your TNL Glyph on https://www.warframe.com/promocode. Find out how to get your own TNL glyph in this video: https://www.youtube.com/watch?v=v4FCdaWn0Io
Views: 741 Michel Postma
MINING INTERACTION PATTERNS AMONG BRAIN REGIONS BY CLUSTERING USING JAVA
 
05:59
Title: Mining Interaction Patterns among Brain Regions by Clustering Domain: Java Description: Functional magnetic resonance imaging (fMRI) provides the potential to study brain function in a non-invasive way. Massive in volume and complex in terms of the information content, fMRI data requires effective and efficient data mining techniques. Recent results from neuroscience suggest a modular organization of the brain. To understand the complex interaction patterns among brain regions we propose a novel clustering technique. We model each subject as multi-dimensional time series, where the single dimensions represent the fMRI signal at different anatomical regions. In contrast to previous approaches, we base our cluster notion on the interactions between the univariate time series within a data object. Our objective is to assign objects exhibiting a similar intrinsic interaction pattern to a common cluster. To formalize this idea, we define a cluster by a set of mathematical models describing the cluster-specific interaction pattern. Based on this novel cluster notion, we propose interaction K-means (IKM), an efficient algorithm for partitioning clustering. An extensive experimental evaluation on benchmark data demonstrates the effectiveness and efficiency of our approach. The results on two real fMRI studies demonstrate the potential of IKM to contribute to a better understanding of normal brain function and the alternations characteristic for psychiatric disorders. Buy Whole Project Kit with Project Kit: • 1 Review PPT • 2nd Review PPT • Full Coding with described algorithm • Video File • Full Document Note: *For bull purchase of projects and for outsourcing in various domains such as Java, .Net, .PHP, NS2, Matlab, Android, Embedded, Bio-Medical, Electrical, Robotic etc. contact us. *Contact for Real Time Projects, Web Development and Web Hosting services. *Comment and share on this video and win exciting developed projects for free of cost. Contact for more details: Ph:044-43548566 Mob:8110081181 Mail id:[email protected]
Views: 431 SHPINE TECHNOLOGIES
Mining Reliable Information from Passively and Actively Crowdsourced Data (Part 2)
 
42:47
Authors: Jiawei Han, Department of Computer Science, University of Illinois at Urbana-Champaign Wei Fan, Baidu, Inc. Bo Zhao, LinkedIn Corporation Qi Li, Department of Computer Science and Engineering, University at Buffalo Jing Gao, Department of Computer Science and Engineering, University at Buffalo Abstract: Recent years have witnessed an astonishing growth of crowd-contributed data, which has become a powerful information source that covers almost every aspect of our lives. This big treasure trove of information has fundamentally changed the ways in which we learn about our world. Crowdsourcing has attracted considerable attentions with various approaches developed to utilize these enormous crowdsourced data from different perspectives. From the data collection perspective, crowdsourced data can be divided into two types: "passively" crowdsourced data and "actively" crowdsourced data; from task perspective, crowdsourcing research includes information aggregation, budget allocation, worker incentive mechanism, etc. To answer the need of a systematic introduction of the field and comparison of the techniques, we will present an organized picture on crowdsourcing methods in this tutorial. The covered topics will be interested for both advanced researchers and beginners in this field. More on http://www.kdd.org/kdd2016/ KDD2016 conference is published on http://videolectures.net/
Views: 68 KDD2016 video
Mining Knowledge from Databases: An Information Network Analysis Approach
 
01:17:53
Most people consider a database is merely a data repository that supports data storage and retrieval. Actually, a database contains rich, inter-related, multi-typed data and information, forming one or a set of gigantic, interconnected, heterogeneous information networks. Much knowledge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology. In this talk, we introduce database-oriented information network analysis methods and demonstrate how information networks can be used to improve data quality and consistency, facilitate data integration, and generate interesting knowledge. Moreover, we present interesting case studies on real datasets, including DBLP and Flickr, and show how interesting and organized knowledge can be generated from database-oriented information networks
Views: 71 Microsoft Research
Mining Reliable Information from Passively and Actively Crowdsourced Data (Part 3)
 
01:24:17
Authors: Jiawei Han, Department of Computer Science, University of Illinois at Urbana-Champaign Wei Fan, Baidu, Inc. Bo Zhao, LinkedIn Corporation Qi Li, Department of Computer Science and Engineering, University at Buffalo Jing Gao, Department of Computer Science and Engineering, University at Buffalo Abstract: Recent years have witnessed an astonishing growth of crowd-contributed data, which has become a powerful information source that covers almost every aspect of our lives. This big treasure trove of information has fundamentally changed the ways in which we learn about our world. Crowdsourcing has attracted considerable attentions with various approaches developed to utilize these enormous crowdsourced data from different perspectives. From the data collection perspective, crowdsourced data can be divided into two types: "passively" crowdsourced data and "actively" crowdsourced data; from task perspective, crowdsourcing research includes information aggregation, budget allocation, worker incentive mechanism, etc. To answer the need of a systematic introduction of the field and comparison of the techniques, we will present an organized picture on crowdsourcing methods in this tutorial. The covered topics will be interested for both advanced researchers and beginners in this field. More on http://www.kdd.org/kdd2016/ KDD2016 conference is published on http://videolectures.net/
Views: 68 KDD2016 video
Data Looksee -  Text Analysis Clustering
 
00:35
Describes how clustering can be used to identify the differences in topics discussed between positive and negative sentiments.
Views: 23 PolyVista
Maximize Plant Performance: Data Mining to Implement a Better O&M Strategy
 
34:32
In this webinar AWS Truepower President, Dr. Bruce Bailey, and Chief Engineer, Daniel Bernadett, discuss how performance issues can be addressed through diagnostic data mining. They share a case study demonstrating how the implementation of diagnostic techniques led to increased overall performance at a wind facility by modifying the operations and maintenance strategy.
Views: 589 AWS Truepower
K-Means
 
04:29
Berikut video animasi pembelajaran tentang K-Means. Semoga bermanfaat, mohon maaf bila ada salah maupun kekurangan. Jangan lupa LIKE & COMMENT untuk membahas dan mendalami tentang K-means. :) Referensi : - Jiawei Han. (-). Data mining: concepts and techniques. 03. Morgan Kaufmann Publishers. Boston. ISBN: 9780123814791. - Practicum Case O161-COMP6140-BO01
Towards scalable critical alert mining (KDD 2014 Presentation)
 
14:42
Towards scalable critical alert mining KDD 2014 Presentation Bo Zong Yinghui Wu Jie Song Ambuj K. Singh Hasan Cam Jiawei Han Xifeng Yan Performance monitor software for data centers typically generates a great number of alert sequences. These alert sequences indicate abnormal network events. Given a set of observed alert sequences, it is important to identify the most critical alerts that are potentially the causes of others. While the need for mining critical alerts over large scale alert sequences is evident, most alert analysis techniques stop at modeling and mining the causal relations among the alerts. This paper studies the critical alert mining problem: Given a set of alert sequences, we aim to find a set of k critical alerts such that the number of alerts potentially triggered by them is maximized. We show that the problem is intractable; therefore, we resort to approximation and heuristic algorithms. First, we develop an approximation algorithm that obtains a near-optimal alert set in quadratic time, and propose pruning techniques to improve its runtime performance. Moreover, we show a faster approximation exists, when the alerts follow certain causal structure. Second, we propose two fast heuristic algorithms based on tree sampling techniques. On real-life data, these algorithms identify a critical alert from up to 270,000 mined causal relations in 5 seconds; meanwhile, they preserve more than 80% of solution quality, and are up to 5,000 times faster than their approximation counterparts.
Mining Reliable Information from Passively and Actively Crowdsourced .. (KDD 2016)
 
03:10:34
Mining Reliable Information from Passively and Actively Crowdsourced Data KDD 2016 Jing Gao Qi Li Bo Zhao Wei Fan Jiawei Han Recent years have witnessed an astonishing growth of crowd-contributed data, which has become a powerful information source that covers almost every aspect of our lives. This big treasure trove of information has fundamentally changed the ways in which we learn about our world. Crowdsourcing has attracted considerable attentions with various approaches developed to utilize these enormous crowdsourced data from different perspectives. From the data collection perspective, crowdsourced data can be divided into two types: passively crowdsourced data and actively crowdsourced data; from task perspective, crowdsourcing research includes information aggregation, budget allocation, worker incentive mechanism, etc. To answer the need of a systematic introduction of the field and comparison of the techniques, we will present an organized picture on crowdsourcing methods in this tutorial. The covered topics will be interested for both advanced researchers and beginners in this field.
Data mining
 
47:46
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. The term is a misnomer, because the goal is the extraction of patterns and knowledge from large amount of data, not the extraction of data itself. It also is a buzzword, and is frequently also applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The popular book "Data mining: Practical machine learning tools and techniques with Java" (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons. Often the more general terms "(large scale) data analysis", or "analytics" -- or when referring to actual methods, artificial intelligence and machine learning -- are more appropriate. This video is targeted to blind users. Attribution: Article text available under CC-BY-SA Creative Commons image source in video
Views: 1669 Audiopedia
[PURDUE MLSS] Divide and Recombine for the Analysis of Big Data by William S. Cleveland (Part 2/8)
 
51:55
Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/cleveland.pdf Abstract: D&R consists of the general approach of parallelizing big data, statistical methods for division and recombination, sampling and display methods for visualization of samples of subsets, computational methods, and computational environments. In D&R, the data are broken up into structured subsets, general analysis methods are applied to each subset, and the results of the analyses recombined. The necessary steps of data division and recombination open up an exciting area of research in statistical theory and methods, and there are already a number of very useful results. The steps also open up research in computational methods and hardware-software environments, and here, too, there are important results. By introducing the exploitable parallelization of the data, D&R succeeds in making it possible to apply to big data almost any existing analysis method from statistics, machine learning, and visualization. This enables detailed, comprehensive analysis of big data at all stages of the analysis process, starting with the raw data. This includes detailed visualization at all stages, not just to reduced data such as summary statistics, results of dimension reduction methods, fitted models, and the output of algorithms applied to the detailed data. Visualization at all stages substantially reduces the chances of losing critical information in the data. See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 413 Purdue University
Towards Interactive Construction of Topical Hierarchy
 
20:52
Authors: Chi Wang, Xueqing Liu, Yanglei Song, Jiawei Han Abstract: Automatic construction of user-desired topical hierarchies over large volumes of text data is a highly desirable but challenging task. This study proposes to give users freedom to construct topical hierarchies via interactive operations such as expanding a branch and merging several branches. Existing hierarchical topic modeling techniques are inadequate for this purpose because (1) they cannot consistently preserve the topics when the hierarchy structure is modified; and (2) the slow inference prevents swift response to user requests. In this study, we propose a novel method, called STROD, that allows efficient and consistent modification of topic hierarchies, based on a recursive generative model and a scalable tensor decomposition inference algorithm with theoretical performance guarantee. Empirical evaluation shows that STROD reduces the runtime of construction by several orders of magnitude, while generating consistent and quality hierarchies. ACM DL: http://dl.acm.org/citation.cfm?id=2783288 DOI: http://dx.doi.org/10.1145/2783258.2783288
Decision Tree 4: Information Gain
 
07:20
Full lecture: http://bit.ly/D-Tree After a split, we end up with several subsets, which will have different values of entropy (purity). Information Gain (aka mutual information) is an average of these entropies, weighted by the size of each subset.
Views: 151653 Victor Lavrenko
Data Mining Open Flights Social Networking Presentation INFS770
 
13:47
This is a presentation created for my Final Assignment in my social networking class. It contains a social networking presentation that I analyzed in gephi.
Views: 216 Tom Austin
[PURDUE MLSS] Optimization for Machine Learning by S.V.N Vishwanathan (Part 4/5)
 
50:49
Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/vishy.pdf Optimization for Machine Learning Machine learning poses data driven optimization problems. Computing the function value and gradients for these problems is challenging because they often involves thousands of variables and millions of training data points. This can often be cast as a convex optimization problem. Therefore, a lot of recent research has focused on designing specialized optimization algorithms for such problems. In this talk, I will present a high level overview of a few such algorithm that were recently developed. The talk will be broadly accessible and will have plenty of fun pictures and illustrations! See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 550 Purdue University
Examination of the Effects of Heterogeneous Organization of RyR Clusters, Myofibrils...
 
00:26
Examination of the Effects of Heterogeneous Organization of RyR Clusters, Myofibrils and Mitochondria on Ca2+ Release Patterns in Cardiomyocytes. Vijay Rajagopal et al (2015), PLoS Computational Biology http://dx.doi.org/10.1371/journal.pcbi.1004417 Spatio-temporal dynamics of intracellular calcium, [Ca2+]i, regulate the contractile function of cardiac muscle cells. Measuring [Ca2+]i flux is central to the study of mechanisms that underlie both normal cardiac function and calcium-dependent etiologies in heart disease. However, current imaging techniques are limited in the spatial resolution to which changes in [Ca2+]i can be detected. Using spatial point process statistics techniques we developed a novel method to simulate the spatial distribution of RyR clusters, which act as the major mediators of contractile Ca2+ release, upon a physiologically-realistic cellular landscape composed of tightly-packed mitochondria and myofibrils. We applied this method to computationally combine confocal-scale (~ 200 nm) data of RyR clusters with 3D electron microscopy data (~ 30 nm) of myofibrils and mitochondria, both collected from adult rat left ventricular myocytes. Using this hybrid-scale spatial model, we simulated reaction-diffusion of [Ca2+]i during the rising phase of the transient (first 30 ms after initiation). At 30 ms, the average peak of the simulated [Ca2+]i transient and of the simulated fluorescence intensity signal, F/F0, reached values similar to that found in the literature ([Ca2+]i ≈1 μM; F/F0≈5.5). However, our model predicted the variation in [Ca2+]i to be between 0.3 and 12.7 μM (~3 to 100 fold from resting value of 0.1 μM) and the corresponding F/F0 signal ranging from 3 to 9.5. We demonstrate in this study that: (i) heterogeneities in the [Ca2+]i transient are due not only to heterogeneous distribution and clustering of mitochondria; (ii) but also to heterogeneous local densities of RyR clusters. Further, we show that: (iii) these structure-induced heterogeneities in [Ca2+]i can appear in line scan data. Finally, using our unique method for generating RyR cluster distributions, we demonstrate the robustness in the [Ca2+]i transient to differences in RyR cluster distributions measured between rat and human cardiomyocytes.
Views: 44 ScienceVio
CSSE Lecture: "Recent Research Progress from DBG@UNSW in Processing Big Graph Data"
 
01:06:33
Professor Xuemin Lin presents "Recent Research Progress from [email protected] in Processing Big Graph Data". Abstract Graphs are widely used for modeling complex structured data, with a broad spectrum of applications such as biochemistry, bioinformatics, web search and social network, road network, etc. Over the last decade, tremendous research efforts have been devoted to many fundamental problems in managing and analyzing network data. Some interesting problems include graph structure search, node-pair similarity search , graph compression, uncertainty graph, and graph pattern matching. In this talk, I will provide an overview of recent advances in network data based search, and address some of the open issues. The technical part will mainly consist of two parts graph structure search and node-pair similarity search. More specifically, (1) graph structure search includes subgraph containment search, similarity subgraph search, supergraph containment search, similarity supergraph search, exact all-matching, and similarity all-matching. Most of these problems are challenging due to their NP-hardness, and they need to be solved by developing efficient and effective techniques to cope with large network data. (2) Node-pair similarity search involves two promising link-based similarity measures SimRank and SimFusion. Due to their self-referential concept, the sheer size of the Web has presented striking challenges to their fast computations. I will introduce novel techniques for efficiently and effectively processing large graphs. About the Speaker Xuemin Lin is a Professor in the School of Computer Science and Engineering at UNSW, and the head of database group. His research lies in (big) data analytics including graph data, spatial-temporal data, string/text data, streaming data, and uncertain data. In the above fields, he co-authored more than 214 papers. Many of them (total 90) have been recently published in top venues including, SIGMOD, VLDB, KDD, SIGIR, WWW, ICDE, TODS, VLDBJ, and TKDE. He also co-authored 12 best papers in international conferences including ICDE07 best student paper award, ICDE10 one of the best papers, SIGMOD11 one of the best papers, ICDE12 one of the best papers, ICDE13 one of the best papers, TKDE2011Dec Spotlight paper, PAKDD14 best paper runner-up award. He was an associate editor of ACM Transactions on Database Systems (Jan 2008-Jan 2014). Currently, he is an associate editor of IEEE Transactions on Knowledge and Data Engineering and WWW Journal.
Views: 626 UON FEBE
Towards Interactive Construction of Topical Hierarchy: A Recursive Te.. (KDD 2015)
 
20:52
Towards Interactive Construction of Topical Hierarchy: A Recursive Tensor Decomposition Approach KDD 2015 Chi Wang Xueqing Liu Yanglei Song Jiawei Han Automatic construction of user-desired topical hierarchies over large volumes of text data is a highly desirable but challenging task. This study proposes to give users freedom to construct topical hierarchies via interactive operations such as expanding a branch and merging several branches. Existing hierarchical topic modeling techniques are inadequate for this purpose because (1) they cannot consistently preserve the topics when the hierarchy structure is modified; and (2) the slow inference prevents swift response to user requests. In this study, we propose a novel method, called STROD, that allows efficient and consistent modification of topic hierarchies, based on a recursive generative model and a scalable tensor decomposition inference algorithm with theoretical performance guarantee. Empirical evaluation shows that STROD reduces the runtime of construction by several orders of magnitude, while generating consistent and quality hierarchies.
High Dimensional Heterogeneous Data Analysis in a Big Data World
 
01:08:47
Session 2 of the 2015 D-STOP Symposium, featuring CTR's Chandra Bhat, UT Austin's Joydeep Ghosh, UT Austin's Doug Fearing, and Cisco Systems' Xiaoqing Zhu. These PowerPoint presentations will be available at http://ctr.utexas.edu/research/d-stop/education/annual-symposium/.
GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media (KDD 2016)
 
18:28
GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media KDD 2016 Chao Zhang Keyang Zhang Quan Yuan Luming Zhang Tim Hanratty Jiawei Han Understanding human mobility is of great importance to various applications, such as urban planning, traffic scheduling, and location prediction. While there has been fruitful research on modeling human mobility using tracking data (e.g., GPS traces), the recent growth of geo-tagged social media (GeoSM) brings new opportunities to this task because of its sheer size and multi-dimensional nature. Nevertheless, how to obtain quality mobility models from the highly sparse and complex GeoSM data remains a challenge that cannot be readily addressed by existing techniques. We propose GMove, a group-level mobility modeling method using GeoSM data. Our insight is that the GeoSM data usually contains multiple user groups, where the users within the same group share significant movement regularity. Meanwhile, user grouping and mobility modeling are two intertwined tasks: (1) better user grouping offers better within-group data consistency and thus leads to more reliable mobility models; and (2) better mobility models serve as useful guidance that helps infer the group a user belongs to. GMove thus alternates between user grouping and mobility modeling, and generates an ensemble of Hidden Markov Models (HMMs) to characterize group-level movement regularity. Furthermore, to reduce text sparsity of GeoSM data, GMove also features a text augmenter. The augmenter computes keyword correlations by examining their spatiotemporal distributions. With such correlations as auxiliary knowledge, it performs sampling-based augmentation to alleviate text sparsity and produce high-quality HMMs. Our extensive experiments on two real-life data sets demonstrate that GMove can effectively generate meaningful group-level mobility models. Moreover, with context-aware location prediction as an example application, we find that GMove significantly outperforms baseline mobility models in terms of prediction accuracy.
KDD2016 paper 908
 
02:30
Title: GMove: Group-Level Mobility Modeling using Geo-Tagged Social Media Authors: Chao Zhang*, University of Illinois at Urbana-Champaign Keyang Zhang, University of Illinois at Urbana-Champaign Quan Yuan, University of Illinois at Urbana-Champaign Luming Zhang, University of Illinois at Urbana-Champaign Tim Hanratty, University of Illinois at Urbana-Champaign Jiawei Han, University of Illinois at Urbana-Champaign Abstract: Understanding human mobility is of great importance to various applications, such as urban planning, traffic scheduling, and location prediction. While there has been fruitful research on modeling human mobility using tracking data (e.g., GPS traces), the explosively growing geo-tagged social media (GSM) brings new opportunities to this task because of its sheer size and multi-dimensional nature. Nevertheless, how to obtain quality mobility models from the highly sparse and complex GSM data remains a challenge that cannot be readily addressed by existing techniques. We propose GMove, a group-level mobility modeling method for GSM data. Our key insight is that, the GSM data usually contains multiple user groups, where the users within the same group share significant movement regularity. Meanwhile, user grouping and mobility modeling are two intertwined tasks: (1) better user grouping offers better within-group data consistency and thus leads to more reliable mobility models; and (2) better mobility models serve as useful guidance that helps infer the group a user belongs to. GMove thus alternates between user grouping and mobility modeling, and generates an ensemble of Hidden Markov Models (HMMs) to characterize group-level movement regularity. Furthermore, to reduce text sparsity of GSM data, GMove also features a text augmenter. The augmenter computes keyword correlations by examining their spatiotemporal distributions. With such correlations as auxiliary knowledge, it performs sampling-based augmentation to alleviate text sparsity and produce high-quality HMMs. Our extensive experiments on two real-life data sets demonstrate that GMove can effectively generate meaningful group-level mobility models. Moreover, with context-aware location prediction as an example application, we observe that GMove significantly outperforms baseline mobility models in terms of prediction accuracy. More on http://www.kdd.org/kdd2016/ KDD2016 Conference will be recorded and published on http://videolectures.net/
Views: 247 KDD2016 video
[PURDUE MLSS] Optimization for Machine Learning by S.V.N Vishwanathan (Part 5/5)
 
58:43
Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/vishy.pdf Optimization for Machine Learning Machine learning poses data driven optimization problems. Computing the function value and gradients for these problems is challenging because they often involves thousands of variables and millions of training data points. This can often be cast as a convex optimization problem. Therefore, a lot of recent research has focused on designing specialized optimization algorithms for such problems. In this talk, I will present a high level overview of a few such algorithm that were recently developed. The talk will be broadly accessible and will have plenty of fun pictures and illustrations! See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 468 Purdue University
Software Defined Network for seamless authentication in heterogeneous networks
 
10:19
BabelTEN Panel discussion "Software Defined Network for seamless authentication in heterogeneous networks" - CEO Giovanni Guerri at IEEE LANMAN Bruxelles April 19th 2013
Views: 75 Guglielmo RD
[PURDUE MLSS] Privacy Issues with Machine Learning: Fears, Facts, and Opportunities by Chris Clifton
 
01:21:39
Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/clifton.pdf Privacy Issues with Machine Learning: Fears, Facts, and Opportunities Increasing collection of data, and increased ability of machines to understand it, have lead to highly public privacy concerns. Call it "data mining" instead of "machine learning", and you might even find your funding cut... This talk will briefly review the history and legal background of privacy issues in machine learning. We will then look at a sampling of specific challenges and solutions where privacy concerns and machine learning interact. Each will be capped with a discussion of new research opportunities and what it takes to work in the area. See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 387 Purdue University
Structured Learning from Heterogeneous Behavior for Social Identity Linkage
 
14:37
2015 IEEE Transaction on Knowledge and Data Engineering For More Details::Contact::K.Manjunath - 09535866270 http://www.tmksinfotech.com and http://www.bemtechprojects.com Bangalore - Karnataka
Views: 535 manju nath
Distributed Cooperative Caching in Social Wireless Networks
 
00:53
To get this project in ONLINE or through TRAINING Sessions, Contact: JP INFOTECH, 45, KAMARAJ SALAI, THATTANCHAVADY, PUDUCHERRY-9 Landmark: Opposite to Thattanchavady Industrial Estate, Next to VVP Nagar Arch. Mobile: (0) 9952649690 , Email: [email protected], web: www.jpinfotech.org Blog: www.jpinfotech.blogspot.com Distributed Cooperative Caching in Social Wireless Networks This paper introduces cooperative caching policies for minimizing electronic content provisioning cost in Social Wireless Networks (SWNET).SWNETs are formed by mobile devices, such as data enabled phones, electronic book readers etc., sharing common interests in electronic content, and physically gathering together in public places. Electronic object caching in such SWNETs are shown to be able to reduce the content provisioning cost which depends heavily on the service and pricing dependences among various stakeholders including content providers (CP), network service providers, and End Consumers (EC). Drawing motivation from Amazon's Kindle electronic book delivery business, this paper develops practical network, service, and pricing models which are then used for creating two object caching strategies for minimizing content provisioning costs in networks with homogenous and heterogeneous object demands. The paper constructs analytical and simulation models for analyzing the proposed caching strategies in the presence of selfish users that deviate from network-wide cost-optimal policies. It also reports results from an Android phone based prototype SWNET, validating the presented analytical and simulation results.
Views: 670 jpinfotechprojects
[PURDUE MLSS] Classic and Modern Data Clustering by Marina Meilă (Part 8/8)
 
55:50
Lecture slides: http://learning.stat.purdue.edu/mlss/_media/mlss/meila.pdf Abstract of the lecture: Clustering, or finding groups in data, is as old as machine learning itself. However, as more people use clustering in a variety of settings, the last few years we have brought unprecedented developments in this field. This tutorial will survey the most important clustering methods in use today from a unifying perspective, and will then present some of the current paradigms shifts in data clustering. See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 283 Purdue University
2011-08-31 CERIAS - Non-homogeneous anonymizations
 
57:10
Recorded: 08/31/2011 CERIAS Security Seminar at Purdue University Non-homogeneous anonymizations Tamir Tassa, The Open University, Israel Privacy Preserving Data Publishing (PPDP) is an evolving research field that is targeted at developing anonymization techniques to enable publishing data so that privacy is preserved while data distortion is minimized. Up until recently most of the research on PPDP considered partition-based anonymization models. The approach in such models is to partition the database records into groups and then homogeneously generalize the quasi-identifiers in all records within a group, as a countermeasure against linking attacks. We describe in this talk alternative anonymization models which are not based on partitioning and homogeneous generalization. Such models extend the set of acceptable anonymizations of a given table, whence they allow achieving similar privacy goals with much less information loss. We shall briefly review the basic models of homogeneous anonymization (e.g. k-anonymity and l-diversity) and then define non-homogeneous anonymization, discuss its privacy, describe algorithms and demonstrate the advantage of such anonymizations in reducing the information loss. We shall then discuss the usefulness of those models for data mining purposes. In particular, we will show that the reduced information loss that characterizes such anonymizations translates also to enhanced accuracy when using the anonymized tables to learn classification models. Based on joint works with Aris Gionis, Arnon Mazza, Mark Last and Sasha Zhmudyak Tamir Tassa is a member of the Department of Mathematics and Computer Science at The Open University of Israel. Previously, he served as a lecturer and researcher in the School of Mathematical Sciences at Tel Aviv University, and in the Department of Computer Science at Ben Gurion University. During the years 1993-1996 he served as an assistant professor of Computational and Applied Mathematics at University of California, Los Angeles. He earned his Ph.D. in applied mathematics from the Tel Aviv University in 1993. His current research interests include cryptography, privacy preserving data publishing and data mining. (Visit: www.cerias.purude.edu)
Views: 191 ceriaspurdue
[PURDUE MLSS] Classic and Modern Data Clustering by Marina Meilă (Part 3/8)
 
55:05
Lecture slides: http://learning.stat.purdue.edu/mlss/_media/mlss/meila.pdf Abstract of the lecture: Clustering, or finding groups in data, is as old as machine learning itself. However, as more people use clustering in a variety of settings, the last few years we have brought unprecedented developments in this field. This tutorial will survey the most important clustering methods in use today from a unifying perspective, and will then present some of the current paradigms shifts in data clustering. See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 312 Purdue University
[PURDUE MLSS] Classic and Modern Data Clustering by Marina Meilă (Part 7/8)
 
49:06
Lecture slides: http://learning.stat.purdue.edu/mlss/_media/mlss/meila.pdf Abstract of the lecture: Clustering, or finding groups in data, is as old as machine learning itself. However, as more people use clustering in a variety of settings, the last few years we have brought unprecedented developments in this field. This tutorial will survey the most important clustering methods in use today from a unifying perspective, and will then present some of the current paradigms shifts in data clustering. See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 158 Purdue University
[PURDUE MLSS] The MASH project by Francois Fleuret
 
51:55
Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/fleuret.pdf The MASH project. An open platform for the collaborative development of feature extractors. The MASH project is a European research initiative that aims at developing novel tools for the design of complex machine learning systems. The project's main objective is to create an open-source web platform allowing the collaborative development of complex families of image feature extractors for scene understanding and vision-based goal-planing algorithms. Performance will be measured on standard image classification and object detection data sets, and using a real robotic arm, and a 3d simulator for goal-planing tasks. I will present the motivations behind the project, the scientific and technical challenges to address, and the current status of the platform. I will then show how to participate and some early results. http://mash-project.eu/ See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 381 Purdue University
[PURDUE MLSS] Graphical Models for the Internet by Alexander Smola (Part 6/8)
 
53:28
Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/smola.pdf Graphical Models for the Internet Information extraction from webpages, social networks, news and user interactions crucially relies on inferring the hidden parameters of interaction between entities. For instance, in factorization models for movie recommendation we are interested in the underlying hidden properties of users and movies respectively such as to suggest new movies. Likewise, when extracting topics from webpages we want to find the hidden topics representing documents and words. Finally, when modeling user behavior it is worth while finding the latent factors, cluster variables, causes, etc. that drive a user's interaction with websites. All these problems can be described in a coherent statistical framework. While much has been published about how to deal with these problems at moderate sizes, there is little information available on how to perform efficient scalable estimation at the scale of the internet. In this tutorial we present both the theory and algorithms for achieving these goals. In particular, we will describe inference algorithms for collaborative filtering, recommendation, latent Dirichlet allocation, and advanced clustering models. The course will cover basic issues of inference with graphical models and give a self-contained tutorial. See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 368 Purdue University
[PURDUE MLSS] Survey of Boosting from an Optimization Perspective by Manfred K. Warmuth (Part 6/6)
 
56:26
Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/warmuth.pdf Survey of Boosting from an Optimization Perspective Boosting has become a well known ensemble method. The algorithm maintains a distribution on the binary labeled examples and a new base learner is added in a greedy fashion. The goal is to obtain a small linear combination of base learners that clearly separates the examples. We focus on a recent view of Boosting where the update algorithm for distribution on the examples is characterized by a minimization problem that uses a relative entropy as a regularization. The most well known boosting algorithms is AdaBoost. This algorithm approximately maximizes the hard margin, when the data is separable. We focus on recent algorithms that provably maximize the soft margin when the data is noisy. We will teach the new algorithms, give a unified and versatile view of Boosting in terms of relative entropy regularization, and show how to solve large scale problems based on state of the art optimization techniques. We also discuss lower and upper bounds on the number of iterations required for any greedy boosting method and propose a way to circumvent these lower bounds. Joint work with S V N Vishwanathan. See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all
Views: 334 Purdue University
Knowledge mining with Neuromation
 
40:30
Democratizing access to the tools of Artificial Intelligence, generating synthetic data for deep learning applications, Neuromation puts to good use the mining power of the computers securing blockchain networks. In this conversation with Andrew Rabinovich, Director of Deep Learning at Magic Leap, David Orban discusses the implications of this approach for the development of innovative decentralized AI applications.
Views: 985 David Orban

Fosamax 10mg
Annovera and toujeo solostar drug interactions
Evista generic 2019 jeep
Propranolol 80mg Pills 180 $420
Black gay and lesbian films