Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/151
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorJotwani, Naresh D.
dc.contributor.authorPaluri, Santosh Kumar
dc.date.accessioned2017-06-10T14:37:10Z-
dc.date.available2017-06-10T14:37:10Z-
dc.date.issued2007
dc.identifier.citationPaluri, Santosh Kumar (2007). Web content outlier detection using latent semantic indexing. Dhirubhai Ambani Institute of Information and Communication Technology, vii, 36 p. (Acc.No: T00114)
dc.identifier.urihttp://drsr.daiict.ac.in/handle/123456789/151-
dc.description.abstractOutliers are data elements different from the other elements in the category from which they are mined. Finding outliers in web data is considered as web outlier mining. This thesis explores web content outlier mining which finds applications in electronic commerce, finding novelty in text, etc. Web content outliers are text documents having varying contents from the rest of the documents taken from the same domain. Existing approaches for this problem uses lexical match techniques such as n-grams which are prone to problems like synonymy (expressing the same word in different ways), which leads to poor recall (an important measure for evaluating a search strategy). In this thesis we use Latent Semantic Indexing (LSI) to represent the documents and terms as vectors in a reduced dimensional space and thereby separating the outlying documents from the rest of the corpus. Experimental results using embedded outliers in chapter four indicate the proposed idea is successful and also better than the existing approaches to mine web content outliers.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectContent analysis
dc.subjectCommunication
dc.subjectData mining
dc.subjectWeb sites
dc.subjectWeb databases
dc.subjectSemantics
dc.subjectSemantics of data
dc.subjectSemantic database models
dc.classification.ddc006.312 PAL
dc.titleWeb content outlier detection using latent semantic indexing
dc.typeDissertation
dc.degreeM. Tech
dc.student.id200511015
dc.accession.numberT00114
Appears in Collections:M Tech Dissertations

Files in This Item:
File Description SizeFormat 
200511015.pdf
  Restricted Access
197.17 kBAdobe PDFThumbnail
View/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.