dc.description.abstract | Yahoo! Answers, Quora like Community Question Answering (CQA) services are mainly created to remove the limitation of Web search engines by helping users to get information from a community. This CQA system has the so many questions in its memory with possible number of answer. And number of times the questions are repeated. So, if the CQA system understand the user intent of question it helps it to recognize similar kind of questions, find relevant answers and hence, recommend potential answers more effectively and effectively.
So, thesis approach is to classify the CQA questions, according to user intent, into three categories: objective, subjective, and social. So, to understand the user intent of questions, we first find the text features and metadata features and then through the machine learning algorithms we build a predictive model that classify the questions into above three categories. This one is supervised learning model. We have a very limited number of labeled questions and large number of unlabeled questions. So, to improve the question classification we also use the co-training, a semi supervised learning algorithm, which uses a small set of labeled questions plus a large number of unlabeled questions for classification. Our results shows that the co-training approach that regards text features and metadata features as two views works better than the supervised learning approach that simply applying these two types of features together. This is because co-training, as a semi-supervised learning method, can make use of a large amount of unlabelled questions in addition to the small set of labeled questions. | |