SOQDE: A Supervised Learning Based Question Difficulty Estimation Model for Stack Overflow

Document Type

Conference Proceeding

Publication Date



Prediction model, Question Difficulty, Reputation, StackOverflow


StackOverflow (SO), the most popular community Q&A site rewards answerers with reputation scores to encourage answers from volunteer participants. However, irrespective of the difficulty of a question, the contributor of an accepted answer is awarded with the same 'reputation' score, which may demotivate an user's additional efforts to answer a difficult question. To facilitate a question difficulty aware rewarding system, this study proposes SOQDE (Stack Overflow Question Difficulty Estimation), a supervised learning based Question difficulty estimation model for the StackOverflow. To design SOQDE, we randomly selected 936 questions from a SO datadump exported during September 2017. Two of the authors independently labeled those questions into three categories (basic, intermediate, or advanced), where conflicting labels were resolved through tie-breaking votes from a third author. We performed an empirical study to determine how the difficulty of a question impacts its outcomes, such as number of votes, resolution time, and number of votes. Our results suggest that the answers of a basic question receive more votes and therefore would generate more reputation points for an answerer. Due to less incentives relative to efforts spent by an answerer, intermediate and advanced questions encounter significantly more delays than the basic questions, which further validates the need of a model like SOQDE. To build our model, we have identified textual and contextual features of a question and divided them into two categories-pre-hoc and post-hoc features. We observed a model based on Random Forest achieving the highest mean accuracy (67.6%), using only answer-independent pre-hoc features. Accommodating answer-dependent post-hoc features, we were able to improve the mean accuracy of our model to 75.2%.

Recommended Citation

Hassan, Sk Adnan, Dipto Das, Anindya Iqbal, Amiangshu Bosu, Rifat Shahriyar, and Toufique Ahmed. "SOQDE: A Supervised Learning based Question Difficulty Estimation Model for Stack Overflow." In 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp. 445-454. IEEE, 2018.

DOI for the article