Lexicon-based Sentiment Analysis for Product Design and Development

This paper discusses the role of text mining and sentiment analysis in collecting and analyzing customers’ verbatim/voice of the customer for product design and development. In the illustration case of designing car underbody, the data were collected from a car online forum discussion website and processed using text mining techniques with “underbody” as a keyword. The result of the analysis finds there are worries regarding underbody durability against rust and damages. This finding is used as a reference point for the car underbody design process. DOI: https://doi.org/10.24002/ijieem.v3i1.4351


INTRODUCTION
The first thing to be addressed in product development is identifying customers' expectations on the would-be product. Customers' expectations and the product's derived value are important aspects of quality (Ikiz & Özdağoğlu, 2015;Mulay & Khanna, 2017). An appealing product has an impact on image formation and positive brand and product evaluations for customers (Kreuzbauer & Malter, 2005).
Customer expectations, often called Voice of Customer (VoC), are collected through surveys, interviews, focus group discussion, dedicated feedback form, and other tools. However, these methods face several setbacks in most situations, such as cost efficiency, time, and place limitations. To overcome these limitations, current technology allows many platforms for customers to share their views, thoughts, and experience through social media, product reviews, blogs, and forum discussions. These large data sets are valuable to enable researchers to find hidden patterns in customers' wants and needs.
Text mining is a method to extract knowledge from texts (Khedr et al., 2017). Text mining enables the information of what customers expect of a product to be extracted without going through lengthy and complicated methods when facing large sets of data (Ikiz & Özdağoğlu, 2015). Sentiment analysis is one of the text mining methods to determine the contextual polarity of writing, whether it expresses negative, neutral, or positive opinions (Shukri et al., 2015). The method explores customers' views, thoughts, and experiences on one or more keywords related to the product and helps frame customers' expectations.
This paper discusses the role of text mining and sentiment analysis in product design and development, especially its role in the idea screening stage. This concept is illustrated in the application of car underbody design. Previous research on underbody design mainly focuses on its technical performance, such as improving its aerodynamics (Buljac et al., 2016;Desai et al., 2008;Heidemann et al., 2018;Yuan & Wang, 2017). Other researchers also discuss tackling issues in the underbody such as squeak or using better materials to improve performance and tailor better to customer's specifications (Dittmar & Plaggenborg, 2019;Moos, 2014;H. Park, 2019). To the best of the Author's knowledge, there is no research on underbody design that specifically considers customers' wants in an underbody. This study tries to fill that gap by using text mining to collect VoC in underbody design to create an appealing product for customers.

Product development
In developing a new product, Kotler & Keller (2006) proposed eight steps of new product development, namely idea generation, where the idea is formulated through personnel brainstorming. The generated ideas are selected and narrowed down in the idea screening step. The concept development and testing step define a product's parameters and measures customers' purchase intention based on the selected idea. Detailed strategies are developed and analyzed through marketing strategy and business analysis steps. Product development, test marketing, and commercialization steps focus on the production of new products, testing new products in a trial market, and launching the new product. Finding customers' expectations of a product through VoC gives the idea generation process input.
Customers' verbatim is an essential resource for companies as they include customers' expectations and requirements and help companies improve their products or services (Aguwa et al., 2017). Text mining helps to remove the labor-intensive and time-consuming process of surveys, interviews, and other commonly-used data collecting for VoC. Their study on text mining as a supporting process for VoC clarification found that text mining techniques help to hasten VoC processes (Ikiz & Özdağoğlu, 2015).

Sentiment analysis
There are three levels where sentiment classification can be performed: document-level, sentence-level, and aspect and feature level (Singh & Sharma, 2013). As the name implies, document-level sentiment classification determines the polarity of an entire document and sentence-level for each sentence. Aspect and feature level sentiment classification classifies documents or sentences according to certain aspects of sentences.
In general, sentiment classifications are done by doing the following actions in order: document pre-processing, feature selection, feature extraction, and text classification (Shah & Patel, 2016). Feature extraction is a process of extracting new features based on the features generated in the feature selection phase. It can be done with unigram-based, bigram-based, unigrams with bigrams, and unigrams with POS (Abdulla et al., 2013;Alsaeedi & Khan, 2019;Shah & Patel, 2016). In unigrambased sentiment classification, the basic approach is to collect sentiment scores or labels associated with the unigram from a designated lexicon with an assigned polarity score (Dey et al., 2018).
The use of sentiment analysis in the idea generation step of product design and development has been discussed in previous studies. Jeong & Yoon (2016) used sentiment analysis to identify product development opportunities. Product features with high opportunity were able to be identified and used to formulate product development. Using sentiment analysis on feature-based review data, J. Park et al. (2018) identified features that should be given more importance in the improvement of tires. These studies show sentiment analysis's role in the idea generation step of new product development, either developing new products or improving existing products according to customers' expectations and requirements.

Underbody design
Underbody design is of great interest in the automotive industry (Desai et al., 2008). One of the reasons for this interest is related to the decrease in fuel consumption. Improved fuel economy could be achieved with an underbody design that aerodynamically reduces total drag force (Heidemann et al., 2018;Muyl et al., 2004;Yuan & Wang, 2017). Huminic & Huminic (2020) found that the use of curved diffusers in the underbody leads to small values of drag, and from a design point of view, it has little interference with other components of cars.
In their research of drag reduction device of a trailer underbody and base, Ortega & Salari (2004) found that curved base flaps were the most effective drag-reduction add-on device for a trailer. Meanwhile, Cho et al. (2017) did similar research on the sedan and found a combination of an undercover and side air dam had the best result in reducing aerodynamic drag. Synergy effect was obtained when combining the two devices resulting in the maximization of undercover's diffusing effect, thus increasing rear surface pressure and reduction in the wake region.

PROPOSED FRAMEWORK
This study is trying to explore the use of sentiment analysis to discover customers' expectations and requirements of the product. The requirements are used as guidelines to determine the direction of the idea in the first step of the product development phase, hence fulfilling customer expectations. The proposed method of this research is shown in Figure 1. The data are collected using the web scraping method on social media, online discussion forums, blogs, or product reviews to collect the customers' thoughts verbatim.
After the data are collected, common text preprocessing techniques, i.e., data cleaning, tokenization, stemming, and removing stop words, are used to clean the data. Data cleaning is a process to remove unwanted characters or gibberish from the text. Tokenization decomposes sentences into words or phrases called tokens (Ikiz & Özdağoğlu, 2015). Each token is returned to its root words by removing suffixes in the stemming process. Commonly used words (stop words) are removed to give more emphasis on words that define the meaning of the text. This process leaves only nouns and adjectives as tokens.
Once the data were pre-processed, advanced analysis techniques can be deployed for further inferences, i.e., assigning sentiments to each token using the sentiment lexicon package from Hu & Liu (2004). The labeled unigrams/tokens are ranked based on the frequency of the word, and they are used as the basis of idea generation.

ILLUSTRATED CASE STUD
The application of the proposed framework was illustrated in the design process of the car underbody. Several social media and forums were considered as data sources. The initial selection of social media and forums was based on Author's on-hand knowledge about the most popular social media and forum: Facebook, Twitter, and Reddit. The other forums were found using Google search with "car forum" or "forum mobil" as keywords.
The next step was to determine the criteria for website selection. The Author determined four criteria for website selection: the platform has to have more than one million users, specifically discusses cars, has an active discussion about car-related topics, and is not region-specific. The result of the selection is summarized in Table 1.
From the criteria selection table above, Reddit, specifically the r/Cars subforum, was chosen as the data source. Cen (2020) stated that Reddit offers better external validity compared to other niche web forms. One of the criteria used in scrapping Reddit data from Reddit was that the threads should at least have fifty comments to ensure the scraped data were from active threads.
The data were extracted with Rstudio using "underbody" as a keyword to search all threads containing the word. The extracted data were in a data frame illustrated in Table 2.

RESULT AND DISCUSSION
The use of sentiment analysis in the idea generation step of product development is illustrated with the process to design a car underbody. Gurusami (1999) stated there are fifteen criteria that need to be considered in designing underbody: investment, variable cost, timing, weight, customer satisfaction, NVH (noise, vibration, harshness), vehicle dynamics, rear crash performance, craftsmanship, durability, fuel tank, seating location, rear overhang, types of rear suspension, types of exhaust.
The result of sentiment analysis of keyword "underbody" in Figure 2 shows that "rust" is the most mentioned negative sentiment word. This result is in line with Rana et al.'s (2018) finding that underbody coating is prone to corrosion. Moreover, other words such as "damage" and "issues" are words related to the aforementioned durability criteria. We can see that although customers verbatim did not mention durability explicitly, we can infer these words fit into durability  criteria. On the other hand, "pretty" is the most mentioned positive sentiment word that we can infer will fit into aesthetics criteria that are not included in the aforementioned fifteen criteria. This finding shows that by scraping an online discussion forum and processing them using text mining, we got a voice of the customer that focuses more on underbody aesthetics aspects. Previous research mainly focuses on manufacturing and technicalities of underbody design. The proposed framework complements the existing design process by bringing the technical aspects and what the customers want together. By capturing customers' perceptions through specific words, companies are able to crosscheck whether their definition of criteria is aligned with customers' definitions of criteria. This finding is valuable in product design and development. It can be used as a guideline in narrowing down the options and directing the direction the product should take to meet customers' expectations. Although customer expectations can be fulfilled in a later phase by tailoring the specifications according to customers' wants (Moos, 2014), integrating the expectations in the idea generation phase will incur less cost of change (Folkestad & Johnson, 2001). In the idea generation phase of the illustrated case, the ideas focusing on underbody durability and aesthetics should be prioritized.

CONCLUSION
This research explores the role of sentiment analysis in the idea screening stage of product development. Text mining demonstrates an advantage time and cost-wise in VoC data collection and analysis. In the illustrated case discussed above, sentiment analysis contributes to the idea-generating stage of product development, specifically in narrowing down the options and determining the product's direction. This research also finds that sentiment analysis is able to verify companies' perspectives of criteria with customers' perspectives.
From Author's findings, there is a gap in the Author's knowledge around sentiments analysis in a car's underbody design process, namely the ambiguity of one-word sentiments. Further research on the use of n-grams in doing the sentiment analysis is hypothesized to be able to reduce this ambiguity. Buljac, A., Kozmar, H., & Džijan, I. (2016)