Market Analysis Cheat Sheet
Session 1 - Segementation
- Data Types:
- Geo-demographics - 用户基础信息
- Psychograhics - 用户心理
- Behavorial - 用户行为
- Benefits & Needs - 用户需求

- How (to do segementation) ? - Cluster analysis(聚类)
- Hierarchical Clustering - Recursively group entities based on how similar they are
- 计算所有点之间的举例 (Euclidean Distance)
- Select Min {Dij} and join i and j at that distance
- 如何计算剩余的点到已经构成的组的距离?
- Minimum (single) linkage - Distance to Closest Point
- Average linkage - Average Distance over All Points
- Maximum (complete) linkage - Distance to Furthest point
- Ward linkage - Minimize the within-cluster variance
- Add {4} to {1,2} to form cluster {1,2,4} , Distance = variance of {1,2,4} – (variance of {1,2} + variance of {4})
- K-Means - Minimize within-cluster variance, maximize between cluster variance (k centers)
- Initialize centroids(centers) (End Result depends on initialization)
- Assign points (observations) to the nearest centroid
- Re-compute centers
- Stop when no change
- K = ? though ?
- Inertia - measure of how internally coherent clusters are, lower = better (Always decreases with the number of clusters)

- Elbow Plot: Increase the number of clusters and monitor the inertia
- Ratio Plot: Increase the number of clusters and monitor (total between sum of squares/total sum of squares)
- 这些统称为Determined by Fit, 另一种方法是Determine by Interpretability (capture meaningful differences)
- Characteristics of ideal segments: Large, Identifiable, Distinctive, Stable(LIDS), more importantly - actionable
- Chi-square Test
- Determine whether a difference between two categorical variables is due to chance or a relationship between them
- Expected count = (row total) * (column total) / total sample size

- with degrees of freedom = (# of rows - 1)(# of columns - 1)
- Reject the null when p-value of
- Hierarchical Clustering - Recursively group entities based on how similar they are
Session 6
Utility function
- Consumers preferences for alternatives are represented by utility functions. Rational consumers choose the alternative with the highest utility.
- Utility = F(Consumer Characteristics,Alternative Attributes) is deterministic(consistent)
- To simulate real world inconsistency, we add $$ {U}{ij} = {V}{ij} +{\epsilon}_{ij} $$ where e_ij represents total impact of all unobserved attributes and demographics relevant to a given choice occasion(stochastic part)
- For ‘Alternative Attributes’ - coefficients are the same across alternatives
- For ‘Character Charateristic’ - coefficients are the different across alternatives
- e_ij varies across alternatives j and across consumers i, but can be assumed coming from a probability distribution (Gumbel distribution)
- 所以Customer i 选择选项 j 的概率是 exp(i 对 j的utility) / exp(i 对 所有选项的utility之和)
- Identification: Only differences in utility matter
- Need to set one alternative specific constant to zero
- Need to set the coefficients of the individual characteristics to zero for one alternative
Elastics & IIA
Session 7
4 Basic Approach: Simple Summaries, Sentiment Analysis, Topic Modeling, Large Language Models.
Sentiment Analysis
Lexicon-based Sentiment Analysis = Classification of words
- One popular choice LIWC = Linguistic Inquiry and Word Count
- Logistic Regression on Y = churn and X = LIWC Proportions (Page 122)
Topic Modeling
Automatic summarization of documents through topics(set of commonly co-occurring words)
Most Common Model = Latent Dirichlet Allocation (LDA)
- Output 1: Which words belong to which topics (i.e., what are the topics)?
- Output 2: Which topics best describe each documenti.e., what percentage of the words in a given document are from topic 1, topic 2, …)?
- Perplexity = measure of predictive performance for a language model (Lower = Better)

Large Language Model
A category of foundation models (large deep learning model trained on generalized and unlabeled data and used as a starting point for other models) trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks
- Word Embeddings = representation of a word as a vector of numbers (Able to perform word algebra)
- Similar meaning = similar representation
- Some uses in marketing
- Text summarization – summarizing customer reviews, complaints, etc.
- Sentiment analysis – extracting more nuanced sentiment (e.g., granular emotions)
- Text generation – creating product descriptions, social media posts, emails
Session 8
products are bundles of independent attributes “Products = the sum of their parts”
Attributes is consist of levels. Combination of levels form profile. Value derived from a level is part-worth. Total part-worth is utility.
- How do we estimate consumers’ part-worths?
- Step 1: Ask them to rate many potential profiles.
- Which profiles? Fractional factorial design - The minimal number of questions to get the information we need(Design determined by the number of attributes / levels)
- Step 2: Analyze the data!
- Step 1: Ask them to rate many potential profiles.
Conjoint analysis
- Rating based Conjoint = Multiple Regression - Result Coefficients = Part-worth
- Baseline = 0
- Importance = Range (Range / Sum of Range)
- We can use these coefficients to predit newcomers
- Limitations: You can never include all the attributes
- Other Approaches
- Eye Tracking
- Learn Preferences Faster / Better
Diffusion of Innovation
- The Bass model - Predict adoption curve, Number of adopters in period " = Adoption rate in period " x Number of potential adopters in period "(传染模型)
- Application: Can predict both our innovation and a innovation we relies on


Session 9
Bass model
m = Demographic data, p, q: Historical analysis of analogous innovations
Factors to take into account when evaluating analogies: • Environmental situation • Market structure • Buyer behavior • Marketing mix strategy • Characteristics of innovation itself
Diffusion Speed Has Generally Increased Over Time
Generative AI
生成式人工智能(Generative AI) 是一种能够生成新内容(如文本、图像、音频、视频等)的人工智能技术。与传统的判别式AI不同,生成式AI不仅能够识别和分类数据,还能基于已有的数据创作出全新的、原创性的内容
- Generative Models
- Generative Adversarial Networks (GANs) 生成对抗网络
GANs由生成器(Generator)和判别器(Discriminator)组成,生成器负责生成逼真的数据样本,判别器则区分生成的数据与真实数据。(图像相关) - Variational Autoencoders (VAEs) 变分自编码器
通过编码器将输入数据映射到潜在空间,再通过解码器从潜在空间生成新数据 (数据相关) - Transformers 基于注意力机制的深度学习模型
Transformer 是一种用于处理序列数据(如文本、音频、时间序列等)的神经网络架构。Transformer 完全基于注意力机制(Attention Mechanism),无需依赖序列的顺序处理,从而实现更高的并行化效率和更好的性能。
- Generative Adversarial Networks (GANs) 生成对抗网络
- Applications in marketing
- Perceptual Maps (Classification)
- WTP
- Prompt Engineering - providing clear instructions to a generative model to get what you want
Explainable AI
Session 10
Pricing + Placing
Session 11
A/B test
Session 12
Bias & Fairness
源课件下载
Market Analysis Session 1
Market Analysis Session 2-5
Market Analysis Session 6
Market Analysis Session 6-12
Business Analysis Session 1-10
- 标题: Market Analysis Cheat Sheet
- 作者: Konata
- 创建于 : 2025-11-28 15:54:08
- 更新于 : 2025-11-28 16:40:14
- 链接: http://blog.suzumiyaharuhi.net/2025/11/28/Ma-Cheat-Sheet/
- 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。





