Analyze how to make clustering of human relationship in "comic story"

Abstract

Recently I got to know the know Sociograms.
It means we can express 2 people's far/near relationship concisely only if we define the familiar score eij.*1
I wonder if it can apply to the relationships in "comic story", but there is no hint.
Anyway, let's apply it to "comic story".
(Note: this report is only for my hobby, so I'm not responsible for the content of this report. I'm pleased if you enjoy it as a kind of thought experiment.)

How to analyze

If we define the familiar score as e_{ij}, we can express the largeness of familiar score for the Euclidean distance between 2 points. (When i=je_{ij}=0.)

For examile, the largeness of familiar score is expressed as the below formula.
Q=-\sum_{i}^{n}\sum_{j}^{n}{e_{ij}(x_i-x_j)^2}
If we can maximize the value of Q under the conditions that the average is 0 and the variance is constant.
It means it is enough to find x_i which leads to {\rm max} Q.

When we define Q^\ast=Q/\sigma_x\sigma_x is standard deviation), the condition changes to、
\frac{\partial Q^\ast}{\partial x_i}=0.

Finally, if we define the number T_i=\sum_{j}^{n}{{(e}_{ij}+e_{ji})}S_{ij}=-(e_{ij}+e_{ji}), the above formula can be solved if we can solve the below formula.
 \left(\begin{matrix}T_1&S_{12}&\cdots&S_{1n}\\S_{21}&T_2&\cdots&S_{2n}\\\vdots&\vdots&\ddots&\vdots\\S_{n1}&S_{n2}&\cdots&T_n\\\end{matrix}\right)\left(\begin{matrix}x_1\\x_2\\\vdots\\x_n\\\end{matrix}\right)=0


Target of analysis

I wondered what "comic story" is appropriate for this analysis.
Though my knowledge for "comic story" is not so large, I imagined one story.

The main characters are seemingly 5 members such as hero, heroin A, heroin B, rival, classmate.
At first heroin A is interesting in hero, but heroin A doesn't take care of hero so much.
As the story proceeded, the heroin B who likes only hero appears, and heroin A started to be confused.
Maybe if a reader find the similar story in your mind, I think it is right because the ideal story in your own mind.
(By the way, I used "hero" and "heroin", but I don't have any intension for gender. It is like Bob and Alice in quantum computer.)


Preparation

If we want to write a sociogram, the table of familiar score e_{ij} is necessary.
(The score is from someone to other, it is certainly Asymmetry.)
And I think it might be more impressive if time scale is included because the time sequence get clear.
So, I express images of table of familiar score e_{ij} as below.

Table 1. Image of familiar score for 1st story

Table 2. Image of familiar score for 10th story


Table 3. Image of familiar score for the last story

As Table. 1-3 show, the heroin A's familiar score to hero has increased as the story proceeded.
On the other hand, heroin B is approaching to hero from the mid-story.
Also, Classmate and rival can't get chance to approach to hero or heroins.


Anyway, I start to make matrix from the above tables, and let's solve simultaneous equations.
The computing is tough for me, I used a computer which let us the result of computing.
The result is as below.


Figure 1. Result of computing

It doesn't make sense for human's eye.
I try to express the result on number line.


Figure 2. Clustering on number line


By using Figure 2, we can find the similarity between any two members.
At first heroin A and B don't take care of hero, however from the mid-term, they start to have likeness to hero.
Finally, in the last story, hero's position set in the middle between heroin A and B, it means the typical situation of "comic story", I think.
The 1980th Japanese comic books often show this relation ship in their story seemingly.

On the other hand, we can find the classmate and rival make a cluster far from cluster of hero and heroins.
I guess, Speaking of which also epitomize the stories of Japanese comic books.


Conclusion

As I see the result on number line, I'm surprised that it express the story of my image of "comic story" almost exactly.
I didn't decide the distance between 2 points arbitrarily, but the inherent vector led from random score of tables make the appropriate points.


The good point of this method is easy to expand for not only distance between 2 people but also for n people.
We can express the distance of 10 people easily for clustering analysis.

Also, we can apply this method for any kind score if the score is dual direction and labeled.
The complex chart can be expressed more concisely.
For example, the member trade of base ball teams during the past 20 years will become one interesting example.
I hope someone will analyse the above example..