DataTaunew | comments | leaders | submitlogin
Show Me the Data: Using Graphics for Exploratory Data Analysis (insightdatascience.com)
13 points by johnjoo 3382 days ago | 3 comments


3 points by rent0n 3381 days ago | link

Does anyone know how the PCA scatter plot with histograms was generated? In other words how did he add the histograms to the scatter plot?

-----

2 points by astrobiased 3376 days ago | link

Here's the code I used to generate the PCA plot.

    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.decomposition import PCA as sklearnPCA
    
    sklearn_pca = sklearnPCA(n_components=2)

    tmp = np.array(df) #df is a Pandas DataFrame
    proj = sklearn_pca.fit_transform(tmp)

    sns.set_style("white")
    sns.set_context('talk')

    g = sns.JointGrid(proj[:,0], proj[:,1], space=0, size=8)
    g.plot_marginals(sns.distplot, kde=False, color=".7", bins=30)
    g.plot_joint(plt.scatter, color=".5", edgecolor="none", alpha=1)
    g.set_axis_labels(xlabel='PC1', ylabel='PC2')

-----

1 point by isms 3381 days ago | link

Check out `jointplot` in the `seaborn` Python library:

Docs: http://stanford.edu/~mwaskom/software/seaborn/generated/seab...

Example with hexbins: http://stanford.edu/~mwaskom/software/seaborn/examples/hexbi...

Example with kernel density estimate: http://stanford.edu/~mwaskom/software/seaborn/examples/joint...

-----




RSS | Announcements