Skip to main content

One Click Exploratory Data Analysis

The Exploration Project feature of OneNineAI platform analyses the uploaded data in the following grounds.

  1. Data Overview
  2. Variable Analysis
  3. Interactions
  4. Correlations
  5. Missing Value Analysis

Data Overview


  • Dataset Statistics

    • Includes high level statistics about the data such as Dimension of the data, memory size, variable types etc.
  • Dataset Insights

    • Includes auto-generated intuitive verbal insights for easy and quick understanding

Data Overview

Variable Analysis


Numeric Variables

In case of numeric variables, the following insights can be obtained.

  • Summary Statistics

    • Provides important statistical information about the variable. It includes quantile statistics such as Minimum, Maximum, Q1, Q3, 5th Percentile, 95th percentile, Median, Range and IQR and Descriptive statistics such as Mean, Standard Deviation, Variance, Sum, Skewness, Kurtosis, Coefficient of Variation and the total sum
  • KDE Plot

    • A KDE plot is nothing but the Kernel Density Plot which visualizes the distribution of observations.
    • It displays the probability density at different values in a continuous variable
  • Normal Q-Q Plot

    • A Q-Q plot is a scatter plot created by plotting two sets of quantiles against one another.
    • If both sets of quantiles came from the same distribution, we should see the points forming a line that's roughly straight
  • Box Plot

    • The Box plot represents the summary statistics of the data in a graphical manner.

Categorical Variables

In case of categorical variables, the following insights can be obtained

  • Stats

    • Includes Length stats, Sample Stats and Letter Stats
  • Pie Chart

    • A frequency Pie chart representing the frequency of each category
  • Word Cloud

    • A visual representation of words from the categorical variable where the font size of the word is proportional to the frequency
  • Word Frequency

    • A bar plot representing the frequency of each category in descending order
  • Word Length

    • A bar plot of word lengths against the frequency

Interaction


  • Dynamic Interactive scatter plots between numeric variables. A scatter plot basically uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point

Scatter Plots

Correlations


  • Correlation denotes the extent of association between the variables.
  • Correlation can be measured using Correlation coefficients whose value range between -1 to 1.
  • When the correlation coefficient is -1, then it means that the variables are strongly inversely associated (When one increases, the other decreases) .
  • A correlation coefficient of 1 means that the variables are strongly directly associated (When one increases, the other also increases).
  • A correlation of 0 means that there is no correlation or association between the variables. There are a lot of ways in which we can calculate the correlation coefficient.
  • In this report, Correlation analysis with three different types of formulas can be obtained.
  • Pearson Correlation

    • This is the most common form of correlation which measures the linear relationship.
    • Suitable for variables that follow normal distribution.
  • Spearman Correlation

    • It is a non-parametric test that measures a monotonic relationship using ranked or ordinal data.
    • Does not require continuous data.
  • KendallTau Correlation

    • Similar to Spearman, Kendall’s is non-parametric meaning that it does not require the two variables to fall into a bell curve.
    • Kendall’s also does not require continuous data.

Missing Value Analysis


  • The number of missing values in each variable are represented using the various forms of visualizations such as Bar Chart, Spectrum Chart, Heat Map and Dendrogram

  • Bar Chart

    • In general, A bar chart represents data with rectangular bars where the heights or lengths proportional to the values that they represent.
    • Here, the count of of missing values are represented in terms of percentage
  • Spectrum

    • Similar to Bar chart, Spectrum also displays the count of missing values in a two dimensional space against the row indices
  • Heat Map

    • Generally, A heat map is a data visualization technique that shows magnitude of a missing values as color in two dimensions
  • Dendrogram

    • The dendrogram is a visual representation of the compound missing data.
    • The individual compounds are arranged along the bottom of the dendrogram and referred to as leaf nodes.
    • Compound clusters are formed by joining individual compounds or existing compound clusters with the join point referred to as a node