Today we are going to learn:
EDA
Stem and Leaf
Visual representations
Uses For Data Analysis
Data analytics refers to qualitative and quantitative techniques and processes used
to enhance productivity and business gain. Data is taken and categorised
to identify and analyse data and patterns, and techniques vary according to
organisational requirements.
Data Analysis can been displayed using pencil and paper
and in computer spreadsheets also in different ways.
Like EDA -
There are some limitations to Data and here they are:
1. It must display and visually reveal the behavior of the data and the struture of the analyses.
2.Resustance ensures that afew extraordairy data values do not unduly influence the results of analysis.
3.You must terminate any iterative processes - Constant Uneeded processes
4. Use Economical Theory (and common sense)
5. Avoid typing 111 errors - Right answer to wrong question
6.Know Context
7.Inspect Data
8. Keep it Simple
9. Costs and benefits of data mining (the practice of examining large pre-existing databases in order to generate new information.)
10.Report Sensitivity analysis
11. Comprimise when needed
12.Don't confuse statistical significance with magnitude.
----------------------------------------------------------------------------------------------------------------------------
The top algorithm voted by kdnuggets.com is C.45.
-----------------------------------------------------------------------------------------------------------------------------
You may think this is hard and looks alot like
an large line graph but it is suprisingly simple
putting in data values into a spreadsheet and eliminating uneeded information.
That is the first step. The second is to put the data to use.- For Example - Using it in software or hardware develpoment or determine how to add new features to the software/hardware.
Algorithms
C4.5 constructs a classifier in the form of a decision tree.
In order to do this, C4.5 is given a set of data representing things that are
already classified.
Wait whats a classifier?
A classifier is a tool in data mining that takes a bunch of data representing things
we want to classify and attempts to predict
which class the new data belongs to.
Example
You are a doctor, you have a bunch of patients.
You know their age,pulse and blood pressure.
These are called attributes.
Now.
Given these attributes, we want to predict whether the patient will get a disease.
The patient can fall into 1 of 2 classes: will get a disease or won’t get a disease.
C4.5 is told the class for each patient.
Using the set of a patients attributes C4.5 constructs a decsion that can predict the class
for each patient based on the data given.
Decision trees
Decision tree learning creates something similar to a flowchart to determine new data.
Using the same patient example, one particular path in the flowchart could be:
Patient has a history of having this disease.
Patient is expressing a gene highly correlated with diseased patients
Patient has tumors
Patient’s tumor size is greater than 5
At each point in the flowchart is a question about the value of some attribute,
and depending on those values, he or she gets classed as having the disease.
You can find lots of examples of decision trees.
if this was put into visual display it would be stem and leaf display.
Why use C4.5?
Arguably, the best selling point of decision trees
is their ease of interpretation and explanation.
They are also quite fast, quite popular
also it is easily read.
This works for computers and laptops only.
Run Microsoft Excel or Mac Spreadsheets
or you can open Microsoft Word or Mac Wordpad
and draw a _ column table for speed
and in one colmn put in the name of your
first class
in the second column your second class
and so on...
these will be your catagories.
You can put your information in manually
or you can buy a algorithm bot or get one free.
Then put this information into a caroll diagram
This will show you attributes.
Instead of Yes and No you can put in the severity or possibility in number out of 100.
Divide the fractions by 100 and ?/1 numbers are now just cardinal numbers,
now X them by the number of catagories X them by 100 (if in decimals) to get the % chance of
getting into each category.
EDA - Exploratory Data Analysis
In statistics, exploratory data analysis (EDA) is an approach to analysing data sets
to summarize their main characteristics, often with visual methods.
A statistical model can be used or not, but primarily EDA is for seeing what
the data can tell us beyond the formal modeling or hypothesis testing task.
How to display EDA
EDA is stem & leaf (or C4.5) display put into a line graph.
Pretty simple...
Uses for Data Analysis
By now you know the basics of data analysis,but it seems
pointless. There are many uses for data analyis:
In hosbital-determining the chances of diseases that will infect a patient
Movies,TV and Brands-Determining if they will do well in the future.
Court-Chances of Guilty or Not
And much more...
.
Introduction
Rules
Quick Info
Algorithms are a process or set of rules to be followed in calculations or other problem-solving operations especially by a computer.
How to put it into visual display