Analytics for Amateurs

Welcome to Data Analaysis, this will require some level of computer knowledge.

Difficulty : Medium - Runs Best In Chrome

Data Analysis and The Computing Behind it

Created Using HTML4

Today we are going to learn:

EDA

Stem and Leaf

Visual representations

Uses For Data Analysis

Introduction

Data analytics refers to qualitative and quantitative techniques and processes used

to enhance productivity and business gain. Data is taken and categorised

to identify and analyse data and patterns, and techniques vary according to

organisational requirements.

Data Analysis can been displayed using pencil and paper

and in computer spreadsheets also in different ways.

Like EDA - Explitory Data Analysis

Rules

There are some limitations to Data and here they are:

1. It must display and visually reveal the behavior of the data and the struture of the analyses.

2.Resustance ensures that afew extraordairy data values do not unduly influence the results of analysis.

3.You must terminate any iterative processes - Constant Uneeded processes

4. Use Economical Theory (and common sense)

5. Avoid typing 111 errors - Right answer to wrong question

6.Know Context

7.Inspect Data

8. Keep it Simple

9. Costs and benefits of data mining (the practice of examining large pre-existing databases in order to generate new information.)

10.Report Sensitivity analysis

11. Comprimise when needed

12.Don't confuse statistical significance with magnitude.

----------------------------------------------------------------------------------------------------------------------------

Quick Info

Algorithms are a process or set of rules to be followed in calculations or other problem-solving operations especially by a computer.

The top algorithm voted by kdnuggets.com is C.45.

-----------------------------------------------------------------------------------------------------------------------------

You may think this is hard and looks alot like an large line graph but it is suprisingly simple putting in data values into a spreadsheet and eliminating uneeded information.

That is the first step. The second is to put the data to use.- For Example - Using it in software or hardware develpoment or determine how to add new features to the software/hardware.

Algorithms

C4.5 constructs a classifier in the form of a decision tree.

In order to do this, C4.5 is given a set of data representing things that are

already classified.

Wait whats a classifier?

A classifier is a tool in data mining that takes a bunch of data representing things

we want to classify and attempts to predict

which class the new data belongs to.

Example

You are a doctor, you have a bunch of patients.

You know their age,pulse and blood pressure.

These are called attributes.

Now.

Given these attributes, we want to predict whether the patient will get a disease.

The patient can fall into 1 of 2 classes: will get a disease or won’t get a disease.

C4.5 is told the class for each patient.

Using the set of a patients attributes C4.5 constructs a decsion that can predict the class

for each patient based on the data given.

Decision trees

Decision tree learning creates something similar to a flowchart to determine new data.

Using the same patient example, one particular path in the flowchart could be:

Patient has a history of having this disease.

Patient is expressing a gene highly correlated with diseased patients

Patient has tumors

Patient’s tumor size is greater than 5

At each point in the flowchart is a question about the value of some attribute,

and depending on those values, he or she gets classed as having the disease.

You can find lots of examples of decision trees.

if this was put into visual display it would be stem and leaf display.

Why use C4.5?

Arguably, the best selling point of decision trees

is their ease of interpretation and explanation.

They are also quite fast, quite popular

also it is easily read.

How to put it into visual display

This works for computers and laptops only.

Run Microsoft Excel or Mac Spreadsheets

or you can open Microsoft Word or Mac Wordpad

and draw a _ column table for speed

and in one colmn put in the name of your

first class

in the second column your second class

and so on...

these will be your catagories.

You can put your information in manually

or you can buy a algorithm bot or get one free.

Then put this information into a caroll diagram

This will show you attributes.

Instead of Yes and No you can put in the severity or possibility in number out of 100.

Divide the fractions by 100 and ?/1 numbers are now just cardinal numbers,

now X them by the number of catagories X them by 100 (if in decimals) to get the % chance of

getting into each category.

EDA - Exploratory Data Analysis

In statistics, exploratory data analysis (EDA) is an approach to analysing data sets

to summarize their main characteristics, often with visual methods.

A statistical model can be used or not, but primarily EDA is for seeing what

the data can tell us beyond the formal modeling or hypothesis testing task.

How to display EDA

EDA is stem & leaf (or C4.5) display put into a line graph.

Pretty simple...

Uses for Data Analysis

By now you know the basics of data analysis,but it seems

pointless. There are many uses for data analyis:

In hosbital-determining the chances of diseases that will infect a patient

Movies,TV and Brands-Determining if they will do well in the future.

Court-Chances of Guilty or Not

And much more...

HTML5.