Following data presents the number of nucleotides of gene sequence (A, C, G, T). This is illustrated by the Zyxin gene which plays an important role in cell adhesion (Golub et al, 1999). The accession number (X94991.1) of one of its variants can be found in a data base like NCBI (UniGene). Given data will be used to illustrate the construction of pie chart from the frequency table of four nucleotides.
A
C
G
T
410
789
573
394
Data from the GenBank can also be imported directly by the following code.
# A tibble: 20 × 2
Student Grade
<dbl> <chr>
1 1 A
2 2 B
3 3 B
4 4 C
5 5 A
6 6 D
7 7 F
8 8 C
9 9 B
10 10 D
11 11 F
12 12 A
13 13 B
14 14 B
15 15 C
16 16 D
17 17 C
18 18 B
19 19 C
20 20 D
# A tibble: 20 × 3
Student Gender RS
<dbl> <chr> <chr>
1 1 Male B
2 2 Male NB
3 3 Female NB
4 4 Female B
5 5 Female B
6 6 Female NB
7 7 Male NB
8 8 Male B
9 9 Male NB
10 10 Male NB
11 11 Female NB
12 12 Female B
13 13 Male NB
14 14 Male NB
15 15 Female NB
16 16 Female NB
17 17 Male NB
18 18 Male NB
19 19 Male NB
20 20 Male B
ggplot( data =df5 , mapping =aes(x =Gender, fill =RS))+geom_bar(position ="dodge")+scale_y_continuous(expand =c(0, 0))+labs(title ="Multiple Bar Chart", x ="Gender", fill ="Residental Status", y ="Frequency")+theme_bw()+theme(plot.title =element_text(hjust =0.5))
ggplot( data =df5 , mapping =aes(x =RS, fill =Gender))+geom_bar(position ="dodge")+scale_y_continuous(expand =c(0, 0))+labs(title ="Multiple Bar Chart", x ="Residental Status", fill ="Gender", y ="Frequency")+theme_bw()+theme(plot.title =element_text(hjust =0.5))
Source: OMB Statistical Policy Working Paper 22. https://www.hhs.gov/sites/default/files/spwp22.pdf Following data set consists of information concerning delinquent children. Recorded variables are Number of Delinquent Children by County and Education Level of Household Head.
# A tibble: 16 × 3
Delinquent EduLevel Freq
<fct> <fct> <dbl>
1 Alpha Low 15
2 Alpha Medium 0
3 Alpha High 5
4 Alpha Very High 0
5 Beta Low 20
6 Beta Medium 10
7 Beta High 10
8 Beta Very High 15
9 Gamma Low 5
10 Gamma Medium 10
11 Gamma High 10
12 Gamma Very High 0
13 Delta Low 10
14 Delta Medium 15
15 Delta High 5
16 Delta Very High 5
EduLevel
Delinquent Low Medium High Very High
Alpha 15 0 5 0
Beta 20 10 10 15
Gamma 5 10 10 0
Delta 10 15 5 5
Multiple Bar Charts
ggplot( data =df6 , mapping =aes(x =Delinquent, y =Freq, fill =EduLevel))+geom_bar(stat ="identity", position ="dodge")+scale_y_continuous(expand =c(0, 0))+labs(title ="Multiple Bar Chart", x ="Delinquent", fill ="Education Level", y ="Frequency")+theme_bw()+theme(plot.title =element_text(hjust =0.5))
ggplot( data =df6 , mapping =aes(x =EduLevel, y =Freq, fill =Delinquent))+geom_bar(stat ="identity", position ="dodge")+scale_y_continuous(expand =c(0, 0))+labs(title ="Multiple Bar Chart", x ="Education Level", fill ="Delinquent", y ="Frequency")+theme_bw()+theme(plot.title =element_text(hjust =0.5))
Component Bar Charts
ggplot( data =df6 , mapping =aes(x =Delinquent, y =Freq, fill =EduLevel))+geom_bar(stat ="identity")+scale_y_continuous(expand =c(0, 0))+labs(title ="Component Bar Chart", x ="Delinquent", fill ="Education Level", y ="Frequency")+theme_bw()+theme(plot.title =element_text(hjust =0.5))
ggplot( data =df6 , mapping =aes(x =EduLevel, y =Freq, fill =Delinquent))+geom_bar(stat ="identity")+scale_y_continuous(expand =c(0, 0))+labs(title ="Component Bar Chart", x ="Education Level", fill ="Delinquent", y ="Frequency")+theme_bw()+theme(plot.title =element_text(hjust =0.5))
Count Data
Example
The following data shows the number of notebook a sample of twenty students keeping.
The golub table contains gene expression values from 3051 genes taken from 38 Leukemia patients. Twenty seven patients are diagnosed as acute lymphoblastic leukemia (ALL) and eleven as acute myeloid leukemia (AML). The golub.gnames table contains information on the gene, including gene index, manufacturing ID, and biological name. Following table presents the gene expression value by their tumor type.
# A tibble: 38 × 2
genevalue tumortype
<dbl> <chr>
1 2.11 ALL
2 1.52 ALL
3 1.96 ALL
4 2.34 ALL
5 1.85 ALL
6 1.99 ALL
7 2.07 ALL
8 1.82 ALL
9 2.18 ALL
10 1.81 ALL
# ℹ 28 more rows
Frequency Distribution
df11<-df10%>%summarize( R =max(genevalue)-min(genevalue) , k =floor(1+3.3*log10(length(genevalue))) , h =R/k)df10Freq<-df10%>%mutate( Classes =cut( x =genevalue , breaks =df11$k , include.lowest =TRUE , right =FALSE))%>%count(Classes)%>%tidyr::separate(col =Classes, into =c("LB", "UB"), sep =",", remove =FALSE)%>%rename(f =n)%>%mutate( LB =readr::parse_number(x =LB) , UB =readr::parse_number(x =UB) , rf =f/sum(f) , pf =f/sum(f)*100 , cf =cumsum(f) , MidPoint =(LB+UB)/2)df10Freq