+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)


Molecular Data Analysis Using R

  • ID: 3641136
  • Book
  • February 2017
  • 352 Pages
  • John Wiley and Sons Ltd
1 of 3
This book addresses the difficulties experienced by wet lab researchers with the statistical analysis of molecular biology related data.  The authors explain how to use R and Bioconductor for the analysis of experimental data in the field of molecular biology.  The content is based upon two university courses for bioinformatics and experimental biology students (Biological Data Analysis with R and High–throughput Data Analysis with R). The material is divided into chapters based upon the experimental methods used in the laboratories.

Key features include:

 Broad appeal––the authors target their material to researchers in several levels, ensuring that the basics are always covered.

 First book to explain how to use R and Bioconductor for the analysis of several types of experimental data in the field of molecular biology.

 Focuses on R and Bioconductor, which are widely used for data analysis. One great benefit of R and Bioconductor is that there is a vast user community and very active discussion in place, in addition to the practice of sharing codes. Further, R is the platform for implementing new analysis approaches, therefore novel methods are available early for R users.

About the Authors
Csaba Ortutay
is a bioinformatician from Finland who has taught several bioinformatics courses at different European universities (Finland, Ireland, and Hungary) for over a decade. He is also active as a researcher publishing in the field of computational immunology.

Zsuzsanna Ortutay is a molecular immunologist at the University of Tampere, Finland, frequently utilizing diverse molecular lab methods.
Note: Product cover images may vary from those shown
2 of 3

Foreword, xiii

Preface, xv

Acknowledgements, xix

About the Companion Website, xxi

1 Introduction to R statistical environment, 1

Why R?, 1

Installing R, 2

Interacting with R, 2

Graphical interfaces and integrated development environment (IDE) integration, 3

Scripting and sourcing, 3

The R history and the R environment file, 4

Packages and package repositories, 4

Comprehensive R Archive Network, 5

Bioconductor, 6

Working with data, 7

Basic operations in R, 8

Some basics of graphics in R, 10

Getting help in R, 12

Files for practicing, 13

Study exercises and questions, 14

References, 14

Webliography, 15

2 Simple sequence analysis, 17

Sequence files, 17

FASTA sequence format, 18

GenBank flat file format, 19

Reading sequence files into R, 20

Obtaining sequences from remote databases, 21

Seqinr package, 21

Ape package, 22

Descriptive statistics of nucleotide sequences, 24

Descriptive statistics of proteins, 28

Aligned sequences, 31

Visualization of genes and transcripts in a professional way, 34

Files for practicing, 37

Study exercises and questions, 38

References, 38

Webliography, 39

Packages, 40

3 Annotating gene groups, 41

Enrichment analysis: an overview, 41

Overview of two different methods, 41

Enrichment analysis results, 42

Common aspects of the two different approaches, 43

Overrepresentation analysis, 46

Hypergeometric test using GOstats, 47

ORA analysis using topGO, 48

Enrichment analysis of microarray sets with topGO, 51

Gene set enrichment analysis, 52

GSEA with R, 56

Files for practicing, 61

Study exercises and questions, 61

References, 62

Webliography, 62

Packages, 63

4 Next–generation sequencing: introduction and genomic applications, 65

High–throughput sequencing background, 65

Experimental background, 66

Single–end and paired–end sequencing reads, 67

Assemble reads, 69

How many reads? Depth of coverage, 71

Storing data in files, 72


SAM and BAM files, 76

Variant call format files, 77

General data analysis workflow, 77

Data processing considerations, 78

Quality checking and screening read sequences, 80

Quality checking for one file, 82

Quality inspection for multiple files in a project, 82

Quality filtering of FASTQ files, 83

Handling alignment files and genomic variants, 84

Alignment and variation visualization, 88

Simple handling of VCF files, 89

Genomic applications: low– and medium–depth sequencing, 91

Aneuploidity sequencing and copy number variation identification, 92

SNP identification and validation, 92

Exome sequencing, 93

Genomic region resequencing, 93

Full genome and metagenome sequencing, 94

Files for practicing, 94

Study exercises and questions, 94

References, 95

Webliography, 97

Packages, 97

5 Quantitative transcriptomics: qRT–PCR, 99

Transcriptome, 99

Polymerase chain reaction, 100

Standards for qPCR, 102

R packages, 104

Understanding delta Ct, 104

Calculation of delta Ct, 105

Requirements for real delta Ct calculations, 107

Absolute quantification, 110

Value prediction, the professional way, 114

Relative quantification using the ddCt method, 115

Comparison of two conditions, 116

Comparison of multiple experimental conditions, 118

Quality control with melting curve, 121

Files for practicing, 123

Study exercises and questions, 123

References, 123

Webliography, 124

Packages, 124

6 Advanced transcriptomics: gene expression microarrays, 125

Microarray analysis: probes and samples, 125

Experimental background, 126

Archiving and publishing microarray data, 128

Minimum information standard, 128

Data preprocessing, 128

Accessing data from CEL files, 129

Quality control, 131

Normalization, 132

Differential gene expression, 133

Annotating results, 136

Creating normalized expression set from Illumina data, 138

Automated data access from GEO, 140

Files for practicing, 142

Study exercises and questions, 142

References, 143

Webliography, 144

Packages, 144

7 Next–generation sequencing in transcriptomics: RNA–seq experiments, 145

High–throughput RNA sequencing background, 145

Experimental background, 145

RNA–seq applications, 146

Differential expression with different resolutions, 147

Preparing count tables, 148

Alignment files to read counts, 148

Differential expression in simple comparison, 151

A naive t–test approach, 151

Single factor analysis with edgeR, 153

Differential expression with DESeq, 156

Complex experimental arrangements, 159

Experimental factors and design matrix, 160

GLM with edgeR, 161

GLMs with DESeq, 162

Heatmap visualization, 163

Files for practicing, 164

Study exercises and questions, 164

References, 165

Webliography, 166

Packages, 166

8 Deciphering the regulome: from ChIP to ChIP–seq, 167

Chromatin immunoprecipitation, 167

Experimental background, 168

Fragment analysis, 168

ChIP data in ENCODE, 169

ChIP with tiling microarrays, 169

High–throughput sequencing of ChIP fragments, 176

Connecting annotation to peaks, 181

Analysis of binding site motifs, 182

Files for practicing, 186

Study exercises and questions, 187

References, 187

Webliography, 188

Packages, 189

9 Inferring regulatory and other networks from gene expression data, 191

Gene regulatory networks, 191

Data for gene network inference, 192

Reconstruction of co–expression networks, 193

Gene regulatory network inference focusing of master regulators, 201

Integrated interpretation of genes with GeneAnswers, 207

Files for practicing, 211

Study exercises and questions, 212

References, 213

Packages, 214

10 Analysis of biological networks, 215

A gentle introduction to networks, 215

Networks and their components and features, 215

Random networks, 220

Biological networks, 221

Files for storing network information, 223

Important network metrics in biology, 227

Distance–based measures, 228

Degree and related measures, 230

Vulnerability, 231

Community structure of a network, 234

Graph visualization, 236

Cytoscape, 240

Files for practicing, 241

Study exercises and questions, 241

References, 242

Webliography, 243

Packages, 243

11 Proteomics: mass spectrometry, 245

Mass spectrometry and proteomics: why and how?, 245

File formats for MS data, 246

Accessing the raw data of published studies, 247

Identification of peptides in the samples, 249

Peptide mass fingerprinting, 249

Peptide identification by using MS/MS spectra, 250

Quantitative proteomics, 254

Getting protein–specific annotation, 258

Files for practicing, 259

Study exercises and questions, 259

References, 259

Webliography, 260

Packages, 260

12 Measuring protein abundance with ELISA, 261

Enzyme–linked immunosorbent assays, 261

Accessing ELISA data, 264

Concentration calculation with a standard curve, 264

Preparing reference data, 267

Fitting linear model, 268

Fitting of a logistic model, 269

Concentration calculations by employing models, 270

Comparative calculations using concentrations, 271

Files for practicing, 277

Study exercises and questions, 277

References, 277

Packages, 278

13 Flow cytometry: counting and sorting stained cells, 279

Theoretical aspects of flow cytometry, 279

Experiment types: diagnosis versus discovery, 280

Measurement arrangements, 281

Fluorescent dyes, 281

Tubes versus plates, 285

Instruments, 285

What about data?, 287

Files, 287

Workflows, 288

Data preprocessing, 289

Handling all samples together, 290

Compensation, 292

Quality assurance, 292

Using workflow objects and transformation, 296

Normalization, 298

Cell population identification, 299

Manual gating, 300

Automatic gating, 304

Relating cell populations to external variables, 305

Reporting results, 307

MIFlowCyt, 307

FlowRepository.org, 308

Files for practicing, 308

Study exercises and questions, 309

References, 309

Webliography, 310

Packages, 310

Glossary, 311

Index, 323

Note: Product cover images may vary from those shown
3 of 3


4 of 3
Csaba Ortutay
Zsuzsanna Ortutay
Note: Product cover images may vary from those shown