Principles of Big Data helps readers avoid the common mistakes that endanger all Big Data projects. By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book. The book demonstrates how adept analysts can find relationships among data objects held in disparate Big Data resources, when the data objects are endowed with semantic support (i.e., organized in classes of uniquely identified data objects). Readers will learn how their data can be integrated with data from other resources, and how the data extracted from Big Data resources can be used for purposes beyond those imagined by the data creators.
- Learn general methods for specifying Big Data in a way that is understandable to humans and to computers
- Avoid the pitfalls in Big Data design and analysis
- Understand how to create and use Big Data safely and responsibly with a set of laws, regulations and ethical standards that apply to the acquisition, distribution and integration of Big Data resources
1. Big Data Moves to the Center of the Universe
4. Identification, De-identification, and Re-identification
5. Ontologies and Semantics: How information is endowed with meaning
6. Standards and their Versions
7. Legacy Data
8. Hypothesis Testing
14. Social and Ethical Issues
Jules Berman holds two bachelor of science degrees from MIT (Mathematics, and Earth and Planetary Sciences), a PhD from Temple University, and an MD, from the University of Miami. He was a graduate researcher in the Fels Cancer Research Institute, at Temple University, and at the American Health Foundation in Valhalla, New York. His post-doctoral studies were completed at the U.S. National Institutes of Health, and his residency was completed at the George Washington University Medical Center in Washington, D.C. Dr. Berman served as Chief of Anatomic Pathology, Surgical Pathology and Cytopathology at the Veterans Administration Medical Center in Baltimore, Maryland, where he held joint appointments at the University of Maryland Medical Center and at the Johns Hopkins Medical Institutions. In 1998, he transferred to the U.S. National Institutes of Health, as a Medical Officer, and as the Program Director for Pathology Informatics in the Cancer Diagnosis Program at the National Cancer Institute. Dr. Berman is a past President of the Association for Pathology Informatics, and the 2011 recipient of the association's Lifetime Achievement Award. He is a listed author on over 200 scientific publications and has written more than a dozen books in his three areas of expertise: informatics, computer programming, and cancer biology. Dr. Berman is currently a free-lance writer.