About The Book –
Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple data sets.
Pandas for Everyone, 2nd Edition, brings together practical knowledge and insight for solving real problems with Pandas, even if you’re new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world data science problems such as using regularization to prevent data overfitting, or when to use unsupervised machine learning methods to find the underlying structure in a data set.
Features -
1. Extended coverage of plotting and the seaborn data visualization library
2. Expanded examples and resources
3. Updated Python 3.9 code and packages coverage, including stats models and scikit-learn libraries
About the Author
Daniel Y. Chen, PhD, MPH, completed his PhD at Virginia Tech in Genetics, Bioinformatics, and Computational Biology (GBCB). His dissertation was on data science education in the medical and biomedical sciences.
He completed a Master’s of Public Health in Epidemiology at Columbia University Mailman School of Public Health, where he studied how attitudes toward behaviors diffuse and spread in social networks. In a past life, he studied psychology and neuroscience at the Macaulay Honors College at CUNY Hunter College and worked in a bench laboratory doing microscopy work looking at proteins in the brain associated with learning and memory.
Contents –
Part I: Introduction –
1. Pandas Data Frame Basics 2. Pandas Data Structures Basics 3. Plotting Basics 4. Tidy Data 5. Apply Functions
Part II: Data Processing –
6. Data Assembly 7. Data Normalization 8. Groupby Operations: Split-Apply-Combine
Part III: Data Types –
9. Missing Data 10. Data Types 11. Strings and Text Data 12. Dates and Times
Part IV: Data Modeling –
13. Linear Regression (Continuous Outcome Variable) 14. Generalized Linear Models 15. Survival Analysis 16. Model Diagnostics 17. Regularization 18. Clustering
Part V. Conclusion –
19. Life Outside of Pandas 20. It's Dangerous to Go Alone!
Appendix