000 03782dam a2200325Ii 4500
001 0000372243
003 0001
008 221213s2020 sz# ob 001 0 eng d
020 _a9783030575915 (paperback)
_q(electronic bk.)
020 _a3030575926
_q(electronic bk.)
020 _z3030575918
020 _z9783030575915
035 _a(OCoLC)1205607630
_z(OCoLC)1225563876
_z(OCoLC)1227334462
040 _aYDX
_beng
_erda
_cYDX
_dSFB
_dUAB
_dOCLCF
_dEBLCP
_dGW5XE
_dOCLCO
082 0 4 _a005.74
_223
084 _a005.74
_bBAD-S
_223
100 1 _aBadia, Antonio,
_0http://id.loc.gov/authorities/names/nb2009013244
_eauthor
245 1 0 _aSQL for data science
_hBook :
_bdata cleaning, wrangling and analytics with relational databases /
_cAntonio Badia.
300 _axi, 285 pages :
_c23 cm.
365 _a01
_b11,347.18
440 0 _aData-centric systems and applications.
_0http://id.loc.gov/authorities/names/no2003128521
_x2197-9723
490 1 _aData-centric systems and applications,
_x2197-9723
500 _aIncludes references and index
520 _aThis textbook explains SQL within the context of data science and introduces the different parts of SQL as they are needed for the tasks usually carried out during data analysis. Using the framework of the data life cycle, it focuses on the steps that are very often given the short shift in traditional textbooks, like data loading, cleaning and pre-processing. The book is organized as follows. Chapter 1 describes the data life cycle, i.e. the sequence of stages from data acquisition to archiving, that data goes through as it is prepared and then actually analyzed, together with the different activities that take place at each stage. Chapter 2 gets into databases proper, explaining how relational databases organize data. Non-traditional data, like XML and text, are also covered. Chapter 3 introduces SQL queries, but unlike traditional textbooks, queries and their parts are described around typical data analysis tasks like data exploration, cleaning and transformation. Chapter 4 introduces some basic techniques for data analysis and shows how SQL can be used for some simple analyses without too much complication. Chapter 5 introduces additional SQL constructs that are important in a variety of situations and thus completes the coverage of SQL queries. Lastly, chapter 6 briefly explains how to use SQL from within R and from within Python programs. It focuses on how these languages can interact with a database, and how what has been learned about SQL can be leveraged to make life easier when using R or Python. All chapters contain a lot of examples and exercises on the way, and readers are encouraged to install the two open-source database systems (MySQL and Postgres) that are used throughout the book in order to practice and work on the exercises, because simply reading the book is much less useful than actually using it. This book is for anyone interested in data science and/or databases. It just demands a bit of computer fluency, but no specific background on databases or data analysis. All concepts are introduced intuitively and with a minimum of specialized jargon. After going through this book, readers should be able to profitably learn more about data mining, machine learning, and database management from more advanced textbooks and courses
521 _aAll.
650 0 _aDatabase management.
_0http://id.loc.gov/authorities/subjects/sh85035848
650 0 _aBig data.
_0http://id.loc.gov/authorities/subjects/sh2012003227
650 0 _aSQL (Computer program language)
_0http://id.loc.gov/authorities/subjects/sh86006628
852 _p10001000062606
_911347.18
_vAllied Book Company
_dBooks
999 _c64119
_d64119