Big data : concepts, technology and architecture / Balamurugan Balusamy, Nandhini Abirami. R, Seifedine Kadry, and Amir H. Gandomi.

Saved in:
书目详细资料
Main Authors: Balusamy, Balamurugan (Author), R, Nandhini Abirami (Author), Kadry, Seifedine, 1977- (Author), Gandomi, Amir Hossein (Author)
格式: 电子书
语言:English
出版: Hoboken, NJ : John Wiley and Sons, Inc., 2021.
版:First edition.
主题:
在线阅读:Click for online access
书本目录:
  • <P>Big Data
  • concepts, Technology and Architecture. 1</p> <p>Book Description.. 11</p> <p>1.1 Understanding Big Data. 13</p> <p>1.2 Evolution of Big Data. 14</p> <p>1.3 Failure of Traditional database in handling Big Data. 15</p> <p>1.3 (a) Data Mining Vs Big Data. 16</p> <p>1.4 3 V's of Big Data. 17</p> <p>1.4.1 Volume. 17</p> <p>1.4.2 Velocity. 18</p> <p>1.4.3 Variety. 19</p> <p>1.5 Sources of Big Data. 19</p> <p>1.6 Different Types of Data. 21</p> <p>1.6.1 Structured Data. 22</p> <p>1.6.2 Unstructured Data. 22</p> <p>1.6.3 Semi-Structured Data. 23</p> <p>1.7 Big Data Infrastructure. 24</p> <p>1.8 Big Data Life Cycle. 25</p> <p>1.8.1 Big Data Generation. 26</p> <p>1.8.2 Data Aggregation. 26</p> <p>1.8.3 Data Preprocessing. 27</p> <p>1.7.<i>3</i>Big Data Analytics. 31</p> <p>1.7.4 Visualizing Big Data. 32</p> <p>1.8 Big Data Technology. 32</p> <p>1.8.1 Challenges faced by Big Data technology. 34</p> <p>1.8.1 Heterogeneity and incompleteness. 34</p> <p>1.8.2 Volume and velocity of the Data. 35</p> <p>1.8.3 Data Storage. 35</p> <p>1.8.4 Data Privacy. 36</p> <p>1.9 Big Data Applications. 36</p> <p>1.10 Big Data Use Cases. 37</p> <p>1.9. 1 Healthcare. 37</p> <p>1.9.2 Telecom.. 38</p> <p>1.9.3 Financial Services. 39</p> <p>Chapter 1 refresher: 40</p> <p>Conceptual short Questions with answers. 43</p> <p>Frequently asked Interview questions. 45</p> <p>Chapter Objective. 46</p> <p>Big Data Storage Concepts. 46</p> <p>2.1 Cluster computing. 47</p> <p>2.1.1 Types of cluster. 49</p> <p>2.1.1.1 High availability cluster. 50</p> <p>2.1.1.2 Load balancing cluster. 50</p> <p>2.1.2 Cluster structure. 51</p> <p>2.3 Distribution Models. 53</p> <p>2.3.1 Sharding. 54</p> <p>2.3.2 Data Replication. 56</p> <p>2.3.2.1 Master-Slave model 57</p> <p>2.3.2.2 Peer-to-Peer model 58</p> <p>2.3.3 Sharding and Replication. 59</p> <p>2.4 Distributed file system.. 60</p> <p>2.5 Relational and Non Relational Databases. 61</p> <p>CoursesOffered. 62</p> <p>Figure 2.12 Data divided across multiple related tables. 62</p> <p>2.4.2 RDBMS Databases. 63</p> <p>2.4.3 NoSQL Databases. 63</p> <p>2.4.4 NewSQL Databases. 64</p> <p>2.5 Scaling Up and Scaling Out Storage. 65</p> <p>Chapter 2 refresher. 67</p> <p>Conceptual short questions with answers. 69</p> <p>Chapter Objective. 72</p> <p>3.1 Introduction to NoSQL. 72</p> <p>3.2 Why NoSQL. 72</p> <p>3.3 CAP theorem.. 73</p> <p>3.4 ACID.. 75</p> <p>3.5 BASE. 76</p> <p>3.6 Schemaless Database. 77</p> <p>3.7 NoSQL (Not Only SQL) 77</p> <p>3.7.1 NoSQL Vs RDBMS. 78</p> <p>3.7.2Features of NoSQL database. 79</p> <p>3.7.3Types of NoSQL Technologies. 80</p> <p>3.7.3.1 Key-Value store database. 81</p> <p>3.7.3.2 Column-store database. 82</p> <p>3.7.3.3 Document Oriented Database. 84</p> <p>3.7.3.4 Graph-oriented Database. 86</p> <p>3.7.4 NoSQL Operations. 93</p> <p>3.9 Migrating from RDBMS to NoSQL. 98</p> <p>Chapter 3 refresher. 99</p> <p>Conceptual short questions with answers. 102</p> <p>Chapter Objective. 104</p> <p>4.1 Data Processing. 104</p> <p>4.2 Shared Everything Architecture. 106</p> <p>4.2.1 Symmetric multiprocessing architecture. 107</p> <p>4.2.2 Distributed Shared memory. 108</p> <p>4.3 Shared nothing architecture. 109</p> <p>4.4 Batch Processing. 110</p> <p>4.5 Real-Time Data Processing. 111</p> <p>4.6 Parallel Computing. 112</p> <p>4.7 Distributed Computing. 113</p> <p>4.8 Big Data Virtualization. 113</p> <p>4.8.1 Attributes of Virtualization. 114</p> <p>4.8.1.1 Encapsulation. 115</p> <p>4.8.1.2 Partitioning. 115</p> <p>4.8.1.3 Isolation. 115</p> <p>4.8.2Big Data Server Virtualization. 116</p> <p>4.9 Introduction. 116</p> <p>4.10 Cloud computing types. 118</p> <p>4.11Cloud Services. 120</p> <p>4.12 Cloud Storage. 121</p> <p>4.12.1 Architecture of GFS. 121</p> <p>4.12.1.1 Master. 123</p> <p>4.12.1.2 Client. 123</p> <p>4.13 Cloud Architecture. 127</p> <p>Cloud Challenges. 129</p> <p>Chapter 4 Refresher. 130</p> <p>Conceptual short questions with answers. 133</p> <p>Chapter Objective. 139</p> <p>5.1 Apache Hadoop. 139</p> <p>5.1.1 Architecture of Apache Hadoop. 140</p> <p>5.1.2Hadoop Ecosystem Components Overview.. 140</p> <p>5.2 Hadoop Storage. 142</p> <p>5.2.1HDFS (Hadoop Distributed File System). 142</p> <p>5.2.2Why HDFS?. 143</p> <p>5.2.3HDFS Architecture. 143</p> <p>5.2.4HDFS Read/Write Operation. 146</p> <p>5.2.5Rack Awareness. 148</p> <p>5.2.6Features of HDFS. 149</p> <p>5.2.6.1Cost-effective. 149</p> <p>5.2.6.2Distributed storage. 149</p> <p>5.2.6.3Data Replication. 149</p> <p>5.3 Hadoop Computation. 149</p> <p>5.3.1MapReduce. 149</p> <p>5.3.1.1Mapper. 151</p> <p>5.3.1.2Combiner. 151</p> <p>5.3.1.3 Reducer. 152</p> <p>5.3.1.4 JobTracker and TaskTracker. 153</p> <p>5.3.2 MapReduce Input Formats. 154</p> <p>5.3.3 MapReduce Example. 156</p> <p>5.3.4 MapReduce Processing. 157</p> <p>5.3.5 MapReduce Algorithm.. 160</p> <p>5.3.6 Limitations of MapReduce. 161</p> <p>5.4Hadoop 2.0. 161</p> <p>5.4.1Hadoop 1.0 limitations. 162</p> <p>5.4.2 Features of Hadoop 2.0. 163</p> <p>5.4.3 Yet Another Resource Negotiator (YARN). 164</p> <p>5.4.3 Core components of YARN.. 165</p> <p>5.4.3.1 ResourceManager. 165</p> <p>5.4.3.2 NodeManager. 166</p> <p>5.4.4 YARN Scheduler. 169</p> <p>5.4.4.1 <i>FIFO scheduler</i>. 169</p> <p>5.4.4.2 <i>Capacity Scheduler</i>. 170</p> <p>5.4.4.3 <i>Fair Scheduler</i>. 170</p> <p>5.4.5 Failures in YARN.. 171</p> <p>5.4.5.1ResourceManager failure. 171</p> <p>5.4.5.2 ApplicationMaster failure. 172</p> <p>5.4.5.3 NodeManagerFailure. 172</p> <p>5.4.5.4 Container Failure. 172</p> <p>5.3 HBASE. 173</p> <p>5.4 Apache Cassandra. 176</p> <p>5.5 SQOOP. 177</p> <p>5.6 Flume. 179</p> <p>5.6.1 Flume Architecture. 179</p> <p>5.6.1.1 Event. 180</p> <p>5.6.1.2 Agent. 180</p> <p>5.7 Apache Avro. 181</p> <p>5.8 Apache Pig. 182</p> <p>5.9 Apache Mahout. 183</p> <p>5.10 Apache Oozie. 183</p> <p>5.10.1 Oozie Workflow.. 184</p> <p>5.10.2 Oozie Coordinators. 186</p> <p>5.10.3 Oozie Bundles. 187</p> <p>5.11 Apache Hive. 187</p> <p>5.11 Apache Hive. 187</p> <p>Hive Architecture. 189</p> <p>Hadoop Distributions. 190</p> <p>Chapter 5refresher. 191</p> <p>Conceptual short questions with answers. 194</p> <p>Frequently asked Interview Questions. 199</p> <p>Chapter Objective. 200</p> <p>6.1 Terminologies of Big Data Analytics. 201</p> <p><i>Data Warehouse</i>. 201</p> <p><i>Business Intelligence</i>. 201</p> <p><i>Analytics</i>. 202</p> <p>6.2 Big Data Analytics. 202</p> <p>6.2.1 Descriptive Analytics. 204</p> <p>6.2.2 Diagnostic Analytics. 205</p> <p>6.2.3 Predictive Analytics. 205</p> <p>6.2.4 Prescriptive Analytics. 205</p> <p>6.3 Data Analytics Lifecycle. 207</p> <p>6.3.1 Business case evaluation and Identify the source data. 208</p> <p>6.3.2 Data preparation. 209</p> <p>6.3.3 Data Extraction and Transformation. 210</p> <p>6.3.4 Data Analysis and visualization. 211</p> <p>6.3.5 Analytics application. 212</p> <p>6.4 Big Data Analytics Techniques. 212</p> <p>6.4.1 Quantitative Analysis. 212</p> <p>6.4.3 Statistical analysis. 214</p> <p>6.4.3.1 A/B testing. 214</p> <p>6.4.3.2 Correlation. 215</p> <p>6.4.3.3 Regression. 218</p> <p>6.5 Semantic Analysis. 220</p> <p>6.5.1 Natural Language Processing. 220</p> <p>6.5.2 Text Analytics. 221</p> <p>6.7 Big Data Business Intelligence. 222</p> <p>6.7.1 Online Transaction Processing (OLTP). 223</p> <p>6.7.2 Online Analytical Processing (OLAP). 223</p> <p>6.7.3 Real-Time Analytics Platform (RTAP). 224</p> <p>6.6Big Data Real Time Analytics Processing. 225</p> <p>6.7 Enterprise Data Warehouse. 227</p> <p>Chapter 6 Refresher. 228</p> <p>Concept