By Topic
Skip to Results

Search Results

You searched for: big data
9,464 Results returned
Skip to Results
  • Save this Search
  • Download Citations Disabled
  • Save To Project
  • Email
  • Print
  • Export Results
  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Defining architecture components of the Big Data Ecosystem

    Demchenko, Y. ; De Laat, C. ; Membrey, P.
    Collaboration Technologies and Systems (CTS), 2014 International Conference on

    DOI: 10.1109/CTS.2014.6867550
    Publication Year: 2014 , Page(s): 104 - 112

    IEEE Conference Publications

    Big Data are becoming a new technology focus both in science and in industry and motivate technology shift to data centric architecture and operational models. There is a vital need to define the basic information/semantic models, architecture components and operational models that together comprise a so-called Big Data Ecosystem. This paper discusses a nature of Big Data that may originate from different scientific, industry and social activity domains and proposes improved Big Data definition that includes the following parts: Big Data properties (also called Big Data 5V: Volume, Velocity, Variety, Value and Veracity), data models and structures, data analytics, infrastructure and security. The paper discusses paradigm change from traditional host or service based to data centric architecture and operational models in Big Data. The Big Data Architecture Framework (BDAF) is proposed to address all aspects of the Big Data Ecosystem and includes the following components: Big Data Infrastructure, Big Data Analytics, Data structures and models, Big Data Lifecycle Management, Big Data Security. The paper analyses requirements to and provides suggestions how the mentioned above components can address the main Big Data challenges. The presented work intends to provide a consolidated view of the Big Data phenomena and related challenges to modern technologies, and initiate wide discussion. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    BigDataBench: A big data benchmark suite from internet services

    Lei Wang ; Jianfeng Zhan ; Chunjie Luo ; Yuqing Zhu ; Qiang Yang ; Yongqiang He ; Wanling Gao ; Zhen Jia ; Yingjie Shi ; Shujie Zhang ; Chen Zheng ; Gang Lu ; Zhan, K. ; Xiaona Li ; Bizhu Qiu
    High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on

    DOI: 10.1109/HPCA.2014.6835958
    Publication Year: 2014 , Page(s): 488 - 499
    Cited by:  Papers (4)

    IEEE Conference Publications

    As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diversity, frequently changed workloads, and rapid evolution of big data systems raise great challenges in big data benchmarking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite-BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. Currently, we choose 19 big data benchmarks from dimensions of application scenarios, operations/ algorithms, data types, data sources, software stacks, and application types, and they are comprehensive for fairly measuring and evaluating big data systems and architecture. BigDataBench is publicly available from the project home page http://prof.ict.ac.cn/BigDataBench. Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data inputs. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PARSEC, HPCC, and SPECCPU, big data applications have very low operation intensity, which measures the ratio of the total number of instructions divided by the total byte number of memory accesses; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based- big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache (L1I) misses per 1000 instructions (in short, MPKI) of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Prominence of MapReduce in Big Data Processing

    Pandey, S. ; Tokekar, V.
    Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on

    DOI: 10.1109/CSNT.2014.117
    Publication Year: 2014 , Page(s): 555 - 560

    IEEE Conference Publications

    Big Data has come up with aureate haste and a clef enabler for the social business, Big Data gifts an opportunity to create extraordinary business advantage and better service delivery. Big Data is bringing a positive change in the decision making process of various business organizations. With the several offerings Big Data has come up with several issues and challenges which are related to the Big Data Management, Big Data processing and Big Data analysis. Big Data is having challenges related to volume, velocity and variety. Big Data has 3Vs Volume means large amount of data, Velocity means data arrives at high speed, Variety means data comes from heterogeneous resources. In Big Data definition, Big means a dataset which makes data concept to grow so much that it becomes difficult to manage it by using existing data management concepts and tools. Map Reduce is playing a very significant role in processing of Big Data. This paper includes a brief about Big Data and its related issues, emphasizes on role of MapReduce in Big Data processing. MapReduce is elastic scalable, efficient and fault tolerant for analysing a large set of data, highlights the features of MapReduce in comparison of other design model which makes it popular tool for processing large scale data. Analysis of performance factors of MapReduce shows that elimination of their inverse effect by optimization improves the performance of Map Reduce. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Service-Generated Big Data and Big Data-as-a-Service: An Overview

    Zibin Zheng ; Jieming Zhu ; Lyu, M.R.
    Big Data (BigData Congress), 2013 IEEE International Congress on

    DOI: 10.1109/BigData.Congress.2013.60
    Publication Year: 2013 , Page(s): 403 - 410
    Cited by:  Papers (3)

    IEEE Conference Publications

    With the prevalence of service computing and cloud computing, more and more services are emerging on the Internet, generating huge volume of data, such as trace logs, QoS information, service relationship, etc. The overwhelming service-generated data become too large and complex to be effectively processed by traditional approaches. How to store, manage, and create values from the service-oriented big data become an important research problem. On the other hand, with the increasingly large amount of data, a single infrastructure which provides common functionality for managing and analyzing different types of service-generated big data is urgently required. To address this challenge, this paper provides an overview of service-generated big data and Big Data-as-a-Service. First, three types of service-generated big data are exploited to enhance system performance. Then, Big Data-as-a-Service, including Big Data Infrastructure-as-a-Service, Big Data Platform-as-a-Service, and Big Data Analytics Software-as-a-Service, is employed to provide common big data related services (e.g., accessing service-generated big data and data analytics results) to users to enhance efficiency and reduce cost. View full abstract»

  • Open Access

    Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

    Han Hu ; Yonggang Wen ; Tat-Seng Chua ; Xuelong Li
    Access, IEEE

    Volume: 2
    DOI: 10.1109/ACCESS.2014.2332453
    Publication Year: 2014 , Page(s): 652 - 687
    Cited by:  Papers (3)

    IEEE Journals & Magazines

    Recent technological advancements have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The term big data was coined to capture the meaning of this emerging trend. In addition to its sheer volume, big data also exhibits other unique characteristics as compared with traditional data. For instance, big data is commonly unstructured and require more real-time analysis. This development calls for new system architectures for data acquisition, transmission, storage, and large-scale data processing mechanisms. In this paper, we present a literature survey and system tutorial for big data analytics platforms, aiming to provide an overall picture for nonexpert readers and instill a do-it-yourself spirit for advanced audiences to customize their own big-data solutions. First, we present the definition of big data and discuss big data challenges. Next, we present a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics. These four modules form a big data value chain. Following that, we present a detailed survey of numerous approaches and mechanisms from research and industry communities. In addition, we present the prevalent Hadoop framework for addressing big data challenges. Finally, we outline several evaluation benchmarks and potential research directions for big data systems. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Technical aspects and case study of big data based condition monitoring of power apparatuses

    Jinxin Huang ; Lin Niu ; Jie Zhan ; Xiaosheng Peng ; Junyang Bai ; Shijie Cheng
    Power and Energy Engineering Conference (APPEEC), 2014 IEEE PES Asia-Pacific

    DOI: 10.1109/APPEEC.2014.7066164
    Publication Year: 2014 , Page(s): 1 - 4

    IEEE Conference Publications

    This paper presents the key technologies of big data based condition monitoring of power apparatuses. Firstly, the characteristics of big data and big data of power system are discussed and the application prospects of big data based condition monitoring of power apparatuses is presented and the key technologies of the system is discussed, in terms of big data analyzing technologies, big data management technologies, big data processing technologies and big data visualization technologies. Thirdly, big data based condition assessment techniques of power apparatuses are discussed, including data fusion of signal collected from different sensors, historical trending analysis and association analysis of combined equipments. Finally, to further introduce big data techniques, an integrated condition monitoring system of transformer, GIS and power cable is presented, including system hardware structures and big data based condition assessments. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    The opportunity and challenge of Big Data's application in distribution grids

    Xin Miao ; Dongxia Zhang
    Electricity Distribution (CICED), 2014 China International Conference on

    DOI: 10.1109/CICED.2014.6991847
    Publication Year: 2014 , Page(s): 962 - 964

    IEEE Conference Publications

    In order to meet the challenge of Big Data, enhance the intelligent level of distribution grid, the better for the power user service. Starting from the characteristics of 4 V of Big Data, the 6 links to the power supply system of industrial chain (i.e., planning, design, construction, operation, management & regulation of distribution grids, and equipment design and manufacturing) is relatively mature degree angle; the needs of Big Data's application in distribution network have been analysed. B y using the method o f SWOT, analysed the double-edged sword effect Big Data for the distribution gird, provides both opportunities and challenges. The benefits and opportunities is that, Big Data bringing data view, changing thinking methods and tools, expanding the application scene, providing better service to the society, enhancing the value of the opportunity. At the same time, Big Data will lead to the challenges in distribution grid, for example, because of security challenges of Big Data itself, Big Data more concentrated, cause safety challenges in distribution grid is more serious; the energy consumption challenges of Big Data; Big Data privacy threat distribution grid and user. The demand for Big Data's application in distribution grids in industrial chain is that, from strong to weak, management & regulation of distribution grids, operation, equipment design and manufacturing, construction, design, planning. Big Data's source of power supply enterprise's internal operation, including 3 parts, physical grid operation, marketing services and grid enterprise operation. Power system technology innovation by three wheel drive (experimental science, theoretical science, computational science) increased to four wheel drive (experimental science, theoretical science, computational science, data intensive science/data exploration science) paradigm. Big data is still “explosion”, control the fourth paradigm - data intensive science also need- to redouble our efforts. Non structure data in distribution grid will be rapid growth, 50% more than the amount of data in the next five years. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Reducing the Search Space for Big Data Mining for Interesting Patterns from Uncertain Data

    Leung, C.K.-S. ; MacKinnon, R.K. ; Fan Jiang
    Big Data (BigData Congress), 2014 IEEE International Congress on

    DOI: 10.1109/BigData.Congress.2014.53
    Publication Year: 2014 , Page(s): 315 - 322
    Cited by:  Papers (1)

    IEEE Conference Publications

    Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search space for mining from uncertain data is much larger due to the presence of the existential probabilities. This problem is worsened as we are moving to the era of Big data. Furthermore, in many real-life applications, users may be interested in a tiny portion of this large search space for Big data mining. Without providing opportunities for users to express the interesting patterns to be mined, many existing data mining algorithms return numerous patterns -- out of which only some are interesting. In this paper, we propose an algorithm that (i) allows users to express their interest in terms of constraints and (ii) uses the MapReduce model to mine uncertain Big data for frequent patterns that satisfy the user-specified constraints. By exploiting properties of the constraints, our algorithm greatly reduces the search space for Big data mining of uncertain data, and returns only those patterns that are interesting to the users for Big data analytics. View full abstract»

  • Freely Available from IEEE

    Smart data — How you and I will exploit Big Data for personalized digital health and many other activities

    Sheth, A.
    Big Data (Big Data), 2014 IEEE International Conference on

    DOI: 10.1109/BigData.2014.7004204
    Publication Year: 2014 , Page(s): 2 - 3

    IEEE Conference Publications

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Study of the key technologies of electric power big data and its application prospects in smart grid

    Jie Zhan ; Jinxin Huang ; Lin Niu ; Xiaosheng Peng ; Diyuan Deng ; Shijie Cheng
    Power and Energy Engineering Conference (APPEEC), 2014 IEEE PES Asia-Pacific

    DOI: 10.1109/APPEEC.2014.7066162
    Publication Year: 2014 , Page(s): 1 - 4

    IEEE Conference Publications

    Application of big data techniques in power system will contribute to the sustainable development of power industry companies and the establishment of strong smart grid. This article introduces a universal framework of electric power big data platform, based on the analysis of the relationships among the big data, cloud computing and smart grid. Then key techniques of electric power big data is discussed in four aspects, including big data management techniques, big data analysing techniques, big data processing techniques and big data visualization techniques. Finally, the article presents three typical application examples of electric power big data techniques which are new and renewable energy integration, wind turbine condition monitoring and assessment and data base integrative backup for electric power enterprises. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Big Data Density Analytics Using Parallel Coordinate Visualization

    Jinson Zhang ; Mao Lin Huang ; Wen Bo Wang ; Liang Fu Lu ; Zhao-Peng Meng
    Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on

    DOI: 10.1109/CSE.2014.219
    Publication Year: 2014 , Page(s): 1115 - 1120

    IEEE Conference Publications

    Parallel coordinate is a popular tool for visualizing high-dimensional data and analyzing multivariate data. With the rapid growth of data size and complexity, data clutter in parallel coordinates is a major issue for Big Data visualization. This has given rise to three problems, (1) how to rearrange the parallel axes without the loss of data patterns, (2) how to shrink data attributes on each axis without the loss of data trends, (3) how to visualize the structured and unstructured data patterns for Big Data analysis. In this paper, we introduce the 5Ws dimensions as the parallel axes and establish the 5Ws sending density and receiving density as additional axes for Big Data visualization. Our model not only demonstrates Big Data attributes and patterns, but also reduces data over-lapping by up to 80 percent without the loss of data patterns. Experiments show that this new model can be efficiently used for Big Data analysis and visualization. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Next Big Thing in Big Data: The Security of the ICT Supply Chain

    Tianbo Lu ; Xiaobo Guo ; Bing Xu ; Lingling Zhao ; Yong Peng ; Hongyu Yang
    Social Computing (SocialCom), 2013 International Conference on

    DOI: 10.1109/SocialCom.2013.172
    Publication Year: 2013 , Page(s): 1066 - 1073

    IEEE Conference Publications

    In contemporary society, with supply chains becoming more and more complex, the data in supply chains increases by means of volume, variety and velocity. Big data rise in response to the proper time and conditions to offer advantages for the nodes in supply chains to solve prewiously difficult problems. For any big data project to succeed, it must first depend on high-quality data but not merely on quantity. Further, it will become increasingly important in many big data projects to add external data to the mix and companies will eventually turn from only looking inward to also looking outward into the market, which means the use of big data must be broadened considerably. Hence the data supply chains, both internally and externally, become of prime importance. ICT (Information and Telecommunication) supply chain management is especially important as supply chain link the world closely and ICT supply chain is the base of all supply chains in today's world. Though many initiatives to supply chain security have been developed and taken into practice, most of them are emphasized in physical supply chain which is addressed in transporting cargos. The research on ICT supply chain security is still in preliminary stage. The use of big data can promote the normal operation of ICT supply chain as it greatly improve the data collecting and processing capacity and in turn, ICT supply chain is a necessary carrier of big data as it produces all the software, hardware and infrastructures for big data's collection, storage and application. The close relationship between big data and ICT supply chain make it an effective way to do research on big data security through analysis on ICT supply chain security. This paper first analyzes the security problems that the ICT supply chain is facing in information management, system integrity and cyberspace, and then introduces several famous international models both on physical supply chain and ICT supply chain. After that the authors d- scribe a case of communication equipment with big data in ICT supply chain and propose a series of recommendations conducive to developing secure big data supply chain from five dimensions. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Attribute Relationship Evaluation Methodology for Big Data Security

    Sung-Hwan Kim ; Nam-Uk Kim ; Tai-Myoung Chung
    IT Convergence and Security (ICITCS), 2013 International Conference on

    DOI: 10.1109/ICITCS.2013.6717808
    Publication Year: 2013 , Page(s): 1 - 4

    IEEE Conference Publications

    There has been an increasing interest in big data and big data security with the development of network technology and cloud computing. However, big data is not an entirely new technology but an extension of data mining. In this paper, we describe the background of big data, data mining and big data features, and propose attribute selection methodology for protecting the value of big data. Extracting valuable information is the main goal of analyzing big data which need to be protected. Therefore, relevance between attributes of a dataset is a very important element for big data analysis. We focus on two things. Firstly, attribute relevance in big data is a key element for extracting information. In this perspective, we studied on how to secure a big data through protecting valuable information inside. Secondly, it is impossible to protect all big data and its attributes. We consider big data as a single object which has its own attributes. We assume that a attribute which have a higher relevance is more important than other attributes. View full abstract»

  • Freely Available from IEEE

    Big data analytics for drug discovery

    Chan, K.C.C.
    Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on

    DOI: 10.1109/BIBM.2013.6732448
    Publication Year: 2013 , Page(s): 1

    IEEE Conference Publications

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    A Holistic Framework for Big Scientific Data Management

    Kantere, V.
    Big Data (BigData Congress), 2014 IEEE International Congress on

    DOI: 10.1109/BigData.Congress.2014.39
    Publication Year: 2014 , Page(s): 220 - 226

    IEEE Conference Publications

    Most of the domains of science are facing an explosion in the data that they have to collect and process in order to conduct research. This is true for both: scientific domains dealing with experimental data, e.g. biology, sociology, astronomy etc, but also scientific domains dealing with simulation data, e.g. seismology, physics etc. To maximize the potential outcome of scientific data analysis, respective data management applications need to fulfil the following coarse tasks: fast on-demand data processing, and effective storage and consolidation of diverse data collections. These two tasks are in general hard to realize because of: (a) the big data size, (b) the diversity of data formats, (c) their conceptual dependencies, (d) disperse data locations, and (e) intensive and systematic nature of scientific queries. We present the characteristics of big scientific data collections and their necessities in terms of data management. Based on this discussion, we discuss the structure of a framework for the processing and consolidation of heterogeneous scientific data collections. Such a framework aims to mediate between the user and a set of available data management technologies, such as relational DBMSs, key-value stores and column stores, in order to efficiently direct data management operations (insertions, updates) and especially requests (queries) to the appropriate data management application. The framework aims to distribute, dissect, and schedule data management actions, as well as integrate results, in a way that reduces response time. This entails the accommodation of methods for the selective parallelism and serialization depending on partial results and response times. Also, this entails the accommodation of methods for the gradual alteration of data formats and storage, e.g. storage of semi-structured data or raw data in files into relational databases. Furthermore, we discuss the processing of scientific query bulks or workflows with the possibility to - etrieve early partial results and calibrate query parameters. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Inconsistencies in big data

    Du Zhang
    Cognitive Informatics & Cognitive Computing (ICCI*CC), 2013 12th IEEE International Conference on

    DOI: 10.1109/ICCI-CC.2013.6622226
    Publication Year: 2013 , Page(s): 61 - 67
    Cited by:  Papers (3)

    IEEE Conference Publications

    We are faced with a torrent of data generated and captured in digital form as a result of the advancement of sciences, engineering and technologies, and various social, economical and human activities. This big data phenomenon ushers in a new era where human endeavors and scientific pursuits will be aided by not only human capital, and physical and financial assets, but also data assets. Research issues in big data and big data analysis are embedded in multi-dimensional scientific and technological spaces. In this paper, we first take a close look at the dimensions in big data and big data analysis, and then focus our attention on the issue of inconsistencies in big data and the impact of inconsistencies in big data analysis. We offer classifications of four types of inconsistencies in big data and point out the utility of inconsistency-induced learning as a tool for big data analysis. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Empowering personalized medicine with big data and semantic web technology: Promises, challenges, and use cases

    Panahiazar, M. ; Taslimitehrani, V. ; Jadhav, A. ; Pathak, J.
    Big Data (Big Data), 2014 IEEE International Conference on

    DOI: 10.1109/BigData.2014.7004307
    Publication Year: 2014 , Page(s): 790 - 795

    IEEE Conference Publications

    In healthcare, big data tools and technologies have the potential to create significant value by improving outcomes while lowering costs for each individual patient. Diagnostic images, genetic test results and biometric information are increasingly generated and stored in electronic health records presenting us with challenges in data that is by nature high volume, variety and velocity, thereby necessitating novel ways to store, manage and process big data. This presents an urgent need to develop new, scalable and expandable big data infrastructure and analytical methods that can enable healthcare providers access knowledge for the individual patient, yielding better decisions and outcomes. In this paper, we briefly discuss the nature of big data and the role of semantic web and data analysis for generating “smart data” which offer actionable information that supports better decision for personalized medicine. In our view, the biggest challenge is to create a system that makes big data robust and smart for healthcare providers and patients that can lead to more effective clinical decision-making, improved health outcomes, and ultimately, managing the healthcare costs. We highlight some of the challenges in using big data and propose the need for a semantic data-driven environment to address them. We illustrate our vision with practical use cases, and discuss a path for empowering personalized medicine using big data and semantic web technology. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Big data dimensional analysis

    Gadepally, V. ; Kepner, J.
    High Performance Extreme Computing Conference (HPEC), 2014 IEEE

    DOI: 10.1109/HPEC.2014.7040944
    Publication Year: 2014 , Page(s): 1 - 6

    IEEE Conference Publications

    The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data variety is automatically understanding the underlying structures and patterns of the data. Such an understanding is required as a pre-requisite to the application of advanced analytics to the data. Further, big data sets often contain anomalies and errors that are difficult to know a priori. Current approaches to understanding data structure are drawn from the traditional database ontology design. These approaches are effective, but often require too much human involvement to be effective for the volume, velocity and variety of data encountered by big data systems. Dimensional Data Analysis (DDA) is a proposed technique that allows big data analysts to quickly understand the overall structure of a big dataset, determine anomalies. DDA exploits structures that exist in a wide class of data to quickly determine the nature of the data and its statical anomalies. DDA leverages existing schemas that are employed in big data databases today. This paper presents DDA, applies it to a number of data sets, and measures its performance. The overhead of DDA is low and can be applied to existing big data systems without greatly impacting their computing requirements. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Integrating legacy system into big data solutions: Time to make the change

    Jha, S. ; Jha, M. ; O'Brien, L. ; Wells, M.
    Computer Science and Engineering (APWC on CSE), 2014 Asia-Pacific World Congress on

    DOI: 10.1109/APWCCSE.2014.7053872
    Publication Year: 2014 , Page(s): 1 - 10

    IEEE Conference Publications

    Storing, analyzing and accessing data is a growing problem for organizations. Competitive pressures and new regulations are requiring organizations to efficiently handle increasing volumes and varieties of data, but this doesn't come cheap. And as the demands of Big Data exceed the constraints of traditional relational databases, evaluating legacy infrastructure and assessing new technology has become a necessity for most organizations, not only to gain competitive advantage, but also for compliance purposes. The challenge is how well the organization's legacy infrastructure integrates Big Data. It is without a doubt that one way or another Big Data must be accommodated by legacy systems. Legacy systems contain significant and invaluable business logic of the organization. Organizations cannot afford to throw away or replace this business logic. These legacy systems are assets of the organization. These invaluable assets of encoded `business logic' represent many years of coding, development, real-life experiences, enhancements, modifications, debugging etc. Most of the legacy systems were developed without the process models or data models - now needed to support and integrate Big Data. To integrate Big Data into legacy system, modernization of legacy system is required. There are many approaches for modernization of legacy systems but none of them are focused on integrating Big Data into legacy systems. Legacy systems hold valuable data too important to be lost in the process of modernization. However, addressing the issues and scope related to incorporating Big Data with legacy systems allows mature legacy systems to become part of groundswell changes. There are many areas unaddressed about integration of Big Data into legacy systems. Incorporating data from new sources, specifically "live" sources, into existing legacy systems is a technical challenge. Moreover, the sheer volume of Big Data can be daunting. Our paper presents the scope of integrating Big Dat- into modernization of legacy systems. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Instructional Model for Building Effective Big Data Curricula for Online and Campus Education

    Demchenko, Y. ; Gruengard, E. ; Klous, S.
    Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on

    DOI: 10.1109/CloudCom.2014.162
    Publication Year: 2014 , Page(s): 935 - 941

    IEEE Conference Publications

    This paper presents current results and ongoing work to develop effective educational courses on the Big Data (BD) and Data Intensive Science and Technologies (DIST) that is been done at the University of Amsterdam in cooperation with KPMG and by the Laureate Online Education (online partner of the University of Liverpool). The paper introduces the main Big Data concepts: multicomponent Big Data definition and Big Data Architecture Framework that provide the basis for defining the course structure and Common Body of Knowledge for Data Science and Big Data technology domains. The paper presents details on approach, learning model, and course content for two courses at the Laureate Online Education/University of Liverpool and at the University of Amsterdam. The paper also provides background information about existing initiatives and activities related to information exchange and coordination on developing educational materials and programs on Big Data, Data Science, and Research Data Management. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Sharing best practices for the implementation of Big Data applications in government and science communities

    Aron, J.L. ; Niemann, B.
    Big Data (Big Data), 2014 IEEE International Conference on

    DOI: 10.1109/BigData.2014.7004469
    Publication Year: 2014 , Page(s): 8 - 10

    IEEE Conference Publications

    The Federal Big Data Working Group supports the Federal Big Data Initiative but is not endorsed by the Federal Government or its agencies. This working group uses meetups with onsite and virtual participation to share best practices for the implementation of Big Data applications in government and science communities. Decision-makers and the scientific community interact with data science in order to take advantage of the Big Data transformation of how information is used in science, decision support, data discovery and data publishing. The working group federates use cases, data publications, solutions and technologies. The range of topics is illustrated in a keynote and panel discussion at a recent Big Data conference and in a summary of recent working group meetups. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    From Big Data to Big Projects: A Step-by-Step Roadmap

    Mousannif, H. ; Sabah, H. ; Douiji, Y. ; Sayad, Y.O.
    Future Internet of Things and Cloud (FiCloud), 2014 International Conference on

    DOI: 10.1109/FiCloud.2014.66
    Publication Year: 2014 , Page(s): 373 - 378

    IEEE Conference Publications

    While technologies to build and run big data projects have started to mature and proliferate over the last couple of years, exploiting all potentials of big data is still at a relatively early stage. In fact, building effective big data projects inside organizations is hindered by the lack of a clear data-driven and analytical roadmap to move businesses and organizations from an opinion-operated era where humans skills are a necessity to a data-driven and smart era where big data analytics plays a major role in discovering unexpected insights in the oceans of data routinely generated or collected. This paper provides a solid and well-founded methodology for organizations to build big data projects and reap the most rewards out of their data. It covers all aspects of big data project implementation, from data collection to final project evaluation. In each stage of the process, we introduce different sets of platforms and tools in order to assist IT professionals and managers in gaining a comprehensive understanding of the methods and technologies involved and in making the best use of them. We also complete the picture by illustrating the process through different real-world big data projects implementations. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Challenges for MapReduce in Big Data

    Grolinger, K. ; Hayes, M. ; Higashino, W.A. ; L'Heureux, A. ; Allison, D.S. ; Capretz, M.A.M.
    Services (SERVICES), 2014 IEEE World Congress on

    DOI: 10.1109/SERVICES.2014.41
    Publication Year: 2014 , Page(s): 182 - 189
    Cited by:  Papers (1)

    IEEE Conference Publications

    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Big data: Issues, challenges, tools and Good practices

    Katal, A. ; Wazid, M. ; Goudar, R.H.
    Contemporary Computing (IC3), 2013 Sixth International Conference on

    DOI: 10.1109/IC3.2013.6612229
    Publication Year: 2013 , Page(s): 404 - 409
    Cited by:  Papers (7)

    IEEE Conference Publications

    Big data is defined as large amount of data which requires new technologies and architectures so that it becomes possible to extract value from it by capturing and analysis process. Due to such large size of data it becomes very difficult to perform effective analysis using the existing traditional techniques. Big data due to its various properties like volume, velocity, variety, variability, value and complexity put forward many challenges. Since Big data is a recent upcoming technology in the market which can bring huge benefits to the business organizations, it becomes necessary that various challenges and issues associated in bringing and adapting to this technology are brought into light. This paper introduces the Big data technology along with its importance in the modern world and existing projects which are effective and important in changing the concept of science into big science and society too. The various challenges and issues in adapting and accepting Big data technology, its tools (Hadoop) are also discussed in detail along with the problems Hadoop is facing. The paper concludes with the Good Big data practices to be followed. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    A layer based architecture for provenance in big data

    Agrawal, R. ; Imran, A. ; Seay, C. ; Walker, J.
    Big Data (Big Data), 2014 IEEE International Conference on

    DOI: 10.1109/BigData.2014.7004468
    Publication Year: 2014 , Page(s): 1 - 7

    IEEE Conference Publications

    Big data is a new technology wave that makes the world awash in data. Various organizations accumulate data that are difficult to exploit. Government databases, social media, healthcare databases etc. are the examples of the big data. Big data covers absorbing and analyzing huge amount of data that may have originated or processed outside of the organization. Data provenance can be defined as origin and process of data. It carries significant information of a system. It can be useful for debugging, auditing, measuring performance and trust in data. Data provenance in big data is relatively unexplored topic. It is necessary to appropriately track the creation and collection process of the data to provide context and reproducibility. In this paper, we propose an intuitive layer based architecture of data provenance and visualization. In addition, we show a complete workflow of tracking provenance information of big data. View full abstract»

Skip to Results

SEARCH HISTORY

Search History is available using your personal IEEE account.