Yogesh Simmhan
All Publications
[DBLP] [Scholar] [ACM] [MS Academic] [CiteseerX] [CSB] [MSR]

Global QuickSearch:   Number of matching entries: 0

Search Settings

    Key Author / Editor / Organization Title Year Journal / Conference / Book Pub Type Keywords
    jha:cep:2016 Jha, S.; Luckow, D.S.K.A.; Rana, O. & amd Neil Chue Hong, Y.S.
    Introducing Distributed Dynamic Data-intensive (D3) Science: Understanding Applications and Infrastructure
    2016 Concurrency and Computation: Practice and Experience   article peer reviewed, iisc
    BibTeX:
    @article{jha:cep:2016,
      author = {Shantenu Jha and Daniel S. Katz Andre Luckow and Omer Rana and Yogesh Simmhan amd Neil Chue Hong},
      title = {Introducing Distributed Dynamic Data-intensive (D3) Science: Understanding Applications and Infrastructure},
      journal = {Concurrency and Computation: Practice and Experience},
      year = {2016},
      note = {To Appear},
      url = {https://github.com/radical-project/3DPAS}
    }
    					
    simmhan:cpe:2016 Simmhan, Y.; Ramakrishnan, L.; Antoniu, G. & Goble, C.
    Editorial: Cloud computing for data-driven science and engineering
    2016 Concurrency and Computation: Practice and Experience   article iisc, editorial
    BibTeX:
    @article{simmhan:cpe:2016,
      author = {Yogesh Simmhan and Lavanya Ramakrishnan and Gabriel Antoniu and Carole Goble},
      title = {Editorial: Cloud computing for data-driven science and engineering},
      journal = {Concurrency and Computation: Practice and Experience},
      year = {2016},
      url = {http://onlinelibrary.wiley.com/doi/10.1002/cpe.3668/full},
      doi = {http://doi.org/10.1002/cpe.3668}
    }
    					
    dindokar:ccgrid:2016 Dindokar, R. & Simmhan, Y.
    Elastic Partition Placement for Non-stationary Graph Algorithms
    2016 IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid)   inproceedings goffish, peer reviewed, iisc, graph, cloud
    BibTeX:
    @inproceedings{dindokar:ccgrid:2016,
      author = {Ravikant Dindokar and Yogesh Simmhan},
      title = {Elastic Partition Placement for Non-stationary Graph Algorithms},
      booktitle = {IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid)},
      year = {2016},
      note = {Short Paper [Core A]}
    }
    					
    jamadagni:ccgrid:2016 Jamadagni, N. & Simmhan, Y.
    GoDB: From Batch Processing to Distributed Querying over Property Graphs
    2016 IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid)   inproceedings godb, goffish, peer reviewed, iisc, graph
    BibTeX:
    @inproceedings{jamadagni:ccgrid:2016,
      author = {Nitin Jamadagni and Yogesh Simmhan},
      title = {GoDB: From Batch Processing to Distributed Querying over Property Graphs},
      booktitle = {IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid)},
      year = {2016},
      note = {[Core A]}
    }
    					
    shukla:vldbw:2016 Shukla, A. & Simmhan, Y.
    Benchmarking Distributed Stream Processing Platforms for IoT Applications
    2016 TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC)   inproceedings iot, peer reviewed, iisc, stream, benchmark
    BibTeX:
    @inproceedings{shukla:vldbw:2016,
      author = {Anshu Shukla and Yogesh Simmhan},
      title = {Benchmarking Distributed Stream Processing Platforms for IoT Applications},
      booktitle = {TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC)},
      year = {2016},
      note = {To Appear},
      url = {https://arxiv.org/abs/1606.07621}
    }
    					
    aluru:jpdc:2015 Aluru, S. & Simmhan, Y.
    Editorial: Scalable Systems for Big Data Management and Analytics
    2015 Journal of Parallel and Distributed Systems (JPDC)
    Vol. 79 and 80 , pp. 1-2  
    article editorial, iisc, big data
    BibTeX:
    @article{aluru:jpdc:2015,
      author = {Srinivas Aluru and Yogesh Simmhan},
      title = {Editorial: Scalable Systems for Big Data Management and Analytics},
      journal = {Journal of Parallel and Distributed Systems (JPDC)},
      year = {2015},
      volume = {79 and 80},
      pages = {1-2}
    }
    					
    Aman:tkde:2015 Aman, S.; Simmhan, Y. & Prasanna, V.
    Holistic Measures for Evaluating Prediction Models in Smart Grids
    2015 IEEE Transactions on Knowledge and Data Engineering (TKDE)
    Vol. 27 (2) , pp. 475-488  
    article usc, machine learning, smart grid, peer reviewed, iisc
    BibTeX:
    @article{Aman:tkde:2015,
      author = {Saima Aman and Yogesh Simmhan and Viktor Prasanna},
      title = {Holistic Measures for Evaluating Prediction Models in Smart Grids},
      journal = {IEEE Transactions on Knowledge and Data Engineering (TKDE)},
      year = {2015},
      volume = {27},
      number = {2},
      pages = {475-488},
      note = {[IF 2.476, CORE A]},
      doi = {http://doi.org/10.1109/TKDE.2014.2327022}
    }
    					
    kumbhare:tcc:2015 Kumbhare, A.G.; Simmhan, Y.; Frincu, M. & Prasanna, V.K.
    Reactive Resource Provisioning Heuristics for Dynamic Dataflows on Cloud Infrastructure
    2015 IEEE Transactions on Cloud Computing (TCC)
    Vol. 3 (2) , pp. 105-118  
    article peer reviewed, iisc, stream processing, cloud
    BibTeX:
    @article{kumbhare:tcc:2015,
      author = {Alok Gautam Kumbhare and Yogesh Simmhan and Marc Frincu and Viktor K. Prasanna},
      title = {Reactive Resource Provisioning Heuristics for Dynamic Dataflows on Cloud Infrastructure},
      journal = {IEEE Transactions on Cloud Computing (TCC)},
      year = {2015},
      volume = {3},
      number = {2},
      pages = {105-118},
      doi = {http://doi.org/10.1109/TCC.2015.2394316}
    }
    					
    mishra:iotn:2015 Misra, P.; Simmhan, Y. & Warrior, J.
    Towards a Practical Architecture for Internet of Things: An India-centric View
    2015 IEEE Internet of Things Newsletter , pp. 1-2   article iot, iisc, peer reviewed
    BibTeX:
    @article{mishra:iotn:2015,
      author = {Prasant Misra and Yogesh Simmhan and Jay Warrior},
      title = {Towards a Practical Architecture for Internet of Things: An India-centric View},
      journal = {IEEE Internet of Things Newsletter},
      year = {2015},
      pages = {1-2},
      url = {http://iot.ieee.org/newsletter/january-2015/towards-a-practical-architecture-for-internet-of-things-an-india-centric-view.html}
    }
    					
    aman:sgcomm:2015 Aman, S.; Frincu, M.; Chelmis, C.; Noor, M.; Simmhan, Y. & Prasanna, V.K.
    Prediction Models for Dynamic Demand Response: Requirements, Challenges, and Insights
    2015 IEEE International Conference on Smart Grid Communications (SmartGridComm) , pp. 1-6   inproceedings iisc, peer reviewed, smart grid, iot
    BibTeX:
    @inproceedings{aman:sgcomm:2015,
      author = {Saima Aman and Marc Frincu and Charalampos Chelmis and Muhammad Noor and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Prediction Models for Dynamic Demand Response: Requirements, Challenges, and Insights},
      booktitle = {IEEE International Conference on Smart Grid Communications (SmartGridComm)},
      year = {2015},
      pages = {1-6}
    }
    					
    dindokar:parlearning:2015 Dindokar, R.; Choudhury, N. & Simmhan, Y.
    Analysis of Subgraph-centric Distributed Shortest Path Algorithm
    2015 International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics (ParLearning) , pp. 1185-1190   inproceedings peer reviewed, iisc, graph processing
    BibTeX:
    @inproceedings{dindokar:parlearning:2015,
      author = {Ravikant Dindokar and Neel Choudhury and Yogesh Simmhan},
      title = {Analysis of Subgraph-centric Distributed Shortest Path Algorithm},
      booktitle = {International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics (ParLearning)},
      year = {2015},
      pages = {1185-1190}
    }
    					
    kumbhare:icdcs:2015 Kumbhare, A.; Frincu, M.; Simmhan, Y. & Prasanna, V.K.
    Fault-Tolerant and Elastic Streaming MapReduce with Decentralized Coordination
    2015 IEEE International Conference on Distributed Computing Systems (ICDCS) , pp. 328 – 338   inproceedings iisc, peer reviewed, mapreduce, stream processing
    BibTeX:
    @inproceedings{kumbhare:icdcs:2015,
      author = {Alok Kumbhare and Marc Frincu and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Fault-Tolerant and Elastic Streaming MapReduce with Decentralized Coordination},
      booktitle = {IEEE International Conference on Distributed Computing Systems (ICDCS)},
      year = {2015},
      pages = {328 – 338},
      note = {[Core A]}
    }
    					
    shukla:hipcw:2015 Shukla, A.; Sharma, T. & Simmhan, Y.
    Characterizing Distributed Stream Processing Systems for IoT Applications
    2015 Workshop on Architectural Support and Middleware for InfoSymbiotics/ Dynamic Data Driven Applications Systems (DDDAS), co-located with High Performance Computing Conference (HiPC) , pp. 1-1   inproceedings iisc, iot, stream processing, peer reviewed
    BibTeX:
    @inproceedings{shukla:hipcw:2015,
      author = {Anshu Shukla and Tarun Sharma and Yogesh Simmhan},
      title = {Characterizing Distributed Stream Processing Systems for IoT Applications},
      booktitle = {Workshop on Architectural Support and Middleware for InfoSymbiotics/ Dynamic Data Driven Applications Systems (DDDAS), co-located with High Performance Computing Conference (HiPC)},
      year = {2015},
      pages = {1-1},
      note = {Extended abstract}
    }
    					
    simmhan:ipdps:2015 Simmhan, Y.; Choudhury, N.; Wickramaarachchi, C.; Kumbhare, A.; Frincu, M.; Raghavendra, C. & Prasanna, V.
    Distributed Programming over Time-series Graphs
    2015 IEEE International Parallel & Distributed Processing Symposium (IPDPS) , pp. 809 - 818   inproceedings graph processing, timeseries, goffish, iisc, usc, peer reviewed
    BibTeX:
    @inproceedings{simmhan:ipdps:2015,
      author = {Yogesh Simmhan and Neel Choudhury and Charith Wickramaarachchi and Alok Kumbhare and Marc Frincu and Cauligi Raghavendra and Viktor Prasanna},
      title = {Distributed Programming over Time-series Graphs},
      booktitle = {IEEE International Parallel & Distributed Processing Symposium (IPDPS)},
      year = {2015},
      pages = {809 - 818},
      note = {[Core A]}
    }
    					
    simmhan:wbdb:2015 Simmhan, Y.; Shukla, A. & Verma, A.
    Benchmarking Fast Data Platforms for the Aadhaar Biometric Database
    2015 (arxiv:1510.04160) Workshop on Big Data Benchmarking (WBDB) , pp. 1-9 School: CoRR   inproceedings iisc, stream processing, uidai, benchmark, peer reviewed
    BibTeX:
    @inproceedings{simmhan:wbdb:2015,
      author = {Yogesh Simmhan and Anshu Shukla and Arun Verma},
      title = {Benchmarking Fast Data Platforms for the Aadhaar Biometric Database},
      booktitle = {Workshop on Big Data Benchmarking (WBDB)},
      school = {CoRR},
      year = {2015},
      number = {arxiv:1510.04160},
      pages = {1-9},
      url = {http://arxiv.org/abs/1510.04160}
    }
    					
    choudhury:arxiv:2015 Choudhury, N.; Dindokar, R.; Dixit, A. & Simmhan, Y.
    Partitioning Strategies for Load Balancing Subgraph-centric Distributed Graph Processing
    2015 (arxiv:1508.04265) , pp. 1-12 School: CoRR   techreport iisc, goffish
    BibTeX:
    @techreport{choudhury:arxiv:2015,
      author = {Neel Choudhury and Ravikant Dindokar and Akshay Dixit and Yogesh Simmhan},
      title = {Partitioning Strategies for Load Balancing Subgraph-centric Distributed Graph Processing},
      school = {CoRR},
      year = {2015},
      number = {arxiv:1508.04265},
      pages = {1-12},
      url = {http://arxiv.org/abs/1508.04265}
    }
    					
    dindokar:arxiv:2015 Dindokar, R. & Simmhan, Y.
    Elastic Resource Allocation for Distributed Graph Processing Platforms
    2015 (arXiv:1510.03145) , pp. 1-11 School: CoRR   techreport iisc, goffish
    BibTeX:
    @techreport{dindokar:arxiv:2015,
      author = {Ravikant Dindokar and Yogesh Simmhan},
      title = {Elastic Resource Allocation for Distributed Graph Processing Platforms},
      school = {CoRR},
      year = {2015},
      number = {arXiv:1510.03145},
      pages = {1-11},
      note = {Under review}
    }
    					
    mishra:arxiv:2015 Misra, P.; Rajaraman, V.; Dhotrad, K.; Warrior, J. & Simmhan, Y.
    An Interoperable Realization of Smart Cities with Plug and Play based Device Management
    2015 (arXiv:1503.00923) , pp. 1-5 School: CoRR   techreport iisc, rbccps, iot
    BibTeX:
    @techreport{mishra:arxiv:2015,
      author = {Prasant Misra and Vasanth Rajaraman and Kumaresh Dhotrad and Jay Warrior and Yogesh Simmhan},
      title = {An Interoperable Realization of Smart Cities with Plug and Play based Device Management},
      school = {CoRR},
      year = {2015},
      number = {arXiv:1503.00923},
      pages = {1-5},
      url = {http://arxiv.org/abs/1503.00923}
    }
    					
    mishra:arxiv:2015b Misra, P.; Simmhan, Y. & Warrior, J.
    Towards a Practical Architecture for the Next Generation Internet of Things
    2015 (arXiv:1502.00797) , pp. 1-3 School: CoRR   techreport iisc, rbccps, iot
    BibTeX:
    @techreport{mishra:arxiv:2015b,
      author = {Prasant Misra and Yogesh Simmhan and Jay Warrior},
      title = {Towards a Practical Architecture for the Next Generation Internet of Things},
      school = {CoRR},
      year = {2015},
      number = {arXiv:1502.00797},
      pages = {1-3},
      url = {http://arxiv.org/abs/1502.00797}
    }
    					
    simmhan:arxiv:2015 Simmhan, Y.; Shukla, A. & Verma, A.
    Benchmarking Fast Data Platforms for the Aadhaar Biometric Database
    2015 (arxiv:1510.04160) , pp. 1-9 School: CoRR   techreport iisc, stream processing, uidai
    BibTeX:
    @techreport{simmhan:arxiv:2015,
      author = {Yogesh Simmhan and Anshu Shukla and Arun Verma},
      title = {Benchmarking Fast Data Platforms for the Aadhaar Biometric Database},
      school = {CoRR},
      year = {2015},
      number = {arxiv:1510.04160},
      pages = {1-9},
      url = {http://arxiv.org/abs/1510.04160}
    }
    					
    badam:comad:2014 Badam, N.C. & Simmhan, Y.
    Subgraph Rank: PageRank for SubgraphCentric Distributed Graph Processing
    2014 International Conference on Management of Data (COMAD)   inproceedings iisc, graph, goffish, algorithm, peer reviewed
    BibTeX:
    @inproceedings{badam:comad:2014,
      author = {Nitin Chandra Badam and Yogesh Simmhan},
      title = {Subgraph Rank: PageRank for SubgraphCentric Distributed Graph Processing},
      booktitle = {International Conference on Management of Data (COMAD)},
      year = {2014},
      note = {[Core A]}
    }
    					
    choudhury:hipcsrs:2014 Choudhury, N. & Simmhan, Y.
    Towards Scalable, Sequentially-Dependent Algorithms on Time-Series Graphs
    2014 HiPC Student Research Symposium   inproceedings graph processing, timeseries, goffish, algorithm, iisc, short paper
    BibTeX:
    @inproceedings{choudhury:hipcsrs:2014,
      author = {Neel Choudhury and Yogesh Simmhan},
      title = {Towards Scalable, Sequentially-Dependent Algorithms on Time-Series Graphs},
      booktitle = {HiPC Student Research Symposium},
      year = {2014}
    }
    					
    chu:ipdps:2014 Chu, H.-Y. & Simmhan, Y.
    Cost-efficient and Resilient Job Life-cycle Management on Hybrid Clouds
    2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS)   inproceedings usc, cloud, peer reviewed, iisc
    BibTeX:
    @inproceedings{chu:ipdps:2014,
      author = {Hsuan-Yi Chu and Yogesh Simmhan},
      title = {Cost-efficient and Resilient Job Life-cycle Management on Hybrid Clouds},
      booktitle = {IEEE International Parallel & Distributed Processing Symposium (IPDPS)},
      year = {2014},
      note = {[Core A]}
    }
    					
    govindarajan:comad:2014 Govindarajan, N.; Simmhan, Y.; Jamadagni, N. & Misra, P.
    Event Processing across Edge and the Cloud for Internet of Things Applications
    2014 International Conference on Management of Data (COMAD)   inproceedings iisc, event processing, cep, iot, peer reviewed, poster
    BibTeX:
    @inproceedings{govindarajan:comad:2014,
      author = {Nithyashri Govindarajan and Yogesh Simmhan and Nitin Jamadagni and Prasant Misra},
      title = {Event Processing across Edge and the Cloud for Internet of Things Applications},
      booktitle = {International Conference on Management of Data (COMAD)},
      year = {2014},
      note = {Short Paper [Core B]}
    }
    					
    kumbhare:ccgrid:2014 Kumbhare, A.; Simmhan, Y. & Prasanna, V.K.
    PLAStiCC: Predictive Look-Ahead Scheduling for Continuous dataflows on Clouds
    2014 IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)   inproceedings continuous dataflow, workflow, floe, cloud, iisc, usc, peer reviewed, iisc
    BibTeX:
    @inproceedings{kumbhare:ccgrid:2014,
      author = {Alok Kumbhare and Yogesh Simmhan and Viktor K. Prasanna},
      title = {PLAStiCC: Predictive Look-Ahead Scheduling for Continuous dataflows on Clouds},
      booktitle = {IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)},
      year = {2014},
      note = {[Core A]}
    }
    					
    kushwaha:ccem:2014 Kushwaha, V. & Simmhan, Y.
    An Analysis of Spot-Priced Clouds for Practical Job Scheduling
    2014 Cloud Computing for Emerging Markets (CCEM)   inproceedings iisc, cloud, spot, peer reviewed
    BibTeX:
    @inproceedings{kushwaha:ccem:2014,
      author = {Vedsar Kushwaha and Yogesh Simmhan},
      title = {An Analysis of Spot-Priced Clouds for Practical Job Scheduling},
      booktitle = {Cloud Computing for Emerging Markets (CCEM)},
      year = {2014}
    }
    					
    sharma:hipcsrs:2014 Sharma, T. & Simmhan, Y.
    Online Update Strategies for Distributed Stream Processing in Apache Storm
    2014 HiPC Student Research Symposium   inproceedings stream processing, storm, iisc, short paper
    BibTeX:
    @inproceedings{sharma:hipcsrs:2014,
      author = {Tarun Sharma and Yogesh Simmhan},
      title = {Online Update Strategies for Distributed Stream Processing in Apache Storm},
      booktitle = {HiPC Student Research Symposium},
      year = {2014}
    }
    					
    simmhan:europar:2014 Simmhan, Y.; Kumbhare, A.; Wickramaarachchi, C.; Nagarkar, S.; Ravi, S.; Raghavendra, C. & Prasanna, V.
    GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics
    2014 International European Conference on Parallel Processing (EuroPar)   inproceedings graphs, goffish, cluster, usc, peer reviewed, iisc
    BibTeX:
    @inproceedings{simmhan:europar:2014,
      author = {Yogesh Simmhan and Alok Kumbhare and Charith Wickramaarachchi and Soonil Nagarkar and Santosh Ravi and Cauligi Raghavendra and Viktor Prasanna},
      title = {GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics},
      booktitle = {International European Conference on Parallel Processing (EuroPar)},
      year = {2014},
      note = {[Core A]}
    }
    					
    misra:arxiv:2014 Misra, P.; Simmhan, Y. & Warrior, J.
    Towards a Practical Architecture for India Centric Internet of Things
    2014 (arXiv:1407.0434) School: CoRR   techreport iot, iisc, rbccps
    BibTeX:
    @techreport{misra:arxiv:2014,
      author = {Prasant Misra and Yogesh Simmhan and Jay Warrior},
      title = {Towards a Practical Architecture for India Centric Internet of Things},
      school = {CoRR},
      year = {2014},
      number = {arXiv:1407.0434},
      note = {Appears in IoT Newsletter},
      url = {http://arxiv.org/abs/1407.0434}
    }
    					
    simmhan:arxiv:2014a Simmhan, Y.; Wickramaarachchi, C.; Kumbhare, A.G.; Frncu, M.; Nagarkar, S.; Ravi, S.; Raghavendra, C.S. & Prasanna, V.K.
    Scalable Analytics over Distributed Time-series Graphs using GoFFish
    2014 (arXiv:1406.5975) School: CoRR   techreport iisc, graphs, goffish, timeseries, cloud
    BibTeX:
    @techreport{simmhan:arxiv:2014a,
      author = {Yogesh Simmhan and Charith Wickramaarachchi and Alok Gautam Kumbhare and Marc Frncu and Soonil Nagarkar and Santosh Ravi and Cauligi S. Raghavendra and Viktor K. Prasanna},
      title = {Scalable Analytics over Distributed Time-series Graphs using GoFFish},
      school = {CoRR},
      year = {2014},
      number = {arXiv:1406.5975},
      note = {Modified version appears in IPDPS 2015},
      url = {http://arxiv.org/abs/1406.5975}
    }
    					
    simmhan:arxiv:2014b Simmhan, Y. & Kumbhare, A.G.
    Floe: A Continuous Dataflow Framework for Dynamic Cloud Applications
    2014 (arXiv:1406.5977) School: CoRR   techreport iisc, streaming, dataflow, cloud, floe
    BibTeX:
    @techreport{simmhan:arxiv:2014b,
      author = {Yogesh Simmhan and Alok Gautam Kumbhare},
      title = {Floe: A Continuous Dataflow Framework for Dynamic Cloud Applications},
      school = {CoRR},
      year = {2014},
      number = {arXiv:1406.5977},
      url = {http://arxiv.org/abs/1406.5977}
    }
    					
    Aman:comm:2013 Aman, S.; Simmhan, Y. & Prasanna, V.K.
    Energy Management Systems: State of the Art and Emerging Trends
    2013 IEEE Communications Magazine
    Vol. 51 (1) , pp. 114 - 119  
    article smart grid, peer reviewed, usc
    BibTeX:
    @article{Aman:comm:2013,
      author = {Saima Aman and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Energy Management Systems: State of the Art and Emerging Trends},
      journal = {IEEE Communications Magazine},
      publisher = {IEEE},
      year = {2013},
      volume = {51},
      number = {1},
      pages = {114 - 119},
      note = {[IF 3.785]},
      doi = {http://doi.org/10.1109/MCOM.2013.6400447}
    }
    					
    simmhan:cise:2013 Simmhan, Y.; Aman, S.; Kumbhare, A.; Liu, R.; Stevens, S.; Zhou, Q. & Prasanna, V.
    Cloud-based Software Platform for Data-Driven Smart Grid Management
    2013 IEEE/AIP Computing in Science and Engineering
    Vol. July/August , pp. 1-11  
    article usc, smart grid, cloud, peer reviewed
    BibTeX:
    @article{simmhan:cise:2013,
      author = {Yogesh Simmhan and Saima Aman and Alok Kumbhare and Rongyang Liu and Sam Stevens and Qunzhi Zhou and Viktor Prasanna},
      title = {Cloud-based Software Platform for Data-Driven Smart Grid Management},
      journal = {IEEE/AIP Computing in Science and Engineering},
      publisher = {IEEE and AIP},
      year = {2013},
      volume = {July/August},
      pages = {1-11},
      note = {[IF 1.422, CORE C]},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-cise-2013.pdf}
    }
    					
    kumbhare:sc:2013 Kumbhare, A.; Simmhan, Y. & Prasanna, V.
    Exploiting Cloud Elasticity to Enhance the Value of Dynamic, Continuous Dataflows
    2013 IEEE/ACM International Conference for High Performance Computing Networking, Storage, and Analysis (SC) [CORE A]   inproceedings usc, cloud, workflow, continuous dataflow, peer reviewed
    BibTeX:
    @inproceedings{kumbhare:sc:2013,
      author = {Alok Kumbhare and Yogesh Simmhan and Viktor Prasanna},
      title = {Exploiting Cloud Elasticity to Enhance the Value of Dynamic, Continuous Dataflows},
      booktitle = {IEEE/ACM International Conference for High Performance Computing Networking, Storage, and Analysis (SC) [CORE A]},
      year = {2013},
      note = {[Core A]}
    }
    					
    redekopp:ipdps:2013 Redekopp, M.; Simmhan, Y. & Prasanna, V.K.
    Optimizations and Analysis of BSP Graph Processing Models on Public Clouds
    2013 International Parallel & Distributed Processing Symposium (IPDPS) , pp. 1 - 12   inproceedings usc, cloud, graphs, azure, peer reviewed
    BibTeX:
    @inproceedings{redekopp:ipdps:2013,
      author = {Mark Redekopp and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Optimizations and Analysis of BSP Graph Processing Models on Public Clouds},
      booktitle = {International Parallel & Distributed Processing Symposium (IPDPS)},
      publisher = {IEEE},
      year = {2013},
      pages = {1 -- 12},
      note = {[CORE A]},
      url = {http://ceng.usc.edu/ simmhan/pubs/redekopp-ipdps-2013.pdf}
    }
    					
    simmhan:smartcities:2013 Simmhan, Y. & Noor, M.U.
    Scalable Prediction of Energy Consumption using Incremental Time Series Clustering
    2013 Workshop on Big Data and Smarter Cities   inproceedings smart grid, analytics, usc, peer reviewed
    BibTeX:
    @inproceedings{simmhan:smartcities:2013,
      author = {Yogesh Simmhan and Muhammad Usman Noor},
      title = {Scalable Prediction of Energy Consumption using Incremental Time Series Clustering},
      booktitle = {Workshop on Big Data and Smarter Cities},
      year = {2013}
    }
    					
    Wickramaarachchi:escience:2013 Wickramaarachchi, C. & Simmhan, Y.
    Continuous Dataflow Update Strategies for Mission-Critical Applications
    2013 IEEE Internatrional Conference on eScience (eScience)   inproceedings usc, cloud, workflow, continuous dataflow, peer reviewed
    BibTeX:
    @inproceedings{Wickramaarachchi:escience:2013,
      author = {Charith Wickramaarachchi and Yogesh Simmhan},
      title = {Continuous Dataflow Update Strategies for Mission-Critical Applications},
      booktitle = {IEEE Internatrional Conference on eScience (eScience)},
      year = {2013},
      note = {[CORE A]},
      url = {http://ceng.usc.edu/ simmhan/pubs/wickramaarachchi-escience-2013.pdf}
    }
    					
    zhou:bigdata:2013 Zhou, Q.; Simmhan, Y. & Prasanna, V.
    Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing
    2013 IEEE International Conference on Big Data (BigData)   inproceedings smart grid, cep, usc, peer reviewed, short
    BibTeX:
    @inproceedings{zhou:bigdata:2013,
      author = {Qunzhi Zhou and Yogesh Simmhan and Viktor Prasanna},
      title = {Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing},
      booktitle = {IEEE International Conference on Big Data (BigData)},
      year = {2013}
    }
    					
    chu:usctr:2013 Chu, H.-Y. & Simmhan, Y.
    Resource Allocation Strategies on Hybrid Cloud for Resilient Jobs
    2013 School: University of Southern California   techreport usc, cloud
    BibTeX:
    @techreport{chu:usctr:2013,
      author = {Hsuan-Yi Chu and Yogesh Simmhan},
      title = {Resource Allocation Strategies on Hybrid Cloud for Resilient Jobs},
      school = {University of Southern California},
      year = {2013},
      note = {Appeared in IPDPS 2014},
      url = {http://ceng.usc.edu/ simmhan/pubs/chu-usctr-2013.pdf}
    }
    					
    simmhan:usctr:2013a Simmhan, Y.; Kumbhare, A.; Wickramaarachchi, C.; Ma, N.; Nagarkar, S.; Ravi, S.; Raghavendra, C. & Prasanna, V.
    GoFFish : A Framework for Distributed Analytics Over Timeseries Graphs
    2013 (13-936) School: University of Southern California, Computer Science Department   techreport cloud, graph, goffish, usc
    BibTeX:
    @techreport{simmhan:usctr:2013a,
      author = {Yogesh Simmhan and Alok Kumbhare and Charith Wickramaarachchi and Nam Ma and Soonil Nagarkar and Santosh Ravi and Cauligi Raghavendra and Viktor Prasanna},
      title = {GoFFish : A Framework for Distributed Analytics Over Timeseries Graphs},
      school = {University of Southern California, Computer Science Department},
      year = {2013},
      number = {13-936},
      url = {http://www.cs.usc.edu/assets/007/87995.pdf}
    }
    					
    swenson:arxiv:2013 Swenson, S.; Simmhan, Y.; Prasanna, V.; Parashar, M.; Riedy, J.; Bader, D. & Vuduc, R.
    Sustainable Software Development for Next-Gen Sequencing (NGS) Bioinformatics on Emerging Platforms
    2013 (arXiv:1309.1828) School: CoRR   techreport usc, nsf, cloud
    BibTeX:
    @techreport{swenson:arxiv:2013,
      author = {Shel Swenson and Yogesh Simmhan and Viktor Prasanna and Manish Parashar and Jason Riedy and David Bader and Richard Vuduc},
      title = {Sustainable Software Development for Next-Gen Sequencing (NGS) Bioinformatics on Emerging Platforms},
      school = {CoRR},
      year = {2013},
      number = {arXiv:1309.1828},
      url = {http://arxiv.org/abs/1309.1828}
    }
    					
    Kumbhare:cloud:2012 Kumbhare, A.; Simmhan, Y. & Prasanna, V.
    Cryptonite: A Secure and Performant Data Repository on Public Clouds
    2012 International Cloud Computing Conference (CLOUD) , pp. 510-517   inproceedings usc, smart grid, security, data privacy, cloud, azure, peer reviewed
    BibTeX:
    @inproceedings{Kumbhare:cloud:2012,
      author = {Alok Kumbhare and Yogesh Simmhan and Viktor Prasanna},
      title = {Cryptonite: A Secure and Performant Data Repository on Public Clouds},
      booktitle = {International Cloud Computing Conference (CLOUD)},
      year = {2012},
      pages = {510--517},
      note = {[CORE B]},
      url = {https://ganges.usc.edu/svn/pg/pubs/preprint/kumbhare-cloud-2012.pdf}
    }
    					
    Simmhan:cloudfutures:2012 Simmhan, Y.; Den, L.; Kumbhare, A.; Redekopp, M. & Prasanna, V. Microsoft Research
    Scalable, Secure Analysis of Social Sciences Data on the Azure Platform
    2012 Cloud Futures Workshop   inproceedings azure, cloud, social informatics, graph, security, usc
    BibTeX:
    @inproceedings{Simmhan:cloudfutures:2012,
      author = {Yogesh Simmhan and Litao Den and Alok Kumbhare and Mark Redekopp and Viktor Prasanna},
      title = {Scalable, Secure Analysis of Social Sciences Data on the Azure Platform},
      booktitle = {Cloud Futures Workshop},
      year = {2012},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-cloudfutures-2012.pdf}
    }
    					
    Simmhan:scale:2012 Simmhan, Y.; Agarwal, V.; Aman, S.; Kumbhare, A.; Natarajan, S.; Rajguru, N.; Robinson, I.; Stevens, S.; Yin, W.; Zhou, Q. & Prasanna, V.
    Adaptive Energy Forecasting and Information Diffusion for Smart Power Grids
    2012 IEEE International Scalable Computing Challenge (SCALE) , pp. 1-4   inproceedings hadoop, openplanet, floe, workflow, information integration, smart grid, peer reviewed, usc, short
    BibTeX:
    @inproceedings{Simmhan:scale:2012,
      author = {Yogesh Simmhan and Vaibhav Agarwal and Saima Aman and Alok Kumbhare and Sreedhar Natarajan and Nikhil Rajguru and Ian Robinson and Samuel Stevens and Wei Yin and Qunzhi Zhou and Viktor Prasanna},
      title = {Adaptive Energy Forecasting and Information Diffusion for Smart Power Grids},
      booktitle = {IEEE International Scalable Computing Challenge (SCALE)},
      year = {2012},
      pages = {1--4},
      note = {First Prize},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-scale-2012.pdf}
    }
    					
    Yin:mapreduce:2012 Yin, W.; Simmhan, Y. & Prasanna, V.
    Scalable Regression Tree Learning on Hadoop using OpenPlanet
    2012 International Workshop on MapReduce and its Applications (MAPREDUCE) , pp. 57-64   inproceedings cloud, machine learning, map reduce, hadoop, smart grid, peer reviewed, usc
    Abstract: As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework using a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.
    BibTeX:
    @inproceedings{Yin:mapreduce:2012,
      author = {Wei Yin and Yogesh Simmhan and Viktor Prasanna},
      title = {Scalable Regression Tree Learning on Hadoop using OpenPlanet},
      booktitle = {International Workshop on MapReduce and its Applications (MAPREDUCE)},
      year = {2012},
      pages = {57--64},
      url = {http://ceng.usc.edu/ simmhan/pubs/yin-mapreduce-2012.pdf}
    }
    					
    Zhao:ipaw:2012 Zhao, J.; Simmhan, Y. & Prasanna, V.
    Presenting Apropos Provenance for Situation Awareness and Forensics
    2012
    Vol. 7525 International Proveanance and Annotation Workshop , pp. 250-253  
    inproceedings provenance, smart grid, usc, peer reviewed, short
    BibTeX:
    @inproceedings{Zhao:ipaw:2012,
      author = {Jing Zhao and Yogesh Simmhan and Viktor Prasanna},
      title = {Presenting Apropos Provenance for Situation Awareness and Forensics},
      booktitle = {International Proveanance and Annotation Workshop},
      publisher = {Springer},
      year = {2012},
      volume = {7525},
      pages = {250--253},
      note = {Poster},
      url = {http://dx.doi.org/10.1007/978-3-642-34222-6_30},
      doi = {http://doi.org/10.1007/978-3-642-34222-6_30}
    }
    					
    Zhou:iswc:2012 Zhao, Q.; Simmhan, Y. & Prasanna, V.K.
    Incorporating Semantic Knowledge into Stream Processing for Smart Grid Applications
    2012 International Semantic Web Conference (ISWC) , pp. 1-16   inproceedings peer reviewed, smart grid, cep, usc
    BibTeX:
    @inproceedings{Zhou:iswc:2012,
      author = {Qunzhi Zhao and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Incorporating Semantic Knowledge into Stream Processing for Smart Grid Applications},
      booktitle = {International Semantic Web Conference (ISWC)},
      year = {2012},
      pages = {1--16},
      note = {[CORE A]},
      url = {http://iswc2012.semanticweb.org/sites/default/files/76500254.pdf},
      doi = {http://doi.org/10.1007/978-3-642-35173-0_17}
    }
    					
    Zhou:itng:2012 Zhou, Q.; Natarajan, S.; Simmhan, Y. & Prasanna, V.
    Semantic Information Modeling for Emerging Applications in Smart Grid
    2012 International Conference on Information Technology : New Generations (ITNG) , pp. 775-782   inproceedings usc, smart grid, semantic, information integration, peer reviewed
    Abstract: Abstract—Smart Grid modernizes power grid by integrating digital and information technologies. Millions of smart meters, intelligent appliances and communication infrastructures are under deployment allowing advanced IT applications to be developed to protect and optimize power grid operations. Demand response (DR) is one such emerging application to optimize electricity demand by curtailing/shifting power load when peak load occurs. Existing DR approaches are mostly based on static plans such as pricing policies and load shedding schedules. However, improvements to power management applications rely on data emanated from existing and new information sources with the grow of Smart Grid information space. In particular, dynamic DR algorithms may depend on information from smart meters that report interval-based power consumption measurement, HVAC systems that monitor buildings heat and humidity, and even weather forecast services. In order for emerging Smart Grid applications to take advantage of the diverse data influx, extensible information integration is required. In this paper, we develop an integrated Smart Grid information model using Semantic Web techniques and present case studies of using semantic information for dynamic DR. We show the semantic model facilitates information integration and knowledge representation for developing the next generation Smart Grid applications.
    BibTeX:
    @inproceedings{Zhou:itng:2012,
      author = {Qunzhi Zhou and Sreedhar Natarajan and Yogesh Simmhan and Viktor Prasanna},
      title = {Semantic Information Modeling for Emerging Applications in Smart Grid},
      booktitle = {International Conference on Information Technology : New Generations (ITNG)},
      year = {2012},
      pages = {775--782},
      url = {http://dx.doi.org/10.1109/ITNG.2012.150},
      doi = {http://doi.org/10.1109/ITNG.2012.150}
    }
    					
    Simmhan:sciencecloud:2012 Simmhan, Y.; Antoniu, G.; Goble, C. & Ramakrishnan, L. Simmhan, Y.; Antoniu, G.; Goble, C. & Ramakrishnan, L. (Hrsg.)
    Proceedings of the 3rd International Workshop on Scientific Cloud Computing (ScienceCloud)
    2012   proceedings editorial, usc
    BibTeX:
    @proceedings{Simmhan:sciencecloud:2012,
      author = {Yogesh Simmhan and Gabriel Antoniu and Carole Goble and Lavanya Ramakrishnan},
      title = {Proceedings of the 3rd International Workshop on Scientific Cloud Computing (ScienceCloud)},
      publisher = {ACM},
      year = {2012}
    }
    					
    Zhou:usctr:2012 Zhou, Q.; Simmhan, Y. & Prasanna, V.
    SCEPter: Semantic Complex Event Processing over End-to-End Data Flows
    2012 (12-926) School: Computer Science Department, University of Southern California   techreport usc, smart grid, cep
    BibTeX:
    @techreport{Zhou:usctr:2012,
      author = {Qunzhi Zhou and Yogesh Simmhan and Viktor Prasanna},
      title = {SCEPter: Semantic Complex Event Processing over End-to-End Data Flows},
      school = {Computer Science Department, University of Southern California},
      year = {2012},
      number = {12-926},
      url = {http://www.cs.usc.edu/research/12-926.pdf}
    }
    					
    Zhou:usctrddr:2012 Zhao, Q.; Simmhan, Y. & Prasanna, V.K.
    On Using Semantic Complex Event Processing for Dynamic Demand Response Optimization
    2012 School: USC Computer Science Department   techreport smart grid, complex event processing, usc
    BibTeX:
    @techreport{Zhou:usctrddr:2012,
      author = {Qunzhi Zhao and Yogesh Simmhan and Viktor K. Prasanna},
      title = {On Using Semantic Complex Event Processing for Dynamic Demand Response Optimization},
      school = {USC Computer Science Department},
      year = {2012}
    }
    					
    Chebotko:ijca:2011 Chebotko, A.; Simmhan, Y. & Missier, P.
    Guest Editorial: Scientific Workflows, Provenance and Their Applications
    2011 International Journal of Computers and Their Applications (IJCA)
    Vol. 18 (3) , pp. 130-132  
    article usc, provenance, workflows, special issue, editorial
    BibTeX:
    @article{Chebotko:ijca:2011,
      author = {Artem Chebotko and Yogesh Simmhan and Paolo Missier},
      title = {Guest Editorial: Scientific Workflows, Provenance and Their Applications},
      journal = {International Journal of Computers and Their Applications (IJCA)},
      publisher = {ISCA},
      year = {2011},
      volume = {18},
      number = {3},
      pages = {130-132},
      url = {http://ceng.usc.edu/ simmhan/pubs/chebotko-ijca-2011.pdf}
    }
    					
    Moreau:fgcs:2011 Moreau, L.; Clifford, B.; Freire, J.; Futrelle, J.; Gil, Y.; Groth, P.; Kwasnikowska, N.; Miles, S.; Missier, P.; Myers, J.; Plale, B.; Simmhan, Y.; Stephan, E. & den Bussche, J.V. Simmhan, Y.; Groth, P. & Moreau, L. (Hrsg.)
    The Open Provenance Model core specification (v1.1)
    2011 Future Generation Computer Systems (FGCS)
    Vol. 27 , pp. 743-756  
    article msr, provenance, opm, representation, inter-operability, peer reviewed
    Abstract: The Open Provenance Model is a model of provenance that is designed to meet the following requirements: (1) Allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) Allow developers to build and share tools that operate on such a provenance model. (3) Define provenance in a precise, technology-agnostic manner. (4) Support a digital representation of provenance for any “thing”, whether produced by computer systems or not. (5) Allow multiple levels of description to coexist. (6) Define a core set of rules that identify the valid inferences that can be made on provenance representation. This document contains the specification of the Open Provenance Model (v1.1) resulting from a community effort to achieve inter-operability in the Provenance Challenge series.
    BibTeX:
    @article{Moreau:fgcs:2011,
      author = {Luc Moreau and Ben Clifford and Juliana Freire and Joe Futrelle and Yolanda Gil and Paul Groth and Natalia Kwasnikowska and Simon Miles and Paolo Missier and Jim Myers and Beth Plale and Yogesh Simmhan and Eric Stephan and Jan Van den Bussche},
      title = {The Open Provenance Model core specification (v1.1)},
      journal = {Future Generation Computer Systems (FGCS)},
      publisher = {Elsevier},
      year = {2011},
      volume = {27},
      pages = {743--756},
      note = {[IF 2.43, CORE A]},
      url = {http://ceng.usc.edu/ simmhan/pubs/moreau-fgcs-2011.pdf},
      doi = {http://doi.org/10.1016/j.future.2010.07.005}
    }
    					
    Simmhan:fgcs:2011 Simmhan, Y. & Barga, R. Simmhan, Y.; Groth, P. & Moreau, L. (Hrsg.)
    Analysis of approaches for supporting the Open Provenance Model: A case study of the Trident workflow workbench
    2011 Future Generation Computer Systems (FGCS)
    Vol. 27 , pp. 790-796  
    article msr, provenance, opm, trident, workflow, inter-operability, provenance challenge, peer reviewed
    Abstract: The Trident workbench is a platform for composing, executing and managing scientific workflows. While Trident collects provenance in its native provenance model, the third provenance challenge was an opportunity to build support for the Open Provenance Model into Trident. There are several possible approaches to harmonize our native model with OPM, and such choices are also available to other existing provenance and workflow systems working towards OPM compatibility. We identify and analyze the relative merits of these approaches in an effort to inform practitioners planning to support OPM in their existing provenance/workflow systems. Further, we describe our experience with using the integration approach we choose to interoperate with other teams as part of the challenge.
    BibTeX:
    @article{Simmhan:fgcs:2011,
      author = {Yogesh Simmhan and Roger Barga},
      title = {Analysis of approaches for supporting the Open Provenance Model: A case study of the Trident workflow workbench},
      journal = {Future Generation Computer Systems (FGCS)},
      publisher = {Elsevier},
      year = {2011},
      volume = {27},
      pages = {790--796},
      note = {[IF 2.43, CORE A]},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-fgcs-2011.pdf},
      doi = {http://doi.org/10.1016/j.future.2010.10.005}
    }
    					
    Simmhan:fgcs:2011a Simmhan, Y.; Groth, P. & Moreau, L. Simmhan, Y.; Groth, P. & Moreau, L. (Hrsg.)
    Special Section: The third provenance challenge on using the open provenance model for interoperability
    2011 Future Generation Computer Systems (FGCS)
    Vol. 27 , pp. 737-742  
    article msr, provenance, opm, trident, workflow, inter-operability, provenance challenge, editorial
    Abstract: The third provenance challenge was organized to evaluate the efficacy of the Open Provenance Model (OPM) in representing and sharing provenance with the goal of improving the specification. A data loading scientific workflow that ingests data files into a relational database for the Pan-STARRS sky survey project was selected as a candidate for collecting provenance. Challenge participants record provenance, run queries over it, and export/import provenance as OPM documents with other teams to verify interoperability. Fourteen teams participated in the challenge that concluded at a workshop in June 2009 in Amsterdam. The experiences of several participating teams are included in this special issue. In this editorial, we describe the challenge in detail, review its outcome, and introduce articles included in this special issue.
    BibTeX:
    @article{Simmhan:fgcs:2011a,
      author = {Yogesh Simmhan and Paul Groth and Luc Moreau},
      title = {Special Section: The third provenance challenge on using the open provenance model for interoperability},
      journal = {Future Generation Computer Systems (FGCS)},
      publisher = {Elsevier},
      year = {2011},
      volume = {27},
      pages = {737-742},
      note = {[IF 2.43, CORE A]},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-fgcs-2011a.pdf},
      doi = {http://doi.org/10.1016/j.future.2010.11.020}
    }
    					
    Simmhan:ijca:2011 Simmhan, Y. & Plale, B.
    Using Provenance for Personalized Quality Ranking of Scientific Datasets
    2011 International Journal of Computers and Their Applications (IJCA)
    Vol. 18 (3) , pp. 180-195  
    article usc, provenance, iu, peer reviewed, karma, special issue
    Abstract: The rapid growth of eScience has led to an explosion in the creation and availability of scientific datasets that includes raw instrument data and derived datasets from model simulations. A large number of these datasets are surfacing online in public and private catalogs, often annotated with XML metadata, as part of community efforts to foster open research. With this rapid expansion comes the challenge of filtering and selecting datasets that best match the needs of scientists. We address a key aspect of the scientific data discovery process by ranking search results according to a personalized data quality score based on a declarative quality profile to help scientists select the most suitable data for their applications. Our quality model is resilient to missing metadata using a novel strategy that uses provenance in its absence. Intuitively, our premise is that the quality score for a dataset depends on its provenance – the scientific task and its inputs that created the dataset – and it is possible to define a quality function based on provenance metadata that predicts the same quality score as one evaluated using the user’s quality profile over the complete metadata. Here, we present a model and architecture for data quality scoring, apply machine learning techniques to construct a quality function that uses provenance as proxy for missing metadata, and empirically test the prediction power of our quality function. Our results show that for some scientific tasks, quality scores based on provenance closely track the quality scores based on complete metadata properties, with error margins between 1 – 29%.
    BibTeX:
    @article{Simmhan:ijca:2011,
      author = {Yogesh Simmhan and Beth Plale},
      title = {Using Provenance for Personalized Quality Ranking of Scientific Datasets},
      journal = {International Journal of Computers and Their Applications (IJCA)},
      publisher = {ISCA},
      year = {2011},
      volume = {18},
      number = {3},
      pages = {180--195},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-ijca-2011.pdf}
    }
    					
    Zhao:ijca:2011 Zhao, J.; Simmhan, Y.; Gomadam, K. & Prasanna, V.K.
    Querying Provenance Information in Distributed Environments
    2011 International Journal of Computers and Their Applications (IJCA)
    Vol. 18 (3) , pp. 196-215  
    article usc, smart oilfield, provenance, peer reviewed, special issue
    Abstract: The growing recognition of the importance of provenance for data intensive and multidisciplinary domains is leading to careful collection of provenance. One consequence of this is the proliferation of provenance repositories hosted for individual organization or communities, with limited ability to reconstruct and query for and on provenance across them. Community standards like the Open Provenance Model (OPM) allow uniform interpretation and exchange of provenance metadata but do not prescribe query or service specifications to access provenance. If data reuse and sharing across institutions is not accompanied by passing provenance at the time of data exchange, we need to track the provenance and query for them or over them across distributed provenance repositories. In this article, we present approaches for querying over distributed provenance information, and address two common provenance query models that we formalize: provenance retrieval query and provenance filter query. Our problem is motivated by Smart Oilfield applications in the energy informatics domain, and we evaluate the performance of our algorithms using synthetic workflows based on the domain.
    BibTeX:
    @article{Zhao:ijca:2011,
      author = {Jing Zhao and Yogesh Simmhan and Karthik Gomadam and Viktor K. Prasanna},
      title = {Querying Provenance Information in Distributed Environments},
      journal = {International Journal of Computers and Their Applications (IJCA)},
      publisher = {ISCA},
      year = {2011},
      volume = {18},
      number = {3},
      pages = {196--215},
      url = {http://ceng.usc.edu/ simmhan/pubs/zhao-ijca-2011.pdf}
    }
    					
    Simmhan:greenit:2011 Simmhan, Y.; Zhou, Q. & Prasanna, V.K. Kim, J.H. & Lee, M.J. (Hrsg.)
    Semantic Information Integration for Smart Grid Applications ( Green IT: Technologies and Applications )
    2011 Green IT: Technologies and Applications , pp. 361-380   inbook usc, smart grid, semantic, information integration, peer reviewed
    Abstract: The Los Angeles Smart Grid Project aims to use informatics techniques to bring about a quantum leap in the way demand response load optimization is performed in utilities. Semantic information integration, from sources as diverse as Internet-connected smart meters and social networks, is a linchpin to support the advanced analytics and mining algorithms required for this. In association with it, semantic complex event processing system will allow consumer and utility managers to easily specify and enact energy policies continuously. We present the information systems architecture for the project that is under development, and discuss research issues that emerge from having to design a system that supports 1.4 million customers and a rich ecosystem of Smart Grid applications from users, third party vendors, the utility and regulators.
    BibTeX:
    @inbook{Simmhan:greenit:2011,
      author = {Yogesh Simmhan and Qunzhi Zhou and Viktor K. Prasanna},
      title = {Green IT: Technologies and Applications},
      publisher = {Springer Berlin Heidelberg},
      year = {2011},
      pages = {361--380},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-greenit-2011.pdf},
      doi = {http://doi.org/10.1007/978-3-642-22179-8_19}
    }
    					
    Aman:dddm:2011 Aman, S.; Simmhan, Y. & Prasanna, V.K.
    Improving Energy Use Forecast for Campus Micro-grids using Indirect Indicators
    2011 International Workshop on Domain Driven Data Mining (DDDM) , pp. 1-9   inproceedings usc, smart grid, machine learning, peer reviewed
    BibTeX:
    @inproceedings{Aman:dddm:2011,
      author = {Saima Aman and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Improving Energy Use Forecast for Campus Micro-grids using Indirect Indicators},
      booktitle = {International Workshop on Domain Driven Data Mining (DDDM)},
      year = {2011},
      pages = {1--9},
      url = {http://ceng.usc.edu/ simmhan/pubs/aman-dddm-2011.pdf}
    }
    					
    Aman:socalsgs:2011 Aman, S.; Yin, W.; Simmhan, Y. & Prasanna, V.
    Machine Learning for Demand Forecasting in Smart Grid
    2011 Southern California Smart Grid Research Symposium (SoCalSGS)   inproceedings usc, smart grid, machine learning, map reduce
    BibTeX:
    @inproceedings{Aman:socalsgs:2011,
      author = {Saima Aman and Wei Yin and Yogesh Simmhan and Viktor Prasanna},
      title = {Machine Learning for Demand Forecasting in Smart Grid},
      booktitle = {Southern California Smart Grid Research Symposium (SoCalSGS)},
      year = {2011},
      note = {Poster},
      url = {http://ceng.usc.edu/ simmhan/pubs/aman-socalsgs-2011.pdf}
    }
    					
    Kumbhare:datacloud:2011 Kumbhare, A.; Simmhan, Y. & Prasanna, V.
    Designing a Secure Storage Repository for Sharing Scientific Datasets using Public Clouds
    2011 International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC11) , pp. 31-40   inproceedings peer reviewed, cloud, azure, security, smart grid, usc
    BibTeX:
    @inproceedings{Kumbhare:datacloud:2011,
      author = {Alok Kumbhare and Yogesh Simmhan and Viktor Prasanna},
      title = {Designing a Secure Storage Repository for Sharing Scientific Datasets using Public Clouds},
      booktitle = {International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC11)},
      year = {2011},
      pages = {31-40},
      url = {http://ceng.usc.edu/ simmhan/pubs/kumbhare-datacloud-2011.pdf}
    }
    					
    Redekopp:pargraph:2011 Redekopp, M.; Simmhan, Y. & Prasanna, V.K.
    Performance Analysis of Vertex-centric Graph Algorithms on the Azure Cloud Platform
    2011 Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs (ParGraph) , pp. 1-8   inproceedings graphs, azure, cloud, peer reviewed, usc
    Abstract: Finding key vertices in large graphs is an important problem in many applications such as social networks, bioinformatics, and distribution networks. Betweenness centrality is a popular algorithm for finding such vertices and has been studied extensively, yielding several parallel formulations suitable to supercomputers and clusters. In this paper we implement and study betweenness centrality in the context of cloud-based platforms using Microsoft Windows Azure as our case study. We demonstrate scalable parallel performance and investigate key issues related to a cloud-based implementation including mitigating penalties associated with VM failures as well as the impact of communication overheads in the cloud. We use a combination of empirical and analytical evaluation using both synthetic small-world and real-world social interaction graphs.
    BibTeX:
    @inproceedings{Redekopp:pargraph:2011,
      author = {Mark Redekopp and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Performance Analysis of Vertex-centric Graph Algorithms on the Azure Cloud Platform},
      booktitle = {Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs (ParGraph)},
      year = {2011},
      pages = {1--8},
      url = {http://ceng.usc.edu/ simmhan/pubs/redekopp-pargraph-2011.pdf}
    }
    					
    Simmhan:buildsys:2011 Simmhan, Y.; Prasanna, V.; Aman, S.; Natarajan, S.; Yin, W. & Zhou, Q.
    Towards Data-driven Demand-Response Optimization in a Campus Microgrid
    2011 Workshop On Embedded Sensing Systems For Energy-Efficiency In Buildings (BuildSys) , pp. 1-2   inproceedings usc, smart grid. information integration, cep, machine learning, peer reviewed, demo
    Abstract: We describe and demonstrate a prototype software architecture to support data-driven demand response optimization (DR) in the USC campus microgrid, as part of the Los Angeles Smart Grid Demonstration Project. The architecture includes a semantic information repository that integrates diverse data sources to support DR, demand forecasting using scalable machine-learned models, and detection of load curtailment opportunities by matching complex event patterns.
    BibTeX:
    @inproceedings{Simmhan:buildsys:2011,
      author = {Yogesh Simmhan and Viktor Prasanna and Saima Aman and Sreedhar Natarajan and Wei Yin and Qunzhi Zhou},
      title = {Towards Data-driven Demand-Response Optimization in a Campus Microgrid},
      booktitle = {Workshop On Embedded Sensing Systems For Energy-Efficiency In Buildings (BuildSys)},
      publisher = {ACM},
      year = {2011},
      pages = {1--2},
      note = {Demo},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-buildsys-2011.pdf}
    }
    					
    Simmhan:cloud:2011 Simmhan, Y.; Kumbhare, A.; Cao, B. & Prasanna, V.K.
    An Analysis of Security and Privacy Issues in Smart Grid Software Architectures on Clouds
    2011 International Cloud Computing Conference (CLOUD) , pp. 582 - 589   inproceedings usc, cloud, security, privacy, smart grid, peer reviewed
    Abstract: Power utilities globally are increasingly upgrading to Smart Grids that use bi-directional communication with the consumer to enable an information-driven approach to distributed energy management. Clouds offer features well suited for Smart Grid software platforms and applications, such as elastic resources and shared services. However, the security and privacy concerns inherent in an informationrich Smart Grid environment are further exacerbated by their deployment on Clouds. Here, we present an analysis of security and privacy issues in a Smart Grids software architecture operating on different Cloud environments, in the form of a taxonomy. We use the Los Angeles Smart Grid Project that is underway in the largest U.S. municipal utility to drive this analysis that will benefit both Cloud practitioners targeting Smart Grid applications, and Cloud researchers investigating security and privacy.
    BibTeX:
    @inproceedings{Simmhan:cloud:2011,
      author = {Yogesh Simmhan and Alok Kumbhare and Baohua Cao and Viktor K. Prasanna},
      title = {An Analysis of Security and Privacy Issues in Smart Grid Software Architectures on Clouds},
      booktitle = {International Cloud Computing Conference (CLOUD)},
      publisher = {IEEE},
      year = {2011},
      pages = {582 -- 589},
      note = {[CORE B]},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-cloud-2011.pdf},
      doi = {http://doi.org/10.1109/CLOUD.2011.107}
    }
    					
    Simmhan:hpcdb:2011 Simmhan, Y.; van Ingen, C.; Heasley, J. & Szalay, A.
    Stargazing through a Digital Veil: Managing a Large Scale Sky Survey using Distributed Databases on HPC Clusters
    2011 Workshop on High-Performance Computing meets Databases (HPCDB) , pp. 33-36   inproceedings usc, msr, escience, data management, hpc, graywulf, panstarrs, databases, peer reviewed
    BibTeX:
    @inproceedings{Simmhan:hpcdb:2011,
      author = {Yogesh Simmhan and Catharine van Ingen and Jim Heasley and Alex Szalay},
      title = {Stargazing through a Digital Veil: Managing a Large Scale Sky Survey using Distributed Databases on HPC Clusters},
      booktitle = {Workshop on High-Performance Computing meets Databases (HPCDB)},
      year = {2011},
      pages = {33--36},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-hpcdb-2011.pdf}
    }
    					
    Simmhan:sciencecloud:2011 Simmhan, Y.; Cao, B.; Giakkoupis, M. & Prasanna, V.K.
    Adaptive rate stream processing for smart grid applications on clouds
    2011 International Workshop on Scientific Cloud Computing (ScienceCloud) , pp. 33-38   inproceedings usc, smart grid, cloud, streaming, peer reviewed, short paper
    Abstract: Pervasive smart meters that continuously measure power usage by consumers within a smart (power) grid are providing utilities and power systems researchers with unprecedented volumes of information through streams that need to be processed and analyzed in near realtime. We introduce the use of Cloud platforms to perform scalable, latency sensitive stream processing for eEngineering applications in the smart grid domain. One unique aspect of our work is the use of adaptive rate control to throttle the rate of generation of power events by smart meters, which meets accuracy requirements of smart grid applications while consuming 50% lesser bandwidth resources in the Cloud.
    BibTeX:
    @inproceedings{Simmhan:sciencecloud:2011,
      author = {Yogesh Simmhan and Baohua Cao and Michail Giakkoupis and Viktor K. Prasanna},
      title = {Adaptive rate stream processing for smart grid applications on clouds},
      booktitle = {International Workshop on Scientific Cloud Computing (ScienceCloud)},
      publisher = {ACM},
      year = {2011},
      pages = {33--38},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-sciencecloud-2011.pdf},
      doi = {http://doi.org/10.1145/1996109.1996116}
    }
    					
    Zhou:debs:2011 Zhou, Q.; Simmhan, Y. & Prasanna, V.K.
    Towards an inexact semantic complex event processing framework
    2011 International Conference on Distributed Event-Based System (DEBS) , pp. 401-402   inproceedings usc, smart grid. cep, semantic, peer reviewed, poster
    Abstract: Complex event processing (CEP) deals with detecting real-time situations, represented as event patterns, from among an event cloud. The state-of-the-art CEP systems process events as plain data tuples and are limited to detect precisely defined patterns. Emerging application areas like optimization in smart power grids require CEP to incorporate semantic knowledge of the domain for easier pattern specification, and detect inexact patterns in the presence of uncertainties. In this paper, we present motivating use cases, discuss limitations of existing CEP systems and describe our work towards an Inexact Semantic Complex Event Processing (InSCEP) framework.
    BibTeX:
    @inproceedings{Zhou:debs:2011,
      author = {Qunzhi Zhou and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Towards an inexact semantic complex event processing framework},
      booktitle = {International Conference on Distributed Event-Based System (DEBS)},
      publisher = {ACM},
      year = {2011},
      pages = {401-402},
      note = {Poster},
      url = {http://ceng.usc.edu/ simmhan/pubs/zhou-debs-2011.pdf},
      doi = {http://doi.org/10.1145/2002259.2002331}
    }
    					
    Zhou:socalsgs:2011 Zhou, Q.; Natarajan, S.; Simmhan, Y. & Prasanna, V.
    Semantic Information Integration and Processing for Demand Response Optimization
    2011 Southern California Smart Grid Research Symposium (SoCalSGS)   inproceedings usc, smart grid, information integration, semantic web
    BibTeX:
    @inproceedings{Zhou:socalsgs:2011,
      author = {Qunzhi Zhou and Sreedhar Natarajan and Yogesh Simmhan and Viktor Prasanna},
      title = {Semantic Information Integration and Processing for Demand Response Optimization},
      booktitle = {Southern California Smart Grid Research Symposium (SoCalSGS)},
      year = {2011},
      note = {Poster},
      url = {http://ceng.usc.edu/ simmhan/pubs/aman-socalsgs-2011.pdf}
    }
    					
    Zinn:ccgrid:2011 Zinn, D.; Hart, Q.; McPhillips, T.M.; Ludäscher, B.; Simmhan, Y.; Giakkoupis, M. & Prasanna, V.K.
    Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms
    2011 International Symposium on Cluster, Cloud and Grid Computing (CCGRID) , pp. 235-244   inproceedings usc, smart grid, cloud, streaming, peer reviewed, escience
    Abstract: Scientific workflows are commonplace in eScience applications. Yet, the lack of integrated support for data models, including streaming data, structured collections and files, is limiting the ability of workflows to support emerging applications in energy informatics that are stream oriented. This is compounded by the absence of Cloud data services that support reliable and performant streams. In this paper, we propose and present a scientific workflow framework that supports streams as first-class data, and is optimized for performant and reliable execution across desktop and Cloud platforms. The workflow framework features and its empirical evaluation on a private Eucalyptus Cloud are presented.
    BibTeX:
    @inproceedings{Zinn:ccgrid:2011,
      author = {Daniel Zinn and Quinn Hart and Timothy M. McPhillips and Bertram Ludäscher and Yogesh Simmhan and Michail Giakkoupis and Viktor K. Prasanna},
      title = {Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms},
      booktitle = {International Symposium on Cluster, Cloud and Grid Computing (CCGRID)},
      publisher = {IEEE},
      year = {2011},
      pages = {235-244},
      note = {[CORE A]},
      url = {http://ceng.usc.edu/ simmhan/pubs/zinn-ccgrid-2011.pdf},
      doi = {http://doi.org/10.1109/CCGrid.2011.74}
    }
    					
    Raicu:ScienceCloud2011 Raicu, I.; Beckman, P.; Foster, I.T. & Simmhan, Y. Raicu, I.; Beckman, P.; Foster, I.T. & Simmhan, Y. (Hrsg.)
    Proceedings of the 2nd International Workshop on Scientific Cloud Computing (ScienceCloud)
    2011   proceedings editorial, usc
    BibTeX:
    @proceedings{Raicu:ScienceCloud2011,
      author = {Ioan Raicu and Pete Beckman and Ian T. Foster and Yogesh Simmhan},
      title = {Proceedings of the 2nd International Workshop on Scientific Cloud Computing (ScienceCloud)},
      publisher = {ACM},
      year = {2011},
      url = {http://dx.doi.org/10.1145/1996109}
    }
    					
    Simmhan:HiPC:2011 Simmhan, Y. & Srinivasan, A. Simmhan, Y. & Srinivasan, A. (Hrsg.)
    HiPC 2011 Student Research Symposium: Message from the co-chairs
    2011 High Performance Computing Conference (HiPC)   proceedings editorial, usc
    BibTeX:
    @proceedings{Simmhan:HiPC:2011,
      author = {Yogesh Simmhan and Ashok Srinivasan},
      title = {HiPC 2011 Student Research Symposium: Message from the co-chairs},
      booktitle = {High Performance Computing Conference (HiPC)},
      year = {2011}
    }
    					
    Klyne:w3cprov:2011 Klyne, G.; Groth, P.; Moreau, L.; Hartig, O.; Simmhan, Y.; Myers, J.; Lebo, T.; Belhajjame, K. & Miles, S.
    PROV-AQ: Provenance Access and Query
    2011 School: World Wide Web Consortium (W3C)   techreport usc, provenance
    Abstract: This document specifies how to use standard Web protocols, including HTTP, to obtain information about the provenance of Web resources. We describe both simple access mechanisms for locating provenance information associated with web pages or resources, and provenance query services for more complex deployments. This is part of the larger W3C Prov provenance framework.
    BibTeX:
    @techreport{Klyne:w3cprov:2011,
      author = {Graham Klyne and Paul Groth and Luc Moreau and Olaf Hartig and Yogesh Simmhan and James Myers and Timothy Lebo and Khalid Belhajjame and Simon Miles},
      title = {PROV-AQ: Provenance Access and Query},
      school = {World Wide Web Consortium (W3C)},
      year = {2011},
      note = {W3C Editor's Draft},
      url = {http://dvcs.w3.org/hg/prov/raw-file/tip/paq/prov-aq.html}
    }
    					
    Simmhan:usctr:2011 Simmhan, Y.; Aman, S.; Cao, B.; Giakkoupis, M.; Kumbhare, A.; Zhou, Q.; Paul, D.; Fern, C.; Sharma, A. & Prasanna, V.K.
    An Informatics Approach to Demand Response Optimization in Smart Grids
    2011 School: University of Southern California   techreport usc, smart grid, informatics
    BibTeX:
    @techreport{Simmhan:usctr:2011,
      author = {Yogesh Simmhan and Saima Aman and Baohua Cao and Mike Giakkoupis and Alok Kumbhare and Qunzhi Zhou and Donald Paul and Carol Fern and Aditya Sharma and and Viktor K. Prasanna},
      title = {An Informatics Approach to Demand Response Optimization in Smart Grids},
      school = {University of Southern California},
      year = {2011},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-usctr-2011.pdf}
    }
    					
    Barga:deb:2010 Barga, R.; Simmhan, Y.; Withana, E.C.; Sahoo, S.; Jackson, J. & Araujo, N. Tan, W.-C. (Hrsg.)
    Provenance for Scientific Workflows: Towards Reproducible Research
    2010 Data Engineering Bulletin (DEB)
    Vol. 33 (3) , pp. 50-59  
    article msr, provenance, trident, workflow, peer reviewed
    BibTeX:
    @article{Barga:deb:2010,
      author = {Roger Barga and Yogesh Simmhan and Eran Chinthaka Withana and Satya Sahoo and Jared Jackson and Nelson Araujo},
      title = {Provenance for Scientific Workflows: Towards Reproducible Research},
      journal = {Data Engineering Bulletin (DEB)},
      publisher = {IEEE},
      year = {2010},
      volume = {33},
      number = {3},
      pages = {50--59},
      url = {http://sites.computer.org/debull/A10sept/barga.pdf}
    }
    					
    Aman:ef:2010 Aman, S.; Simmhan, Y. & Prasanna, V.K.
    Smart Communication of Energy Use and Prediction in a Smart Grid Software Architecture
    2010 Energy Forum: A System Approach Toward Green Energy Production and Adaptive Power Distribution   inproceedings usc, energy informatics, smart grid, natural language processing, machine learning, poster
    BibTeX:
    @inproceedings{Aman:ef:2010,
      author = {Saima Aman and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Smart Communication of Energy Use and Prediction in a Smart Grid Software Architecture},
      booktitle = {Energy Forum: A System Approach Toward Green Energy Production and Adaptive Power Distribution},
      publisher = {IEEE Coastal Los Angeles Section},
      year = {2010},
      note = {Poster},
      url = {http://ceng.usc.edu/ simmhan/pubs/aman-ef-2010.pdf}
    }
    					
    Simmhan:cloud:2010 Simmhan, Y.; van Ingen, C.; Subramanian, G. & Li, J.
    Bridging the Gap between Desktop and the Cloud for eScience Applications
    2010 International Cloud Computing Conference (CLOUD) , pp. 474-481   inproceedings msr, cloud, workflow, escience, generic worker, genomics, peer reviewed
    Abstract: The widely discussed scientific data deluge creates a need to computationally scale out eScience applications beyond the local desktop and cope with variable loads over time. Cloud computing offers a scalable, economic, on-demand model well matched to these needs. Yet cloud computing creates gaps that must be crossed to move existing science applications to the cloud. In this article, we propose a Generic Worker framework to deploy and invoke science applications in the cloud with minimal user effort and predictable cost-effective performance. Our framework addresses three distinct challenges posed by the cloud: the complexity of application deployment, invocation of cloud applications from desktop clients, and efficient transparent data transfers across desktop and the cloud. We present an implementation of the Generic Worker for the Microsoft Azure Cloud and evaluate its use for a genomics application. Our evaluation shows that the user complexity to port and scale the application is substantially reduced while introducing a negligible performance overhead of of <; 5% for the genomics application when scaling to 20 VM instances.
    BibTeX:
    @inproceedings{Simmhan:cloud:2010,
      author = {Yogesh Simmhan and Catharine van Ingen and Girish Subramanian and Jie Li},
      title = {Bridging the Gap between Desktop and the Cloud for eScience Applications},
      booktitle = {International Cloud Computing Conference (CLOUD)},
      publisher = {IEEE},
      year = {2010},
      pages = {474-481},
      note = {[CORE B]},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-cloud-2010.pdf},
      doi = {http://doi.org/10.1109/CLOUD.2010.72}
    }
    					
    Simmhan:cloudcom:2010 Simmhan, Y.; Giakkoupis, M.; Cao, B. & Prasanna, V.K.
    On Using Cloud Platforms in a Software Architecture for Smart Energy Grids
    2010 International Conference on Cloud Computing Technology and Science (CloudCom)   inproceedings usc, energy informatics, smart grid, cloud, poster, peer reviewed
    Abstract: Increasing concern about energy consumption is leading to infrastructure that continuously monitors consumer energy usage and allow power utilities to provide dynamic feedback to curtail peak power load. Smart Grid infrastructure being deployed globally needs scalable software platforms to rapidly integrate and analyze information streaming from millions of smart meters, forecast power usage and respond to operational events. Cloud platforms are well suited to support such data and compute intensive, always-on applications. We examine opportunities and challenges of using cloud platforms for such applications in the emerging domain of energy informatics.
    BibTeX:
    @inproceedings{Simmhan:cloudcom:2010,
      author = {Yogesh Simmhan and Michail Giakkoupis and Baohua Cao and Viktor K. Prasanna},
      title = {On Using Cloud Platforms in a Software Architecture for Smart Energy Grids},
      booktitle = {International Conference on Cloud Computing Technology and Science (CloudCom)},
      publisher = {IEEE},
      year = {2010},
      note = {Poster [CORE C]},
      url = {http://salsahpc.indiana.edu/CloudCom2010/EPoster/cloudcom2010_submission_269.pdf}
    }
    					
    Simmhan:ipaw:2010 Simmhan, Y. & Gomadam, K. McGuinness, D.; Michaelis, J. & Moreau, L. (Hrsg.)
    Social Web-Scale Provenance in the Cloud
    2010
    Vol. 6378 International Provenance and Annotation Workshop (IPAW) , pp. 298-300  
    inproceedings msr, provenance, social network, cloud, poster, peer reviewed, short paper
    Abstract: The lower barrier to entry for users to create and share resources through applications like Facebook and Twitter, and the commoditization of social Web data has heightened issues of privacy, attribution, and copyright. These make it important to track the provenance of social Web data. We outline and discuss key engineering, privacy, and monetization challenges in collecting and analyzing provenance of social Web resources.
    BibTeX:
    @inproceedings{Simmhan:ipaw:2010,
      author = {Yogesh Simmhan and Karthik Gomadam},
      title = {Social Web-Scale Provenance in the Cloud},
      booktitle = {International Provenance and Annotation Workshop (IPAW)},
      publisher = {Springer Berlin / Heidelberg},
      year = {2010},
      volume = {6378},
      pages = {298-300},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-ipaw-2010.pdf},
      doi = {http://doi.org/10.1007/978-3-642-17819-1_39}
    }
    					
    Simmhan:sciencecloud:2010 Simmhan, Y. & Ramakrishnan, L.
    Comparison of resource platform selection approaches for scientific workflows
    2010 International Workshop on Scientific Cloud Computing (ScienceCloud) , pp. 445-450   inproceedings msr, cloud, escience, hpc, resource management, workflows, azure, scheduling, peer reviewed, short paper
    Abstract: Cloud computing is increasingly considered as an additional computational resource platform for scientific workflows. The cloud offers opportunity to scale-out applications from desktops and local cluster resources. Each platform has different properties (e.g., queue wait times in high performance systems, virtual machine startup overhead in clouds) and characteristics (e.g., custom environments in cloud) that makes choosing from these diverse resource platforms for a workflow execution a challenge for scientists. Scientists are often faced with deciding resource platform selection trade-offs with limited information on the actual workflows. While many workflow planning methods have explored resource selection or task scheduling, these methods often require fine-scale characterization of the workflow that is onerous for a scientist. In this paper, we describe our early exploratory work in using blackbox characteristics for a cost-benefit analysis of using different resource platforms. In our blackbox method, we use only limited high-level information on the workflow length, width, and data sizes. The length and width are indicative of the workflow duration and parallelism. We compare the effectiveness of this approach to other resource selection models using two exemplar scientific workflows on desktop, local cluster, HPC center, and cloud platforms. Early results suggest that the blackbox model often makes the same resource selections as a more fine-grained whitebox model. We believe the simplicity of the blackbox model can help inform a scientist on the applicability of a new resource platform, such as cloud resources, even before porting an existing workflow.
    BibTeX:
    @inproceedings{Simmhan:sciencecloud:2010,
      author = {Yogesh Simmhan and Lavanya Ramakrishnan},
      title = {Comparison of resource platform selection approaches for scientific workflows},
      booktitle = {International Workshop on Scientific Cloud Computing (ScienceCloud)},
      publisher = {ACM},
      year = {2010},
      pages = {445-450},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-sciencecloud-2010.pdf},
      doi = {http://doi.org/10.1145/1851476.1851541}
    }
    					
    Simmhan:socalsgs:2010 Simmhan, Y.; Aman, S.; Cao, B.; Giakkoupis, M.; Kumbhare, A.; Zhou, Q.; Gomadam, K. & Prasanna, V.K.
    Scalable, Secure Energy Information Management for DR Analysis
    2010 Southern California Smart Grid Research Symposium (SoCalSGS)   inproceedings usc, energy informatics, smart grid, cloud, streaming, security, privacy, complex event processing, machine learning, poster
    Abstract: The advent and growth of smart energy grids is increasing the ability to monitor and communicate power supply, pricing, and demand among utility providers and consumers. While the smart meter infrastructure is expanding at a rapid rate to enable communication using emerging standards, the software architecture to collect, manage, analyze, scale, and secure the information constantly streaming from the grid is still being designed. Effective integration, analysis, and feedback of energy information are essential for the benefits of smart grid to propagate to the various stakeholders: power utilities, residential, commercial, and institutional consumers, and service and application providers. This can lead to a lower peak demand on utilities to ensure regular supply of quality power, reduced consumption, and costs for consumers by making them aware of and giving them control over their power profile, and help organizations to plan and optimize energy usage to meet sustainability goals. Managing the energy information lifecycle – from the events streaming from smart meters through the smart grid, to meaningful analysis and feedback to utilities and consumers – presents several opportunities for software systems research identified below.
    BibTeX:
    @inproceedings{Simmhan:socalsgs:2010,
      author = {Yogesh Simmhan and Saima Aman and Baohua Cao and Michail Giakkoupis and Alok Kumbhare and Qunzhi Zhou and Karthik Gomadam and Viktor K. Prasanna},
      title = {Scalable, Secure Energy Information Management for DR Analysis},
      booktitle = {Southern California Smart Grid Research Symposium (SoCalSGS)},
      publisher = {University of Southern California},
      year = {2010},
      note = {Poster},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-socalsgs-2010.pdf}
    }
    					
    Simmhan:works:2010 Simmhan, Y.; Soroush, E.; van Ingen, C.; Agarwal, D. & Ramakrishnan, L.
    BReW: Blackbox resource selection for e-Science workflows
    2010 Workshop on Workflows in Support of Large-Scale Science (WORKS) , pp. 1-10   inproceedings msr, escience, workflow, cloud, scheduling, peer reviewed
    Abstract: Workflows are commonly used to model data intensive scientific analysis. As computational resource needs increase for eScience, emerging platforms like clouds present additional resource choices for scientists and policy makers. We introduce BReW, a tool enables users to make rapid, highlevel platform selection for their workflows using limited workflow knowledge. This helps make informed decisions on whether to port a workflow to a new platform. Our analysis of synthetic and real eScience workflows shows that using just total runtime length, maximum task fanout, and total data used and produced by the workflow, BReW can provide platform predictions comparable to whitebox models with detailed workflow knowledge.
    BibTeX:
    @inproceedings{Simmhan:works:2010,
      author = {Yogesh Simmhan and Emad Soroush and Catharine van Ingen and Deb Agarwal and Lavanya Ramakrishnan},
      title = {BReW: Blackbox resource selection for e-Science workflows},
      booktitle = {Workshop on Workflows in Support of Large-Scale Science (WORKS)},
      publisher = {IEEE},
      year = {2010},
      pages = {1-10},
      url = {http://ceng.usc.edu/ simmhan/pubs/simmhan-works-2010.pdf},
      doi = {http://doi.org/10.1109/WORKS.2010.5671857}
    }
    					
    Zhou:ef:2010 Zhou, Q.; Simmhan, Y. & Prasanna, V.K.
    Semantic Complex Event Processing for Smart Grid Information Integration and Management
    2010 Energy Forum: A System Approach Toward Green Energy Production and Adaptive Power Distribution   inproceedings usc, energy informatics, smart grid, semantic web, complex event processing, poster
    BibTeX:
    @inproceedings{Zhou:ef:2010,
      author = {Qunzhi Zhou and Yogesh Simmhan and Viktor K. Prasanna},
      title = {Semantic Complex Event Processing for Smart Grid Information Integration and Management},
      booktitle = {Energy Forum: A System Approach Toward Green Energy Production and Adaptive Power Distribution},
      publisher = {IEEE Coastal Los Angeles Section},
      year = {2010},
      note = {Poster},
      url = {http://ceng.usc.edu/ simmhan/pubs/zhou-ef-2010.pdf}
    }
    					
    Zinn:works:2010 Zinn, D.; Hart, Q.; Ludascher, B. & Simmhan, Y.
    Streaming satellite data to cloud workflows for on-demand computing of environmental data products
    2010 Workshop on Workflows in Support of Large-Scale Science (WORKS) , pp. 1-8   inproceedings usc, streaming, workflow, cloud, escience, peer reviewed
    Abstract: Environmental data arriving constantly from satellites and weather stations are used to compute weather coefficients that are essential for agriculture and viticulture. For example, the reference evapotranspiration (ET0) coefficient, overlaid on regional maps, is provided each day by the California Department of Water Resources to local farmers and turf managers to plan daily water use. Scaling out single-processor compute/data intensive applications operating on realtime data to support more users and higher-resolution data poses data engineering challenges. Cloud computing helps data providers expand resource capacity to meet growing needs besides supporting scientific needs like reprocessing historic data using new models. In this article, we examine migration of a legacy script used for daily ET0 computation by CIMIS to a workflow model that eases deployment to and scaling on the Windows Azure Cloud. Our architecture incorporates a direct streaming model into Cloud virtual machines (VMs) that improves the performance by 130% to 160% for our workflow over using Cloud storage for data staging, used commonly. The streaming workflows achieve runtimes comparable to desktop execution for single VMs and a linear speed-up when using multiple VMs, thus allowing computation of environmental coefficients at a much larger resolution than done presently.
    BibTeX:
    @inproceedings{Zinn:works:2010,
      author = {Daniel Zinn and Quinn Hart and Bertram Ludascher and Yogesh Simmhan},
      title = {Streaming satellite data to cloud workflows for on-demand computing of environmental data products},
      booktitle = {Workshop on Workflows in Support of Large-Scale Science (WORKS)},
      publisher = {IEEE},
      year = {2010},
      pages = {1-8},
      url = {http://ceng.usc.edu/ simmhan/pubs/zinn-works-2010.pdf},
      doi = {http://doi.org/10.1109/WORKS.2010.5671841}
    }
    					
    Prasanna:nsfs2i2:2010 Prasanna, V.K.; Bader, D.A.; Aluru, S.; Athanas, P.; Balaji, P.; Biros, G.; Brewer, T.; Brower, R.; Casselman, S.; Chien, A.; Crago, S.; French, M.; Gokhale, M.; Gomadam, K.; Guo, Z.; Gupta, A.; Hollingsworth, J.; Kindratenko, V.; Mahoney, M.; Mertoguno, S.; Meyerhenke, H.; Nomura, K.-i.; Riedy, J.; Simmhan, Y.; Strenski, D.; Sundararajan, P.; Vuduc, R.; Walker, R. & Waltzman, R. Prasanna, V.K. & Bader, D.A. (Hrsg.)
    Report on NSF Workshop on Center Scale Activities Related to Accelerators for Data Intensive Applications
    2010 School: University of Southern California   techreport usc, cloud, workshop
    BibTeX:
    @techreport{Prasanna:nsfs2i2:2010,
      author = {Viktor K. Prasanna and David A. Bader and Srinivas Aluru and Peter Athanas and Pavan Balaji and George Biros and Tony Brewer and Richard Brower and Steve Casselman and Andrew Chien and Steve Crago and Matt French and Maya Gokhale and Karthik Gomadam and Zhi Guo and Anshul Gupta and Jeff Hollingsworth and Volodymyr Kindratenko and Michael Mahoney and Sukarno Mertoguno and Henning Meyerhenke and Ken-ichi Nomura and Jason Riedy and Yogesh Simmhan and David Strenski and Prasanna Sundararajan and Rich Vuduc and Ross Walker and Rand Waltzman},
      title = {Report on NSF Workshop on Center Scale Activities Related to Accelerators for Data Intensive Applications},
      school = {University of Southern California},
      year = {2010},
      url = {http://ceng.usc.edu/ simmhan/pubs/prasanna-nsfs2i2-2010.pdf}
    }
    					
    Zinn:ucdcstr:2010 Zinn, D.; Hart, Q.; McPhillips, T.; Ludascher, B.; Simmhan, Y.; Giakkoupis, M. & Prasanna, V.K.
    Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms
    2010 (CSE-2010-23) School: Computer Science Department, UC Davis   techreport usc, streaming, workflow, cloud, escience
    Abstract: Scientific workflows are commonplace in eScience applications. Yet, the lack of integrated support for data models, including streaming data, structured collections and files, is limiting the ability of workflows to support emerging applications in energy informatics that are stream oriented. This is compounded by the absence of Cloud data services that support reliable and performant streams. In this paper, we propose and present a scientific workflow framework that supports streams as firstclass data, and is optimized for performant and reliable execution across desktop and Cloud platforms. The workflow framework features and its empirical evaluation on the Eucalyptus cloud are presented.
    BibTeX:
    @techreport{Zinn:ucdcstr:2010,
      author = {Daniel Zinn and Quinn Hart and Timothy McPhillips and Bertram Ludascher and Yogesh Simmhan and Michail Giakkoupis and and Viktor K. Prasanna},
      title = {Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms},
      school = {Computer Science Department, UC Davis},
      year = {2010},
      number = {CSE-2010-23},
      note = {Extended version of CCGrid 2011},
      url = {http://www.cs.ucdavis.edu/research/tech-reports/2010/CSE-2010-23.pdf}
    }
    					
    Cao:swf:2009 Cao, B.; Plale, B.; Subramanian, G.; Robertson, E. & Simmhan, Y.
    Provenance Information Model of Karma Version 3
    2009 International Workshop on Scientific Workflows (SWF) , pp. 348-351   inproceedings msr, karma, provenance, workflow, peer reviewed
    Abstract: Provenance that captures e-Science activity has long term value only if the right amount and kind of information is collected. In this paper, we propose a two-layer model for representing provenance information capable of representing both execution information and higher level process details. The information model forms the basis for efficient relational database storage and query, and sets the stage for investigation of the necessary and sufficient information for long-term preservation.
    BibTeX:
    @inproceedings{Cao:swf:2009,
      author = {Bin Cao and Beth Plale and Girish Subramanian and Ed Robertson and Yogesh Simmhan},
      title = {Provenance Information Model of Karma Version 3},
      booktitle = {International Workshop on Scientific Workflows (SWF)},
      publisher = {IEEE},
      year = {2009},
      pages = {348-351},
      doi = {http://doi.org/10.1109/SERVICES-I.2009.54}
    }
    					
    Cao:swpm:2009 Cao, B.; Plale, B.; Subramanian, G.; Missier, P.; Goble, C. & Simmhan, Y. Freire, J.; Missier, P. & Sahoo, S.S. (Hrsg.)
    Semantically Annotated Provenance in the Life Science Grid
    2009
    Vol. 526 International Workshop on the role of Semantic Web in Provenance Management (SWPM)  
    inproceedings msr, provenance, karma, lsg, semantic web, life sciences, escience, peer reviewed
    Abstract: Selected semantic annotation on raw provenance data can help bridge the gap between low level provenance events (e.g., service invocations, data creation, message passing) and the high-level view that the user has of his/her investigation (e.g., data retrieval and analysis). In this initial investigation we added semantically annotated provenance to the Life Science Grid, a cyber-infrastructure framework supporting interactive data exploration and automated data analysis tools, through (i) automated data provenance collection and (ii) automated semantic enrichment of the collected provenance metadata. We use a paradigmatic life sciences use case of interactive data exploration to show that semantically annotated provenance can help users recognize the occurrence of specific patterns of investigation from an otherwise low-level sequence of elementary interaction events.
    BibTeX:
    @inproceedings{Cao:swpm:2009,
      author = {Bin Cao and Beth Plale and Girish Subramanian and Paolo Missier and Carole Goble and Yogesh Simmhan},
      title = {Semantically Annotated Provenance in the Life Science Grid},
      booktitle = {International Workshop on the role of Semantic Web in Provenance Management (SWPM)},
      publisher = {CEUR-WS.org},
      year = {2009},
      volume = {526},
      url = {http://ceur-ws.org/Vol-526/paper_5.pdf}
    }
    					
    Simmhan:advcomp:2009 Simmhan, Y.; Barga, R.; van Ingen, C.; Lazowska, E. & Szalay, A.
    Building the Trident Scientific Workflow Workbench for Data Management in the Cloud
    2009 Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP) , pp. 41-50   inproceedings msr, workflows, escience, data management, cloud, hpc, trident, panstarrs, peer reviewed
    Abstract: Scientific workflows have gained popularity for modeling and executing in silico experiments by scientists for problem-solving. These workflows primarily engage in computation and data transformation tasks to perform scientific analysis in the Science Cloud. Increasingly workflows are gaining use in managing the scientific data when they arrive from external sensors and are prepared for becoming science ready and available for use in the Cloud. While not directly part of the scientific analysis, these workflows operating behind the Cloud on behalf of the -data valetsᅢツᅡ﾿ play an important role in end-to-end management of scientific data products. They share several features with traditional scientific workflows: both are data intensive and use Cloud resources. However, they also differ in significant respects, for example, in the reliability required, scheduling constraints and the use of provenance collected. In this article, we investigate these two classes of workflows - Science Application workflows and Data Preparation workflows - and use these to drive common and distinct requirements from workflow systems for eScience in the Cloud. We use workflow examples from two collaborations, the NEPTUNE oceanography project and the Pan-STARRS astronomy project, to draw out our comparison. Our analysis of these workflows classes can guide the evolution of workflow systems to support emerging applications in the Cloud and the Trident Scientific Workbench is one such workflow system that has directly benefitted from this to meet the needs of these two eScience projects.
    BibTeX:
    @inproceedings{Simmhan:advcomp:2009,
      author = {Yogesh Simmhan and Roger Barga and Catharine van Ingen and Ed Lazowska and Alex Szalay},
      title = {Building the Trident Scientific Workflow Workbench for Data Management in the Cloud},
      booktitle = {Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP)},
      publisher = {IEEE},
      year = {2009},
      pages = {41-50},
      doi = {http://doi.org/10.1109/ADVCOMP.2009.14}
    }
    					
    Simmhan:escience:2009 Simmhan, Y.; van Ingen, C.; Szalay, A.; Barga, R. & Heasley, J.
    Building Reliable Data Pipelines for Managing Community Data Using Scientific Workflows
    2009 International Conference on eScience (eScience) , pp. 321-328   inproceedings msr, workflows, data management, cloud, panstarrs, escience, peer reviewed
    Abstract: The growing amount of scientific data from sensors and field observations is posing a challenge to ᅢツᅡ﾿data valetsᅢツᅡ﾿ responsible for managing them in data repositories. These repositories built on commodity clusters need to reliably ingest data continuously and ensure its availability to a wide user community. Workflows provide several benefits to modeling data-intensive science applications and many of these benefits can help manage the data ingest pipelines too. But using workflows is not panacea in itself and data valets need to consider several issues when designing workflows that behave reliably on fault prone hardware while retaining the consistency of the scientific data. In this paper, we propose workflow designs for reliable data ingest in a distributed environment and identify workflow framework features to support resilience. We illustrate these using the data pipeline for the Pan-STARRS repository, one of the largest digital surveys that accumulates 100TB of data annually to support 300 astronomers.
    BibTeX:
    @inproceedings{Simmhan:escience:2009,
      author = {Yogesh Simmhan and Catharine van Ingen and Alex Szalay and Roger Barga and Jim Heasley},
      title = {Building Reliable Data Pipelines for Managing Community Data Using Scientific Workflows},
      booktitle = {International Conference on eScience (eScience)},
      publisher = {IEEE},
      year = {2009},
      pages = {321-328},
      note = {[CORE A]},
      doi = {http://doi.org/10.1109/e-Science.2009.52}
    }
    					
    Simmhan:hicss:2009 Simmhan, Y.; Barga, R.; van Ingen, C.; Nieto-Santisteban, M.; Dobos, L.; Li, N.; Shipway, M.; Szalay, A.S.; Werner, S. & Heasley, J.
    GrayWulf: Scalable Software Architecture for Data Intensive Computing
    2009 Hawaii International Conference on System Sciences (HICSS) , pp. 1-10   inproceedings msr, workflows, escience, data management, cloud, hpc, trident, graywulf, panstarrs, peer reviewed
    Abstract: Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data intensive computing with petabyte data sets, named GrayWulf. These services are intended for deployment on a cluster of commodity servers similar to the well-known Beowulf clusters. We use the Pan-STARRS system currently under development as an example of the architecture and principles in action.
    BibTeX:
    @inproceedings{Simmhan:hicss:2009,
      author = {Yogesh Simmhan and Roger Barga and Catharine van Ingen and Maria Nieto-Santisteban and Lazslo Dobos and Nolan Li and Michael Shipway and Alexander S. Szalay and Sue Werner and Jim Heasley},
      title = {GrayWulf: Scalable Software Architecture for Data Intensive Computing},
      booktitle = {Hawaii International Conference on System Sciences (HICSS)},
      publisher = {IEEE},
      year = {2009},
      pages = {1-10},
      note = {[CORE A]},
      doi = {http://doi.org/10.1109/HICSS.2009.235}
    }
    					
    Subramanian:msrescience:2009 Subramanian, G. & Simmhan, Y.
    Tools for Genome Haplotyping in the Windows Azure Cloud
    2009 Microsoft Research eScience Workshop   inproceedings msr, cloud, azure, generic worker, escience, talk
    Abstract: With the increasing throughput of Next Generation DNA sequencing machines, it has become important to come up with efficient ways of processing the sequence data and producing assembled whole human genome sequences for research and diagnostic purposes. In this paper, we describe our efforts in scaling HapCUT, a haplotype phasing from UCSD, using a parallel implementation that runs on Windows Azure Cloud infrastructure. One of our novel contributions is a tool to reduce the effort required to port, deploy and execute existing methods in a .NET library or a Windows executable within the cloud; we use this tool to run the extant phasing libraries within Azure. We are currently conducting experiments to study the performance implications and advantages of running the haplotype phasing on the cloud as compared to a local Windows HPC cluster.
    BibTeX:
    @inproceedings{Subramanian:msrescience:2009,
      author = {Girish Subramanian and Yogesh Simmhan},
      title = {Tools for Genome Haplotyping in the Windows Azure Cloud},
      booktitle = {Microsoft Research eScience Workshop},
      publisher = {Microsoft},
      year = {2009},
      note = {Talk},
      url = {http://research.microsoft.com/en-us/UM/redmond/events/eScience2009/17830/lecture.htm}
    }
    					
    Simmhan:msrtr:2009 Simmhan, Y.; van Ingen, C.; Barga, R.; Szalay, A. & Heasley, J.
    Reliable Management of Community Data Pipelines using Scientific Workflows
    2009 (MSR-TR-2009-125) School: Microsoft Research   techreport msr, workflows, data management, cloud, panstarrs, escience
    Abstract: The pervasive availability of scientific data from sensors and field observations is posing a challenge to data valets responsible for accumulating and managing them in data repositories. Science collaborations, big and small, are standing up repositories built on commodity clusters need to reliably ingest data constantly and ensure its availability to a wide user community. Workflows provide several benefits to model data-intensive science applications and many of these benefits can be transmitted effectively to manage the data ingest pipelines. But using workflows is not panacea in itself and data valets need to consider several issues when designing workflows that behave reliably on fault prone hardware while retaining the consistency of the scientific data, and when selecting workflow frameworks that support these requirements. In this paper, we propose workflow design models for reliable data ingest in a distributed environment and identify workflow framework features to support resilience. We illustrate these using the data ingest pipeline for the Pan-STARRS sky survey, one of the largest digital surveys that accumulates 100TB of data annually, where these concepts are applied.
    BibTeX:
    @techreport{Simmhan:msrtr:2009,
      author = {Yogesh Simmhan and Catharine van Ingen and Roger Barga and Alex Szalay and Jim Heasley},
      title = {Reliable Management of Community Data Pipelines using Scientific Workflows},
      school = {Microsoft Research},
      year = {2009},
      number = {MSR-TR-2009-125},
      note = {Extended version of IEEE eScience 2009},
      url = {http://research.microsoft.com/apps/pubs/default.aspx?id=102521}
    }
    					
    Simmhan:msrtr:2009a Simmhan, Y.; van Ingen, C.; Subramanian, G. & Li, J.
    Bridging the Gap between the Cloud and an eScience Application Platform
    2009 (MSR-TR-2009-2021) School: Microsoft Research   techreport msr, cloud, workflow, escience, generic worker, genomics
    Abstract: The widely discussed scientific data deluge creates not only a need to computationally scale an application from a local desktop or cluster to a supercomputer, but also the need to cope with variable data loads over time. Cloud computing offers a scalable, economic, on-demand model well matched to the evolving eScience needs. Yet cloud computing creates gaps that must be crossed to move science applications to the cloud. In this article, we propose a Generic Worker framework to deploy and invoke science applications in the Cloud with minimal user effort and predictable, cost-effective performance. Our framework is an evolution of Grid computing application factory pattern and addresses the distinct challenges posed by the Cloud such as efficient data transfers to and from the Cloud, and the transient nature of its VMs. We present an implementation of the Generic Worker for the Microsoft Azure Cloud and evaluate its use in a genome sequencing application pipeline. Our results show that the user overhead to port and run the application seamlessly across desktop and the Cloud can be substantially reduced without significant performance penalties, while providing on-demand scalability.
    BibTeX:
    @techreport{Simmhan:msrtr:2009a,
      author = {Yogesh Simmhan and Catharine van Ingen and Girish Subramanian and Jie Li},
      title = {Bridging the Gap between the Cloud and an eScience Application Platform},
      school = {Microsoft Research},
      year = {2009},
      number = {MSR-TR-2009-2021},
      note = {Extended version of IEEE Cloud 2010},
      url = {http://research.microsoft.com/apps/pubs/default.aspx?id=118329}
    }
    					
    Moreau:cpe:2008 Moreau, L.; Ludäscher, B.; Altintas, I.; Barga, R.S.; Bowers, S.; Callahan, S.; George Chin, J.; Clifford, B.; Cohen, S.; Cohen-Boulakia, S.; Davidson, S.; Deelman, E.; Digiampietri, L.; Foster, I.; Freire, J.; Frew, J.; Futrelle, J.; Gibson, T.; Gil, Y.; Goble, C.; Golbeck, J.; Groth, P.; Holland, D.A.; Jiang, S.; Kim, J.; Koop, D.; Krenek, A.; McPhillips, T.; Mehta, G.; Miles, S.; Metzger, D.; Munroe, S.; Myers, J.; Plale, B.; Podhorszki, N.; Ratnakar, V.; Santos, E.; Scheidegger, C.; Schuchardt, K.; Seltzer, M.; Simmhan, Y.L.; Silva, C.; Slaughter, P.; Stephan, E.; Stevens, R.; Turi, D.; Vo, H.; Wilde, M.; Zhao, J. & Zhao, Y.
    Special Issue: The First Provenance Challenge
    2008 Concurrency and Computation: Practice & Experience, Special Issue on The First Provenance Challenge
    Vol. 20 , pp. 409-418  
    article iu, provenance, provenance challenge
    Abstract: The first Provenance Challenge was set up in order to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a functional magnetic resonance imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarize the participants' contributions. Copyright © 2007 John Wiley & Sons, Ltd.
    BibTeX:
    @article{Moreau:cpe:2008,
      author = {Luc Moreau and Bertram Ludäscher and Ilkay Altintas and Roger S. Barga and Shawn Bowers and Steven Callahan and George Chin, Jr. and Ben Clifford and Shirley Cohen and Sarah Cohen-Boulakia and Susan Davidson and Ewa Deelman and Luciano Digiampietri and Ian Foster and Juliana Freire and James Frew and Joe Futrelle and Tara Gibson and Yolanda Gil and Carole Goble and Jennifer Golbeck and Paul Groth and David A. Holland and Sheng Jiang and Jihie Kim and David Koop and Ales Krenek and Timothy McPhillips and Gaurang Mehta and Simon Miles and Dominic Metzger and Steve Munroe and Jim Myers and Beth Plale and Norbert Podhorszki and Varun Ratnakar and Emanuele Santos and Carlos Scheidegger and Karen Schuchardt and Margo Seltzer and Yogesh L. Simmhan and Claudio Silva and Peter Slaughter and Eric Stephan and Robert Stevens and Daniele Turi and Huy Vo and Mike Wilde and Jun Zhao and Yong Zhao},
      title = {Special Issue: The First Provenance Challenge},
      journal = {Concurrency and Computation: Practice & Experience, Special Issue on The First Provenance Challenge},
      publisher = {John Wiley and Sons Ltd.},
      year = {2008},
      volume = {20},
      pages = {409-418},
      note = {[CORE A]},
      doi = {http://doi.org/10.1002/cpe.v20:5}
    }
    					
    Simmhan:cpe:2008 Simmhan, Y.L.; Plale, B. & Gannon, D.
    Query capabilities of the Karma provenance framework
    2008 Concurrency and Computation: Practice & Experience, Special Issue on The First Provenance Challenge
    Vol. 20 , pp. 441-451  
    article iu, provenance, data provenance, process provenance, provenance queries, workflows, karma, escience, provenance challenge, peer reviewed
    Abstract: Provenance metadata in e-Science captures the derivation history of data products generated from scientific workflows. Provenance forms a glue linking workflow execution with associated data products, and finds use in determining the quality of derived data, tracking resource usage, and for verifying and validating scientific experiments. In this article, we discuss the scope of provenance collected in the Karma provenance framework used in the LEAD Cyberinfrastructure project, distinguishing provenance metadata from generic annotations. We further describe our approaches to querying for different forms of provenance in Karma in the context of queries in the first provenance challenge. We use an incremental, building-block method to construct provenance queries based on the fundamental querying capabilities provided by the Karma service centered on the provenance data model. This has the advantage of keeping the Karma service generic and simple, and yet supports a wide range of queries. Karma successfully answers all but one challenge query. Copyright © 2007 John Wiley & Sons, Ltd.
    BibTeX:
    @article{Simmhan:cpe:2008,
      author = {Yogesh L. Simmhan and Beth Plale and Dennis Gannon},
      title = {Query capabilities of the Karma provenance framework},
      journal = {Concurrency and Computation: Practice & Experience, Special Issue on The First Provenance Challenge},
      publisher = {John Wiley and Sons Ltd.},
      year = {2008},
      volume = {20},
      pages = {441--451},
      note = {[IF 0.636, CORE A]},
      doi = {http://doi.org/10.1002/cpe.v20:5}
    }
    					
    Simmhan:ijwsr:2008 Simmhan, Y.L.; Plale, B. & Gannon, D.
    Karma2: Provenance Management for Data-Driven Workflows
    2008 International Journal of Web Services Research (IJWSR)
    Vol. 5 (2) , pp. 1-22  
    article msr, provenance, karma, workflow, escience, peer reviewed
    Abstract: The increasing ability for the sciences to sense the world around us is resulting in a growing need for datadriven e-Science applications that are under the control of workflows composed of services on the Grid. The focus of our work is on provenance collection for these workflows that are necessary to validate the work-flow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework is based on generating discrete provenance activities during the lifecycle of a workflow execution that can be aggregated to form complex data and process provenance graphs that can span across workflows. The implementation uses a loosely coupled publish-subscribe architecture for propagating these activities, and the capabilities of the system satisfy the needs of detailed provenance collection. A performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight-service workflow using 271 data products).
    BibTeX:
    @article{Simmhan:ijwsr:2008,
      author = {Yogesh L. Simmhan and Beth Plale and Dennis Gannon},
      title = {Karma2: Provenance Management for Data-Driven Workflows},
      journal = {International Journal of Web Services Research (IJWSR)},
      publisher = {IGI Publishing},
      year = {2008},
      volume = {5},
      number = {2},
      pages = {1--22},
      note = {[IF 0.371, CORE C]},
      doi = {http://doi.org/10.4018/jwsr.2008040101}
    }
    					
    Gannon:hpcbook:2008 Gannon, D.; Plale, B.; Christie, M.; Huang, Y.; Jensen, S.; Liu, N.; Marru, S.; Pallickara, S.; Perera, S.; Shirasuna, S.; Simmhan, Y.; Slominski, A.; Sun, Y. & Vijayakumar, N. Grandinetti, L. (Hrsg.)
    Building Grid Portals for e-Science: A Service Oriented Architecture ( High Performance Computing and Grids in Action )
    2008 High Performance Computing and Grids in Action
    Vol. 16 , pp. 149-166  
    inbook iu,escience, portal, web service, lead, peer reviewed
    Abstract: Grids are built by communities who need a shared cyberinfrastructure to make progress on the critical problems they are currently confronting. An e-science portal is a conventional Web portal that sits on top of a rich collection of web services that allow a community of users access to shared data and application resources without exposing them to the details of Grid computing. In this chapter we describe a service-oriented architecture to support this type of portal.
    BibTeX:
    @inbook{Gannon:hpcbook:2008,
      author = {Dennis Gannon and Beth Plale and Marcus Christie and Yi Huang and Scott Jensen and Ning Liu and Suresh Marru and Sangmi Pallickara and Srinath Perera and Satoshi Shirasuna and Yogesh Simmhan and Aleksander Slominski and Yiming Sun and Nithya Vijayakumar},
      title = {High Performance Computing and Grids in Action},
      publisher = {IOS Press},
      year = {2008},
      volume = {16},
      pages = {149--166},
      url = {http://www.booksonline.iospress.nl/Content/View.aspx?piid=8567}
    }
    					
    Barga:clade:2008 Barga, R.S.; Fay, D.; Guo, D.; Newhouse, S.; Simmhan, Y. & Szalay, A.
    Efficient scheduling of scientific workflows in a high performance computing cluster
    2008 International Workshop on Challenges of Large Applications in Distributed Environments (CLADE) , pp. 63-68   inproceedings msr, data intensive, escience, scheduling, workflow, hpc, peer reviewed
    Abstract: The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100+ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exemplar cluster for data intensive applications using SQL Server and HPC Clusters. One of the key software components of GrayWulf is Trident, a scientific workflow workbench that performs automatic scheduling of workflows across the cluster. We examine the challenges of scheduling workflows on GrayWulf, algorithms to improve performance, and present early results from applying Trident to schedule data loading workflows on GrayWulf for an actual e-Science project
    BibTeX:
    @inproceedings{Barga:clade:2008,
      author = {Roger S. Barga and Dan Fay and Dean Guo and Steven Newhouse and Yogesh Simmhan and Alex Szalay},
      title = {Efficient scheduling of scientific workflows in a high performance computing cluster},
      booktitle = {International Workshop on Challenges of Large Applications in Distributed Environments (CLADE)},
      publisher = {ACM},
      year = {2008},
      pages = {63-68},
      note = {[CORE C]},
      doi = {http://doi.org/10.1145/1383529.1383545}
    }
    					
    Barga:escience:2008 Barga, R.; Jackson, J.; Araujo, N.; Guo, D.; Gautam, N. & Simmhan, Y.
    The Trident Scientific Workflow Workbench
    2008 International Conference on eScience (eScience) , pp. 317-318   inproceedings msr, workflows, escience, trident, panstarrs, neptune, demo, peer reviewed
    Abstract: In our demonstration we present Trident, a scientific workflow workbench built on top of a commercial workflow system to leverage existing functionality to the extent possible. Trident is being developed in collaboration with the scientific computing community for use in a number of ongoing eScience projects that make use of scientific workflows, in particular the Pan-STARRS sky survey project and the Ocean Observatory Initiative. In our demonstration of Trident we will illustrate the ability to utilize both local and cloud resources for storage and execution, as well as services such as provenance, monitoring, logging and scheduling workflows over clusters. Our goal is to release Trident in early 2009 as an open source accelerator for others to use for eScience projects and to continue extending with support for new workflow features and services.
    BibTeX:
    @inproceedings{Barga:escience:2008,
      author = {Roger Barga and Jared Jackson and Nelson Araujo and Dean Guo and Nitin Gautam and Yogesh Simmhan},
      title = {The Trident Scientific Workflow Workbench},
      booktitle = {International Conference on eScience (eScience)},
      publisher = {IEEE},
      year = {2008},
      pages = {317-318},
      note = {Demo [CORE A]},
      doi = {http://doi.org/10.1109/eScience.2008.126}
    }
    					
    Nieto-Santisteban:adass:2008 Nieto-Santisteban, M.A.; Budavari, T.; Dobos, L.; Li, N.; Shipway, M.; Szalay, A.; Thakar, A.; Werner, S.; Wilton, R.; Simmhan, Y.; van Ingen, C.; Heasley, J. & Holmberg, C.
    GrayWulf: Conquering Astronomical Databases
    2008 Astronomical Data Analysis Software and Systems (ADASS)   inproceedings msr, panstarrs, escience, data management, workflow, talk
    Abstract: Astronomy is posing imminent big challenges to database management systems. Projects such as Pan-STARRS will build a 300 TB database system by year 2011. In order to achieve such as an ambitious goal, we must divide to conquer. We present the GrayWulf framework where computational and data resources are integrated through powerful workflow, and query tools. The system is built on top of a cluster of commodity servers. Its scalable architecture makes it a great host for data intensive applications such as large scale cross-matching.
    BibTeX:
    @inproceedings{Nieto-Santisteban:adass:2008,
      author = {Maria A. Nieto-Santisteban and Tamas Budavari and Laszlo Dobos and Nolan Li and Michael Shipway and Alexander Szalay and Ani Thakar and Suzanne Werner and Richard Wilton and Yogesh Simmhan and Catharine van Ingen and Jim Heasley and Conrad Holmberg},
      title = {GrayWulf: Conquering Astronomical Databases},
      booktitle = {Astronomical Data Analysis Software and Systems (ADASS)},
      publisher = {Astronomical Society of the Pacific},
      year = {2008},
      note = {Talk},
      url = {http://adass2008.artisan.net/seeabstract.php?id=193}
    }
    					
    Nieto-Santisteban:msrescience:2008 Nieto-Santisteban, M.; Simmhan, Y.; Barga, R.; Dobos, L.; Heasley, J.; Holmberg, C.; Li, N.; Shipway, M.; Szalay, A.S.; van Ingen, C. & Werner, S.
    Pan-STARRS: Learning to Ride the Data Tsunami
    2008 Microsoft Research eScience Workshop   inproceedings msr, panstarrs, escience, data management, trident, workflow, hpc, talk
    Abstract: The Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) is the next generation of digital sky surveys that builds on the success of the Sloan Digital Sky Survey (SDSS) . The Pan-STARRS consortium is centered at the University of Hawai`i, Institute for Astronomy, and includes nine other institutions worldwide. The next generation system leverages SQL Server 2008, Windows Workflow Foundation and the Trident Scientific Workbench. This updated technology is needed to address the much larger data generated by Pan-STARRS and the need to make that data available to astronomers promptly. SDSS released survey data of about 4TB in size every 6 months; PS will have about 30TB of data per year, incrementally updated every week. The currently deployed PS1 telescope is one of the four Pan-STARRS telescopes. PS1 tests the system end to end, including the optics, cameras, image processing algorithms, data loading workflows, user facing databases, and science analysis. The PS1 survey over the next 3.5 years
    BibTeX:
    @inproceedings{Nieto-Santisteban:msrescience:2008,
      author = {Maria Nieto-Santisteban and Yogesh Simmhan and Roger Barga and Laszlo Dobos and Jim Heasley and Conrad Holmberg and Nolan Li and Michael Shipway and Alexander S. Szalay and Catharine van Ingen and Sue Werner},
      title = {Pan-STARRS: Learning to Ride the Data Tsunami},
      booktitle = {Microsoft Research eScience Workshop},
      publisher = {Microsoft},
      year = {2008},
      note = {Talk}
    }
    					
    Simmhan:agu:2008 Simmhan, Y.; Barga, R. & van Ingen, C.
    Automatic Provenance Recording for Scientific Data using Trident
    2008 American Geophysical Union (AGU) Fall Meeting   inproceedings msr, provenance, trident, escience, poster
    Abstract: Provenance is increasingly recognized as being critical to the understanding and reuse of scientific datasets. Given the rapid generation of scientific data from sensors and computational model results, it is not practical to manually record provenance for data and automated techniques for provenance capture are essential. Scientific workflows provide a framework for representing computational models and complex transformations of scientific data, and present a means for tracking the operations performed to derive a dataset. The Trident Scientific Workbench is a workflow system that natively incorporates provenance capture of data derived as part of the workflow execution. The applications used as part of a Trident workflow can execute on remote computational cluster, such as a supercomputing center on in the Cloud, or on the local desktop of the researcher and provenance on data derived by the applications is seamlessly captured. Scientists also have the option to annotate the provenance metadata using domain specific tags, such as, for example, GCMD keywords. The provenance records thus captured can be exported in the Open Provenance Model XML standard that is emerging or visualized as a graph. The Trident system and provenance recorded by it has been successfully applied in the Neptune oceanography project and is presently being tested in the Pan-STARRS astronomy project.
    BibTeX:
    @inproceedings{Simmhan:agu:2008,
      author = {Yogesh Simmhan and Roger Barga and Catharine van Ingen},
      title = {Automatic Provenance Recording for Scientific Data using Trident},
      booktitle = {American Geophysical Union (AGU) Fall Meeting},
      publisher = {AGU},
      year = {2008},
      note = {Poster},
      url = {http://adsabs.harvard.edu/abs/2008AGUFMIN11C1048S}
    }
    					
    Simmhan:escience:2008 Simmhan, Y.; Barga, R.; van Ingen, C.; Lazowska, E. & Szalay, A.
    On Building Scientific Workflow Systems for Data Management in the Cloud
    2008 International Conference on eScience (eScience) , pp. 434-435   inproceedings msr, workflows, escience, data management, cloud, hpc, trident, panstarrs, poster, peer reviewed
    Abstract: Scientific workflows have become an archetype to model in silico experiments in the Cloud by scientists. There is a class of workflows that are used to by "data valets" to prepare raw data from scientific instruments into a science-ready form for use by scientists. These share data-intensive traits with traditional scientific workflows, yet differ significantly, for example, in the required degree of reliability and the type of provenance collected. We compare and contrast science application and data valet workflows through exemplar eScience projects to drive shared and unique requirements for scientific workflows across diverse users in a Science Cloud.
    BibTeX:
    @inproceedings{Simmhan:escience:2008,
      author = {Yogesh Simmhan and Roger Barga and Catharine van Ingen and Ed Lazowska and Alex Szalay},
      title = {On Building Scientific Workflow Systems for Data Management in the Cloud},
      booktitle = {International Conference on eScience (eScience)},
      publisher = {IEEE},
      year = {2008},
      pages = {434-435},
      note = {Poster [CORE A]},
      doi = {http://doi.org/10.1109/eScience.2008.150}
    }
    					
    Simmhan:swf:2008 Simmhan, Y.
    End-to-End Scientific Data Management Using Workflows
    2008 International Workshop on Scientific Workflows (SWF) , pp. 472-473   inproceedings msr, workflows, escience, data management, invited
    Abstract: Workflows have evolved as the natural tool for scientists to model their eScience experiments. With the scientific world producing data at an explosive rate, workflows have an important part to play in the end to end management of scientific data. To illustrate, workflow can help with fault tolerance and ease of administration when ingesting massive quantities of data using commodity hardware. The ability for workflows to automatically collect provenance on derived scientific data improves data discovery and publication capabilities. With better support for interoperating with data centric tools, workflows can become ubiquitous systems for scientific collaboration.
    BibTeX:
    @inproceedings{Simmhan:swf:2008,
      author = {Yogesh Simmhan},
      title = {End-to-End Scientific Data Management Using Workflows},
      booktitle = {International Workshop on Scientific Workflows (SWF)},
      publisher = {IEEE},
      year = {2008},
      pages = {472-473},
      note = {Invited talk},
      doi = {http://doi.org/10.1109/SERVICES-1.2008.22}
    }
    					
    Moreau:sotontr:2008 Moreau, L.; Plale, B.; Miles, S.; Goble, C.; Missier, P.; Barga, R.; Simmhan, Y.; Futrelle, J.; McGrath, R.; Myers, J.; Paulson, P.; Bowers, S.; Ludaescher, B.; Kwasnikowska, N.; den Bussche, J.V.; Ellkvist, T.; Freire, J. & Groth, P.
    The Open Provenance Model (v1.01)
    2008 (16148) School: Intelligence, Agents, Multimedia Group, University of Southampton   techreport msr, opm, provenance, model
    Abstract: In this paper, we introduce the Open Provenance Model, a model for provenance that is designed to meet the following requirements: (1) To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. (2) To allow developers to build and share tools that operate on such a provenance model. (3) To define the model in a precise, technology-agnostic manner. (4) To support a digital representation of provenance for any "thing", whether produced by computer systems or not. (5) To define a core set of rules that identify the valid inferences that can be made on provenance graphs.
    BibTeX:
    @techreport{Moreau:sotontr:2008,
      author = {Luc Moreau and Beth Plale and Simon Miles and Carole Goble and Paolo Missier and Roger Barga and Yogesh Simmhan and Joe Futrelle and Robert McGrath and Jim Myers and Patrick Paulson and Shawn Bowers and Bertram Ludaescher and Natalia Kwasnikowska and Jan Van den Bussche and Tommy Ellkvist and Juliana Freire and Paul Groth},
      title = {The Open Provenance Model (v1.01)},
      school = {Intelligence, Agents, Multimedia Group, University of Southampton},
      year = {2008},
      number = {16148},
      url = {http://eprints.ecs.soton.ac.uk/16148/}
    }
    					
    Simmhan:msrtr:2008 Simmhan, Y.; Barga, R.; van Ingen, C.; Nieto-Santisteban, M.; Dobos, L.; Li, N.; Shipway, M.; Szalay, A.S.; Werner, S. & Heasley, J.
    GrayWulf: Scalable Software Architecture for Data Intensive Computing
    2008 (MSR-TR-2008-186) School: Microsoft Research   techreport msr, workflows, escience, data management, cloud, hpc, trident, graywulf, panstarrs
    Abstract: Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data intensive computing with petabyte data sets, named GrayWulf†. These services are intended for deployment on a cluster of commodity servers similar to the well-known Beowulf clusters. We use the Pan-STARRS system currently under development as an example of the architecture and principles in action.
    BibTeX:
    @techreport{Simmhan:msrtr:2008,
      author = {Yogesh Simmhan and Roger Barga and Catharine van Ingen and Maria Nieto-Santisteban and Lazslo Dobos and Nolan Li and Michael Shipway and Alexander S. Szalay and Sue Werner and Jim Heasley},
      title = {GrayWulf: Scalable Software Architecture for Data Intensive Computing},
      school = {Microsoft Research},
      year = {2008},
      number = {MSR-TR-2008-186},
      note = {Extended version of HICSS 2009},
      url = {http://research.microsoft.com/apps/pubs/default.aspx?id=79430}
    }
    					
    Gannon:wfbook:2007 Gannon, D.; Plale, B.; Marru, S.; Kandaswamy, G.; Simmhan, Y. & Shirasuna, S. Gannon, D.; Deelman, E.; Shields, M. & Taylor, I. (Hrsg.)
    Dynamic, Adaptive Workflows for Mesoscale Meteorology ( Workflows for eScience: Scientific Workflows for Grids )
    2007 Workflows for eScience: Scientific Workflows for Grids , pp. 126-142   inbook iu, workflows, grid, escience, peer reviewed
    Abstract: The Linked Environments for Atmospheric Discovery (LEAD) [122] is a National Science Foundation funded1 project to change the paradigm for mesoscale weather prediction from one of static, fixed-schedule computational forecasts to one that is adaptive and driven by weather events. It is a collaboration of eight institutions,2 led by Kelvin Droegemeier of the University of Oklahoma, with the goal of enabling far more accurate and timely predictions of tornadoes and hurricanes than previously considered possible. The traditional approach to weather prediction is a four-phase activity. In the first phase, data from sensors are collected. The sensors include ground instruments such as humidity and temperature detectors, and lightning strike detectors and atmospheric measurements taken from balloons, commercial aircraft, radars, and satellites. The second phase is data assimilation, in which the gathered data are merged together into a set of consistent initial and boundary conditions for a large simulation. The third phase is the weather prediction, which applies numerical equations to measured conditions in order to project future weather conditions. The final phase is the generation of visual images of the processed data products that are analyzed to make predictions. Each phase of activity is performed by one or more application components.
    BibTeX:
    @inbook{Gannon:wfbook:2007,
      author = {Dennis Gannon and Beth Plale and Suresh Marru and Gopi Kandaswamy and Yogesh Simmhan and Satoshi Shirasuna},
      title = {Workflows for eScience: Scientific Workflows for Grids},
      publisher = {Springer London},
      year = {2007},
      pages = {126--142},
      doi = {http://doi.org/10.1007/978-1-84628-757-2_9}
    }
    					
    Ramakrishnan:iccs:2007 Ramakrishnan, L.; Simmhan, Y. & Plale, B. Shi, Y.; van Albada, G.; Dongarra, J. & Sloot, P. (Hrsg.)
    Realization of Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Four Years Down the Road
    2007
    Vol. 4487 International Conference on Computational Science (ICCS) , pp. 1122-1129  
    inproceedings iu, lead, escience, workflow, peer reviewed
    Abstract: Linked Environments for Atmospheric Discovery (LEAD) is a large-scale cyberinfrastructure effort in support of mesoscale meteorology. One of the primary goals of the infrastructure is support for real-time dynamic, adaptive response to severe weather. In this paper we revisit the conception of dynamic adaptivity as appeared in our 2005 DDDAS workshop paper, and discuss changes since the original conceptualization, and lessons learned in working with a complex service oriented architecture in support of data driven science.
    BibTeX:
    @inproceedings{Ramakrishnan:iccs:2007,
      author = {Ramakrishnan, Lavanya and Simmhan, Yogesh and Plale, Beth},
      title = {Realization of Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Four Years Down the Road},
      booktitle = {International Conference on Computational Science (ICCS)},
      publisher = {Springer Berlin / Heidelberg},
      year = {2007},
      volume = {4487},
      pages = {1122-1129},
      note = {[CORE A]},
      doi = {http://doi.org/10.1007/978-3-540-72584-8_147}
    }
    					
    Simmhan:gbpse:2006 Simmhan, Y.; Pallickara, S.; Vijayakumar, N. & Plale, B. Gaffney, P. & Pool, J. (Hrsg.)
    Data Management in Dynamic Environment-driven Computational Science
    2007
    Vol. 239 Grid-Based Problem Solving Environments , pp. 317-333  
    inproceedings iu, data management, lead, provenance, portal, mylead, karma, calder, escience, peer reviewed
    Abstract: Advances in numerical modeling, computational hardware and problem solving environments have driven the growth of computational science over the past decades. Science gateways, based on service oriented architectures and scientific workflows, provide yet another step in democratizing access to advanced numerical and scientific tools, computational resource and massive data storage, and fostering collaborations. Dynamic, data-driven applications, such as those found in weather forecasting, present interesting challenges to Science Gateways, which are being addressed as part of the LEAD Cyberinfrastructure project. In this article, we discuss three important data related problems faced by such adaptive data-driven environments: managing a user’s personal workspace and metadata on the Grid, tracking the provenance of scientific workflows and data products, and continuous data mining over observational weather data.
    BibTeX:
    @inproceedings{Simmhan:gbpse:2006,
      author = {Yogesh Simmhan and Sangmi Pallickara and Nithya Vijayakumar and Beth Plale},
      title = {Data Management in Dynamic Environment-driven Computational Science},
      booktitle = {Grid-Based Problem Solving Environments},
      publisher = {Springer Boston},
      year = {2007},
      volume = {239},
      pages = {317-333},
      doi = {http://doi.org/10.1007/978-0-387-73659-4_17}
    }
    					
    Simmhan:icws:2006 Simmhan, Y.L.; Plale, B. & Gannon, D.
    A Framework for Collecting Provenance in Data-Centric Scientific Workflows
    2006 International Conference on Web Services (ICWS) , pp. 427-436   inproceedings iu, provenance, escience, karma, workflows, peer reviewed
    Abstract: The increasing ability for the earth sciences to sense the world around us is resulting in a growing need for data-driven applications that are under the control of data-centric workflows composed of grid- and web- services. The focus of our work is on provenance collection for these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products).
    BibTeX:
    @inproceedings{Simmhan:icws:2006,
      author = {Yogesh L. Simmhan and Beth Plale and Dennis Gannon},
      title = {A Framework for Collecting Provenance in Data-Centric Scientific Workflows},
      booktitle = {International Conference on Web Services (ICWS)},
      publisher = {IEEE},
      year = {2006},
      pages = {427-436},
      note = {[CORE A]},
      doi = {http://doi.org/10.1109/ICWS.2006.5}
    }
    					
    Simmhan:ipaw:2006 Simmhan, Y.L.; Plale, B. & Gannon, D. Moreau, L. & Foster, I. (Hrsg.)
    Performance Evaluation of the Karma Provenance Framework for Scientific Workflows
    2006
    Vol. 4145 International Provenance and Annotation Workshop (IPAW) , pp. 222-236  
    inproceedings iu, provenance, escience, karma, workflows, peer reviewed
    Abstract: Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientific workflows and is used in the Linked Environments for Atmospheric Discovery (LEAD) project. This paper presents a performance analysis of the Karma service as compared against the contemporary PReServ provenance service. Our study finds that Karma scales exceedingly well for collecting and querying provenance records, showing linear or sub-linear scaling with increasing number of provenance records and clients when tested against workloads in the order of 10,000 application-service invocations and over 36 concurrent clients.
    BibTeX:
    @inproceedings{Simmhan:ipaw:2006,
      author = {Yogesh L. Simmhan and Beth Plale and Dennis Gannon},
      title = {Performance Evaluation of the Karma Provenance Framework for Scientific Workflows},
      booktitle = {International Provenance and Annotation Workshop (IPAW)},
      publisher = {Springer Berlin / Heidelberg},
      year = {2006},
      volume = {4145},
      pages = {222-236},
      doi = {http://doi.org/10.1007/11890850_23}
    }
    					
    Simmhan:sciflow:2006 Simmhan, Y.L.; Plale, B. & Gannon, D.
    Towards a Quality Model for Effective Data Selection in Collaboratories
    2006 Workshop on Workflow and Data Flow for Scientific Applications (SciFlow) , pp. 1-4   inproceedings iu, provenance, escience, karma, workflows, short paper, peer reviewed
    Abstract: Data-driven scientific applications utilize workflow frameworks to execute complex dataflows, resulting in derived data products of unknown quality. We discuss our on-going research on a quality model that provides users with an integrated estimate of the data quality that is tuned to their application needs, and is available as a numerical quality score that enables uniform comparison of datasets, and increases community’s trust in derived data.
    BibTeX:
    @inproceedings{Simmhan:sciflow:2006,
      author = {Yogesh L. Simmhan and Beth Plale and Dennis Gannon},
      title = {Towards a Quality Model for Effective Data Selection in Collaboratories},
      booktitle = {Workshop on Workflow and Data Flow for Scientific Applications (SciFlow)},
      publisher = {IEEE},
      year = {2006},
      pages = {1--4},
      doi = {http://doi.org/10.1109/ICDEW.2006.150}
    }
    					
    Gannon:ieee:2005 Gannon, D.; Alameda, J.; Chipara, O.; Christie, M.; Dukle, V.; Fang, L.; Farellee, M.; Fox, G.; Hampton, S.; Kandaswamy, G.; Kodeboyina, D.; Moad, C.; Pierce, M.; Plale, B.; Rossi, A.; Simmhan, Y.; Sarangi, A.; Slominski, A.; Shirasauna, S. & Thomas, T.
    Building Grid Portal Applications from a Web-Service Component Architecture
    2005 Proceedings of the IEEE, Special issue on Grid Computing
    Vol. 93 (3) , pp. 551-563  
    article iu,grid, portal,web service, peer reviewed
    Abstract: This paper describes an approach to building Grid applications based on the premise that users who wish to access and run these applications prefer to do so without becoming experts on Grid technology. We describe an application architecture based on wrapping user applications and application workflows as web services and web service resources.These services are visible to the users and to resource providers through a family of Grid portal components that can be used to configure, launch and monitor complex applications in the scientific language of the end user. The applications in this model are instantiated by an application factory service. The layered design of the architecture makes it possible for an expert to configure an application factory service with a custom user interface client that may be dynamical loaded into the portal.
    BibTeX:
    @article{Gannon:ieee:2005,
      author = {Dennis Gannon and Jay Alameda and Octav Chipara and Marcus Christie and Vinayak Dukle and Liang Fang and Matthew Farellee and Geoffrey Fox and Shawn Hampton and Gopi Kandaswamy and Deepti Kodeboyina and Charlie Moad and Marlon Pierce and Beth Plale and Albert Rossi and Yogesh Simmhan and Anuraag Sarangi and Aleksander Slominski and Satoshi Shirasauna and Thomas Thomas},
      title = {Building Grid Portal Applications from a Web-Service Component Architecture},
      journal = {Proceedings of the IEEE, Special issue on Grid Computing},
      publisher = {IEEE},
      year = {2005},
      volume = {93},
      number = {3},
      pages = {551--563},
      note = {[IF 6.81]},
      doi = {http://doi.org/10.1109/JPROC.2004.842756}
    }
    					
    Simmhan:record:2005 Simmhan, Y.; Plale, B. & Gannon, D.
    A Survey of Data Provenance in e-Science
    2005 SIGMOD Record
    Vol. 34 (3) , pp. 31-36  
    article iu, provenance, escience, peer reviewed
    Abstract: Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. In this paper we create a taxonomy of data provenance characteristics and apply it to current research efforts in e-science, focusing primarily on scientific workflow approaches. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. The survey culminates with an identification of open research problems in the field.
    BibTeX:
    @article{Simmhan:record:2005,
      author = {Yogesh Simmhan and Beth Plale and Dennis Gannon},
      title = {A Survey of Data Provenance in e-Science},
      journal = {SIGMOD Record},
      publisher = {ACM},
      year = {2005},
      volume = {34},
      number = {3},
      pages = {31--36},
      note = {[IF 0.667]},
      doi = {http://doi.org/10.1145/1084805.1084812}
    }
    					
    Gannon:icsoc:2005 Gannon, D.; Plale, B.; Christie, M.; Fang, L.; Huang, Y.; Jensen, S.; Kandaswamy, G.; Marru, S.; Pallickara, S.L.; Shirasuna, S.; Simmhan, Y.; Slominski, A. & Sun, Y. Benatallah, B.; Casati, F. & Traverso, P. (Hrsg.)
    Service Oriented Architectures for Science Gateways on Grid Systems
    2005
    Vol. 3826 International Conference on Service-Oriented Computing (ICSOC) , pp. 21-32  
    inproceedings iu, portal, web service, grid, peer reviewed
    Abstract: Grid computing is about allocating distributed collections of resources including computers, storage systems, networks and instruments to form a coherent system devoted to a “virtual organization” of users who share a common interest in solving a complex problem or building an efficient agile enterprise. Service oriented architectures have emerged as the standard way to build Grids. This paper provides a brief look at the Open Grid Service Architecture, a standard being proposed by the Global Grid Forum, which provides the foundational concepts of most Grid systems. Above this Grid foundation is a layer of application-oriented services that are managed by workflow tools and “science gateway” portals that provide users transparent access to the applications that use the resources of a Grid. In this paper we will also describe these Gateway framework services and discuss how they relate to and use Grid services.
    BibTeX:
    @inproceedings{Gannon:icsoc:2005,
      author = {Dennis Gannon and Beth Plale and Marcus Christie and Liang Fang and Yi Huang and Scott Jensen and Gopi Kandaswamy and Suresh Marru and Sangmi Lee Pallickara and Satoshi Shirasuna and Yogesh Simmhan and Aleksander Slominski and Yiming Sun},
      title = {Service Oriented Architectures for Science Gateways on Grid Systems},
      booktitle = {International Conference on Service-Oriented Computing (ICSOC)},
      publisher = {Springer Berlin / Heidelberg},
      year = {2005},
      volume = {3826},
      pages = {21-32},
      note = {[CORE A]},
      doi = {http://doi.org/10.1007/11596141_3}
    }
    					
    Simmhan:iucstr:2005 Simmhan, Y.L.; Plale, B. & Gannon, D.
    A Survey of Data Provenance Techniques
    2005 (612) Technical Report TR-612, Computer Science Department, Indiana University School: Computer Science Department, Indiana University   techreport iu, provenance, escience
    Abstract: Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.
    BibTeX:
    @techreport{Simmhan:iucstr:2005,
      author = {Yogesh L. Simmhan and Beth Plale and Dennis Gannon},
      title = {A Survey of Data Provenance Techniques},
      booktitle = {Technical Report TR-612, Computer Science Department, Indiana University},
      school = {Computer Science Department, Indiana University},
      year = {2005},
      number = {612},
      note = {Extended version of SIGMOD Record 2005},
      url = {http://www.cs.indiana.edu/pub/techreports/TR618.pdf}
    }
    					
    Gannon:clade:2004 Gannon, D.; Krishnan, S.; Fang, L.; Kandaswamy, G.; Simmhan, Y. & Slominski, A. IEEE
    On Building Parallel and Grid Applications: Component Technology and Distributed Services
    2004 International Workshop on Challenges of Large Applications in Distributed Environments (CLADE) , pp. 44 - 51   inproceedings iu, grid, web service, escience, component, peer reviewed
    Abstract: Software Component Frameworks are well known in the commercial business application world and now this technology is being explored with great interest as a way to build large-scale scientific application on parallel computers. In the case of Grid systems, the current architectural model is based on the emerging web services framework. In this paper we describe progress that has been made on the Common Component Architecture model (CCA) and discuss its success and limitations when applied to problems in Grid computing. Our primary conclusion is that a component model fits very well with a services-oriented Grid, but the model of composition must allow for a very dynamic (both in space and it time) control of composition. We note that this adds a new dimension to conventional service workflow and it extends the “Inversion of Control” aspects of must component systems.
    BibTeX:
    @inproceedings{Gannon:clade:2004,
      author = {Dennis Gannon and Sriram Krishnan and Liang Fang and Gopi Kandaswamy and Yogesh Simmhan and Aleksander Slominski},
      title = {On Building Parallel and Grid Applications: Component Technology and Distributed Services},
      booktitle = {International Workshop on Challenges of Large Applications in Distributed Environments (CLADE)},
      year = {2004},
      pages = {44 -- 51},
      note = {[CORE C]},
      doi = {http://doi.org/10.1109/CLADE.2004.1309091}
    }
    					
    Gannon:dbgs:2003 Gannon, D.; Christie, M.; Chipara, O.; Fang, L.; Farrellee, M.; Kandaswamy, G.; Lu, W.; Plale, B.; Slominski, A.; Sarangi, A. & Simmhan, Y.L.
    Building Grid Services for User Portals
    2003 Workshop on Designing and Building Grid Services (DBGS)   inproceedings iu, portal, grid, web service, escience, peer reviewed
    BibTeX:
    @inproceedings{Gannon:dbgs:2003,
      author = {Dennis Gannon and Marcus Christie and Octav Chipara and Liang Fang and Matthew Farrellee and Gopi Kandaswamy and Wei Lu and Beth Plale and Aleksander Slominski and Anuraag Sarangi and Yogesh L. Simmhan},
      title = {Building Grid Services for User Portals},
      booktitle = {Workshop on Designing and Building Grid Services (DBGS)},
      publisher = {GGF},
      year = {2003},
      url = {http://www.mcs.anl.gov/ keahey/DBGS/DBGS_files/dbgs_papers/gannon.pdf}
    }
    					
    Gannon:cluster:2002 Gannon, D.; Bramley, R.; Fox, G.; Smallen, S.; Rossi, A.; Ananthakrishnan, R.; Bertrand, F.; Chiu, K.; Farrellee, M.; Govindaraju, M.; Krishnan, S.; Ramakrishnan, L.; Simmhan, Y.; Slominski, A.; Ma, Y.; Olariu, C. & Rey-Cenvaz, N.
    Programming the Grid: Distributed Software Components, P2P and Grid Web Services for Scientific Applications
    2002 Cluster Computing
    Vol. 5 (3) , pp. 325-336  
    article iu, component, grid, web service, escience, peer reviewed
    Abstract: Computational Grids have become an important asset in large-scale scientific and engineering research. By providing a set of services that allow a widely distributed collection of resources to be tied together into a relatively seamless computing framework, teams of researchers can collaborate to solve problems that they could not have attempted before. Unfortunately the task of building Grid applications remains extremely difficult because there are few tools available to support developers. To build reliable and re-usable Grid applications, programmers must be equipped with a programming framework that hides the details of most Grid services and allows the developer a consistent, non-complex model in which applications can be composed from well tested, reliable sub-units. This paper describes experiences with using a software component framework for building Grid applications. The framework, which is based on the DOE Common Component Architecture (CCA), allows individual components to export function/service interfaces that can be remotely invoked by other components. The framework also provides a simple messaging/event system for asynchronous notification between application components. The paper also describes how the emerging Web-Services model fits with a component-oriented application design philosophy. To illustrate the connection between web services and Grid application programming we describe a simple design pattern for application factory services which can be used to simplify the task of building reliable Grid programs. Finally we address several issues of Grid programming that better understood from the perspective of Peer-to-Peer (P2P) systems. In particular we describe how models for collaboration and resource sharing fit well with many grid application scenarios.
    BibTeX:
    @article{Gannon:cluster:2002,
      author = {Dennis Gannon and Randall Bramley and Geoffrey Fox and Shava Smallen and Al Rossi and Rachana Ananthakrishnan and Felipe Bertrand and Kenneth Chiu and Matt Farrellee and Madhusudhan Govindaraju and Sriram Krishnan and Lavanya Ramakrishnan and Yogesh Simmhan and Aleksander Slominski and Yu Ma and Caroline Olariu and Nicolas Rey-Cenvaz},
      title = {Programming the Grid: Distributed Software Components, P2P and Grid Web Services for Scientific Applications},
      journal = {Cluster Computing},
      publisher = {Springer Netherlands},
      year = {2002},
      volume = {5},
      number = {3},
      pages = {325--336},
      note = {[IF 0.519]},
      doi = {http://doi.org/10.1023/A:1015633507128}
    }
    					
    Krishnan:sciprog:2002 Krishnan, S.; Bramley, R.; Gannon, D.; Ananthakrishnan, R.; Govindaraju, M.; Slominski, A.; Simmhan, Y.; Alameda, J.; Alkire, R.; Drews, T. & Webb, E.
    The XCAT Science Portal
    2002 Scientific Programming
    Vol. 10 (4) , pp. 303--317  
    article iu, component, portal, escience, peer reviewed
    Abstract: This paper describes the design and prototype implementation of the XCAT Grid Science Portal. The portal lets grid application programmers script complex distributed computations and package these applications with simple interfaces for others to use. Each application is packaged as a notebook which consists of webpages and editable parameterized scripts. The portal is a workstation-based specialized personal web server, capable of executing the application scripts and launching remote grid applications for the user. The portal server can receive event streams published by the application and grid resource information published by Network Weather Service(NWS) or Autopilot sensors. Notebooks can be published and stored in web based archives for others to retrieve and modify. The XCAT Grid Science Portal has been tested with various applications, including the distributed simulation of chemical processes in semiconductor manufacturing and collaboratory support for X-ray crystallographers.
    BibTeX:
    @article{Krishnan:sciprog:2002,
      author = {Sriram Krishnan and Randall Bramley and Dennis Gannon and Rachana Ananthakrishnan and Madhusudhan Govindaraju and Aleksander Slominski and Yogesh Simmhan and Jay Alameda and Richard Alkire and Timothy Drews and Eric Webb},
      title = {The XCAT Science Portal},
      journal = {Scientific Programming},
      publisher = {IOS Press},
      year = {2002},
      volume = {10},
      number = {4},
      pages = {303---317},
      note = {[IF 0.967]},
      url = {http://iospress.metapress.com/content/UEYBQKHHPGJUGWT2}
    }
    					
    Slominski:iucstr:2002 Slominski, A.; Simmhan, Y.; Rossi, A.L.; Farrellee, M. & Gannon, D.
    XEvents/XMessages: Application Events and Messaging Framework for Grid
    2002 Technical Report, Extreme! Computing Lab, Indiana University School: Extreme! Computing Lab, Indiana University   techreport iu,grid,web service
    BibTeX:
    @techreport{Slominski:iucstr:2002,
      author = {Aleksander Slominski and Yogesh Simmhan and Albert Louis Rossi and Matthew Farrellee and Dennis Gannon},
      title = {XEvents/XMessages: Application Events and Messaging Framework for Grid},
      booktitle = {Technical Report, Extreme! Computing Lab, Indiana University},
      school = {Extreme! Computing Lab, Indiana University},
      year = {2002},
      url = {www.extreme.indiana.edu/xgws/papers/xevents_xmessages_tr.pdf}
    }
    					

    Created by JabRef on 22/09/2016.