Distributed Computing Archive: November 2009
« Previous | Main | Next »
November 24, 2009
11/18 Hadoop Bay Area User Group recap
Hi Hadoopers
Thanks everyone for joining us last Wednesday night at the Yahoo!’s Sunnyvale campus. We had more than a hundred attendees, a nice record to mark the last meeting for 2009. I’m happy to see cool presentations coming out of the growing community and many interesting solutions implemented on top of this amazing technology.
Jason Rutherglen and Jason Venner presenting "Searching at Scale" at Hadoop User Group.
Credit: YDN
For those of you who could not attend in person, please find the video recordings and slides posted below.
Jason Rutherglen and Jason Venner presented an interesting solution for Searching at Scale using Katta, Solr, Lucene and Hadoop.
Video recording:
Slideshare:
Sanjay Radia from Yahoo! Hadoop Team walked as through the New File system API
Video recording:
Slideshare:
Paul Tarjan shared an exciting solution for building a Hadoop Record Reader in Python
As always, we are looking for exciting technologies and experiences you want to share.
Please email presentation requests to dekel at yahoo hyphen inc dot com.
See you all in Jan 20th, 2010
Dekel Tankel
Director, Product Management, Cloud Computing
Posted by eldridge at 1:53 PM | Comments (2) | TrackBack | Permalink
Yahoo!'s India Hadoop Team is growing!
Do you love the challenges of working with systems that host petabytes of data and many tens of thousands of cores? The Hadoop team in Bangalore, India, is a core contributor to developing such large scale distributed systems at Yahoo! The team is heavily engaged directly in the open source Apache Hadoop project. If open source development and large scale distributed systems is your forte, send your resume to hadoop-jobs-2009@yahoo-inc.com
Hadoop Performance Engineers and Architect
Driving up Hadoop performance and resource utilization is a challenge at Yahoo!'s scale of operation (25K+ servers running Hadoop). The Grid Performance Engineering team is looking for analytically strong engineers with a solid background in distributed systems, operating systems, and system internals to study, benchmark and push performance limits of Hadoop MapReduce and Distributed File System. Preferred background is an MTech/MS/PhD in Computer Science with 9+ years industry experience. Background in Performance Optimization, Benchmarking or High Performance Computing (HPC) is a plus.
Hadoop Quality Engineering Architect
The Hadoop QE team is looking for a Quality Engineering Architect having 10+ years of experience in development, white box testing, design and implementation of automated test frameworks using Java. This highly technical role requires hands on experience in Java or C++ and proficiency in scripting. You will be responsible for providing technical leadership to the quality engineering team members, as well as designing and implementing automated test frameworks for Hadoop and its related components. You will also be responsible for delivering quality software products and frequently interacting with open source community, developers, operations and managers. Preferred background is an MTech/MS/PhD in Computer Science or related field.
Data Quality Engineering Architect
The Data QE team is looking for a Quality Engineering Architect having 10+ years of experience in development, white box testing, design and implementation of automated test frameworks using Java. This highly technical role requires hands on experience in data analysis, data validation, data cleansing, and data verification. Excellent Java (or C++) and scripting skills are required. You will be responsible for providing technical leadership to the data QE team, as well as designing and implementing automated test framework for data handling using Hadoop. You will also be responsible for delivering quality software products and frequently interacting with open source community, developers, operations and managers. Preferred background is an MTech/MS/PhD in Computer Science or related field.
Hadoop Architect
The Hadoop team in Bangalore is looking for Architects having 10+ years of experience to help develop massively scalable, highly performant, and reliable platforms, including scheduling for such environments. These platforms handle the data manipulation, mining and storage needs of applications that work with several multi-terabyte data sets. Developing this infrastructure requires solving many technical challenges in the areas of parallel and distributed computing, multi-terabyte storage systems, and high-performance computing. It calls for skills in distributed algorithms and file systems, software design principles, systems programming and expertise in Java and C/C++. You will be expected to build scalable and modular system, measure and optimize system performance, and ensure that systems run reliably in a 24/7 production environment.
Sr. Manager, Hadoop Development
The Hadoop team in Bangalore is looking for Senior Managers with 10+ years of experience to lead a team developing hugely scalable, highly performant, and reliable platforms, including scheduling for such environments. These platforms handle the data manipulation, mining and storage needs of applications that work with several multi-terabyte data sets. We are looking for a software leader with a strong systems management and development background. This job involves leading and growing a team of 10 engineers, managers and architects at the heart of the Hadoop open source community. We are looking for someone with a history of driving delivery of complex distributed systems. 6+ years of software development and an addition 4+ years of management required. Experience with Java, Unix, C++, agile development and open source development desired.
Once again, please send your resume to hadoop-jobs-2009@yahoo-inc.com
Chid Kollengode
Director of Engineering
Hadoop Team
Bangalore, India
Posted by ndaley at 10:20 AM | Comments (0) | TrackBack | Permalink
November 15, 2009
Do you have what it takes to join Yahoo!'s US Hadoop Team? [UPDATED]
Update: Added a few more US based jobs.
First, an introduction. I'm Mark Tsimelzon, a recent addition to the Hadoop team. I'm Director of Engineering at Yahoo!, managing MapReduce and a bunch of projects with cute animal names that build database abstractions on top of Apache Hadoop. Having spent most of my career in various startups, I was not sure what I was getting myself into when I joined Yahoo!. To my amazement, what I discovered here was not so different from a startup. The Hadoop team at Yahoo! is filled with extremely smart, hard-working people, who care deeply about their job, Yahoo!, and Open Source. The team moves as fast as any startup does, even though the scale of the problems it solves would make any startup founder deeply envious.
The best part of being the a part of the Hadoop team at Yahoo! is that despite the current global economic situation this team is growing fast! This is not surprising - all of Yahoo! batch data processing is moving to Hadoop, and we need many more great people on to join this team. What follows is a quick list of openings we currently have, and it. It includes openings for developers, testers, architects, managers and directors. If you are interested in applying for any of these positions, please send your resume together with a few lines on why you want to work with us on Hadoop to hadoop-jobs-2009@yahoo-inc.com
Senior Software Engineer
We are looking for great software engineers who have a wealth of experience with complex software systems, distributed systems, algorithms, data structures, and performance optimizations. Understanding of MapReduce, grid computing, databases, data warehouses, and database internals is a big plus. Expert Java skills are required. Experience with agile development and open source development is desired. 6+ years of software development experience are desired.
Software Architect / Team Leads
We are looking for great architects / technical team leads who have a proven track record of designing and delivering complex software systems. Thorough knowledge of distributed systems, algorithms, data structures, performance optimization, scalability, and reliability issues are required. Understanding of grid computing, databases, data warehouses, and especially database internals is a big plus. Solid Java skills are required. Experience with agile development and open source development is desired. 8+ years of experience, including 4+ years in the architct / team lead role are desired.
Senior Java Performance Engineer
Our Grid Hadoop Performance/Utilization team is looking for a senior performance engineer with expert Java/JVM knowledge to help us: evaluate and propose optimal JVM tuning options for best performance; evaluate various JVM's for best performance and stability as a continuous process; profile the Java code for Grid software stack to find performance bottlenecks and find propose innovative ways to eliminate them; characterize all aspects of HDFS and MapReduce performance; participate in design reviews and propose innovative solutions for performance and scalability improvement throughout the life cycle of product; measure system resource utilization and efficiency, and mine and analyze large amount of logs and traces to identify improvement opportunities; champion the techniques of writing best java code for high performance for writing high performance Java code.
Senior Engineer for Oozie
As more data processing projects both within Yahoo! and worldwide move to using Hadoop, a comprehensive workflow management and coordination system becomes critical. Oozie is intended to solve precisely this problem. The challenge is to make the system scalable, fast, data aware, fault tolerant, and extensible. It needs to support many different job types that can run on a Grid, such as MapReduce, PIG, Hadoop Streaming, HDFS, etc. The Oozie team is rapidly growing. If building a core component of the open source Hadoop project is the kind of challenge that appeals to you, this is the place to be. Excellent Java coding skills required for this position.
Senior Engineering Manager, MapReduce Team
Our MapReduce team is growing, and we are looking for a dedicated Senior Engineering Manger to lead it. We are looking for someone with a history of driving delivery of complex distributed systems and/or complex systems software. Experience with MapReduce is a Plus. Experience with MapReduce internals is a Big Plus. Experience of leading a geographically distributed team is strongly desired. Our environment is open, collaborative and fast paced. It is filled with very smart and independent people. We require our leaders to nurture such an environment while demanding delivery and high standards in our work product. 10+ years of experience, including 5+ years of management experience are preferred. Experience with Java, Unix, C++, agile development and open source development desired.
Director of Software Engineering, Hadoop Systems
Our Hadoop development team is looking for a world class software leader with a strong systems management and architecture background to lead our investments in HDFS, ZooKeeper and Hadoop performance. This job involves leading and growing a team of 20 engineers, managers and architects at the heart of the Hadoop open source community. We are looking for someone with a history of driving delivery of complex distributed systems and/or complex systems software such as file systems and operating systems. Our environment is open, collaborative and fast paced. It is filled with very smart and independent people. We require our leaders to nurture such an environment while demanding delivery and high standards in our work product. 8+ years of software development and an addition 8+ years of architecture / management required. Experience with Java, Unix, C++, agile development and open source development desired.
Hadoop Software Quality Engineer Architect
As a QE Architect with 10+ years of experience, you will lead the design and implementation of test plans, test cases, and test frameworks across a number of open source Apache projects related to data processing that underpin the Yahoo! Cloud Computing infrastructure: Pig, Owl, and Zebra. In this highly technical role, you will interface with QA managers, other architects, leads, developers, product managers and operation teams to complete projects. You should possess skills in architecting and lead test efforts for backend components as well as system performance and reliability testing. Excellent Java or C++ coding skills required for this hands-on position.
Senior Whitebox Quality Engineering Lead for Owl
As a Whitebox QE Lead with 5+ years of experience, you will contribute to the leading, design and implementation of test plans, test cases and validation using test tools of complex, distributed software. You will interface with QA managers, other leads, developers, product managers and operation teams to complete projects. You should possess skills in leading and testing APIs of backend components as well as system performance testing. Excellent Java or C++ coding skills required for this position.
Senior Whitebox Quality Engineering Lead for ZooKeeper
Think you got what it takes to engineer testing for ZooKeeper's 5 guarantees? (Sequential Consistency, Atomicity, Single System Image, Reliability, Timeliness) We're looking for a Whitebox QE Lead with at least 5 years of experience to lead, design and implement test strategy, test infrastructure, and test cases. Working with a small development team, you will interface with QA managers, other leads, developers, product managers and operation teams to complete projects. You should possess skills in leading and testing APIs of backend components as well as system performance testing. Excellent Java or C++ coding skills required for this position.
Senior Manager of Data Processing Quality Engineering
The Hadoop team at Yahoo! is seeking an experienced, hands-on Sr. Whitebox QE Manager with strong technical skills (including coding) to lead and grow a team of quality engineers in delivering world-class data processing services as part of Yahoo! Cloud Computing platform. You will define test strategy, execute test plans, review tests and product specifications, participate in defining and selecting appropriate test tools and automation strategy, and lead a strong team to deliver a high quality product. You will work closely with architects, development, program management, operations, and customers to test and release products.
Senior Release Engineer
The Grid Computing team at Yahoo! is looking for an experienced, self-motivated, and energetic release engineer. If you love working with big distributed systems and handling critical configuration management issues, then this position is for you. You also must be able to create order out of chaos by introducing necessary processes and controls in a fast moving, cross functional environment. The ideal candidate is hardworking, detail oriented and has experience as a system administrator or release engineer.
Once again, please send your resume to hadoop-jobs-2009@yahoo-inc.com
Mark Tsimelzon
Director of Engineering
Hadoop Team
Posted by ndaley at 1:55 AM | Comments (1) | TrackBack | Permalink
November 12, 2009
Hadoop Bay Area User Group - Nov 18 at Yahoo!, Sunnyvale
Yahoo! is hosting the Bay Area Hadoop User Group (HUG) next week on Wednesday, November 18. Whether you are an active submitter of patches or completely new to Hadoop -- we'd love to see you.
Learn more about Hadoop and see some of the great ways people are using it to process their data. The notes and slides from previous meetups can be found on the Yahoo! Hadoop blog.
You can sign-up and view the agenda for the upcoming meetup on the Bay Area HUG Meetup page . It's happening on Wednesday 18 November at the Yahoo! campus in Sunnyvale (see Meetup page for location specifics) at 6pm.
This is the last HUG for 2009, as the December HUG is canceled due to the holidays.
Hope to see you there next week.
Dekel Tankel
Director, product management, cloud computing
Posted by eldridge at 11:53 AM | Comments (0) | TrackBack | Permalink
Subscribe
Recent Blog Articles
view all
Hadoop Bay Area User Group - Feb 17th at Yahoo!, Sunnyvale
Wed, 03 Feb 2010
Comparing Pig Latin and SQL for Constructing Data Processing Pipelines
Fri, 29 Jan 2010
Video from Jan. 20, 2010 Hadoop Bay Area User Group now online
Thu, 28 Jan 2010
Stomping out Java "concurrency cockroaches" with SureLogic's Flashlight and JSure tools
Tue, 26 Jan 2010
Hadoop Bay Area January 2010 User Group - Recap
Thu, 21 Jan 2010
Recent Links
Appcelerator Titanium + Yahoo YQL on Vimeo
Mon, 08 Feb 2010
Tue, 02 Feb 2010
PhoneGap | Cross platform mobile framework
Sat, 30 Jan 2010
Web developers can rule the iPad - O'Reilly Radar
Sat, 30 Jan 2010
rc3.org - Is the iPad the harbinger of doom for personal computing?
Thu, 28 Jan 2010
Archives
Recent Readers

