Google Open Source Blog

GitHub on BigQuery: Analyze all the code

Wednesday, June 29, 2016

Posted by Felipe Hoffa, Google Developer AdvocateGoogle BigQueryGitHub Archive projectGoogle BigQuery Public Datasets

With BigQuery everyone gets a terabyte every month to run queries. If you've never tried BigQuery before, follow these getting started instructions.

The contents table has all the non-binary files in GitHub that are less than 1MB. It's a huge table, with more than 1.5 terabytes of data! This means the monthly terabyte for BigQuery queries won't last long if you want to query this table. To make your life easier, we've created extracts with only a sample of 10% of all files of the most popular projects, as well as another dataset with all the .go, .rb. .js, .php, .py, and .java code. Use them to make your free quota last!

If these tables are not enough, you can always create your own extracts (but you'll be billed for the respective storage). To do so, you could sign up for $300 in Google Cloud Platform credits. These credits could be used to store terabytes (and more) of data in BigQuery.

BigQuery makes it easy to join different datasets. How about ranking coding patterns by the number of stars their projects get? See a related post looking at the Hacker News effect on a project’s GitHub stars.

SQL is not enough? Learn how BigQuery allows you to run arbitrary JavaScript code inside SQL to enable a full range of possibilities.

GitHub's announcementsample queriesreddit.com/r/bigqueryHacker Newspost on Medium

More statistics from Google Summer of Code 2016

Tuesday, June 28, 2016

Google Summer of Codestats post

Country

School

2016 Accepted Students

2015 Accepted Students

12 Year Total

India

International Institute of Information Technology - Hyderabad

252

Sri Lanka

University of Moratuwa

320

Romania

University POLITEHNICA of Bucharest

155

India

Birla Institute of Technology and Science Pilani, Goa Campus

110

India

Birla Institute of Technology and Science, Pilani Campus

116

India

Indian Institute of Technology, Bombay

India

Indian Institute of Technology, Kharagpur

India

Indian Institute of Technology, Roorkee

India

Indraprastha Institute of Information Technology Delhi

India

Amrita School of Engineering, Amrita University, Amritapuri Campus

India

Indian Institute of Technology, Guwahati

Cameroon

University of Buea

India

Delhi Technological University

India

Indian Institute of Technology BHU Varanasi

Germany

TU Munich

Grace HopperBlack Girls CodeBy Mary Radomile, Open Source Programs Office

Coding has begun for Google Summer of Code 2016

Monday, May 23, 2016

Google Summer of Codeabout 1,200 students178programtimelineBy Josh Simmons, Open Source Programs Office

Google Summer of Code 2016 statistics: Part one

Monday, May 23, 2016

Google Summer of Code

Country

Accepted Students

Country

Accepted Students

Country

Accepted Students

Albania

Greece

Romania

Algeria

Guatemala

Russian Federation

Argentina

Hong Kong

Serbia

Armenia

Hungary

Singapore

Australia

India

454

Slovak Republic

Austria

Ireland

Slovenia

Belarus

Israel

South Africa

Belgium

Italy

South Korea

Bosnia-Herzegovina

Japan

Spain

Brazil

Kazakhstan

Sri Lanka

Bulgaria

Kenya

Sweden

Cambodia

Latvia

Switzerland

Cameroon

Lithuania

Taiwan

Canada

Luxembourg

Thailand

China

Macedonia

Turkey

Croatia

Mexico

Ukraine

Czech Republic

Netherlands

United Kingdom

Denmark

New Zealand

United States

118

Egypt

Pakistan

Uruguay

Estonia

Paraguay

Venezuela

Finland

Philippines

Vietnam

France

Poland

Germany

Portugal

By Mary Radomile, Open Source Programs
Correction: A previous version of this blog post erroneously reported the total number of students as 1,202 and the number of students from Cameroon as 1. This has been updated to reflect the actual totals as 1,206 and 16 respectively.

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source

Friday, May 13, 2016

Originally posted on the Google Research Blog
By Slav Petrov, Senior Staff Research Scientistcomputer systemsreadunderstandhuman languageto process itintelligent waysSyntaxNetTensorFlowNatural Language UnderstandingParsey McParsefacemost accurate such model in the worldHow does SyntaxNet work?syntactic parserAlice saw Bob

AliceBobsawsawAlicesawBob

AliceBobsawAlicereadingsawyesterdaywhom did Alice see?who saw Bob?what had Alice been reading about?when did Alice see Bob?Why is Parsing So Hard For Computers to Get Right?Alice drove down the street in her car

indrovestreetprepositional phrase attachment ambiguitybeam searchI booked a ticket to Google

paperintegrate learning and searchSyntaxNetTensorFlowUniversal TreebanksSo How Accurate is Parsey McParseface?Penn Treebankbetter than any previous approachGoogle WebTreebankallSyntaxNet

Googlers on the road: OSCON 2016 in Austin

Monday, May 9, 2016

OSCONCommunity Leadership Summit (CLS)

OSCON 2014 program chairs including Googler Sarah Novotny.
Photo licensed by O'Reilly Media under CC-BY-NC 2.0.

This year we have 10 Googlers hosting sessions covering topics including web development, machine learning, devops, astronomy and open source. A list of all of the talks hosted by Googlers alongside related events can be found below.
If you’re a student, educator, mentor, past or present participant in Google Summer of Code or Google Code-in, or just interested in learning more about the two programs, make sure to join us Monday evening for our Birds of a Feather session.

Have questions about Kubernetes, Google Summer of Code, open source at Google or just want to meet some Googlers? Stop by booth #307 in the Expo Hall.

Thursday, May 12th - GDG Austin 7:00pm   Google Developers Group Austin Meetup

Sunday, May 15th - Community Leadership Summit 10:00am  Occupational Hazard by Josh Simmons

Monday, May 16th 9:00am   Kubernetes: From scratch to production in 2 days by Brian Dorsey and Jeff Mendoza 7:00pm   Google Summer of Code and Google Code-in Birds of a Feather

Tuesday, May 17th 9:00am   Kubernetes: From scratch to production in 2 days by Brian Dorsey and Jeff Mendoza 9:00am   Diving into machine learning through TensorFlow by Julia Ferraioli, Amy Unruh and Eli Bixby

Wednesday, May 18th 1:50pm    Open source lessons from the TODO Group by Chris DiBona, Chris Aniszczyk, Nithya Ruff, Jeff McAffer and Benjamin VanEvery 5:10pm    Scalable bidirectional communication over the Web by Wenbo Zhu

Thursday, May 19th
11:00am  Kubernetes hackathon at OSCON Contribute hosted by Brian Dorsey, Nikhil Jindal, Janet Kuo, Jeff Mendoza, John Mulhausen, Sarah Novotny, Terrence Ryan and Chao Xu 2:40pm    Blocks in containers: Lessons learned from containerizing Minecraft by Julia Ferraioli 5:10pm    PANOPTES: Open source planet discovery by Jennifer Tong and Wilfred Gee 5:10pm    Stop writing JavaScript frameworks by Joseph Gregorio

Haven’t registered for OSCON yet? You can knock 25% off the cost of registration by using discount code Google25, or attend parts of the event including our Birds of a Feather session for free by using discount code OSCON16XPO.

See you at OSCON!
By Josh Simmons, Open Source Programs Office

XRay: a function call tracing system

Tuesday, May 3, 2016

XRayBigtablewhite paper describing the technical details of XRayLLVMBy Dean Michael Berris, Google Engineering

Students announced for Google Summer of Code 2016

Friday, April 22, 2016

1,206 studentsGoogle Summer of Code178 mentoring organizationsprogram website
By Josh Simmons, Open Source Programs Office

CCTZ v2.0 — now with more civil time

Tuesday, April 12, 2016

Last September we announced an open source project called CCTZ, a C++ library that enables computing with arbitrary time zones. Today we're announcing CCTZ v2.0 which introduces a new civil time library. Civil time is a legally recognized representation of time used by humans (i.e., year, month, day, hour, minute and second). The most common example of a civil time is a time zone independent date. In version 2.0, CCTZ's time zone and new civil time libraries cooperate with the standard C++ <chrono> library to give programmers a complete (and simple!) framework in which to reason about and solve even the most complicated time programming problems.

To learn more, please check out the project page on GitHub. Pay particular attention to the fundamental concepts section which establishes a simple, cross-platform and language agnostic mental model that will help you reason about time programming challenges with ease and confidence. And don't forget to subscribe to the new CCTZ mailing list to ask questions and learn about future announcements.

by Greg Miller and Bradley White, Google Engineering

Google Summer of Code marches on!

Friday, April 1, 2016

Google Summer of Code 2016 (GSoC) is well underway and we’ve already seen some impressive numbers — all record highs!

18,981 total registered students (up 36% from 2015)

17.34% female registrants

142 countries

5107 students submitting 7,543 project proposals

Student proposals are currently being reviewed by over 2300 mentors and organization administrators from the 180 participating mentor organizations. We will announce accepted students on April 22, 2016 on the Open Source blog and on the program site.
Last week, members of the Google Open Source Programs team attended FOSSASIA in Singapore, Asia’s premier open technology event, to talk about GSoC and Google Code-in. There, we met dozens of former GSoC and GCI students and mentors who were excited to embark on another great year. To learn more about Google Summer of Code, please visit our program site.
By Stephanie Taylor, Open Source Programs

Open Source Blog

GitHub on BigQuery: Analyze all the code

More statistics from Google Summer of Code 2016

Coding has begun for Google Summer of Code 2016

Google Summer of Code 2016 statistics: Part one

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source

Googlers on the road: OSCON 2016 in Austin

XRay: a function call tracing system

Students announced for Google Summer of Code 2016

CCTZ v2.0 — now with more civil time

Google Summer of Code marches on!

Labels

Archive

Feed

Company-wide

Products

Developers