diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md deleted file mode 100644 index 85567507..00000000 --- a/.github/ISSUE_TEMPLATE.md +++ /dev/null @@ -1,39 +0,0 @@ -Please make sure these boxes are checked before submitting your issue – thank you! - -- [ ] You can actually watch the video in your browser or mobile application, but not download them with `you-get`. -- [ ] Your `you-get` is up-to-date. -- [ ] I have read and tried to do so. -- [ ] The issue is not yet reported on or . If so, please add your comments under the existing issue. -- [ ] The issue (or question) is really about `you-get`, not about some other code or project. - -Run the command with the `--debug` option, and paste the full output inside the fences: - -``` -[PASTE IN ME] -``` - -If there's anything else you would like to say (e.g. in case your issue is not about downloading a specific video; it might as well be a general discussion or proposal for a new feature), fill in the box below; otherwise, you may want to post an emoji or meme instead: - -> [WRITE SOMETHING] -> [OR HAVE SOME :icecream:!] - -汉语翻译最终日期:2016年02月26日 - -在提交前,请确保您已经检查了以下内容! - -- [ ] 你可以在浏览器或移动端中观看视频,但不能使用`you-get`下载. -- [ ] 您的`you-get`为最新版. -- [ ] 我已经阅读并按 中的指引进行了操作. -- [ ] 您的问题没有在 , 报告,否则请在原有issue下报告. -- [ ] 本问题确实关于`you-get`, 而不是其他项目. - -请使用`--debug`运行,并将输出粘贴在下面: - -``` -[在这里粘贴完整日志] -``` - -如果您有其他附言,例如问题只在某个视频发生,或者是一般性讨论或者提出新功能,请在下面添加;或者您可以卖个萌: - -> [您的内容] -> [舔 :icecream:!] diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md deleted file mode 100644 index 79a43f6b..00000000 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ /dev/null @@ -1,48 +0,0 @@ -**(PLEASE DELETE ALL THESE AFTER READING)** - -Thank you for the pull request! `you-get` is a growing open source project, which would not have been possible without contributors like you. - -Here are some simple rules to follow, please recheck them before sending the pull request: - -- [ ] If you want to propose two or more unrelated patches, please open separate pull requests for them, instead of one; -- [ ] All pull requests should be based upon the latest `develop` branch; -- [ ] Name your branch (from which you will send the pull request) properly; use a meaningful name like `add-this-shining-feature` rather than just `develop`; -- [ ] All commit messages, as well as comments in code, should be written in understandable English. - -As a contributor, you must be aware that - -- [ ] You agree to contribute your code to this project, under the terms of the MIT license, so that any person may freely use or redistribute them; of course, you will still reserve the copyright for your own authorship. -- [ ] You may not contribute any code not authored by yourself, unless they are licensed under either public domain or the MIT license, literally. - -Not all pull requests can eventually be merged. I consider merged / unmerged patches as equally important for the community: as long as you think a patch would be helpful, someone else might find it helpful, too, therefore they could take your fork and benefit in some way. In any case, I would like to thank you in advance for taking your time to contribute to this project. - -Cheers, -Mort - -**(PLEASE REPLACE ALL ABOVE WITH A DETAILED DESCRIPTION OF YOUR PULL REQUEST)** - - -汉语翻译最后日期:2016年02月26日 - -**(阅读后请删除所有内容)** - -感谢您的pull request! `you-get`是稳健成长的开源项目,感谢您的贡献. - -以下简单检查项目望您复查: - -- [ ] 如果您预计提出两个或更多不相关补丁,请为每个使用不同的pull requests,而不是单一; -- [ ] 所有的pull requests应基于最新的`develop`分支; -- [ ] 您预计提出pull requests的分支应有有意义名称,例如`add-this-shining-feature`而不是`develop`; -- [ ] 所有的提交信息与代码中注释应使用可理解的英语. - -作为贡献者,您需要知悉 - -- [ ] 您同意在MIT协议下贡献代码,以便任何人自由使用或分发;当然,你仍旧保留代码的著作权 -- [ ] 你不得贡献非自己编写的代码,除非其属于公有领域或使用MIT协议. - -不是所有的pull requests都会被合并,然而我认为合并/不合并的补丁一样重要:如果您认为补丁重要,其他人也有可能这么认为,那么他们可以从你的fork中提取工作并获益。无论如何,感谢您费心对本项目贡献. - -祝好, -Mort - -**(请将本内容完整替换为PULL REQUEST的详细内容)** diff --git a/.travis.yml b/.travis.yml index 2d780e81..8433fe75 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,15 +1,23 @@ # https://travis-ci.org/soimort/you-get language: python python: - - "3.2" - - "3.3" - "3.4" - "3.5" - "3.6" - - "nightly" - "pypy3" +matrix: + include: + - python: "3.7" + dist: xenial + - python: "3.8-dev" + dist: xenial + - python: "nightly" + dist: xenial +before_install: + - pip install flake8 +before_script: + - flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics script: make test -sudo: false notifications: webhooks: urls: diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b7b6ba42..36816948 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,27 +1,27 @@ -# How to Contribute +# How to Report an Issue -`you-get` is currently experimenting with an aggressive approach to handling issues. Namely, a bug report must be addressed with some code via a pull request. +If you would like to report a problem you find when using `you-get`, please open a [Pull Request](https://github.com/soimort/you-get/pulls), which should include: -## Report a broken extractor +1. A detailed description of the encountered problem; +2. At least one commit, addressing the problem through some unit test(s). + * Examples of good commits: [#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files) -**How-To:** Please open a new pull request with the following changes: +PRs that fail to meet the above criteria may be closed summarily with no further action. -* Add a new test case in [tests/test.py](https://github.com/soimort/you-get/blob/develop/tests/test.py), with the failing URL(s). +A valid PR will remain open until its addressed problem is fixed. -The Travis CI build will (ideally) fail showing a :x:, which means you have successfully reported a broken extractor. -Such a valid PR will be either *closed* if it's fixed by another PR, or *merged* if it's fixed by follow-up commits from the reporter himself/herself. -## Report other issues / Suggest a new feature +# 如何汇报问题 -**How-To:** Please open a pull request with the proposed changes directly. +为了防止对 GitHub Issues 的滥用,本项目不接受一般的 Issue。 -A valid PR need not be complete (i.e., can be WIP), but it should contain at least one sensible, nontrivial commit. +如您在使用 `you-get` 的过程中发现任何问题,请开启一个 [Pull Request](https://github.com/soimort/you-get/pulls)。该 PR 应当包含: -## Hints +1. 详细的问题描述; +2. 至少一个 commit,其内容是**与问题相关的**单元测试。**不要通过随意修改无关文件的方式来提交 PR!** + * 有效的 commit 示例:[#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files) -* The [`develop`](https://github.com/soimort/you-get/tree/develop) branch is where your pull request goes. -* Remember to rebase. -* Document your PR clearly, and if applicable, provide some sample links for reviewers to test with. -* Write well-formatted, easy-to-understand commit messages. If you don't know how, look at existing ones. -* We will not ask you to sign a CLA, but you must assure that your code can be legally redistributed (under the terms of the MIT license). +不符合以上条件的 PR 可能被直接关闭。 + +有效的 PR 将会被一直保留,直至相应的问题得以修复。 diff --git a/LICENSE.txt b/LICENSE.txt index 7b25d906..5964bf20 100644 --- a/LICENSE.txt +++ b/LICENSE.txt @@ -1,15 +1,14 @@ -============================================== -This is a copy of the MIT license. -============================================== -Copyright (C) 2012-2017 Mort Yao -Copyright (C) 2012 Boyu Guo +MIT License -Permission is hereby granted, free of charge, to any person obtaining a copy of -this software and associated documentation files (the "Software"), to deal in -the Software without restriction, including without limitation the rights to -use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies -of the Software, and to permit persons to whom the Software is furnished to do -so, subject to the following conditions: +Copyright (c) 2012-2019 Mort Yao +Copyright (c) 2012 Boyu Guo + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. diff --git a/README.md b/README.md index 86c5e4e9..360b5d0b 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,10 @@ [![Build Status](https://travis-ci.org/soimort/you-get.svg)](https://travis-ci.org/soimort/you-get) [![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) +**NOTICE: Read [this](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md) if you are looking for the conventional "Issues" tab.** + +--- + [You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it. Here's how you use `you-get` to download a video from [YouTube](https://www.youtube.com/watch?v=jNQXAC9IVRw): @@ -49,10 +53,10 @@ Are you a Python programmer? Then check out [the source](https://github.com/soim ### Prerequisites -The following dependencies are required and must be installed separately, unless you are using a pre-built package or chocolatey on Windows: +The following dependencies are necessary: -* **[Python 3](https://www.python.org/downloads/)** -* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/) +* **[Python](https://www.python.org/downloads/)** 3.2 or above +* **[FFmpeg](https://www.ffmpeg.org/)** 1.0 or above * (Optional) [RTMPDump](https://rtmpdump.mplayerhq.hu/) ### Option 1: Install via pip @@ -61,17 +65,13 @@ The official release of `you-get` is distributed on [PyPI](https://pypi.python.o $ pip3 install you-get -### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen) +### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen) (for Zsh users) Add the following line to your `.zshrc`: antigen bundle soimort/you-get -### Option 3: Use a pre-built package (Windows only) - -Download the `exe` (standalone) or `7z` (all dependencies included) from: . - -### Option 4: Download from GitHub +### Option 3: Download from GitHub You may either download the [stable](https://github.com/soimort/you-get/archive/master.zip) (identical with the latest release on PyPI) or the [develop](https://github.com/soimort/you-get/archive/develop.zip) (more hotfixes, unstable features) branch of `you-get`. Unzip it, and put the directory containing the `you-get` script into your `PATH`. @@ -89,7 +89,7 @@ $ python3 setup.py install --user to install `you-get` to a permanent path. -### Option 5: Git clone +### Option 4: Git clone This is the recommended way for all developers, even if you don't often code in Python. @@ -99,13 +99,7 @@ $ git clone git://github.com/soimort/you-get.git Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path. -### Option 6: Using [Chocolatey](https://chocolatey.org/) (Windows only) - -``` -> choco install you-get -``` - -### Option 7: Homebrew (Mac only) +### Option 5: Homebrew (Mac only) You can install `you-get` easily via: @@ -113,6 +107,14 @@ You can install `you-get` easily via: $ brew install you-get ``` +### Option 6: pkg (FreeBSD only) + +You can install `you-get` easily via: + +``` +# pkg install you-get +``` + ### Shell completion Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](https://github.com/soimort/you-get/tree/develop/contrib/completion). Please consult your shell's manual for how to take advantage of them. @@ -131,12 +133,6 @@ or download the latest release via: $ you-get https://github.com/soimort/you-get/archive/master.zip ``` -or use [chocolatey package manager](https://chocolatey.org): - -``` -> choco upgrade you-get -``` - In order to get the latest ```develop``` branch without messing up the PIP, you can try: ``` @@ -154,22 +150,54 @@ $ you-get -i 'https://www.youtube.com/watch?v=jNQXAC9IVRw' site: YouTube title: Me at the zoo streams: # Available quality and codecs + [ DASH ] ____________________________________ + - itag: 242 + container: webm + quality: 320x240 + size: 0.6 MiB (618358 bytes) + # download-with: you-get --itag=242 [URL] + + - itag: 395 + container: mp4 + quality: 320x240 + size: 0.5 MiB (550743 bytes) + # download-with: you-get --itag=395 [URL] + + - itag: 133 + container: mp4 + quality: 320x240 + size: 0.5 MiB (498558 bytes) + # download-with: you-get --itag=133 [URL] + + - itag: 278 + container: webm + quality: 192x144 + size: 0.4 MiB (392857 bytes) + # download-with: you-get --itag=278 [URL] + + - itag: 160 + container: mp4 + quality: 192x144 + size: 0.4 MiB (370882 bytes) + # download-with: you-get --itag=160 [URL] + + - itag: 394 + container: mp4 + quality: 192x144 + size: 0.4 MiB (367261 bytes) + # download-with: you-get --itag=394 [URL] + [ DEFAULT ] _________________________________ - itag: 43 container: webm quality: medium - size: 0.5 MiB (564215 bytes) + size: 0.5 MiB (568748 bytes) # download-with: you-get --itag=43 [URL] - itag: 18 container: mp4 - quality: medium - # download-with: you-get --itag=18 [URL] - - - itag: 5 - container: flv quality: small - # download-with: you-get --itag=5 [URL] + # download-with: you-get --itag=18 [URL] - itag: 36 container: 3gp @@ -182,23 +210,24 @@ streams: # Available quality and codecs # download-with: you-get --itag=17 [URL] ``` -The format marked with `DEFAULT` is the one you will get by default. If that looks cool to you, download it: +By default, the one on the top is the one you will get. If that looks cool to you, download it: ``` $ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw' site: YouTube title: Me at the zoo stream: - - itag: 43 + - itag: 242 container: webm - quality: medium - size: 0.5 MiB (564215 bytes) - # download-with: you-get --itag=43 [URL] + quality: 320x240 + size: 0.6 MiB (618358 bytes) + # download-with: you-get --itag=242 [URL] -Downloading zoo.webm ... -100.0% ( 0.5/0.5 MB) ├████████████████████████████████████████┤[1/1] 7 MB/s +Downloading Me at the zoo.webm ... + 100% ( 0.6/ 0.6MB) ├██████████████████████████████████████████████████████████████████████████████┤[2/2] 2 MB/s +Merging video parts... Merged into Me at the zoo.webm -Saving Me at the zoo.en.srt ...Done. +Saving Me at the zoo.en.srt ... Done. ``` (If a YouTube video has any closed captions, they will be downloaded together with the video file, in SubRip subtitle format.) @@ -298,7 +327,7 @@ However, the system proxy setting (i.e. the environment variable `http_proxy`) i ### Watch a video -Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mplayer` or `vlc`, instead of downloading it: +Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mpv` or `vlc`, instead of downloading it: ``` $ you-get -p vlc 'https://www.youtube.com/watch?v=jNQXAC9IVRw' @@ -374,11 +403,10 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the | **niconico
ニコニコ動画** | |✓| | | | **163
网易视频
网易云音乐** |
|✓| |✓| | 56网 | |✓| | | -| **AcFun** | |✓| | | +| **AcFun** | |✓| | | | **Baidu
百度贴吧** | |✓|✓| | | 爆米花网 | |✓| | | | **bilibili
哔哩哔哩** | |✓| | | -| Dilidili | |✓| | | | 豆瓣 | |✓| |✓| | 斗鱼 | |✓| | | | Panda
熊猫 | |✓| | | @@ -407,15 +435,16 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the | **Youku
优酷** | |✓| | | | 战旗TV | |✓| | | | 央视网 | |✓| | | -| 花瓣 | | |✓| | | Naver
네이버 | |✓| | | | 芒果TV | |✓| | | | 火猫TV | |✓| | | -| 全民直播 | |✓| | | | 阳光宽频网 | |✓| | | | 西瓜视频 | |✓| | | | 快手 | |✓|✓| | | 抖音 | |✓| | | +| TikTok | |✓| | | +| 中国体育(TV) |
|✓| | | +| 知乎 | |✓| | | For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page. @@ -423,7 +452,7 @@ For all other sites not on the list, the universal extractor will take care of f If something is broken and `you-get` can't get you things you want, don't panic. (Yes, this happens all the time!) -Check if it's already a known problem on . If not, follow the guidelines on [how to report a broken extractor](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md#report-a-broken-extractor). +Check if it's already a known problem on . If not, follow the guidelines on [how to report an issue](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md). ## Getting Involved diff --git a/src/you_get/common.py b/src/you_get/common.py index a4a036a4..b2bca0a5 100755 --- a/src/you_get/common.py +++ b/src/you_get/common.py @@ -10,6 +10,7 @@ import socket import locale import logging import argparse +import ssl from http import cookiejar from importlib import import_module from urllib import request, parse, error @@ -24,6 +25,7 @@ sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') SITES = { '163' : 'netease', '56' : 'w56', + '365yg' : 'toutiao', 'acfun' : 'acfun', 'archive' : 'archive', 'baidu' : 'baidu', @@ -36,13 +38,11 @@ SITES = { 'cbs' : 'cbs', 'coub' : 'coub', 'dailymotion' : 'dailymotion', - 'dilidili' : 'dilidili', 'douban' : 'douban', 'douyin' : 'douyin', 'douyu' : 'douyutv', 'ehow' : 'ehow', 'facebook' : 'facebook', - 'fantasy' : 'fantasy', 'fc2' : 'fc2video', 'flickr' : 'flickr', 'freesound' : 'freesound', @@ -50,7 +50,6 @@ SITES = { 'google' : 'google', 'giphy' : 'giphy', 'heavy-music' : 'heavymusic', - 'huaban' : 'huaban', 'huomao' : 'huomaotv', 'iask' : 'sina', 'icourses' : 'icourses', @@ -64,6 +63,7 @@ SITES = { 'iqiyi' : 'iqiyi', 'ixigua' : 'ixigua', 'isuntv' : 'suntv', + 'iwara' : 'iwara', 'joy' : 'joy', 'kankanews' : 'bilibili', 'khanacademy' : 'khan', @@ -74,6 +74,7 @@ SITES = { 'le' : 'le', 'letv' : 'le', 'lizhi' : 'lizhi', + 'longzhu' : 'longzhu', 'magisto' : 'magisto', 'metacafe' : 'metacafe', 'mgtv' : 'mgtv', @@ -81,16 +82,15 @@ SITES = { 'mixcloud' : 'mixcloud', 'mtv81' : 'mtv81', 'musicplayon' : 'musicplayon', + 'miaopai' : 'yixia', 'naver' : 'naver', '7gogo' : 'nanagogo', 'nicovideo' : 'nicovideo', - 'panda' : 'panda', 'pinterest' : 'pinterest', 'pixnet' : 'pixnet', 'pptv' : 'pptv', 'qingting' : 'qingting', 'qq' : 'qq', - 'quanmin' : 'quanmin', 'showroom-live' : 'showroom', 'sina' : 'sina', 'smgbb' : 'bilibili', @@ -98,6 +98,7 @@ SITES = { 'soundcloud' : 'soundcloud', 'ted' : 'ted', 'theplatform' : 'theplatform', + 'tiktok' : 'tiktok', 'tucao' : 'tucao', 'tudou' : 'tudou', 'tumblr' : 'tumblr', @@ -117,30 +118,32 @@ SITES = { 'xiaojiadianvideo' : 'fc2video', 'ximalaya' : 'ximalaya', 'yinyuetai' : 'yinyuetai', - 'miaopai' : 'yixia', 'yizhibo' : 'yizhibo', 'youku' : 'youku', - 'iwara' : 'iwara', 'youtu' : 'youtube', 'youtube' : 'youtube', 'zhanqi' : 'zhanqi', - '365yg' : 'toutiao', + 'zhibo' : 'zhibo', + 'zhihu' : 'zhihu', } dry_run = False json_output = False force = False +skip_existing_file_size_check = False player = None extractor_proxy = None cookies = None output_filename = None +auto_rename = False +insecure = False fake_headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # noqa 'Accept-Charset': 'UTF-8,*;q=0.5', 'Accept-Encoding': 'gzip,deflate,sdch', 'Accept-Language': 'en-US,en;q=0.8', - 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0', # noqa + 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0', # noqa } if sys.stdout.isatty(): @@ -268,7 +271,15 @@ def matchall(text, patterns): def launch_player(player, urls): import subprocess import shlex - subprocess.call(shlex.split(player) + list(urls)) + if (sys.version_info >= (3, 3)): + import shutil + exefile=shlex.split(player)[0] + if shutil.which(exefile) is not None: + subprocess.call(shlex.split(player) + list(urls)) + else: + log.wtf('[Failed] Cannot find player "%s"' % exefile) + else: + subprocess.call(shlex.split(player) + list(urls)) def parse_query_param(url, param): @@ -366,20 +377,30 @@ def get_decoded_html(url, faker=False): return data -def get_location(url): +def get_location(url, headers=None, get_method='HEAD'): logging.debug('get_location: %s' % url) - response = request.urlopen(url) - # urllib will follow redirections and it's too much code to tell urllib - # not to do that - return response.geturl() + if headers: + req = request.Request(url, headers=headers) + else: + req = request.Request(url) + req.get_method = lambda: get_method + res = urlopen_with_retry(req) + return res.geturl() def urlopen_with_retry(*args, **kwargs): retry_time = 3 for i in range(retry_time): try: - return request.urlopen(*args, **kwargs) + if insecure: + # ignore ssl errors + ctx = ssl.create_default_context() + ctx.check_hostname = False + ctx.verify_mode = ssl.CERT_NONE + return request.urlopen(*args, context=ctx, **kwargs) + else: + return request.urlopen(*args, **kwargs) except socket.timeout as e: logging.debug('request attempt %s timeout' % str(i + 1)) if i + 1 == retry_time: @@ -423,17 +444,17 @@ def get_content(url, headers={}, decoded=True): # Decode the response body if decoded: charset = match1( - response.getheader('Content-Type'), r'charset=([\w-]+)' + response.getheader('Content-Type', ''), r'charset=([\w-]+)' ) if charset is not None: - data = data.decode(charset) + data = data.decode(charset, 'ignore') else: data = data.decode('utf-8', 'ignore') return data -def post_content(url, headers={}, post_data={}, decoded=True): +def post_content(url, headers={}, post_data={}, decoded=True, **kwargs): """Post the content of a URL via sending a HTTP POST request. Args: @@ -444,14 +465,19 @@ def post_content(url, headers={}, post_data={}, decoded=True): Returns: The content as a string. """ - - logging.debug('post_content: %s \n post_data: %s' % (url, post_data)) + if kwargs.get('post_data_raw'): + logging.debug('post_content: %s\npost_data_raw: %s' % (url, kwargs['post_data_raw'])) + else: + logging.debug('post_content: %s\npost_data: %s' % (url, post_data)) req = request.Request(url, headers=headers) if cookies: cookies.add_cookie_header(req) req.headers.update(req.unredirected_hdrs) - post_data_enc = bytes(parse.urlencode(post_data), 'utf-8') + if kwargs.get('post_data_raw'): + post_data_enc = bytes(kwargs['post_data_raw'], 'utf-8') + else: + post_data_enc = bytes(parse.urlencode(post_data), 'utf-8') response = urlopen_with_retry(req, data=post_data_enc) data = response.read() @@ -493,7 +519,7 @@ def urls_size(urls, faker=False, headers={}): return sum([url_size(url, faker=faker, headers=headers) for url in urls]) -def get_head(url, headers={}, get_method='HEAD'): +def get_head(url, headers=None, get_method='HEAD'): logging.debug('get_head: %s' % url) if headers: @@ -502,7 +528,7 @@ def get_head(url, headers={}, get_method='HEAD'): req = request.Request(url) req.get_method = lambda: get_method res = urlopen_with_retry(req) - return dict(res.headers) + return res.headers def url_info(url, faker=False, headers={}): @@ -596,29 +622,60 @@ def url_save( # the key must be 'Referer' for the hack here if refer is not None: tmp_headers['Referer'] = refer - file_size = url_size(url, faker=faker, headers=tmp_headers) + if type(url) is list: + file_size = urls_size(url, faker=faker, headers=tmp_headers) + is_chunked, urls = True, url + else: + file_size = url_size(url, faker=faker, headers=tmp_headers) + is_chunked, urls = False, [url] - if os.path.exists(filepath): - if not force and file_size == os.path.getsize(filepath): - if not is_part: - if bar: - bar.done() - print( - 'Skipping {}: file already exists'.format( - tr(os.path.basename(filepath)) - ) - ) + continue_renameing = True + while continue_renameing: + continue_renameing = False + if os.path.exists(filepath): + if not force and (file_size == os.path.getsize(filepath) or skip_existing_file_size_check): + if not is_part: + if bar: + bar.done() + if skip_existing_file_size_check: + log.w( + 'Skipping {} without checking size: file already exists'.format( + tr(os.path.basename(filepath)) + ) + ) + else: + log.w( + 'Skipping {}: file already exists'.format( + tr(os.path.basename(filepath)) + ) + ) + else: + if bar: + bar.update_received(file_size) + return else: - if bar: - bar.update_received(file_size) - return - else: - if not is_part: - if bar: - bar.done() - print('Overwriting %s' % tr(os.path.basename(filepath)), '...') - elif not os.path.exists(os.path.dirname(filepath)): - os.mkdir(os.path.dirname(filepath)) + if not is_part: + if bar: + bar.done() + if not force and auto_rename: + path, ext = os.path.basename(filepath).rsplit('.', 1) + finder = re.compile(' \([1-9]\d*?\)$') + if (finder.search(path) is None): + thisfile = path + ' (1).' + ext + else: + def numreturn(a): + return ' (' + str(int(a.group()[2:-1]) + 1) + ').' + thisfile = finder.sub(numreturn, path) + ext + filepath = os.path.join(os.path.dirname(filepath), thisfile) + print('Changing name to %s' % tr(os.path.basename(filepath)), '...') + continue_renameing = True + continue + if log.yes_or_no('File with this name already exists. Overwrite?'): + log.w('Overwriting %s ...' % tr(os.path.basename(filepath))) + else: + return + elif not os.path.exists(os.path.dirname(filepath)): + os.mkdir(os.path.dirname(filepath)) temp_filepath = filepath + '.download' if file_size != float('inf') \ else filepath @@ -633,70 +690,78 @@ def url_save( else: open_mode = 'wb' - if received < file_size: - if faker: - tmp_headers = fake_headers - ''' - if parameter headers passed in, we have it copied as tmp_header - elif headers: - headers = headers - else: - headers = {} - ''' - if received: - tmp_headers['Range'] = 'bytes=' + str(received) + '-' - if refer: - tmp_headers['Referer'] = refer + for url in urls: + received_chunk = 0 + if received < file_size: + if faker: + tmp_headers = fake_headers + ''' + if parameter headers passed in, we have it copied as tmp_header + elif headers: + headers = headers + else: + headers = {} + ''' + if received and not is_chunked: # only request a range when not chunked + tmp_headers['Range'] = 'bytes=' + str(received) + '-' + if refer: + tmp_headers['Referer'] = refer - if timeout: - response = urlopen_with_retry( - request.Request(url, headers=tmp_headers), timeout=timeout - ) - else: - response = urlopen_with_retry( - request.Request(url, headers=tmp_headers) - ) - try: - range_start = int( - response.headers[ - 'content-range' - ][6:].split('/')[0].split('-')[0] - ) - end_length = int( - response.headers['content-range'][6:].split('/')[1] - ) - range_length = end_length - range_start - except: - content_length = response.headers['content-length'] - range_length = int(content_length) if content_length is not None \ - else float('inf') + if timeout: + response = urlopen_with_retry( + request.Request(url, headers=tmp_headers), timeout=timeout + ) + else: + response = urlopen_with_retry( + request.Request(url, headers=tmp_headers) + ) + try: + range_start = int( + response.headers[ + 'content-range' + ][6:].split('/')[0].split('-')[0] + ) + end_length = int( + response.headers['content-range'][6:].split('/')[1] + ) + range_length = end_length - range_start + except: + content_length = response.headers['content-length'] + range_length = int(content_length) if content_length is not None \ + else float('inf') - if file_size != received + range_length: - received = 0 - if bar: - bar.received = 0 - open_mode = 'wb' - - with open(temp_filepath, open_mode) as output: - while True: - buffer = None - try: - buffer = response.read(1024 * 256) - except socket.timeout: - pass - if not buffer: - if received == file_size: # Download finished - break - # Unexpected termination. Retry request - tmp_headers['Range'] = 'bytes=' + str(received) + '-' - response = urlopen_with_retry( - request.Request(url, headers=tmp_headers) - ) - continue - output.write(buffer) - received += len(buffer) + if is_chunked: # always append if chunked + open_mode = 'ab' + elif file_size != received + range_length: # is it ever necessary? + received = 0 if bar: - bar.update_received(len(buffer)) + bar.received = 0 + open_mode = 'wb' + + with open(temp_filepath, open_mode) as output: + while True: + buffer = None + try: + buffer = response.read(1024 * 256) + except socket.timeout: + pass + if not buffer: + if is_chunked and received_chunk == range_length: + break + elif not is_chunked and received == file_size: # Download finished + break + # Unexpected termination. Retry request + if not is_chunked: # when + tmp_headers['Range'] = 'bytes=' + str(received) + '-' + response = urlopen_with_retry( + request.Request(url, headers=tmp_headers) + ) + continue + output.write(buffer) + received += len(buffer) + received_chunk += len(buffer) + if bar: + bar.update_received(len(buffer)) assert received == os.path.getsize(temp_filepath), '%s == %s == %s' % ( received, os.path.getsize(temp_filepath), temp_filepath @@ -820,13 +885,16 @@ class DummyProgressBar: pass -def get_output_filename(urls, title, ext, output_dir, merge): +def get_output_filename(urls, title, ext, output_dir, merge, **kwargs): # lame hack for the --output-filename option global output_filename if output_filename: + result = output_filename + if kwargs.get('part', -1) >= 0: + result = '%s[%02d]' % (result, kwargs.get('part')) if ext: - return output_filename + '.' + ext - return output_filename + result = '%s.%s' % (result, ext) + return result merged_ext = ext if (len(urls) > 1) and merge: @@ -843,7 +911,11 @@ def get_output_filename(urls, title, ext, output_dir, merge): merged_ext = 'mkv' else: merged_ext = 'ts' - return '%s.%s' % (title, merged_ext) + result = title + if kwargs.get('part', -1) >= 0: + result = '%s[%02d]' % (result, kwargs.get('part')) + result = '%s.%s' % (result, merged_ext) + return result def print_user_agent(faker=False): urllib_default_user_agent = 'Python-urllib/%d.%d' % sys.version_info[:2] @@ -863,7 +935,10 @@ def download_urls( return if dry_run: print_user_agent(faker=faker) - print('Real URLs:\n%s' % '\n'.join(urls)) + try: + print('Real URLs:\n%s' % '\n'.join(urls)) + except: + print('Real URLs:\n%s' % '\n'.join([j for i in urls for j in i])) return if player: @@ -883,9 +958,13 @@ def download_urls( output_filepath = os.path.join(output_dir, output_filename) if total_size: - if not force and os.path.exists(output_filepath) \ - and os.path.getsize(output_filepath) >= total_size * 0.9: - print('Skipping %s: file already exists' % output_filepath) + if not force and os.path.exists(output_filepath) and not auto_rename\ + and (os.path.getsize(output_filepath) >= total_size * 0.9\ + or skip_existing_file_size_check): + if skip_existing_file_size_check: + log.w('Skipping %s without checking size: file already exists' % output_filepath) + else: + log.w('Skipping %s: file already exists' % output_filepath) print() return bar = SimpleProgressBar(total_size, len(urls)) @@ -903,16 +982,16 @@ def download_urls( bar.done() else: parts = [] - print('Downloading %s.%s ...' % (tr(title), ext)) + print('Downloading %s ...' % tr(output_filename)) bar.update() for i, url in enumerate(urls): - filename = '%s[%02d].%s' % (title, i, ext) - filepath = os.path.join(output_dir, filename) - parts.append(filepath) + output_filename_i = get_output_filename(urls, title, ext, output_dir, merge, part=i) + output_filepath_i = os.path.join(output_dir, output_filename_i) + parts.append(output_filepath_i) # print 'Downloading %s [%s/%s]...' % (tr(filename), i + 1, len(urls)) bar.update_piece(i + 1) url_save( - url, filepath, bar, refer=refer, is_part=True, faker=faker, + url, output_filepath_i, bar, refer=refer, is_part=True, faker=faker, headers=headers, **kwargs ) bar.done() @@ -1225,27 +1304,89 @@ def download_main(download, download_playlist, urls, playlist, **kwargs): def load_cookies(cookiefile): global cookies - try: - cookies = cookiejar.MozillaCookieJar(cookiefile) - cookies.load() - except Exception: - import sqlite3 + if cookiefile.endswith('.txt'): + # MozillaCookieJar treats prefix '#HttpOnly_' as comments incorrectly! + # do not use its load() + # see also: + # - https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.MozillaCookieJar + # - https://github.com/python/cpython/blob/4b219ce/Lib/http/cookiejar.py#L2014 + # - https://curl.haxx.se/libcurl/c/CURLOPT_COOKIELIST.html#EXAMPLE + #cookies = cookiejar.MozillaCookieJar(cookiefile) + #cookies.load() + from http.cookiejar import Cookie cookies = cookiejar.MozillaCookieJar() - con = sqlite3.connect(cookiefile) - cur = con.cursor() - try: - cur.execute("""SELECT host, path, isSecure, expiry, name, value - FROM moz_cookies""") - for item in cur.fetchall(): - c = cookiejar.Cookie( - 0, item[4], item[5], None, False, item[0], - item[0].startswith('.'), item[0].startswith('.'), - item[1], False, item[2], item[3], item[3] == '', None, - None, {}, - ) + now = time.time() + ignore_discard, ignore_expires = False, False + with open(cookiefile, 'r') as f: + for line in f: + # last field may be absent, so keep any trailing tab + if line.endswith("\n"): line = line[:-1] + + # skip comments and blank lines XXX what is $ for? + if (line.strip().startswith(("#", "$")) or + line.strip() == ""): + if not line.strip().startswith('#HttpOnly_'): # skip for #HttpOnly_ + continue + + domain, domain_specified, path, secure, expires, name, value = \ + line.split("\t") + secure = (secure == "TRUE") + domain_specified = (domain_specified == "TRUE") + if name == "": + # cookies.txt regards 'Set-Cookie: foo' as a cookie + # with no name, whereas http.cookiejar regards it as a + # cookie with no value. + name = value + value = None + + initial_dot = domain.startswith(".") + if not line.strip().startswith('#HttpOnly_'): # skip for #HttpOnly_ + assert domain_specified == initial_dot + + discard = False + if expires == "": + expires = None + discard = True + + # assume path_specified is false + c = Cookie(0, name, value, + None, False, + domain, domain_specified, initial_dot, + path, False, + secure, + expires, + discard, + None, + None, + {}) + if not ignore_discard and c.discard: + continue + if not ignore_expires and c.is_expired(now): + continue cookies.set_cookie(c) - except Exception: - pass + + elif cookiefile.endswith(('.sqlite', '.sqlite3')): + import sqlite3, shutil, tempfile + temp_dir = tempfile.gettempdir() + temp_cookiefile = os.path.join(temp_dir, 'temp_cookiefile.sqlite') + shutil.copy2(cookiefile, temp_cookiefile) + + cookies = cookiejar.MozillaCookieJar() + con = sqlite3.connect(temp_cookiefile) + cur = con.cursor() + cur.execute("""SELECT host, path, isSecure, expiry, name, value + FROM moz_cookies""") + for item in cur.fetchall(): + c = cookiejar.Cookie( + 0, item[4], item[5], None, False, item[0], + item[0].startswith('.'), item[0].startswith('.'), + item[1], False, item[2], item[3], item[3] == '', None, + None, {}, + ) + cookies.set_cookie(c) + + else: + log.e('[error] unsupported cookies format') # TODO: Chromium Cookies # SELECT host_key, path, secure, expires_utc, name, encrypted_value # FROM cookies @@ -1332,6 +1473,10 @@ def script_main(download, download_playlist, **kwargs): '-f', '--force', action='store_true', default=False, help='Force overwriting existing files' ) + download_grp.add_argument( + '--skip-existing-file-size-check', action='store_true', default=False, + help='Skip existing file without checking file size' + ) download_grp.add_argument( '-F', '--format', metavar='STREAM_ID', help='Set video format to STREAM_ID' @@ -1370,6 +1515,15 @@ def script_main(download, download_playlist, **kwargs): '-l', '--playlist', action='store_true', help='Prefer to download a playlist' ) + download_grp.add_argument( + '-a', '--auto-rename', action='store_true', default=False, + help='Auto rename same name different files' + ) + + download_grp.add_argument( + '-k', '--insecure', action='store_true', default=False, + help='ignore ssl errors' + ) proxy_grp = parser.add_argument_group('Proxy options') proxy_grp = proxy_grp.add_mutually_exclusive_group() @@ -1409,16 +1563,24 @@ def script_main(download, download_playlist, **kwargs): logging.getLogger().setLevel(logging.DEBUG) global force + global skip_existing_file_size_check global dry_run global json_output global player global extractor_proxy global output_filename - + global auto_rename + global insecure output_filename = args.output_filename extractor_proxy = args.extractor_proxy info_only = args.info + if args.force: + force = True + if args.skip_existing_file_size_check: + skip_existing_file_size_check = True + if args.auto_rename: + auto_rename = True if args.url: dry_run = True if args.json: @@ -1438,6 +1600,11 @@ def script_main(download, download_playlist, **kwargs): player = args.player caption = False + if args.insecure: + # ignore ssl + insecure = True + + if args.no_proxy: set_http_proxy('') else: @@ -1523,9 +1690,9 @@ def google_search(url): url = 'https://www.google.com/search?tbm=vid&q=%s' % parse.quote(keywords) page = get_content(url, headers=fake_headers) videos = re.findall( - r'([^<]+)<', page + r'

([^<]+)<', page ) - vdurs = re.findall(r'([^<]+)<', page) + vdurs = re.findall(r'([^<]+)<', page) durs = [r1(r'(\d+:\d+)', unescape_html(dur)) for dur in vdurs] print('Google Videos search:') for v in zip(videos, durs): @@ -1554,6 +1721,11 @@ def url_to_module(url): domain = r1(r'(\.[^.]+\.[^.]+)$', video_host) or video_host assert domain, 'unsupported url: ' + url + # all non-ASCII code points must be quoted (percent-encoded UTF-8) + url = ''.join([ch if ord(ch) in range(128) else parse.quote(ch) for ch in url]) + video_host = r1(r'https?://([^/]+)/', url) + video_url = r1(r'https?://[^/]+(.*)', url) + k = r1(r'([^.]+)', domain) if k in SITES: return ( @@ -1561,15 +1733,11 @@ def url_to_module(url): url ) else: - import http.client - video_host = r1(r'https?://([^/]+)/', url) # .cn could be removed - if url.startswith('https://'): - conn = http.client.HTTPSConnection(video_host) - else: - conn = http.client.HTTPConnection(video_host) - conn.request('HEAD', video_url, headers=fake_headers) - res = conn.getresponse() - location = res.getheader('location') + try: + location = get_location(url) # t.co isn't happy with fake_headers + except: + location = get_location(url, headers=fake_headers) + if location and location != url and not location.startswith('/'): return url_to_module(location) else: diff --git a/src/you_get/extractor.py b/src/you_get/extractor.py index 4c9ccaa5..c4315935 100644 --- a/src/you_get/extractor.py +++ b/src/you_get/extractor.py @@ -1,10 +1,11 @@ #!/usr/bin/env python -from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run +from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run, player from .common import print_more_compatible as print from .util import log from . import json_output import os +import sys class Extractor(): def __init__(self, *args): @@ -32,7 +33,8 @@ class VideoExtractor(): self.out = False self.ua = None self.referer = None - self.danmuku = None + self.danmaku = None + self.lyrics = None if args: self.url = args[0] @@ -105,7 +107,7 @@ class VideoExtractor(): if 'quality' in stream: print(" quality: %s" % stream['quality']) - if 'size' in stream and stream['container'].lower() != 'm3u8': + if 'size' in stream and 'container' in stream and stream['container'].lower() != 'm3u8': if stream['size'] != float('inf') and stream['size'] != 0: print(" size: %s MiB (%s bytes)" % (round(stream['size'] / 1048576, 1), stream['size'])) @@ -130,6 +132,8 @@ class VideoExtractor(): print(" url: %s" % self.url) print() + sys.stdout.flush() + def p(self, stream_id=None): maybe_print("site: %s" % self.__class__.name) maybe_print("title: %s" % self.title) @@ -154,9 +158,10 @@ class VideoExtractor(): for stream in itags: self.p_stream(stream) # Print all other available streams - print(" [ DEFAULT ] %s" % ('_' * 33)) - for stream in self.streams_sorted: - self.p_stream(stream['id'] if 'id' in stream else stream['itag']) + if self.streams_sorted: + print(" [ DEFAULT ] %s" % ('_' * 33)) + for stream in self.streams_sorted: + self.p_stream(stream['id'] if 'id' in stream else stream['itag']) if self.audiolang: print("audio-languages:") @@ -164,6 +169,8 @@ class VideoExtractor(): print(" - lang: {}".format(i['lang'])) print(" download-url: {}\n".format(i['url'])) + sys.stdout.flush() + def p_playlist(self, stream_id=None): maybe_print("site: %s" % self.__class__.name) print("playlist: %s" % self.title) @@ -195,7 +202,13 @@ class VideoExtractor(): else: # Download stream with the best quality from .processor.ffmpeg import has_ffmpeg_installed - stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag'] + if has_ffmpeg_installed() and player is None and self.dash_streams or not self.streams_sorted: + #stream_id = list(self.dash_streams)[-1] + itags = sorted(self.dash_streams, + key=lambda i: -self.dash_streams[i]['size']) + stream_id = itags[0] + else: + stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag'] if 'index' not in kwargs: self.p(stream_id) @@ -211,7 +224,7 @@ class VideoExtractor(): ext = self.dash_streams[stream_id]['container'] total_size = self.dash_streams[stream_id]['size'] - if ext == 'm3u8': + if ext == 'm3u8' or ext == 'm4a': ext = 'mp4' if not urls: @@ -226,9 +239,11 @@ class VideoExtractor(): output_dir=kwargs['output_dir'], merge=kwargs['merge'], av=stream_id in self.dash_streams) + if 'caption' not in kwargs or not kwargs['caption']: - print('Skipping captions or danmuku.') + print('Skipping captions or danmaku.') return + for lang in self.caption_tracks: filename = '%s.%s.srt' % (get_filename(self.title), lang) print('Saving %s ... ' % filename, end="", flush=True) @@ -237,11 +252,18 @@ class VideoExtractor(): 'w', encoding='utf-8') as x: x.write(srt) print('Done.') - if self.danmuku is not None and not dry_run: + + if self.danmaku is not None and not dry_run: filename = '{}.cmt.xml'.format(get_filename(self.title)) print('Downloading {} ...\n'.format(filename)) with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp: - fp.write(self.danmuku) + fp.write(self.danmaku) + + if self.lyrics is not None and not dry_run: + filename = '{}.lrc'.format(get_filename(self.title)) + print('Downloading {} ...\n'.format(filename)) + with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp: + fp.write(self.lyrics) # For main_dev() #download_urls(urls, self.title, self.streams[stream_id]['container'], self.streams[stream_id]['size']) diff --git a/src/you_get/extractors/__init__.py b/src/you_get/extractors/__init__.py index 46e5c89c..2961f015 100755 --- a/src/you_get/extractors/__init__.py +++ b/src/you_get/extractors/__init__.py @@ -13,20 +13,17 @@ from .ckplayer import * from .cntv import * from .coub import * from .dailymotion import * -from .dilidili import * from .douban import * from .douyin import * from .douyutv import * from .ehow import * from .facebook import * -from .fantasy import * from .fc2video import * from .flickr import * from .freesound import * from .funshion import * from .google import * from .heavymusic import * -from .huaban import * from .icourses import * from .ifeng import * from .imgur import * @@ -41,6 +38,7 @@ from .kugou import * from .kuwo import * from .le import * from .lizhi import * +from .longzhu import * from .magisto import * from .metacafe import * from .mgtv import * @@ -53,7 +51,6 @@ from .nanagogo import * from .naver import * from .netease import * from .nicovideo import * -from .panda import * from .pinterest import * from .pixnet import * from .pptv import * @@ -66,6 +63,7 @@ from .sohu import * from .soundcloud import * from .suntv import * from .theplatform import * +from .tiktok import * from .tucao import * from .tudou import * from .tumblr import * @@ -87,3 +85,5 @@ from .ted import * from .khan import * from .zhanqi import * from .kuaishou import * +from .zhibo import * +from .zhihu import * diff --git a/src/you_get/extractors/acfun.py b/src/you_get/extractors/acfun.py index c521422f..61f6cae8 100644 --- a/src/you_get/extractors/acfun.py +++ b/src/you_get/extractors/acfun.py @@ -65,7 +65,7 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals elif sourceType == 'tudou': tudou_download_by_iid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only) elif sourceType == 'qq': - qq_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only) + qq_download_by_vid(sourceId, title, True, output_dir=output_dir, merge=merge, info_only=info_only) elif sourceType == 'letv': letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only) elif sourceType == 'zhuzhan': @@ -85,9 +85,13 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals _, _, seg_size = url_info(url) size += seg_size #fallback to flvhd is not quite possible - print_info(site_info, title, 'mp4', size) + if re.search(r'fid=[0-9A-Z\-]*.flv', preferred[0][0]): + ext = 'flv' + else: + ext = 'mp4' + print_info(site_info, title, ext, size) if not info_only: - download_urls(preferred[0], title, 'mp4', size, output_dir=output_dir, merge=merge) + download_urls(preferred[0], title, ext, size, output_dir=output_dir, merge=merge) else: raise NotImplementedError(sourceType) @@ -105,27 +109,46 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals pass def acfun_download(url, output_dir='.', merge=True, info_only=False, **kwargs): - assert re.match(r'http://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url) - html = get_content(url) + assert re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/(\D|bangumi)/\D\D(\d+)', url) - title = r1(r'data-title="([^"]+)"', html) + if re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url): + html = get_content(url) + json_text = match1(html, r"(?s)videoInfo\s*=\s*(\{.*?\});") + json_data = json.loads(json_text) + vid = json_data.get('currentVideoInfo').get('id') + up = json_data.get('user').get('name') + title = json_data.get('title') + video_list = json_data.get('videoList') + if len(video_list) > 1: + title += " - " + [p.get('title') for p in video_list if p.get('id') == vid][0] + # bangumi + elif re.match("https?://[^\.]*\.*acfun\.[^\.]+/bangumi/ab(\d+)", url): + html = get_content(url) + tag_script = match1(html, r'') + json_text = tag_script[tag_script.find('{') : tag_script.find('};') + 1] + json_data = json.loads(json_text) + title = json_data['bangumiTitle'] + " " + json_data['episodeName'] + " " + json_data['title'] + vid = str(json_data['videoId']) + up = "acfun" + else: + raise NotImplemented + + assert title and vid title = unescape_html(title) title = escape_file_path(title) - assert title - if match1(url, r'_(\d+)$'): # current P - title = title + " " + r1(r'active">([^<]*)', html) - - vid = r1('data-vid="(\d+)"', html) - up = r1('data-name="([^"]+)"', html) p_title = r1('active">([^<]+)', html) title = '%s (%s)' % (title, up) - if p_title: title = '%s - %s' % (title, p_title) + if p_title: + title = '%s - %s' % (title, p_title) + + acfun_download_by_vid(vid, title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs) -site_info = "AcFun.tv" + +site_info = "AcFun.cn" download = acfun_download download_playlist = playlist_not_supported('acfun') diff --git a/src/you_get/extractors/baidu.py b/src/you_get/extractors/baidu.py index 6f558e31..7914667e 100644 --- a/src/you_get/extractors/baidu.py +++ b/src/you_get/extractors/baidu.py @@ -38,7 +38,7 @@ def baidu_get_song_title(data): def baidu_get_song_lyric(data): lrc = data['lrcLink'] - return None if lrc is '' else "http://music.baidu.com%s" % lrc + return "http://music.baidu.com%s" % lrc if lrc else None def baidu_download_song(sid, output_dir='.', merge=True, info_only=False): @@ -123,12 +123,22 @@ def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only= elif re.match('http://tieba.baidu.com/', url): try: # embedded videos - embed_download(url, output_dir, merge=merge, info_only=info_only) + embed_download(url, output_dir, merge=merge, info_only=info_only, **kwargs) except: # images html = get_html(url) title = r1(r'title:"([^"]+)"', html) + vhsrc = re.findall(r'"BDE_Image"[^>]+src="([^"]+\.mp4)"', html) or \ + re.findall(r'vhsrc="([^"]+)"', html) + if len(vhsrc) > 0: + ext = 'mp4' + size = url_size(vhsrc[0]) + print_info(site_info, title, ext, size) + if not info_only: + download_urls(vhsrc, title, ext, size, + output_dir=output_dir, merge=False) + items = re.findall( r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html) urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i diff --git a/src/you_get/extractors/bilibili.py b/src/you_get/extractors/bilibili.py index e5abccab..668f40f8 100644 --- a/src/you_get/extractors/bilibili.py +++ b/src/you_get/extractors/bilibili.py @@ -1,362 +1,573 @@ #!/usr/bin/env python -__all__ = ['bilibili_download'] +from ..common import * +from ..extractor import VideoExtractor import hashlib -import re -import time -import json -import http.cookiejar -import urllib.request -import urllib.parse -from xml.dom.minidom import parseString - -from ..common import * -from ..util.log import * -from ..extractor import * - -from .qq import qq_download_by_vid -from .sina import sina_download_by_vid -from .tudou import tudou_download_by_id -from .youku import youku_download_by_vid class Bilibili(VideoExtractor): - name = 'Bilibili' - live_api = 'http://live.bilibili.com/api/playurl?cid={}&otype=json' - api_url = 'http://interface.bilibili.com/playurl?' - bangumi_api_url = 'http://bangumi.bilibili.com/player/web_api/playurl?' - live_room_init_api_url = 'https://api.live.bilibili.com/room/v1/Room/room_init?id={}' - live_room_info_api_url = 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id={}' + name = "Bilibili" - SEC1 = '1c15888dc316e05a15fdd0a02ed6584f' - SEC2 = '9b288147e5474dd2aa67085f716c560d' + # Bilibili media encoding options, in descending quality order. stream_types = [ - {'id': 'hdflv'}, - {'id': 'flv720'}, - {'id': 'flv'}, - {'id': 'hdmp4'}, - {'id': 'mp4'}, - {'id': 'live'}, - {'id': 'vc'} + {'id': 'flv_p60', 'quality': 116, 'audio_quality': 30280, + 'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P60'}, + {'id': 'hdflv2', 'quality': 112, 'audio_quality': 30280, + 'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P+'}, + {'id': 'flv', 'quality': 80, 'audio_quality': 30280, + 'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P'}, + {'id': 'flv720_p60', 'quality': 74, 'audio_quality': 30280, + 'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P60'}, + {'id': 'flv720', 'quality': 64, 'audio_quality': 30280, + 'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P'}, + {'id': 'hdmp4', 'quality': 48, 'audio_quality': 30280, + 'container': 'MP4', 'video_resolution': '720p', 'desc': '高清 720P (MP4)'}, + {'id': 'flv480', 'quality': 32, 'audio_quality': 30280, + 'container': 'FLV', 'video_resolution': '480p', 'desc': '清晰 480P'}, + {'id': 'flv360', 'quality': 16, 'audio_quality': 30216, + 'container': 'FLV', 'video_resolution': '360p', 'desc': '流畅 360P'}, + # 'quality': 15? + {'id': 'mp4', 'quality': 0}, ] - fmt2qlt = dict(hdflv=4, flv=3, hdmp4=2, mp4=1) @staticmethod - def bilibili_stream_type(urls): - url = urls[0] - if 'hd.flv' in url or '-112.flv' in url: - return 'hdflv', 'flv' - if '-64.flv' in url: - return 'flv720', 'flv' - if '.flv' in url: - return 'flv', 'flv' - if 'hd.mp4' in url or '-48.mp4' in url: - return 'hdmp4', 'mp4' - if '.mp4' in url: - return 'mp4', 'mp4' - raise Exception('Unknown stream type') - - def api_req(self, cid, quality, bangumi, bangumi_movie=False, **kwargs): - ts = str(int(time.time())) - if not bangumi: - params_str = 'cid={}&player=1&quality={}&ts={}'.format(cid, quality, ts) - chksum = hashlib.md5(bytes(params_str+self.SEC1, 'utf8')).hexdigest() - api_url = self.api_url + params_str + '&sign=' + chksum + def height_to_quality(height): + if height <= 360: + return 16 + elif height <= 480: + return 32 + elif height <= 720: + return 64 else: - mod = 'movie' if bangumi_movie else 'bangumi' - params_str = 'cid={}&module={}&player=1&quality={}&ts={}'.format(cid, mod, quality, ts) - chksum = hashlib.md5(bytes(params_str+self.SEC2, 'utf8')).hexdigest() - api_url = self.bangumi_api_url + params_str + '&sign=' + chksum + return 80 - xml_str = get_content(api_url, headers={'referer': self.url, 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'}) - return xml_str + @staticmethod + def bilibili_headers(referer=None, cookie=None): + # a reasonable UA + ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36' + headers = {'User-Agent': ua} + if referer is not None: + headers.update({'Referer': referer}) + if cookie is not None: + headers.update({'Cookie': cookie}) + return headers - def parse_bili_xml(self, xml_str): - urls_list = [] - total_size = 0 - doc = parseString(xml_str.encode('utf8')) - durls = doc.getElementsByTagName('durl') - for durl in durls: - size = durl.getElementsByTagName('size')[0] - total_size += int(size.firstChild.nodeValue) - url = durl.getElementsByTagName('url')[0] - urls_list.append(url.firstChild.nodeValue) - stream_type, container = self.bilibili_stream_type(urls_list) - if stream_type not in self.streams: - self.streams[stream_type] = {} - self.streams[stream_type]['src'] = urls_list - self.streams[stream_type]['size'] = total_size - self.streams[stream_type]['container'] = container + @staticmethod + def bilibili_api(avid, cid, qn=0): + return 'https://api.bilibili.com/x/player/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&fnver=0&fnval=16' % (avid, cid, qn) - def download_by_vid(self, cid, bangumi, **kwargs): - stream_id = kwargs.get('stream_id') - # guard here. if stream_id invalid, fallback as not stream_id - if stream_id and stream_id in self.fmt2qlt: - quality = stream_id - else: - quality = 'hdflv' if bangumi else 'flv' + @staticmethod + def bilibili_audio_api(sid): + return 'https://www.bilibili.com/audio/music-service-c/web/url?sid=%s' % sid - info_only = kwargs.get('info_only') - for qlt in range(4, -1, -1): - api_xml = self.api_req(cid, qlt, bangumi, **kwargs) - self.parse_bili_xml(api_xml) - if not info_only or stream_id: - self.danmuku = get_danmuku_xml(cid) + @staticmethod + def bilibili_audio_info_api(sid): + return 'https://www.bilibili.com/audio/music-service-c/web/song/info?sid=%s' % sid + + @staticmethod + def bilibili_audio_menu_info_api(sid): + return 'https://www.bilibili.com/audio/music-service-c/web/menu/info?sid=%s' % sid + + @staticmethod + def bilibili_audio_menu_song_api(sid, ps=100): + return 'https://www.bilibili.com/audio/music-service-c/web/song/of-menu?sid=%s&pn=1&ps=%s' % (sid, ps) + + @staticmethod + def bilibili_bangumi_api(avid, cid, ep_id, qn=0): + return 'https://api.bilibili.com/pgc/player/web/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&ep_id=%s&fnver=0&fnval=16' % (avid, cid, qn, ep_id) + + @staticmethod + def bilibili_interface_api(cid, qn=0): + entropy = 'rbMCKn@KuamXWlPMoJGsKcbiJKUfkPF_8dABscJntvqhRSETg' + appkey, sec = ''.join([chr(ord(i) + 2) for i in entropy[::-1]]).split(':') + params = 'appkey=%s&cid=%s&otype=json&qn=%s&quality=%s&type=' % (appkey, cid, qn, qn) + chksum = hashlib.md5(bytes(params + sec, 'utf8')).hexdigest() + return 'https://interface.bilibili.com/v2/playurl?%s&sign=%s' % (params, chksum) + + @staticmethod + def bilibili_live_api(cid): + return 'https://api.live.bilibili.com/room/v1/Room/playUrl?cid=%s&quality=0&platform=web' % cid + + @staticmethod + def bilibili_live_room_info_api(room_id): + return 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id=%s' % room_id + + @staticmethod + def bilibili_live_room_init_api(room_id): + return 'https://api.live.bilibili.com/room/v1/Room/room_init?id=%s' % room_id + + @staticmethod + def bilibili_space_channel_api(mid, cid, pn=1, ps=100): + return 'https://api.bilibili.com/x/space/channel/video?mid=%s&cid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (mid, cid, pn, ps) + + @staticmethod + def bilibili_space_favlist_api(vmid, fid, pn=1, ps=100): + return 'https://api.bilibili.com/x/space/fav/arc?vmid=%s&fid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (vmid, fid, pn, ps) + + @staticmethod + def bilibili_space_video_api(mid, pn=1, ps=100): + return 'https://space.bilibili.com/ajax/member/getSubmitVideos?mid=%s&page=%s&pagesize=%s&order=0&jsonp=jsonp' % (mid, pn, ps) + + @staticmethod + def bilibili_vc_api(video_id): + return 'https://api.vc.bilibili.com/clip/v1/video/detail?video_id=%s' % video_id + + @staticmethod + def url_size(url, faker=False, headers={},err_value=0): + try: + return url_size(url,faker,headers) + except: + return err_value def prepare(self, **kwargs): - if socket.getdefaulttimeout() == 600: # no timeout specified - socket.setdefaulttimeout(2) # fail fast, very speedy! - - # handle "watchlater" URLs - if '/watchlater/' in self.url: - aid = re.search(r'av(\d+)', self.url).group(1) - self.url = 'http://www.bilibili.com/video/av{}/'.format(aid) - - self.ua = fake_headers['User-Agent'] - self.url = url_locations([self.url])[0] - frag = urllib.parse.urlparse(self.url).fragment - # http://www.bilibili.com/video/av3141144/index_2.html#page=3 - if frag: - hit = re.search(r'page=(\d+)', frag) - if hit is not None: - page = hit.group(1) - aid = re.search(r'av(\d+)', self.url).group(1) - self.url = 'http://www.bilibili.com/video/av{}/index_{}.html'.format(aid, page) - self.referer = self.url - self.page = get_content(self.url) - - m = re.search(r'(.*?)

', self.page) or re.search(r'

', self.page) - if m is not None: - self.title = m.group(1) - if self.title is None: - m = re.search(r'property="og:title" content="([^"]+)"', self.page) - if m is not None: - self.title = m.group(1) - if 'subtitle' in kwargs: - subtitle = kwargs['subtitle'] - self.title = '{} {}'.format(self.title, subtitle) - - if 'bangumi.bilibili.com/movie' in self.url: - self.movie_entry(**kwargs) - elif 'bangumi.bilibili.com' in self.url: - self.bangumi_entry(**kwargs) - elif 'bangumi/' in self.url: - self.bangumi_entry(**kwargs) - elif 'live.bilibili.com' in self.url: - self.live_entry(**kwargs) - elif 'vc.bilibili.com' in self.url: - self.vc_entry(**kwargs) - else: - self.entry(**kwargs) - - def movie_entry(self, **kwargs): - patt = r"var\s*aid\s*=\s*'(\d+)'" - aid = re.search(patt, self.page).group(1) - page_list = json.loads(get_content('http://www.bilibili.com/widget/getPageList?aid={}'.format(aid))) - # better ideas for bangumi_movie titles? - self.title = page_list[0]['pagename'] - self.download_by_vid(page_list[0]['cid'], True, bangumi_movie=True, **kwargs) - - def entry(self, **kwargs): - # tencent player - tc_flashvars = re.search(r'"bili-cid=\d+&bili-aid=\d+&vid=([^"]+)"', self.page) - if tc_flashvars: - tc_flashvars = tc_flashvars.group(1) - if tc_flashvars is not None: - self.out = True - qq_download_by_vid(tc_flashvars, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only']) - return - - has_plist = re.search(r' bangumi/play/ep + # redirect: bangumi.bilibili.com/anime -> bangumi/play/ep + elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ss(\d+)', self.url) or \ + re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)/play', self.url): + initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME + initial_state = json.loads(initial_state_text) + ep_id = initial_state['epList'][0]['id'] + self.url = 'https://www.bilibili.com/bangumi/play/ep%s' % ep_id + html_content = get_content(self.url, headers=self.bilibili_headers()) + + # sort it out + if re.match(r'https?://(www\.)?bilibili\.com/audio/au(\d+)', self.url): + sort = 'audio' + elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url): + sort = 'bangumi' + elif match1(html_content, r'', html) + + # video_guessulike = r1(r"window.xgData =([s\S'\s\.]*)\'\;[\s\S]*window.vouchData", video_html) + video_url = r1(r"window.vurl = \'([s\S'\s\.]*)\'\;[\s\S]*window.imgurl", video_html) + part_urls.append(video_url) + ext = video_url.split('.')[-1] + + print_info(site_info, title, ext, total_size) + if not info_only: + download_urls(part_urls, title, ext, total_size, output_dir=output_dir, merge=merge) + + +def zhibo_download(url, output_dir = '.', merge = True, info_only = False, **kwargs): + if 'video.zhibo.tv' in url: + zhibo_vedio_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs) + return + + # if 'v.zhibo.tv' in url: + # http://v.zhibo.tv/31609372 + html = get_html(url) + title = r1(r'([\s\S]*)', html) + is_live = r1(r"window.videoIsLive=\'([s\S'\s\.]*)\'\;[\s\S]*window.resDomain", html) + if is_live != "1": + raise ValueError("The live stream is not online! (Errno:%s)" % is_live) + + match = re.search(r""" + ourStreamName .*? + '(.*?)' .*? + rtmpHighSource .*? + '(.*?)' .*? + '(.*?)' + """, html, re.S | re.X) + real_url = match.group(3) + match.group(1) + match.group(2) + + print_info(site_info, title, 'flv', float('inf')) + if not info_only: + download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge) + +site_info = "zhibo.tv" +download = zhibo_download +download_playlist = playlist_not_supported('zhibo') diff --git a/src/you_get/extractors/zhihu.py b/src/you_get/extractors/zhihu.py new file mode 100644 index 00000000..64f81423 --- /dev/null +++ b/src/you_get/extractors/zhihu.py @@ -0,0 +1,79 @@ +#!/usr/bin/env python + +__all__ = ['zhihu_download', 'zhihu_download_playlist'] + +from ..common import * +import json + + +def zhihu_download(url, output_dir='.', merge=True, info_only=False, **kwargs): + paths = url.split("/") + # question or column + if len(paths) < 3 and len(paths) < 6: + raise TypeError("URL does not conform to specifications, Support column and question only." + "Example URL: https://zhuanlan.zhihu.com/p/51669862 or " + "https://www.zhihu.com/question/267782048/answer/490720324") + + if ("question" not in paths or "answer" not in paths) and "zhuanlan.zhihu.com" not in paths: + raise TypeError("URL does not conform to specifications, Support column and question only." + "Example URL: https://zhuanlan.zhihu.com/p/51669862 or " + "https://www.zhihu.com/question/267782048/answer/490720324") + + html = get_html(url, faker=True) + title = match1(html, r'data-react-helmet="true">(.*?)') + for index, video_id in enumerate(matchall(html, [r' '0') or (vers[0] == 'avconv') - #set version to 1.0 for nightly build and print warning try: - version = [int(i) for i in vers[2].split('.')] + v = vers[2][1:] if vers[2][0] == 'n' else vers[2] + version = [int(i) for i in v.split('.')] except: - print('It seems that your ffmpeg is a nightly build.') - print('Please switch to the latest stable if merging failed.') version = [1, 0] return cmd, 'ffprobe', version except: @@ -60,14 +59,25 @@ def ffmpeg_concat_av(files, output, ext): params = [FFMPEG] + LOGLEVEL for file in files: if os.path.isfile(file): params.extend(['-i', file]) - params.extend(['-c:v', 'copy']) - if ext == 'mp4': - params.extend(['-c:a', 'aac']) - elif ext == 'webm': - params.extend(['-c:a', 'vorbis']) - params.extend(['-strict', 'experimental']) + params.extend(['-c', 'copy']) params.append(output) - return subprocess.call(params, stdin=STDIN) + if subprocess.call(params, stdin=STDIN): + print('Merging without re-encode failed.\nTry again re-encoding audio... ', end="", flush=True) + try: os.remove(output) + except FileNotFoundError: pass + params = [FFMPEG] + LOGLEVEL + for file in files: + if os.path.isfile(file): params.extend(['-i', file]) + params.extend(['-c:v', 'copy']) + if ext == 'mp4': + params.extend(['-c:a', 'aac']) + params.extend(['-strict', 'experimental']) + elif ext == 'webm': + params.extend(['-c:a', 'opus']) + params.append(output) + return subprocess.call(params, stdin=STDIN) + else: + return 0 def ffmpeg_convert_ts_to_mkv(files, output='output.mkv'): for file in files: @@ -210,7 +220,7 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'): def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.', stream=True): """str, str->True WARNING: NOT THE SAME PARMS AS OTHER FUNCTIONS!!!!!! - You can basicly download anything with this function + You can basically download anything with this function but better leave it alone with """ output = title + '.' + ext @@ -257,6 +267,7 @@ def ffmpeg_concat_audio_and_video(files, output, ext): if has_ffmpeg_installed: params = [FFMPEG] + LOGLEVEL params.extend(['-f', 'concat']) + params.extend(['-safe', '0']) # https://stackoverflow.com/questions/38996925/ffmpeg-concat-unsafe-file-name for file in files: if os.path.isfile(file): params.extend(['-i', file]) diff --git a/src/you_get/util/fs.py b/src/you_get/util/fs.py index d49a117d..c04a10a7 100644 --- a/src/you_get/util/fs.py +++ b/src/you_get/util/fs.py @@ -1,8 +1,8 @@ #!/usr/bin/env python -import platform +from .os import detect_os -def legitimize(text, os=platform.system()): +def legitimize(text, os=detect_os()): """Converts a string to a valid filename. """ @@ -13,7 +13,8 @@ def legitimize(text, os=platform.system()): ord('|'): '-', }) - if os == 'Windows': + # FIXME: do some filesystem detection + if os == 'windows' or os == 'cygwin' or os == 'wsl': # Windows (non-POSIX namespace) text = text.translate({ # Reserved in Windows VFAT and NTFS @@ -28,10 +29,11 @@ def legitimize(text, os=platform.system()): ord('>'): '-', ord('['): '(', ord(']'): ')', + ord('\t'): ' ', }) else: # *nix - if os == 'Darwin': + if os == 'mac': # Mac OS HFS+ text = text.translate({ ord(':'): '-', diff --git a/src/you_get/util/log.py b/src/you_get/util/log.py index a2c77ab5..67b26b78 100644 --- a/src/you_get/util/log.py +++ b/src/you_get/util/log.py @@ -96,3 +96,9 @@ def wtf(message, exit_code=1): print_log(message, RED, BOLD) if exit_code is not None: sys.exit(exit_code) + +def yes_or_no(message): + ans = str(input('%s (y/N) ' % message)).lower().strip() + if ans == 'y': + return True + return False diff --git a/src/you_get/util/os.py b/src/you_get/util/os.py new file mode 100644 index 00000000..1a00d2b5 --- /dev/null +++ b/src/you_get/util/os.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python + +from platform import system + +def detect_os(): + """Detect operating system. + """ + + # Inspired by: + # https://github.com/scivision/pybashutils/blob/78b7f2b339cb03b1c37df94015098bbe462f8526/pybashutils/windows_linux_detect.py + + syst = system().lower() + os = 'unknown' + + if 'cygwin' in syst: + os = 'cygwin' + elif 'darwin' in syst: + os = 'mac' + elif 'linux' in syst: + os = 'linux' + # detect WSL https://github.com/Microsoft/BashOnWindows/issues/423 + try: + with open('/proc/version', 'r') as f: + if 'microsoft' in f.read().lower(): + os = 'wsl' + except: pass + elif 'windows' in syst: + os = 'windows' + elif 'bsd' in syst: + os = 'bsd' + + return os diff --git a/src/you_get/version.py b/src/you_get/version.py index 2d4ff9d0..48bf3b5f 100644 --- a/src/you_get/version.py +++ b/src/you_get/version.py @@ -1,4 +1,4 @@ #!/usr/bin/env python script_name = 'you-get' -__version__ = '0.4.1025' +__version__ = '0.4.1328' diff --git a/tests/test.py b/tests/test.py index 6562d7ca..9584ac51 100644 --- a/tests/test.py +++ b/tests/test.py @@ -7,6 +7,7 @@ from you_get.extractors import ( magisto, youtube, bilibili, + toutiao, ) @@ -31,14 +32,6 @@ class YouGetTests(unittest.TestCase): info_only=True ) - def test_bilibili(self): - bilibili.download( - 'https://www.bilibili.com/video/av16907446/', info_only=True - ) - bilibili.download( - 'https://www.bilibili.com/video/av13228063/', info_only=True - ) - if __name__ == '__main__': unittest.main() diff --git a/tests/test_util.py b/tests/test_util.py index 239083bc..88743b03 100644 --- a/tests/test_util.py +++ b/tests/test_util.py @@ -6,6 +6,7 @@ from you_get.util.fs import * class TestUtil(unittest.TestCase): def test_legitimize(self): - self.assertEqual(legitimize("1*2", os="Linux"), "1*2") - self.assertEqual(legitimize("1*2", os="Darwin"), "1*2") - self.assertEqual(legitimize("1*2", os="Windows"), "1-2") + self.assertEqual(legitimize("1*2", os="linux"), "1*2") + self.assertEqual(legitimize("1*2", os="mac"), "1*2") + self.assertEqual(legitimize("1*2", os="windows"), "1-2") + self.assertEqual(legitimize("1*2", os="wsl"), "1-2") diff --git a/you-get.json b/you-get.json index 594742c2..56f8212a 100644 --- a/you-get.json +++ b/you-get.json @@ -25,6 +25,7 @@ "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", + "Programming Language :: Python :: 3.7", "Topic :: Internet", "Topic :: Internet :: WWW/HTTP", "Topic :: Multimedia",