mirror of
https://github.com/soimort/you-get.git
synced 2025-01-23 21:45:02 +03:00
Merge pull request #1 from soimort/develop
This commit is contained in:
commit
c3fd8d51c0
39
.github/ISSUE_TEMPLATE.md
vendored
39
.github/ISSUE_TEMPLATE.md
vendored
@ -1,39 +0,0 @@
|
||||
Please make sure these boxes are checked before submitting your issue – thank you!
|
||||
|
||||
- [ ] You can actually watch the video in your browser or mobile application, but not download them with `you-get`.
|
||||
- [ ] Your `you-get` is up-to-date.
|
||||
- [ ] I have read <https://github.com/soimort/you-get/wiki/FAQ> and tried to do so.
|
||||
- [ ] The issue is not yet reported on <https://github.com/soimort/you-get/issues> or <https://github.com/soimort/you-get/wiki/Known-Bugs>. If so, please add your comments under the existing issue.
|
||||
- [ ] The issue (or question) is really about `you-get`, not about some other code or project.
|
||||
|
||||
Run the command with the `--debug` option, and paste the full output inside the fences:
|
||||
|
||||
```
|
||||
[PASTE IN ME]
|
||||
```
|
||||
|
||||
If there's anything else you would like to say (e.g. in case your issue is not about downloading a specific video; it might as well be a general discussion or proposal for a new feature), fill in the box below; otherwise, you may want to post an emoji or meme instead:
|
||||
|
||||
> [WRITE SOMETHING]
|
||||
> [OR HAVE SOME :icecream:!]
|
||||
|
||||
汉语翻译最终日期:2016年02月26日
|
||||
|
||||
在提交前,请确保您已经检查了以下内容!
|
||||
|
||||
- [ ] 你可以在浏览器或移动端中观看视频,但不能使用`you-get`下载.
|
||||
- [ ] 您的`you-get`为最新版.
|
||||
- [ ] 我已经阅读并按 <https://github.com/soimort/you-get/wiki/FAQ> 中的指引进行了操作.
|
||||
- [ ] 您的问题没有在<https://github.com/soimort/you-get/issues> , <https://github.com/soimort/you-get/wiki/FAQ> 或 <https://github.com/soimort/you-get/wiki/Known-Bugs> 报告,否则请在原有issue下报告.
|
||||
- [ ] 本问题确实关于`you-get`, 而不是其他项目.
|
||||
|
||||
请使用`--debug`运行,并将输出粘贴在下面:
|
||||
|
||||
```
|
||||
[在这里粘贴完整日志]
|
||||
```
|
||||
|
||||
如果您有其他附言,例如问题只在某个视频发生,或者是一般性讨论或者提出新功能,请在下面添加;或者您可以卖个萌:
|
||||
|
||||
> [您的内容]
|
||||
> [舔 :icecream:!]
|
48
.github/PULL_REQUEST_TEMPLATE.md
vendored
48
.github/PULL_REQUEST_TEMPLATE.md
vendored
@ -1,48 +0,0 @@
|
||||
**(PLEASE DELETE ALL THESE AFTER READING)**
|
||||
|
||||
Thank you for the pull request! `you-get` is a growing open source project, which would not have been possible without contributors like you.
|
||||
|
||||
Here are some simple rules to follow, please recheck them before sending the pull request:
|
||||
|
||||
- [ ] If you want to propose two or more unrelated patches, please open separate pull requests for them, instead of one;
|
||||
- [ ] All pull requests should be based upon the latest `develop` branch;
|
||||
- [ ] Name your branch (from which you will send the pull request) properly; use a meaningful name like `add-this-shining-feature` rather than just `develop`;
|
||||
- [ ] All commit messages, as well as comments in code, should be written in understandable English.
|
||||
|
||||
As a contributor, you must be aware that
|
||||
|
||||
- [ ] You agree to contribute your code to this project, under the terms of the MIT license, so that any person may freely use or redistribute them; of course, you will still reserve the copyright for your own authorship.
|
||||
- [ ] You may not contribute any code not authored by yourself, unless they are licensed under either public domain or the MIT license, literally.
|
||||
|
||||
Not all pull requests can eventually be merged. I consider merged / unmerged patches as equally important for the community: as long as you think a patch would be helpful, someone else might find it helpful, too, therefore they could take your fork and benefit in some way. In any case, I would like to thank you in advance for taking your time to contribute to this project.
|
||||
|
||||
Cheers,
|
||||
Mort
|
||||
|
||||
**(PLEASE REPLACE ALL ABOVE WITH A DETAILED DESCRIPTION OF YOUR PULL REQUEST)**
|
||||
|
||||
|
||||
汉语翻译最后日期:2016年02月26日
|
||||
|
||||
**(阅读后请删除所有内容)**
|
||||
|
||||
感谢您的pull request! `you-get`是稳健成长的开源项目,感谢您的贡献.
|
||||
|
||||
以下简单检查项目望您复查:
|
||||
|
||||
- [ ] 如果您预计提出两个或更多不相关补丁,请为每个使用不同的pull requests,而不是单一;
|
||||
- [ ] 所有的pull requests应基于最新的`develop`分支;
|
||||
- [ ] 您预计提出pull requests的分支应有有意义名称,例如`add-this-shining-feature`而不是`develop`;
|
||||
- [ ] 所有的提交信息与代码中注释应使用可理解的英语.
|
||||
|
||||
作为贡献者,您需要知悉
|
||||
|
||||
- [ ] 您同意在MIT协议下贡献代码,以便任何人自由使用或分发;当然,你仍旧保留代码的著作权
|
||||
- [ ] 你不得贡献非自己编写的代码,除非其属于公有领域或使用MIT协议.
|
||||
|
||||
不是所有的pull requests都会被合并,然而我认为合并/不合并的补丁一样重要:如果您认为补丁重要,其他人也有可能这么认为,那么他们可以从你的fork中提取工作并获益。无论如何,感谢您费心对本项目贡献.
|
||||
|
||||
祝好,
|
||||
Mort
|
||||
|
||||
**(请将本内容完整替换为PULL REQUEST的详细内容)**
|
39
.github/workflows/python-package.yml
vendored
Normal file
39
.github/workflows/python-package.yml
vendored
Normal file
@ -0,0 +1,39 @@
|
||||
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
|
||||
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
|
||||
|
||||
name: develop
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ develop ]
|
||||
pull_request:
|
||||
branches: [ develop ]
|
||||
|
||||
jobs:
|
||||
build:
|
||||
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: [3.5, 3.6, 3.7, 3.8, pypy3]
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
python -m pip install --upgrade pip
|
||||
pip install flake8 pytest
|
||||
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
|
||||
- name: Lint with flake8
|
||||
run: |
|
||||
# stop the build if there are Python syntax errors or undefined names
|
||||
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
|
||||
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
|
||||
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
|
||||
- name: Test with unittest
|
||||
run: |
|
||||
make test
|
8
.gitignore
vendored
8
.gitignore
vendored
@ -81,3 +81,11 @@ _*
|
||||
*.xml
|
||||
/.env
|
||||
/.idea
|
||||
*.m4a
|
||||
*.DS_Store
|
||||
*.txt
|
||||
|
||||
*.zip
|
||||
|
||||
.vscode
|
||||
|
||||
|
18
.travis.yml
18
.travis.yml
@ -1,18 +0,0 @@
|
||||
# https://travis-ci.org/soimort/you-get
|
||||
language: python
|
||||
python:
|
||||
- "3.2"
|
||||
- "3.3"
|
||||
- "3.4"
|
||||
- "3.5"
|
||||
- "nightly"
|
||||
- "pypy3"
|
||||
script: make test
|
||||
sudo: false
|
||||
notifications:
|
||||
webhooks:
|
||||
urls:
|
||||
- https://webhooks.gitter.im/e/43cd57826e88ed8f2152
|
||||
on_success: change # options: [always|never|change] default: always
|
||||
on_failure: always # options: [always|never|change] default: always
|
||||
on_start: never # options: [always|never|change] default: always
|
27
CONTRIBUTING.md
Normal file
27
CONTRIBUTING.md
Normal file
@ -0,0 +1,27 @@
|
||||
# How to Report an Issue
|
||||
|
||||
If you would like to report a problem you find when using `you-get`, please open a [Pull Request](https://github.com/soimort/you-get/pulls), which should include:
|
||||
|
||||
1. A detailed description of the encountered problem;
|
||||
2. At least one commit, addressing the problem through some unit test(s).
|
||||
* Examples of good commits: [#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
|
||||
|
||||
PRs that fail to meet the above criteria may be closed summarily with no further action.
|
||||
|
||||
A valid PR will remain open until its addressed problem is fixed.
|
||||
|
||||
|
||||
|
||||
# 如何汇报问题
|
||||
|
||||
为了防止对 GitHub Issues 的滥用,本项目不接受一般的 Issue。
|
||||
|
||||
如您在使用 `you-get` 的过程中发现任何问题,请开启一个 [Pull Request](https://github.com/soimort/you-get/pulls)。该 PR 应当包含:
|
||||
|
||||
1. 详细的问题描述;
|
||||
2. 至少一个 commit,其内容是**与问题相关的**单元测试。**不要通过随意修改无关文件的方式来提交 PR!**
|
||||
* 有效的 commit 示例:[#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
|
||||
|
||||
不符合以上条件的 PR 可能被直接关闭。
|
||||
|
||||
有效的 PR 将会被一直保留,直至相应的问题得以修复。
|
22
LICENSE.txt
22
LICENSE.txt
@ -1,15 +1,15 @@
|
||||
==============================================
|
||||
This is a copy of the MIT license.
|
||||
==============================================
|
||||
Copyright (C) 2012, 2013, 2014, 2015, 2016 Mort Yao <mort.yao@gmail.com>
|
||||
Copyright (C) 2012 Boyu Guo <iambus@gmail.com>
|
||||
MIT License
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
||||
this software and associated documentation files (the "Software"), to deal in
|
||||
the Software without restriction, including without limitation the rights to
|
||||
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
|
||||
of the Software, and to permit persons to whom the Software is furnished to do
|
||||
so, subject to the following conditions:
|
||||
Copyright (c) 2012-2020 Mort Yao <mort.yao@gmail.com> and other contributors
|
||||
(https://github.com/soimort/you-get/graphs/contributors)
|
||||
Copyright (c) 2012 Boyu Guo <iambus@gmail.com>
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
175
README.md
175
README.md
@ -1,22 +1,32 @@
|
||||
# You-Get
|
||||
|
||||
[![Build Status](https://github.com/soimort/you-get/workflows/develop/badge.svg)](https://github.com/soimort/you-get/actions)
|
||||
[![PyPI version](https://img.shields.io/pypi/v/you-get.svg)](https://pypi.python.org/pypi/you-get/)
|
||||
[![Build Status](https://travis-ci.org/soimort/you-get.svg)](https://travis-ci.org/soimort/you-get)
|
||||
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
||||
|
||||
**NOTICE: Read [this](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md) if you are looking for the conventional "Issues" tab.**
|
||||
|
||||
---
|
||||
|
||||
[You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
|
||||
|
||||
Here's how you use `you-get` to download a video from [this web page](http://www.fsf.org/blogs/rms/20140407-geneva-tedx-talk-free-software-free-society):
|
||||
Here's how you use `you-get` to download a video from [YouTube](https://www.youtube.com/watch?v=jNQXAC9IVRw):
|
||||
|
||||
```console
|
||||
$ you-get http://www.fsf.org/blogs/rms/20140407-geneva-tedx-talk-free-software-free-society
|
||||
Site: fsf.org
|
||||
Title: TEDxGE2014_Stallman05_LQ
|
||||
Type: WebM video (video/webm)
|
||||
Size: 27.12 MiB (28435804 Bytes)
|
||||
$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
|
||||
site: YouTube
|
||||
title: Me at the zoo
|
||||
stream:
|
||||
- itag: 43
|
||||
container: webm
|
||||
quality: medium
|
||||
size: 0.5 MiB (564215 bytes)
|
||||
# download-with: you-get --itag=43 [URL]
|
||||
|
||||
Downloading TEDxGE2014_Stallman05_LQ.webm ...
|
||||
100.0% ( 27.1/27.1 MB) ├████████████████████████████████████████┤[1/1] 12 MB/s
|
||||
Downloading Me at the zoo.webm ...
|
||||
100% ( 0.5/ 0.5MB) ├██████████████████████████████████┤[1/1] 6 MB/s
|
||||
|
||||
Saving Me at the zoo.en.srt ... Done.
|
||||
```
|
||||
|
||||
And here's why you might want to use it:
|
||||
@ -43,10 +53,10 @@ Are you a Python programmer? Then check out [the source](https://github.com/soim
|
||||
|
||||
### Prerequisites
|
||||
|
||||
The following dependencies are required and must be installed separately, unless you are using a pre-built package or chocolatey on Windows:
|
||||
The following dependencies are necessary:
|
||||
|
||||
* **[Python 3](https://www.python.org/downloads/)**
|
||||
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
|
||||
* **[Python](https://www.python.org/downloads/)** 3.2 or above
|
||||
* **[FFmpeg](https://www.ffmpeg.org/)** 1.0 or above
|
||||
* (Optional) [RTMPDump](https://rtmpdump.mplayerhq.hu/)
|
||||
|
||||
### Option 1: Install via pip
|
||||
@ -55,17 +65,13 @@ The official release of `you-get` is distributed on [PyPI](https://pypi.python.o
|
||||
|
||||
$ pip3 install you-get
|
||||
|
||||
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen)
|
||||
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen) (for Zsh users)
|
||||
|
||||
Add the following line to your `.zshrc`:
|
||||
|
||||
antigen bundle soimort/you-get
|
||||
|
||||
### Option 3: Use a pre-built package (Windows only)
|
||||
|
||||
Download the `exe` (standalone) or `7z` (all dependencies included) from: <https://github.com/soimort/you-get/releases/latest>.
|
||||
|
||||
### Option 4: Download from GitHub
|
||||
### Option 3: Download from GitHub
|
||||
|
||||
You may either download the [stable](https://github.com/soimort/you-get/archive/master.zip) (identical with the latest release on PyPI) or the [develop](https://github.com/soimort/you-get/archive/develop.zip) (more hotfixes, unstable features) branch of `you-get`. Unzip it, and put the directory containing the `you-get` script into your `PATH`.
|
||||
|
||||
@ -83,7 +89,7 @@ $ python3 setup.py install --user
|
||||
|
||||
to install `you-get` to a permanent path.
|
||||
|
||||
### Option 5: Git clone
|
||||
### Option 4: Git clone
|
||||
|
||||
This is the recommended way for all developers, even if you don't often code in Python.
|
||||
|
||||
@ -93,13 +99,7 @@ $ git clone git://github.com/soimort/you-get.git
|
||||
|
||||
Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path.
|
||||
|
||||
### Option 6: Using [Chocolatey](https://chocolatey.org/) (Windows only)
|
||||
|
||||
```
|
||||
> choco install you-get
|
||||
```
|
||||
|
||||
### Option 7: Homebrew (Mac only)
|
||||
### Option 5: Homebrew (Mac only)
|
||||
|
||||
You can install `you-get` easily via:
|
||||
|
||||
@ -107,9 +107,17 @@ You can install `you-get` easily via:
|
||||
$ brew install you-get
|
||||
```
|
||||
|
||||
### Option 6: pkg (FreeBSD only)
|
||||
|
||||
You can install `you-get` easily via:
|
||||
|
||||
```
|
||||
# pkg install you-get
|
||||
```
|
||||
|
||||
### Shell completion
|
||||
|
||||
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](contrib/completion). Please consult your shell's manual for how to take advantage of them.
|
||||
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](https://github.com/soimort/you-get/tree/develop/contrib/completion). Please consult your shell's manual for how to take advantage of them.
|
||||
|
||||
## Upgrading
|
||||
|
||||
@ -125,12 +133,6 @@ or download the latest release via:
|
||||
$ you-get https://github.com/soimort/you-get/archive/master.zip
|
||||
```
|
||||
|
||||
or use [chocolatey package manager](https://chocolatey.org):
|
||||
|
||||
```
|
||||
> choco upgrade you-get
|
||||
```
|
||||
|
||||
In order to get the latest ```develop``` branch without messing up the PIP, you can try:
|
||||
|
||||
```
|
||||
@ -148,22 +150,54 @@ $ you-get -i 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
|
||||
site: YouTube
|
||||
title: Me at the zoo
|
||||
streams: # Available quality and codecs
|
||||
[ DASH ] ____________________________________
|
||||
- itag: 242
|
||||
container: webm
|
||||
quality: 320x240
|
||||
size: 0.6 MiB (618358 bytes)
|
||||
# download-with: you-get --itag=242 [URL]
|
||||
|
||||
- itag: 395
|
||||
container: mp4
|
||||
quality: 320x240
|
||||
size: 0.5 MiB (550743 bytes)
|
||||
# download-with: you-get --itag=395 [URL]
|
||||
|
||||
- itag: 133
|
||||
container: mp4
|
||||
quality: 320x240
|
||||
size: 0.5 MiB (498558 bytes)
|
||||
# download-with: you-get --itag=133 [URL]
|
||||
|
||||
- itag: 278
|
||||
container: webm
|
||||
quality: 192x144
|
||||
size: 0.4 MiB (392857 bytes)
|
||||
# download-with: you-get --itag=278 [URL]
|
||||
|
||||
- itag: 160
|
||||
container: mp4
|
||||
quality: 192x144
|
||||
size: 0.4 MiB (370882 bytes)
|
||||
# download-with: you-get --itag=160 [URL]
|
||||
|
||||
- itag: 394
|
||||
container: mp4
|
||||
quality: 192x144
|
||||
size: 0.4 MiB (367261 bytes)
|
||||
# download-with: you-get --itag=394 [URL]
|
||||
|
||||
[ DEFAULT ] _________________________________
|
||||
- itag: 43
|
||||
container: webm
|
||||
quality: medium
|
||||
size: 0.5 MiB (564215 bytes)
|
||||
size: 0.5 MiB (568748 bytes)
|
||||
# download-with: you-get --itag=43 [URL]
|
||||
|
||||
- itag: 18
|
||||
container: mp4
|
||||
quality: medium
|
||||
# download-with: you-get --itag=18 [URL]
|
||||
|
||||
- itag: 5
|
||||
container: flv
|
||||
quality: small
|
||||
# download-with: you-get --itag=5 [URL]
|
||||
# download-with: you-get --itag=18 [URL]
|
||||
|
||||
- itag: 36
|
||||
container: 3gp
|
||||
@ -176,23 +210,24 @@ streams: # Available quality and codecs
|
||||
# download-with: you-get --itag=17 [URL]
|
||||
```
|
||||
|
||||
The format marked with `DEFAULT` is the one you will get by default. If that looks cool to you, download it:
|
||||
By default, the one on the top is the one you will get. If that looks cool to you, download it:
|
||||
|
||||
```
|
||||
$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
|
||||
site: YouTube
|
||||
title: Me at the zoo
|
||||
stream:
|
||||
- itag: 43
|
||||
- itag: 242
|
||||
container: webm
|
||||
quality: medium
|
||||
size: 0.5 MiB (564215 bytes)
|
||||
# download-with: you-get --itag=43 [URL]
|
||||
quality: 320x240
|
||||
size: 0.6 MiB (618358 bytes)
|
||||
# download-with: you-get --itag=242 [URL]
|
||||
|
||||
Downloading zoo.webm ...
|
||||
100.0% ( 0.5/0.5 MB) ├████████████████████████████████████████┤[1/1] 7 MB/s
|
||||
Downloading Me at the zoo.webm ...
|
||||
100% ( 0.6/ 0.6MB) ├██████████████████████████████████████████████████████████████████████████████┤[2/2] 2 MB/s
|
||||
Merging video parts... Merged into Me at the zoo.webm
|
||||
|
||||
Saving Me at the zoo.en.srt ...Done.
|
||||
Saving Me at the zoo.en.srt ... Done.
|
||||
```
|
||||
|
||||
(If a YouTube video has any closed captions, they will be downloaded together with the video file, in SubRip subtitle format.)
|
||||
@ -292,7 +327,7 @@ However, the system proxy setting (i.e. the environment variable `http_proxy`) i
|
||||
|
||||
### Watch a video
|
||||
|
||||
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mplayer` or `vlc`, instead of downloading it:
|
||||
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mpv` or `vlc`, instead of downloading it:
|
||||
|
||||
```
|
||||
$ you-get -p vlc 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
|
||||
@ -333,33 +368,29 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
||||
| VK | <http://vk.com/> |✓|✓| |
|
||||
| Vine | <https://vine.co/> |✓| | |
|
||||
| Vimeo | <https://vimeo.com/> |✓| | |
|
||||
| Vidto | <http://vidto.me/> |✓| | |
|
||||
| Videomega | <http://videomega.tv/> |✓| | |
|
||||
| Veoh | <http://www.veoh.com/> |✓| | |
|
||||
| **Tumblr** | <https://www.tumblr.com/> |✓|✓|✓|
|
||||
| TED | <http://www.ted.com/> |✓| | |
|
||||
| SoundCloud | <https://soundcloud.com/> | | |✓|
|
||||
| SHOWROOM | <https://www.showroom-live.com/> |✓| | |
|
||||
| Pinterest | <https://www.pinterest.com/> | |✓| |
|
||||
| MusicPlayOn | <http://en.musicplayon.com/> |✓| | |
|
||||
| MTV81 | <http://www.mtv81.com/> |✓| | |
|
||||
| Mixcloud | <https://www.mixcloud.com/> | | |✓|
|
||||
| Metacafe | <http://www.metacafe.com/> |✓| | |
|
||||
| Magisto | <http://www.magisto.com/> |✓| | |
|
||||
| Khan Academy | <https://www.khanacademy.org/> |✓| | |
|
||||
| JPopsuki TV | <http://www.jpopsuki.tv/> |✓| | |
|
||||
| Internet Archive | <https://archive.org/> |✓| | |
|
||||
| **Instagram** | <https://instagram.com/> |✓|✓| |
|
||||
| InfoQ | <http://www.infoq.com/presentations/> |✓| | |
|
||||
| Imgur | <http://imgur.com/> | |✓| |
|
||||
| Heavy Music Archive | <http://www.heavy-music.ru/> | | |✓|
|
||||
| **Google+** | <https://plus.google.com/> |✓|✓| |
|
||||
| Freesound | <http://www.freesound.org/> | | |✓|
|
||||
| Flickr | <https://www.flickr.com/> |✓|✓| |
|
||||
| FC2 Video | <http://video.fc2.com/> |✓| | |
|
||||
| Facebook | <https://www.facebook.com/> |✓| | |
|
||||
| eHow | <http://www.ehow.com/> |✓| | |
|
||||
| Dailymotion | <http://www.dailymotion.com/> |✓| | |
|
||||
| Coub | <http://coub.com/> |✓| | |
|
||||
| CBS | <http://www.cbs.com/> |✓| | |
|
||||
| Bandcamp | <http://bandcamp.com/> | | |✓|
|
||||
| AliveThai | <http://alive.in.th/> |✓| | |
|
||||
@ -368,14 +399,12 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
||||
| **niconico<br/>ニコニコ動画** | <http://www.nicovideo.jp/> |✓| | |
|
||||
| **163<br/>网易视频<br/>网易云音乐** | <http://v.163.com/><br/><http://music.163.com/> |✓| |✓|
|
||||
| 56网 | <http://www.56.com/> |✓| | |
|
||||
| **AcFun** | <http://www.acfun.tv/> |✓| | |
|
||||
| **AcFun** | <http://www.acfun.cn/> |✓| | |
|
||||
| **Baidu<br/>百度贴吧** | <http://tieba.baidu.com/> |✓|✓| |
|
||||
| 爆米花网 | <http://www.baomihua.com/> |✓| | |
|
||||
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓| | |
|
||||
| Dilidili | <http://www.dilidili.com/> |✓| | |
|
||||
| 豆瓣 | <http://www.douban.com/> | | |✓|
|
||||
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓|✓|✓|
|
||||
| 豆瓣 | <http://www.douban.com/> |✓| |✓|
|
||||
| 斗鱼 | <http://www.douyutv.com/> |✓| | |
|
||||
| Panda<br/>熊猫 | <http://www.panda.tv/> |✓| | |
|
||||
| 凤凰视频 | <http://v.ifeng.com/> |✓| | |
|
||||
| 风行网 | <http://www.fun.tv/> |✓| | |
|
||||
| iQIYI<br/>爱奇艺 | <http://www.iqiyi.com/> |✓| | |
|
||||
@ -387,26 +416,32 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
||||
| 荔枝FM | <http://www.lizhi.fm/> | | |✓|
|
||||
| 秒拍 | <http://www.miaopai.com/> |✓| | |
|
||||
| MioMio弹幕网 | <http://www.miomio.tv/> |✓| | |
|
||||
| MissEvan<br/>猫耳FM | <http://www.missevan.com/> | | |✓|
|
||||
| 痞客邦 | <https://www.pixnet.net/> |✓| | |
|
||||
| PPTV聚力 | <http://www.pptv.com/> |✓| | |
|
||||
| 齐鲁网 | <http://v.iqilu.com/> |✓| | |
|
||||
| QQ<br/>腾讯视频 | <http://v.qq.com/> |✓| | |
|
||||
| 企鹅直播 | <http://live.qq.com/> |✓| | |
|
||||
| 阡陌视频 | <http://qianmo.com/> |✓| | |
|
||||
| THVideo | <http://thvideo.tv/> |✓| | |
|
||||
| Sina<br/>新浪视频<br/>微博秒拍视频 | <http://video.sina.com.cn/><br/><http://video.weibo.com/> |✓| | |
|
||||
| Sohu<br/>搜狐视频 | <http://tv.sohu.com/> |✓| | |
|
||||
| 天天动听 | <http://www.dongting.com/> | | |✓|
|
||||
| **Tudou<br/>土豆** | <http://www.tudou.com/> |✓| | |
|
||||
| 虾米 | <http://www.xiami.com/> | | |✓|
|
||||
| 虾米 | <http://www.xiami.com/> |✓| |✓|
|
||||
| 阳光卫视 | <http://www.isuntv.com/> |✓| | |
|
||||
| **音悦Tai** | <http://www.yinyuetai.com/> |✓| | |
|
||||
| **Youku<br/>优酷** | <http://www.youku.com/> |✓| | |
|
||||
| 战旗TV | <http://www.zhanqi.tv/lives> |✓| | |
|
||||
| 央视网 | <http://www.cntv.cn/> |✓| | |
|
||||
| 花瓣 | <http://huaban.com/> | |✓| |
|
||||
| Naver<br/>네이버 | <http://tvcast.naver.com/> |✓| | |
|
||||
| 芒果TV | <http://www.mgtv.com/> |✓| | |
|
||||
| 火猫TV | <http://www.huomao.com/> |✓| | |
|
||||
| 阳光宽频网 | <http://www.365yg.com/> |✓| | |
|
||||
| 西瓜视频 | <https://www.ixigua.com/> |✓| | |
|
||||
| 新片场 | <https://www.xinpianchang.com/> |✓| | |
|
||||
| 快手 | <https://www.kuaishou.com/> |✓|✓| |
|
||||
| 抖音 | <https://www.douyin.com/> |✓| | |
|
||||
| TikTok | <https://www.tiktok.com/> |✓| | |
|
||||
| 中国体育(TV) | <http://v.zhibo.tv/> </br><http://video.zhibo.tv/> |✓| | |
|
||||
| 知乎 | <https://www.zhihu.com/> |✓| | |
|
||||
|
||||
For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.
|
||||
|
||||
@ -414,19 +449,13 @@ For all other sites not on the list, the universal extractor will take care of f
|
||||
|
||||
If something is broken and `you-get` can't get you things you want, don't panic. (Yes, this happens all the time!)
|
||||
|
||||
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>, and search on the [list of open issues](https://github.com/soimort/you-get/issues). If it has not been reported yet, open a new issue, with detailed command-line output attached.
|
||||
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>. If not, follow the guidelines on [how to report an issue](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md).
|
||||
|
||||
## Getting Involved
|
||||
|
||||
You can reach us on the Gitter channel [#soimort/you-get](https://gitter.im/soimort/you-get) (here's how you [set up your IRC client](http://irc.gitter.im) for Gitter). If you have a quick question regarding `you-get`, ask it there.
|
||||
|
||||
All kinds of pull requests are welcome. However, there are a few guidelines to follow:
|
||||
|
||||
* The [`develop`](https://github.com/soimort/you-get/tree/develop) branch is where your pull request should go.
|
||||
* Remember to rebase.
|
||||
* Document your PR clearly, and if applicable, provide some sample links for reviewers to test with.
|
||||
* Write well-formatted, easy-to-understand commit messages. If you don't know how, look at existing ones.
|
||||
* We will not ask you to sign a CLA, but you must assure that your code can be legally redistributed (under the terms of the MIT license).
|
||||
If you are seeking to report an issue or contribute, please make sure to read [the guidelines](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md) first.
|
||||
|
||||
## Legal Issues
|
||||
|
||||
@ -450,6 +479,6 @@ We only ship the code here, and how you are going to use it is left to your own
|
||||
|
||||
## Authors
|
||||
|
||||
Made by [@soimort](https://github.com/soimort), who is in turn powered by :coffee:, :pizza: and :ramen:.
|
||||
Made by [@soimort](https://github.com/soimort), who is in turn powered by :coffee:, :beer: and :ramen:.
|
||||
|
||||
You can find the [list of all contributors](https://github.com/soimort/you-get/graphs/contributors) here.
|
||||
|
6
setup.py
6
setup.py
@ -41,5 +41,9 @@ setup(
|
||||
|
||||
classifiers = proj_info['classifiers'],
|
||||
|
||||
entry_points = {'console_scripts': proj_info['console_scripts']}
|
||||
entry_points = {'console_scripts': proj_info['console_scripts']},
|
||||
|
||||
extras_require={
|
||||
'socks': ['PySocks'],
|
||||
}
|
||||
)
|
||||
|
@ -1,7 +1,9 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
''' WIP
|
||||
def main():
|
||||
script_main('you-get', any_download, any_download_playlist)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
'''
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,10 +1,11 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy
|
||||
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run, player
|
||||
from .common import print_more_compatible as print
|
||||
from .util import log
|
||||
from . import json_output
|
||||
import os
|
||||
import sys
|
||||
|
||||
class Extractor():
|
||||
def __init__(self, *args):
|
||||
@ -22,12 +23,18 @@ class VideoExtractor():
|
||||
self.url = None
|
||||
self.title = None
|
||||
self.vid = None
|
||||
self.m3u8_url = None
|
||||
self.streams = {}
|
||||
self.streams_sorted = []
|
||||
self.audiolang = None
|
||||
self.password_protected = False
|
||||
self.dash_streams = {}
|
||||
self.caption_tracks = {}
|
||||
self.out = False
|
||||
self.ua = None
|
||||
self.referer = None
|
||||
self.danmaku = None
|
||||
self.lyrics = None
|
||||
|
||||
if args:
|
||||
self.url = args[0]
|
||||
@ -39,6 +46,8 @@ class VideoExtractor():
|
||||
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
|
||||
set_proxy(parse_host(kwargs['extractor_proxy']))
|
||||
self.prepare(**kwargs)
|
||||
if self.out:
|
||||
return
|
||||
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
|
||||
unset_proxy()
|
||||
|
||||
@ -98,9 +107,13 @@ class VideoExtractor():
|
||||
if 'quality' in stream:
|
||||
print(" quality: %s" % stream['quality'])
|
||||
|
||||
if 'size' in stream:
|
||||
if 'size' in stream and 'container' in stream and stream['container'].lower() != 'm3u8':
|
||||
if stream['size'] != float('inf') and stream['size'] != 0:
|
||||
print(" size: %s MiB (%s bytes)" % (round(stream['size'] / 1048576, 1), stream['size']))
|
||||
|
||||
if 'm3u8_url' in stream:
|
||||
print(" m3u8_url: {}".format(stream['m3u8_url']))
|
||||
|
||||
if 'itag' in stream:
|
||||
print(" # download-with: %s" % log.sprint("you-get --itag=%s [URL]" % stream_id, log.UNDERLINE))
|
||||
else:
|
||||
@ -119,6 +132,8 @@ class VideoExtractor():
|
||||
print(" url: %s" % self.url)
|
||||
print()
|
||||
|
||||
sys.stdout.flush()
|
||||
|
||||
def p(self, stream_id=None):
|
||||
maybe_print("site: %s" % self.__class__.name)
|
||||
maybe_print("title: %s" % self.title)
|
||||
@ -143,6 +158,7 @@ class VideoExtractor():
|
||||
for stream in itags:
|
||||
self.p_stream(stream)
|
||||
# Print all other available streams
|
||||
if self.streams_sorted:
|
||||
print(" [ DEFAULT ] %s" % ('_' * 33))
|
||||
for stream in self.streams_sorted:
|
||||
self.p_stream(stream['id'] if 'id' in stream else stream['itag'])
|
||||
@ -153,6 +169,8 @@ class VideoExtractor():
|
||||
print(" - lang: {}".format(i['lang']))
|
||||
print(" download-url: {}\n".format(i['url']))
|
||||
|
||||
sys.stdout.flush()
|
||||
|
||||
def p_playlist(self, stream_id=None):
|
||||
maybe_print("site: %s" % self.__class__.name)
|
||||
print("playlist: %s" % self.title)
|
||||
@ -183,6 +201,13 @@ class VideoExtractor():
|
||||
stream_id = kwargs['stream_id']
|
||||
else:
|
||||
# Download stream with the best quality
|
||||
from .processor.ffmpeg import has_ffmpeg_installed
|
||||
if has_ffmpeg_installed() and player is None and self.dash_streams or not self.streams_sorted:
|
||||
#stream_id = list(self.dash_streams)[-1]
|
||||
itags = sorted(self.dash_streams,
|
||||
key=lambda i: -self.dash_streams[i]['size'])
|
||||
stream_id = itags[0]
|
||||
else:
|
||||
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||
|
||||
if 'index' not in kwargs:
|
||||
@ -199,16 +224,26 @@ class VideoExtractor():
|
||||
ext = self.dash_streams[stream_id]['container']
|
||||
total_size = self.dash_streams[stream_id]['size']
|
||||
|
||||
if ext == 'm3u8' or ext == 'm4a':
|
||||
ext = 'mp4'
|
||||
|
||||
if not urls:
|
||||
log.wtf('[Failed] Cannot extract video source.')
|
||||
# For legacy main()
|
||||
download_urls(urls, self.title, ext, total_size,
|
||||
headers = {}
|
||||
if self.ua is not None:
|
||||
headers['User-Agent'] = self.ua
|
||||
if self.referer is not None:
|
||||
headers['Referer'] = self.referer
|
||||
download_urls(urls, self.title, ext, total_size, headers=headers,
|
||||
output_dir=kwargs['output_dir'],
|
||||
merge=kwargs['merge'],
|
||||
av=stream_id in self.dash_streams)
|
||||
if not kwargs['caption']:
|
||||
print('Skipping captions.')
|
||||
|
||||
if 'caption' not in kwargs or not kwargs['caption']:
|
||||
print('Skipping captions or danmaku.')
|
||||
return
|
||||
|
||||
for lang in self.caption_tracks:
|
||||
filename = '%s.%s.srt' % (get_filename(self.title), lang)
|
||||
print('Saving %s ... ' % filename, end="", flush=True)
|
||||
@ -218,7 +253,20 @@ class VideoExtractor():
|
||||
x.write(srt)
|
||||
print('Done.')
|
||||
|
||||
if self.danmaku is not None and not dry_run:
|
||||
filename = '{}.cmt.xml'.format(get_filename(self.title))
|
||||
print('Downloading {} ...\n'.format(filename))
|
||||
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
|
||||
fp.write(self.danmaku)
|
||||
|
||||
if self.lyrics is not None and not dry_run:
|
||||
filename = '{}.lrc'.format(get_filename(self.title))
|
||||
print('Downloading {} ...\n'.format(filename))
|
||||
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
|
||||
fp.write(self.lyrics)
|
||||
|
||||
# For main_dev()
|
||||
#download_urls(urls, self.title, self.streams[stream_id]['container'], self.streams[stream_id]['size'])
|
||||
|
||||
keep_obj = kwargs.get('keep_obj', False)
|
||||
if not keep_obj:
|
||||
self.__init__()
|
||||
|
@ -11,9 +11,10 @@ from .bokecc import *
|
||||
from .cbs import *
|
||||
from .ckplayer import *
|
||||
from .cntv import *
|
||||
from .coub import *
|
||||
from .dailymotion import *
|
||||
from .dilidili import *
|
||||
from .douban import *
|
||||
from .douyin import *
|
||||
from .douyutv import *
|
||||
from .ehow import *
|
||||
from .facebook import *
|
||||
@ -23,7 +24,7 @@ from .freesound import *
|
||||
from .funshion import *
|
||||
from .google import *
|
||||
from .heavymusic import *
|
||||
from .huaban import *
|
||||
from .icourses import *
|
||||
from .ifeng import *
|
||||
from .imgur import *
|
||||
from .infoq import *
|
||||
@ -32,12 +33,15 @@ from .interest import *
|
||||
from .iqilu import *
|
||||
from .iqiyi import *
|
||||
from .joy import *
|
||||
from .jpopsuki import *
|
||||
from .khan import *
|
||||
from .ku6 import *
|
||||
from .kakao import *
|
||||
from .kuaishou import *
|
||||
from .kugou import *
|
||||
from .kuwo import *
|
||||
from .le import *
|
||||
from .lizhi import *
|
||||
from .longzhu import *
|
||||
from .magisto import *
|
||||
from .metacafe import *
|
||||
from .mgtv import *
|
||||
@ -45,41 +49,41 @@ from .miaopai import *
|
||||
from .miomio import *
|
||||
from .mixcloud import *
|
||||
from .mtv81 import *
|
||||
from .musicplayon import *
|
||||
from .nanagogo import *
|
||||
from .naver import *
|
||||
from .netease import *
|
||||
from .nicovideo import *
|
||||
from .panda import *
|
||||
from .pinterest import *
|
||||
from .pixnet import *
|
||||
from .pptv import *
|
||||
from .qianmo import *
|
||||
from .qie import *
|
||||
from .qingting import *
|
||||
from .qq import *
|
||||
from .showroom import *
|
||||
from .sina import *
|
||||
from .sohu import *
|
||||
from .soundcloud import *
|
||||
from .suntv import *
|
||||
from .ted import *
|
||||
from .theplatform import *
|
||||
from .thvideo import *
|
||||
from .tiktok import *
|
||||
from .tucao import *
|
||||
from .tudou import *
|
||||
from .tumblr import *
|
||||
from .twitter import *
|
||||
from .ucas import *
|
||||
from .veoh import *
|
||||
from .videomega import *
|
||||
from .vimeo import *
|
||||
from .vine import *
|
||||
from .vk import *
|
||||
from .w56 import *
|
||||
from .wanmen import *
|
||||
from .xiami import *
|
||||
from .xinpianchang import *
|
||||
from .yinyuetai import *
|
||||
from .yixia import *
|
||||
from .youku import *
|
||||
from .youtube import *
|
||||
from .ted import *
|
||||
from .khan import *
|
||||
from .zhanqi import *
|
||||
from .zhibo import *
|
||||
from .zhihu import *
|
@ -1,92 +1,213 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['acfun_download']
|
||||
|
||||
from ..common import *
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
from .le import letvcloud_download_by_vu
|
||||
from .qq import qq_download_by_vid
|
||||
from .sina import sina_download_by_vid
|
||||
from .tudou import tudou_download_by_iid
|
||||
from .youku import youku_download_by_vid, youku_open_download_by_vid
|
||||
class AcFun(VideoExtractor):
|
||||
name = "AcFun"
|
||||
|
||||
import json, re
|
||||
stream_types = [
|
||||
{'id': '2160P', 'qualityType': '2160p'},
|
||||
{'id': '1080P60', 'qualityType': '1080p60'},
|
||||
{'id': '720P60', 'qualityType': '720p60'},
|
||||
{'id': '1080P+', 'qualityType': '1080p+'},
|
||||
{'id': '1080P', 'qualityType': '1080p'},
|
||||
{'id': '720P', 'qualityType': '720p'},
|
||||
{'id': '540P', 'qualityType': '540p'},
|
||||
{'id': '360P', 'qualityType': '360p'}
|
||||
]
|
||||
|
||||
def get_srt_json(id):
|
||||
url = 'http://danmu.aixifan.com/V2/%s' % id
|
||||
return get_html(url)
|
||||
def prepare(self, **kwargs):
|
||||
assert re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/(\D|bangumi)/\D\D(\d+)', self.url)
|
||||
|
||||
def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
"""str, str, str, bool, bool ->None
|
||||
if re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', self.url):
|
||||
html = get_content(self.url, headers=fake_headers)
|
||||
json_text = match1(html, r"(?s)videoInfo\s*=\s*(\{.*?\});")
|
||||
json_data = json.loads(json_text)
|
||||
vid = json_data.get('currentVideoInfo').get('id')
|
||||
up = json_data.get('user').get('name')
|
||||
self.title = json_data.get('title')
|
||||
video_list = json_data.get('videoList')
|
||||
if len(video_list) > 1:
|
||||
self.title += " - " + [p.get('title') for p in video_list if p.get('id') == vid][0]
|
||||
currentVideoInfo = json_data.get('currentVideoInfo')
|
||||
|
||||
Download Acfun video by vid.
|
||||
elif re.match("https?://[^\.]*\.*acfun\.[^\.]+/bangumi/aa(\d+)", self.url):
|
||||
html = get_content(self.url, headers=fake_headers)
|
||||
tag_script = match1(html, r'<script>\s*window\.pageInfo([^<]+)</script>')
|
||||
json_text = tag_script[tag_script.find('{') : tag_script.find('};') + 1]
|
||||
json_data = json.loads(json_text)
|
||||
self.title = json_data['bangumiTitle'] + " " + json_data['episodeName'] + " " + json_data['title']
|
||||
vid = str(json_data['videoId'])
|
||||
up = "acfun"
|
||||
currentVideoInfo = json_data.get('currentVideoInfo')
|
||||
|
||||
Call Acfun API, decide which site to use, and pass the job to its
|
||||
extractor.
|
||||
"""
|
||||
|
||||
#first call the main parasing API
|
||||
info = json.loads(get_html('http://www.acfun.tv/video/getVideo.aspx?id=' + vid))
|
||||
|
||||
sourceType = info['sourceType']
|
||||
|
||||
#decide sourceId to know which extractor to use
|
||||
if 'sourceId' in info: sourceId = info['sourceId']
|
||||
# danmakuId = info['danmakuId']
|
||||
|
||||
#call extractor decided by sourceId
|
||||
if sourceType == 'sina':
|
||||
sina_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif sourceType == 'youku':
|
||||
youku_download_by_vid(sourceId, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
elif sourceType == 'tudou':
|
||||
tudou_download_by_iid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif sourceType == 'qq':
|
||||
qq_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif sourceType == 'letv':
|
||||
letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif sourceType == 'zhuzhan':
|
||||
#As in Jul.28.2016, Acfun is using embsig to anti hotlink so we need to pass this
|
||||
embsig = info['encode']
|
||||
a = 'http://api.aixifan.com/plays/%s' % vid
|
||||
s = json.loads(get_content(a, headers={'deviceType': '2'}))
|
||||
if s['data']['source'] == "zhuzhan-youku":
|
||||
sourceId = s['data']['sourceId']
|
||||
youku_open_download_by_vid(client_id='908a519d032263f8', vid=sourceId, title=title, output_dir=output_dir,merge=merge, info_only=info_only, embsig = embsig, **kwargs)
|
||||
else:
|
||||
raise NotImplementedError(sourceType)
|
||||
raise NotImplemented
|
||||
|
||||
if not info_only and not dry_run:
|
||||
if not kwargs['caption']:
|
||||
print('Skipping danmaku.')
|
||||
if 'ksPlayJson' in currentVideoInfo:
|
||||
durationMillis = currentVideoInfo['durationMillis']
|
||||
ksPlayJson = ksPlayJson = json.loads( currentVideoInfo['ksPlayJson'] )
|
||||
representation = ksPlayJson.get('adaptationSet')[0].get('representation')
|
||||
stream_list = representation
|
||||
|
||||
for stream in stream_list:
|
||||
m3u8_url = stream["url"]
|
||||
size = durationMillis * stream["avgBitrate"] / 8
|
||||
# size = float('inf')
|
||||
container = 'mp4'
|
||||
stream_id = stream["qualityLabel"]
|
||||
quality = stream["qualityType"]
|
||||
|
||||
stream_data = dict(src=m3u8_url, size=size, container=container, quality=quality)
|
||||
self.streams[stream_id] = stream_data
|
||||
|
||||
assert self.title and m3u8_url
|
||||
self.title = unescape_html(self.title)
|
||||
self.title = escape_file_path(self.title)
|
||||
p_title = r1('active">([^<]+)', html)
|
||||
self.title = '%s (%s)' % (self.title, up)
|
||||
if p_title:
|
||||
self.title = '%s - %s' % (self.title, p_title)
|
||||
|
||||
|
||||
def download(self, **kwargs):
|
||||
if 'json_output' in kwargs and kwargs['json_output']:
|
||||
json_output.output(self)
|
||||
elif 'info_only' in kwargs and kwargs['info_only']:
|
||||
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||
# Display the stream
|
||||
stream_id = kwargs['stream_id']
|
||||
if 'index' not in kwargs:
|
||||
self.p(stream_id)
|
||||
else:
|
||||
self.p_i(stream_id)
|
||||
else:
|
||||
# Display all available streams
|
||||
if 'index' not in kwargs:
|
||||
self.p([])
|
||||
else:
|
||||
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||
self.p_i(stream_id)
|
||||
|
||||
else:
|
||||
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||
# Download the stream
|
||||
stream_id = kwargs['stream_id']
|
||||
else:
|
||||
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||
|
||||
if 'index' not in kwargs:
|
||||
self.p(stream_id)
|
||||
else:
|
||||
self.p_i(stream_id)
|
||||
if stream_id in self.streams:
|
||||
url = self.streams[stream_id]['src']
|
||||
ext = self.streams[stream_id]['container']
|
||||
total_size = self.streams[stream_id]['size']
|
||||
|
||||
|
||||
if ext == 'm3u8' or ext == 'm4a':
|
||||
ext = 'mp4'
|
||||
|
||||
if not url:
|
||||
log.wtf('[Failed] Cannot extract video source.')
|
||||
# For legacy main()
|
||||
headers = {}
|
||||
if self.ua is not None:
|
||||
headers['User-Agent'] = self.ua
|
||||
if self.referer is not None:
|
||||
headers['Referer'] = self.referer
|
||||
|
||||
download_url_ffmpeg(url, self.title, ext, output_dir=kwargs['output_dir'], merge=kwargs['merge'])
|
||||
|
||||
if 'caption' not in kwargs or not kwargs['caption']:
|
||||
print('Skipping captions or danmaku.')
|
||||
return
|
||||
try:
|
||||
title = get_filename(title)
|
||||
print('Downloading %s ...\n' % (title + '.cmt.json'))
|
||||
cmt = get_srt_json(vid)
|
||||
with open(os.path.join(output_dir, title + '.cmt.json'), 'w', encoding='utf-8') as x:
|
||||
x.write(cmt)
|
||||
except:
|
||||
pass
|
||||
|
||||
def acfun_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
assert re.match(r'http://[^\.]+.acfun.[^\.]+/\D/\D\D(\d+)', url)
|
||||
html = get_html(url)
|
||||
for lang in self.caption_tracks:
|
||||
filename = '%s.%s.srt' % (get_filename(self.title), lang)
|
||||
print('Saving %s ... ' % filename, end="", flush=True)
|
||||
srt = self.caption_tracks[lang]
|
||||
with open(os.path.join(kwargs['output_dir'], filename),
|
||||
'w', encoding='utf-8') as x:
|
||||
x.write(srt)
|
||||
print('Done.')
|
||||
|
||||
title = r1(r'data-title="([^"]+)"', html)
|
||||
if self.danmaku is not None and not dry_run:
|
||||
filename = '{}.cmt.xml'.format(get_filename(self.title))
|
||||
print('Downloading {} ...\n'.format(filename))
|
||||
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
|
||||
fp.write(self.danmaku)
|
||||
|
||||
if self.lyrics is not None and not dry_run:
|
||||
filename = '{}.lrc'.format(get_filename(self.title))
|
||||
print('Downloading {} ...\n'.format(filename))
|
||||
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
|
||||
fp.write(self.lyrics)
|
||||
|
||||
# For main_dev()
|
||||
#download_urls(urls, self.title, self.streams[stream_id]['container'], self.streams[stream_id]['size'])
|
||||
keep_obj = kwargs.get('keep_obj', False)
|
||||
if not keep_obj:
|
||||
self.__init__()
|
||||
|
||||
|
||||
def acfun_download(self, url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
assert re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/(\D|bangumi)/\D\D(\d+)', url)
|
||||
|
||||
def getM3u8UrlFromCurrentVideoInfo(currentVideoInfo):
|
||||
if 'playInfos' in currentVideoInfo:
|
||||
return currentVideoInfo['playInfos'][0]['playUrls'][0]
|
||||
elif 'ksPlayJson' in currentVideoInfo:
|
||||
ksPlayJson = json.loads( currentVideoInfo['ksPlayJson'] )
|
||||
representation = ksPlayJson.get('adaptationSet')[0].get('representation')
|
||||
reps = []
|
||||
for one in representation:
|
||||
reps.append( (one['width']* one['height'], one['url'], one['backupUrl']) )
|
||||
return max(reps)[1]
|
||||
|
||||
|
||||
if re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url):
|
||||
html = get_content(url, headers=fake_headers)
|
||||
json_text = match1(html, r"(?s)videoInfo\s*=\s*(\{.*?\});")
|
||||
json_data = json.loads(json_text)
|
||||
vid = json_data.get('currentVideoInfo').get('id')
|
||||
up = json_data.get('user').get('name')
|
||||
title = json_data.get('title')
|
||||
video_list = json_data.get('videoList')
|
||||
if len(video_list) > 1:
|
||||
title += " - " + [p.get('title') for p in video_list if p.get('id') == vid][0]
|
||||
currentVideoInfo = json_data.get('currentVideoInfo')
|
||||
m3u8_url = getM3u8UrlFromCurrentVideoInfo(currentVideoInfo)
|
||||
elif re.match("https?://[^\.]*\.*acfun\.[^\.]+/bangumi/aa(\d+)", url):
|
||||
html = get_content(url, headers=fake_headers)
|
||||
tag_script = match1(html, r'<script>\s*window\.pageInfo([^<]+)</script>')
|
||||
json_text = tag_script[tag_script.find('{') : tag_script.find('};') + 1]
|
||||
json_data = json.loads(json_text)
|
||||
title = json_data['bangumiTitle'] + " " + json_data['episodeName'] + " " + json_data['title']
|
||||
vid = str(json_data['videoId'])
|
||||
up = "acfun"
|
||||
|
||||
currentVideoInfo = json_data.get('currentVideoInfo')
|
||||
m3u8_url = getM3u8UrlFromCurrentVideoInfo(currentVideoInfo)
|
||||
|
||||
else:
|
||||
raise NotImplemented
|
||||
|
||||
assert title and m3u8_url
|
||||
title = unescape_html(title)
|
||||
title = escape_file_path(title)
|
||||
assert title
|
||||
p_title = r1('active">([^<]+)', html)
|
||||
title = '%s (%s)' % (title, up)
|
||||
if p_title:
|
||||
title = '%s - %s' % (title, p_title)
|
||||
|
||||
vid = r1('data-vid="(\d+)"', html)
|
||||
up = r1('data-name="([^"]+)"', html)
|
||||
title = title + ' - ' + up
|
||||
acfun_download_by_vid(vid, title,
|
||||
output_dir=output_dir,
|
||||
merge=merge,
|
||||
info_only=info_only,
|
||||
**kwargs)
|
||||
print_info(site_info, title, 'm3u8', float('inf'))
|
||||
if not info_only:
|
||||
download_url_ffmpeg(m3u8_url, title, 'mp4', output_dir=output_dir, merge=merge)
|
||||
|
||||
site_info = "AcFun.tv"
|
||||
download = acfun_download
|
||||
site = AcFun()
|
||||
site_info = "AcFun.cn"
|
||||
download = site.download_by_url
|
||||
download_playlist = playlist_not_supported('acfun')
|
||||
|
@ -38,7 +38,7 @@ def baidu_get_song_title(data):
|
||||
|
||||
def baidu_get_song_lyric(data):
|
||||
lrc = data['lrcLink']
|
||||
return None if lrc is '' else "http://music.baidu.com%s" % lrc
|
||||
return "http://music.baidu.com%s" % lrc if lrc else None
|
||||
|
||||
|
||||
def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
|
||||
@ -104,42 +104,54 @@ def baidu_download_album(aid, output_dir='.', merge=True, info_only=False):
|
||||
|
||||
def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only=False, **kwargs):
|
||||
|
||||
if re.match(r'http://pan.baidu.com', url):
|
||||
if re.match(r'https?://pan.baidu.com', url):
|
||||
real_url, title, ext, size = baidu_pan_download(url)
|
||||
print_info('BaiduPan', title, ext, size)
|
||||
if not info_only:
|
||||
print('Hold on...')
|
||||
time.sleep(5)
|
||||
download_urls([real_url], title, ext, size,
|
||||
output_dir, url, merge=merge, faker=True)
|
||||
elif re.match(r'http://music.baidu.com/album/\d+', url):
|
||||
id = r1(r'http://music.baidu.com/album/(\d+)', url)
|
||||
elif re.match(r'https?://music.baidu.com/album/\d+', url):
|
||||
id = r1(r'https?://music.baidu.com/album/(\d+)', url)
|
||||
baidu_download_album(id, output_dir, merge, info_only)
|
||||
|
||||
elif re.match('http://music.baidu.com/song/\d+', url):
|
||||
id = r1(r'http://music.baidu.com/song/(\d+)', url)
|
||||
elif re.match('https?://music.baidu.com/song/\d+', url):
|
||||
id = r1(r'https?://music.baidu.com/song/(\d+)', url)
|
||||
baidu_download_song(id, output_dir, merge, info_only)
|
||||
|
||||
elif re.match('http://tieba.baidu.com/', url):
|
||||
elif re.match('https?://tieba.baidu.com/', url):
|
||||
try:
|
||||
# embedded videos
|
||||
embed_download(url, output_dir, merge=merge, info_only=info_only)
|
||||
embed_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
except:
|
||||
# images
|
||||
html = get_html(url)
|
||||
title = r1(r'title:"([^"]+)"', html)
|
||||
|
||||
vhsrc = re.findall(r'"BDE_Image"[^>]+src="([^"]+\.mp4)"', html) or \
|
||||
re.findall(r'vhsrc="([^"]+)"', html)
|
||||
if len(vhsrc) > 0:
|
||||
ext = 'mp4'
|
||||
size = url_size(vhsrc[0])
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(vhsrc, title, ext, size,
|
||||
output_dir=output_dir, merge=False)
|
||||
|
||||
items = re.findall(
|
||||
r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
|
||||
urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i
|
||||
r'//tiebapic.baidu.com/forum/w[^"]+/([^/"]+)', html)
|
||||
urls = ['http://tiebapic.baidu.com/forum/pic/item/' + i
|
||||
for i in set(items)]
|
||||
|
||||
# handle albums
|
||||
kw = r1(r'kw=([^&]+)', html) or r1(r"kw:'([^']+)'", html)
|
||||
tid = r1(r'tid=(\d+)', html) or r1(r"tid:'([^']+)'", html)
|
||||
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s' % (
|
||||
kw, tid)
|
||||
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s&pe=%s' % (kw, tid, 1000)
|
||||
album_info = json.loads(get_content(album_url))
|
||||
for i in album_info['data']['pic_list']:
|
||||
urls.append(
|
||||
'http://imgsrc.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
|
||||
'http://tiebapic.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
|
||||
|
||||
ext = 'jpg'
|
||||
size = float('Inf')
|
||||
@ -210,9 +222,6 @@ def baidu_pan_download(url):
|
||||
title_wrapped = json.loads('{"wrapper":"%s"}' % title)
|
||||
title = title_wrapped['wrapper']
|
||||
logging.debug(real_url)
|
||||
print_info(site_info, title, ext, size)
|
||||
print('Hold on...')
|
||||
time.sleep(5)
|
||||
return real_url, title, ext, size
|
||||
|
||||
|
||||
|
@ -6,6 +6,16 @@ from ..common import *
|
||||
|
||||
import urllib
|
||||
|
||||
def baomihua_headers(referer=None, cookie=None):
|
||||
# a reasonable UA
|
||||
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
|
||||
headers = {'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'User-Agent': ua}
|
||||
if referer is not None:
|
||||
headers.update({'Referer': referer})
|
||||
if cookie is not None:
|
||||
headers.update({'Cookie': cookie})
|
||||
return headers
|
||||
|
||||
def baomihua_download_by_id(id, title=None, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html('http://play.baomihua.com/getvideourl.aspx?flvid=%s&devicetype=phone_app' % id)
|
||||
host = r1(r'host=([^&]*)', html)
|
||||
@ -14,11 +24,12 @@ def baomihua_download_by_id(id, title=None, output_dir='.', merge=True, info_onl
|
||||
assert type
|
||||
vid = r1(r'&stream_name=([^&]*)', html)
|
||||
assert vid
|
||||
url = "http://%s/pomoho_video/%s.%s" % (host, vid, type)
|
||||
_, ext, size = url_info(url)
|
||||
dir_str = r1(r'&dir=([^&]*)', html).strip()
|
||||
url = "http://%s/%s/%s.%s" % (host, dir_str, vid, type)
|
||||
_, ext, size = url_info(url, headers=baomihua_headers())
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge = merge)
|
||||
download_urls([url], title, ext, size, output_dir, merge = merge, headers=baomihua_headers())
|
||||
|
||||
def baomihua_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url)
|
||||
|
@ -1,196 +1,770 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['bilibili_download']
|
||||
|
||||
from ..common import *
|
||||
|
||||
from .sina import sina_download_by_vid
|
||||
from .tudou import tudou_download_by_id
|
||||
from .youku import youku_download_by_vid
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
|
||||
appkey = 'f3bb208b3d081dc8'
|
||||
SECRETKEY_MINILOADER = '1c15888dc316e05a15fdd0a02ed6584f'
|
||||
class Bilibili(VideoExtractor):
|
||||
name = "Bilibili"
|
||||
|
||||
def get_srt_xml(id):
|
||||
url = 'http://comment.bilibili.com/%s.xml' % id
|
||||
return get_html(url)
|
||||
# Bilibili media encoding options, in descending quality order.
|
||||
stream_types = [
|
||||
{'id': 'hdflv2_4k', 'quality': 120, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '2160p', 'desc': '超清 4K'},
|
||||
{'id': 'flv_p60', 'quality': 116, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P60'},
|
||||
{'id': 'hdflv2', 'quality': 112, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P+'},
|
||||
{'id': 'flv', 'quality': 80, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P'},
|
||||
{'id': 'flv720_p60', 'quality': 74, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P60'},
|
||||
{'id': 'flv720', 'quality': 64, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P'},
|
||||
{'id': 'hdmp4', 'quality': 48, 'audio_quality': 30280,
|
||||
'container': 'MP4', 'video_resolution': '720p', 'desc': '高清 720P (MP4)'},
|
||||
{'id': 'flv480', 'quality': 32, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '480p', 'desc': '清晰 480P'},
|
||||
{'id': 'flv360', 'quality': 16, 'audio_quality': 30216,
|
||||
'container': 'FLV', 'video_resolution': '360p', 'desc': '流畅 360P'},
|
||||
# 'quality': 15?
|
||||
{'id': 'mp4', 'quality': 0},
|
||||
|
||||
{'id': 'jpg', 'quality': 0},
|
||||
]
|
||||
|
||||
def parse_srt_p(p):
|
||||
fields = p.split(',')
|
||||
assert len(fields) == 8, fields
|
||||
time, mode, font_size, font_color, pub_time, pool, user_id, history = fields
|
||||
time = float(time)
|
||||
@staticmethod
|
||||
def height_to_quality(height, qn):
|
||||
if height <= 360 and qn <= 16:
|
||||
return 16
|
||||
elif height <= 480 and qn <= 32:
|
||||
return 32
|
||||
elif height <= 720 and qn <= 64:
|
||||
return 64
|
||||
elif height <= 1080 and qn <= 80:
|
||||
return 80
|
||||
elif height <= 1080 and qn <= 112:
|
||||
return 112
|
||||
else:
|
||||
return 120
|
||||
|
||||
mode = int(mode)
|
||||
assert 1 <= mode <= 8
|
||||
# mode 1~3: scrolling
|
||||
# mode 4: bottom
|
||||
# mode 5: top
|
||||
# mode 6: reverse?
|
||||
# mode 7: position
|
||||
# mode 8: advanced
|
||||
@staticmethod
|
||||
def bilibili_headers(referer=None, cookie=None):
|
||||
# a reasonable UA
|
||||
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
|
||||
headers = {'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'User-Agent': ua}
|
||||
if referer is not None:
|
||||
headers.update({'Referer': referer})
|
||||
if cookie is not None:
|
||||
headers.update({'Cookie': cookie})
|
||||
return headers
|
||||
|
||||
pool = int(pool)
|
||||
assert 0 <= pool <= 2
|
||||
# pool 0: normal
|
||||
# pool 1: srt
|
||||
# pool 2: special?
|
||||
@staticmethod
|
||||
def bilibili_api(avid, cid, qn=0):
|
||||
return 'https://api.bilibili.com/x/player/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&fnver=0&fnval=16&fourk=1' % (avid, cid, qn)
|
||||
|
||||
font_size = int(font_size)
|
||||
@staticmethod
|
||||
def bilibili_audio_api(sid):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/url?sid=%s' % sid
|
||||
|
||||
font_color = '#%06x' % int(font_color)
|
||||
@staticmethod
|
||||
def bilibili_audio_info_api(sid):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/song/info?sid=%s' % sid
|
||||
|
||||
return pool, mode, font_size, font_color
|
||||
@staticmethod
|
||||
def bilibili_audio_menu_info_api(sid):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/menu/info?sid=%s' % sid
|
||||
|
||||
@staticmethod
|
||||
def bilibili_audio_menu_song_api(sid, ps=100):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/song/of-menu?sid=%s&pn=1&ps=%s' % (sid, ps)
|
||||
|
||||
def parse_srt_xml(xml):
|
||||
d = re.findall(r'<d p="([^"]+)">(.*)</d>', xml)
|
||||
for x, y in d:
|
||||
p = parse_srt_p(x)
|
||||
raise NotImplementedError()
|
||||
@staticmethod
|
||||
def bilibili_bangumi_api(avid, cid, ep_id, qn=0, fnval=16):
|
||||
return 'https://api.bilibili.com/pgc/player/web/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&ep_id=%s&fnver=0&fnval=%s' % (avid, cid, qn, ep_id, fnval)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_interface_api(cid, qn=0):
|
||||
entropy = 'rbMCKn@KuamXWlPMoJGsKcbiJKUfkPF_8dABscJntvqhRSETg'
|
||||
appkey, sec = ''.join([chr(ord(i) + 2) for i in entropy[::-1]]).split(':')
|
||||
params = 'appkey=%s&cid=%s&otype=json&qn=%s&quality=%s&type=' % (appkey, cid, qn, qn)
|
||||
chksum = hashlib.md5(bytes(params + sec, 'utf8')).hexdigest()
|
||||
return 'https://interface.bilibili.com/v2/playurl?%s&sign=%s' % (params, chksum)
|
||||
|
||||
def parse_cid_playurl(xml):
|
||||
from xml.dom.minidom import parseString
|
||||
@staticmethod
|
||||
def bilibili_live_api(cid):
|
||||
return 'https://api.live.bilibili.com/room/v1/Room/playUrl?cid=%s&quality=0&platform=web' % cid
|
||||
|
||||
@staticmethod
|
||||
def bilibili_live_room_info_api(room_id):
|
||||
return 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id=%s' % room_id
|
||||
|
||||
@staticmethod
|
||||
def bilibili_live_room_init_api(room_id):
|
||||
return 'https://api.live.bilibili.com/room/v1/Room/room_init?id=%s' % room_id
|
||||
|
||||
@staticmethod
|
||||
def bilibili_space_channel_api(mid, cid, pn=1, ps=100):
|
||||
return 'https://api.bilibili.com/x/space/channel/video?mid=%s&cid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (mid, cid, pn, ps)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_space_favlist_api(fid, pn=1, ps=20):
|
||||
return 'https://api.bilibili.com/x/v3/fav/resource/list?media_id=%s&pn=%s&ps=%s&order=mtime&type=0&tid=0&jsonp=jsonp' % (fid, pn, ps)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_space_video_api(mid, pn=1, ps=100):
|
||||
return "https://api.bilibili.com/x/space/arc/search?mid=%s&pn=%s&ps=%s&tid=0&keyword=&order=pubdate&jsonp=jsonp" % (mid, pn, ps)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_vc_api(video_id):
|
||||
return 'https://api.vc.bilibili.com/clip/v1/video/detail?video_id=%s' % video_id
|
||||
|
||||
@staticmethod
|
||||
def bilibili_h_api(doc_id):
|
||||
return 'https://api.vc.bilibili.com/link_draw/v1/doc/detail?doc_id=%s' % doc_id
|
||||
|
||||
@staticmethod
|
||||
def url_size(url, faker=False, headers={},err_value=0):
|
||||
try:
|
||||
doc = parseString(xml.encode('utf-8'))
|
||||
urls = [durl.getElementsByTagName('url')[0].firstChild.nodeValue for durl in doc.getElementsByTagName('durl')]
|
||||
return urls
|
||||
return url_size(url,faker,headers)
|
||||
except:
|
||||
return []
|
||||
return err_value
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
self.stream_qualities = {s['quality']: s for s in self.stream_types}
|
||||
|
||||
def bilibili_download_by_cids(cids, title, output_dir='.', merge=True, info_only=False):
|
||||
urls = []
|
||||
for cid in cids:
|
||||
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
|
||||
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
|
||||
urls += [i
|
||||
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
|
||||
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
|
||||
for i in parse_cid_playurl(get_content(url))]
|
||||
try:
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers(referer=self.url))
|
||||
except:
|
||||
html_content = '' # live always returns 400 (why?)
|
||||
#self.title = match1(html_content,
|
||||
# r'<h1 title="([^"]+)"')
|
||||
|
||||
type_ = ''
|
||||
size = 0
|
||||
for url in urls:
|
||||
_, type_, temp = url_info(url)
|
||||
size += temp
|
||||
# redirect: watchlater
|
||||
if re.match(r'https?://(www\.)?bilibili\.com/watchlater/#/(av(\d+)|BV(\S+)/?)', self.url):
|
||||
avid = match1(self.url, r'/(av\d+)') or match1(self.url, r'/(BV\w+)')
|
||||
p = int(match1(self.url, r'/p(\d+)') or '1')
|
||||
self.url = 'https://www.bilibili.com/video/%s?p=%s' % (avid, p)
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers())
|
||||
|
||||
print_info(site_info, title, type_, size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def bilibili_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
|
||||
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
|
||||
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
|
||||
urls = [i
|
||||
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
|
||||
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
|
||||
for i in parse_cid_playurl(get_content(url))]
|
||||
|
||||
type_ = ''
|
||||
size = 0
|
||||
for url in urls:
|
||||
_, type_, temp = url_info(url)
|
||||
size += temp or 0
|
||||
|
||||
print_info(site_info, title, type_, size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def bilibili_live_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
|
||||
api_url = 'http://live.bilibili.com/api/playurl?cid=' + cid
|
||||
urls = parse_cid_playurl(get_content(api_url))
|
||||
|
||||
for url in urls:
|
||||
_, type_, _ = url_info(url)
|
||||
size = 0
|
||||
print_info(site_info, title, type_, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_content(url)
|
||||
|
||||
if re.match(r'https?://bangumi\.bilibili\.com/', url):
|
||||
# quick hack for bangumi URLs
|
||||
url = r1(r'"([^"]+)" class="v-av-link"', html)
|
||||
html = get_content(url)
|
||||
|
||||
title = r1_of([r'<meta name="title" content="\s*([^<>]{1,999})\s*" />',
|
||||
r'<h1[^>]*>\s*([^<>]+)\s*</h1>'], html)
|
||||
if title:
|
||||
title = unescape_html(title)
|
||||
title = escape_file_path(title)
|
||||
|
||||
flashvars = r1_of([r'(cid=\d+)', r'(cid: \d+)', r'flashvars="([^"]+)"',
|
||||
r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
|
||||
assert flashvars
|
||||
flashvars = flashvars.replace(': ', '=')
|
||||
t, cid = flashvars.split('=', 1)
|
||||
cid = cid.split('&')[0]
|
||||
if t == 'cid':
|
||||
if re.match(r'https?://live\.bilibili\.com/', url):
|
||||
title = r1(r'<title>\s*([^<>]+)\s*</title>', html)
|
||||
bilibili_live_download_by_cid(cid, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
# redirect: bangumi/play/ss -> bangumi/play/ep
|
||||
# redirect: bangumi.bilibili.com/anime -> bangumi/play/ep
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ss(\d+)', self.url) or \
|
||||
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)/play', self.url):
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
ep_id = initial_state['epList'][0]['id']
|
||||
self.url = 'https://www.bilibili.com/bangumi/play/ep%s' % ep_id
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
# sort it out
|
||||
if re.match(r'https?://(www\.)?bilibili\.com/audio/au(\d+)', self.url):
|
||||
sort = 'audio'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
|
||||
sort = 'bangumi'
|
||||
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
|
||||
sort = 'bangumi'
|
||||
elif re.match(r'https?://live\.bilibili\.com/', self.url):
|
||||
sort = 'live'
|
||||
elif re.match(r'https?://vc\.bilibili\.com/video/(\d+)', self.url):
|
||||
sort = 'vc'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/video/(av(\d+)|(BV(\S+)))', self.url):
|
||||
sort = 'video'
|
||||
elif re.match(r'https?://h\.?bilibili\.com/(\d+)', self.url):
|
||||
sort = 'h'
|
||||
else:
|
||||
# multi-P
|
||||
cids = []
|
||||
pages = re.findall('<option value=\'([^\']*)\'', html)
|
||||
titles = re.findall('<option value=.*>\s*([^<>]+)\s*</option>', html)
|
||||
for i, page in enumerate(pages):
|
||||
html = get_html("http://www.bilibili.com%s" % page)
|
||||
flashvars = r1_of([r'(cid=\d+)',
|
||||
r'flashvars="([^"]+)"',
|
||||
r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
|
||||
if flashvars:
|
||||
t, cid = flashvars.split('=', 1)
|
||||
cids.append(cid.split('&')[0])
|
||||
if url.endswith(page):
|
||||
cids = [cid.split('&')[0]]
|
||||
titles = [titles[i]]
|
||||
break
|
||||
|
||||
# no multi-P
|
||||
if not pages:
|
||||
cids = [cid]
|
||||
titles = [r1(r'<option value=.* selected>\s*([^<>]+)\s*</option>', html) or title]
|
||||
|
||||
for i in range(len(cids)):
|
||||
bilibili_download_by_cid(cids[i],
|
||||
titles[i],
|
||||
output_dir=output_dir,
|
||||
merge=merge,
|
||||
info_only=info_only)
|
||||
|
||||
elif t == 'vid':
|
||||
sina_download_by_vid(cid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif t == 'ykid':
|
||||
youku_download_by_vid(cid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif t == 'uid':
|
||||
tudou_download_by_id(cid, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
else:
|
||||
raise NotImplementedError(flashvars)
|
||||
|
||||
if not info_only and not dry_run:
|
||||
if not kwargs['caption']:
|
||||
print('Skipping danmaku.')
|
||||
self.download_playlist_by_url(self.url, **kwargs)
|
||||
return
|
||||
title = get_filename(title)
|
||||
print('Downloading %s ...\n' % (title + '.cmt.xml'))
|
||||
xml = get_srt_xml(cid)
|
||||
with open(os.path.join(output_dir, title + '.cmt.xml'), 'w', encoding='utf-8') as x:
|
||||
x.write(xml)
|
||||
|
||||
# regular av video
|
||||
if sort == 'video':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
|
||||
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo = json.loads(playinfo_text) if playinfo_text else None
|
||||
|
||||
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
|
||||
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
|
||||
|
||||
# warn if it is a multi-part video
|
||||
pn = initial_state['videoData']['videos']
|
||||
if pn > 1 and not kwargs.get('playlist'):
|
||||
log.w('This is a multipart video. (use --playlist to download all parts.)')
|
||||
|
||||
# set video title
|
||||
self.title = initial_state['videoData']['title']
|
||||
# refine title for a specific part, if it is a multi-part video
|
||||
p = int(match1(self.url, r'[\?&]p=(\d+)') or match1(self.url, r'/index_(\d+)') or
|
||||
'1') # use URL to decide p-number, not initial_state['p']
|
||||
if pn > 1:
|
||||
part = initial_state['videoData']['pages'][p - 1]['part']
|
||||
self.title = '%s (P%s. %s)' % (self.title, p, part)
|
||||
|
||||
# construct playinfos
|
||||
avid = initial_state['aid']
|
||||
cid = initial_state['videoData']['pages'][p - 1]['cid'] # use p-number, not initial_state['videoData']['cid']
|
||||
current_quality, best_quality = None, None
|
||||
if playinfo is not None:
|
||||
current_quality = playinfo['data']['quality'] or None # 0 indicates an error, fallback to None
|
||||
if 'accept_quality' in playinfo['data'] and playinfo['data']['accept_quality'] != []:
|
||||
best_quality = playinfo['data']['accept_quality'][0]
|
||||
playinfos = []
|
||||
if playinfo is not None:
|
||||
playinfos.append(playinfo)
|
||||
if playinfo_ is not None:
|
||||
playinfos.append(playinfo_)
|
||||
# get alternative formats from API
|
||||
for qn in [120, 112, 80, 64, 32, 16]:
|
||||
# automatic format for durl: qn=0
|
||||
# for dash, qn does not matter
|
||||
if current_quality is None or qn < current_quality:
|
||||
api_url = self.bilibili_api(avid, cid, qn=qn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
api_playinfo = json.loads(api_content)
|
||||
if api_playinfo['code'] == 0: # success
|
||||
playinfos.append(api_playinfo)
|
||||
else:
|
||||
message = api_playinfo['data']['message']
|
||||
if best_quality is None or qn <= best_quality:
|
||||
api_url = self.bilibili_interface_api(cid, qn=qn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
api_playinfo_data = json.loads(api_content)
|
||||
if api_playinfo_data.get('quality'):
|
||||
playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data})
|
||||
if not playinfos:
|
||||
log.w(message)
|
||||
# use bilibili error video instead
|
||||
url = 'https://static.hdslb.com/error.mp4'
|
||||
_, container, size = url_info(url)
|
||||
self.streams['flv480'] = {'container': container, 'size': size, 'src': [url]}
|
||||
return
|
||||
|
||||
for playinfo in playinfos:
|
||||
quality = playinfo['data']['quality']
|
||||
format_id = self.stream_qualities[quality]['id']
|
||||
container = self.stream_qualities[quality]['container'].lower()
|
||||
desc = self.stream_qualities[quality]['desc']
|
||||
|
||||
if 'durl' in playinfo['data']:
|
||||
src, size = [], 0
|
||||
for durl in playinfo['data']['durl']:
|
||||
src.append(durl['url'])
|
||||
size += durl['size']
|
||||
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
|
||||
|
||||
# DASH formats
|
||||
if 'dash' in playinfo['data']:
|
||||
audio_size_cache = {}
|
||||
for video in playinfo['data']['dash']['video']:
|
||||
# prefer the latter codecs!
|
||||
s = self.stream_qualities[video['id']]
|
||||
format_id = 'dash-' + s['id'] # prefix
|
||||
container = 'mp4' # enforce MP4 container
|
||||
desc = s['desc']
|
||||
audio_quality = s['audio_quality']
|
||||
baseurl = video['baseUrl']
|
||||
size = self.url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
# find matching audio track
|
||||
if playinfo['data']['dash']['audio']:
|
||||
audio_baseurl = playinfo['data']['dash']['audio'][0]['baseUrl']
|
||||
for audio in playinfo['data']['dash']['audio']:
|
||||
if int(audio['id']) == audio_quality:
|
||||
audio_baseurl = audio['baseUrl']
|
||||
break
|
||||
if not audio_size_cache.get(audio_quality, False):
|
||||
audio_size_cache[audio_quality] = self.url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
size += audio_size_cache[audio_quality]
|
||||
|
||||
self.dash_streams[format_id] = {'container': container, 'quality': desc,
|
||||
'src': [[baseurl], [audio_baseurl]], 'size': size}
|
||||
else:
|
||||
self.dash_streams[format_id] = {'container': container, 'quality': desc,
|
||||
'src': [[baseurl]], 'size': size}
|
||||
|
||||
# get danmaku
|
||||
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
|
||||
|
||||
# bangumi
|
||||
elif sort == 'bangumi':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
|
||||
# warn if this bangumi has more than 1 video
|
||||
epn = len(initial_state['epList'])
|
||||
if epn > 1 and not kwargs.get('playlist'):
|
||||
log.w('This bangumi currently has %s videos. (use --playlist to download all videos.)' % epn)
|
||||
|
||||
# set video title
|
||||
self.title = initial_state['h1Title']
|
||||
|
||||
# construct playinfos
|
||||
ep_id = initial_state['epInfo']['id']
|
||||
avid = initial_state['epInfo']['aid']
|
||||
cid = initial_state['epInfo']['cid']
|
||||
playinfos = []
|
||||
api_url = self.bilibili_bangumi_api(avid, cid, ep_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
api_playinfo = json.loads(api_content)
|
||||
if api_playinfo['code'] == 0: # success
|
||||
playinfos.append(api_playinfo)
|
||||
else:
|
||||
log.e(api_playinfo['message'])
|
||||
return
|
||||
current_quality = api_playinfo['result']['quality']
|
||||
# get alternative formats from API
|
||||
for fnval in [8, 16]:
|
||||
for qn in [120, 112, 80, 64, 32, 16]:
|
||||
# automatic format for durl: qn=0
|
||||
# for dash, qn does not matter
|
||||
if qn != current_quality:
|
||||
api_url = self.bilibili_bangumi_api(avid, cid, ep_id, qn=qn, fnval=fnval)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
api_playinfo = json.loads(api_content)
|
||||
if api_playinfo['code'] == 0: # success
|
||||
playinfos.append(api_playinfo)
|
||||
|
||||
for playinfo in playinfos:
|
||||
if 'durl' in playinfo['result']:
|
||||
quality = playinfo['result']['quality']
|
||||
format_id = self.stream_qualities[quality]['id']
|
||||
container = self.stream_qualities[quality]['container'].lower()
|
||||
desc = self.stream_qualities[quality]['desc']
|
||||
|
||||
src, size = [], 0
|
||||
for durl in playinfo['result']['durl']:
|
||||
src.append(durl['url'])
|
||||
size += durl['size']
|
||||
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
|
||||
|
||||
# DASH formats
|
||||
if 'dash' in playinfo['result']:
|
||||
for video in playinfo['result']['dash']['video']:
|
||||
# playinfo['result']['quality'] does not reflect the correct quality of DASH stream
|
||||
quality = self.height_to_quality(video['height'], video['id']) # convert height to quality code
|
||||
s = self.stream_qualities[quality]
|
||||
format_id = 'dash-' + s['id'] # prefix
|
||||
container = 'mp4' # enforce MP4 container
|
||||
desc = s['desc']
|
||||
audio_quality = s['audio_quality']
|
||||
baseurl = video['baseUrl']
|
||||
size = url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
# find matching audio track
|
||||
audio_baseurl = playinfo['result']['dash']['audio'][0]['baseUrl']
|
||||
for audio in playinfo['result']['dash']['audio']:
|
||||
if int(audio['id']) == audio_quality:
|
||||
audio_baseurl = audio['baseUrl']
|
||||
break
|
||||
size += url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
self.dash_streams[format_id] = {'container': container, 'quality': desc,
|
||||
'src': [[baseurl], [audio_baseurl]], 'size': size}
|
||||
|
||||
# get danmaku
|
||||
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
|
||||
|
||||
# vc video
|
||||
elif sort == 'vc':
|
||||
video_id = match1(self.url, r'https?://vc\.?bilibili\.com/video/(\d+)')
|
||||
api_url = self.bilibili_vc_api(video_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo = json.loads(api_content)
|
||||
|
||||
# set video title
|
||||
self.title = '%s (%s)' % (api_playinfo['data']['user']['name'], api_playinfo['data']['item']['id'])
|
||||
|
||||
height = api_playinfo['data']['item']['height']
|
||||
quality = self.height_to_quality(height) # convert height to quality code
|
||||
s = self.stream_qualities[quality]
|
||||
format_id = s['id']
|
||||
container = 'mp4' # enforce MP4 container
|
||||
desc = s['desc']
|
||||
|
||||
playurl = api_playinfo['data']['item']['video_playurl']
|
||||
size = int(api_playinfo['data']['item']['video_size'])
|
||||
|
||||
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': [playurl]}
|
||||
|
||||
# live
|
||||
elif sort == 'live':
|
||||
m = re.match(r'https?://live\.bilibili\.com/(\w+)', self.url)
|
||||
short_id = m.group(1)
|
||||
api_url = self.bilibili_live_room_init_api(short_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
room_init_info = json.loads(api_content)
|
||||
|
||||
room_id = room_init_info['data']['room_id']
|
||||
api_url = self.bilibili_live_room_info_api(room_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
room_info = json.loads(api_content)
|
||||
|
||||
# set video title
|
||||
self.title = room_info['data']['title'] + '.' + str(int(time.time()))
|
||||
|
||||
api_url = self.bilibili_live_api(room_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
video_info = json.loads(api_content)
|
||||
|
||||
durls = video_info['data']['durl']
|
||||
playurl = durls[0]['url']
|
||||
container = 'flv' # enforce FLV container
|
||||
self.streams['flv'] = {'container': container, 'quality': 'unknown',
|
||||
'size': 0, 'src': [playurl]}
|
||||
|
||||
# audio
|
||||
elif sort == 'audio':
|
||||
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/au(\d+)', self.url)
|
||||
sid = m.group(1)
|
||||
api_url = self.bilibili_audio_info_api(sid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
song_info = json.loads(api_content)
|
||||
|
||||
# set audio title
|
||||
self.title = song_info['data']['title']
|
||||
|
||||
# get lyrics
|
||||
self.lyrics = get_content(song_info['data']['lyric'])
|
||||
|
||||
api_url = self.bilibili_audio_api(sid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
audio_info = json.loads(api_content)
|
||||
|
||||
playurl = audio_info['data']['cdns'][0]
|
||||
size = audio_info['data']['size']
|
||||
container = 'mp4' # enforce MP4 container
|
||||
self.streams['mp4'] = {'container': container,
|
||||
'size': size, 'src': [playurl]}
|
||||
|
||||
# h images
|
||||
elif sort == 'h':
|
||||
m = re.match(r'https?://h\.?bilibili\.com/(\d+)', self.url)
|
||||
doc_id = m.group(1)
|
||||
api_url = self.bilibili_h_api(doc_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
h_info = json.loads(api_content)
|
||||
|
||||
urls = []
|
||||
for pic in h_info['data']['item']['pictures']:
|
||||
img_src = pic['img_src']
|
||||
urls.append(img_src)
|
||||
size = urls_size(urls)
|
||||
|
||||
self.title = doc_id
|
||||
container = 'jpg' # enforce JPG container
|
||||
self.streams[container] = {'container': container,
|
||||
'size': size, 'src': urls}
|
||||
|
||||
def prepare_by_cid(self,avid,cid,title,html_content,playinfo,playinfo_,url):
|
||||
#response for interaction video
|
||||
#主要针对互动视频,使用cid而不是url来相互区分
|
||||
|
||||
self.stream_qualities = {s['quality']: s for s in self.stream_types}
|
||||
self.title = title
|
||||
self.url = url
|
||||
|
||||
current_quality, best_quality = None, None
|
||||
if playinfo is not None:
|
||||
current_quality = playinfo['data']['quality'] or None # 0 indicates an error, fallback to None
|
||||
if 'accept_quality' in playinfo['data'] and playinfo['data']['accept_quality'] != []:
|
||||
best_quality = playinfo['data']['accept_quality'][0]
|
||||
playinfos = []
|
||||
if playinfo is not None:
|
||||
playinfos.append(playinfo)
|
||||
if playinfo_ is not None:
|
||||
playinfos.append(playinfo_)
|
||||
# get alternative formats from API
|
||||
for qn in [80, 64, 32, 16]:
|
||||
# automatic format for durl: qn=0
|
||||
# for dash, qn does not matter
|
||||
if current_quality is None or qn < current_quality:
|
||||
api_url = self.bilibili_api(avid, cid, qn=qn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo = json.loads(api_content)
|
||||
if api_playinfo['code'] == 0: # success
|
||||
playinfos.append(api_playinfo)
|
||||
else:
|
||||
message = api_playinfo['data']['message']
|
||||
if best_quality is None or qn <= best_quality:
|
||||
api_url = self.bilibili_interface_api(cid, qn=qn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo_data = json.loads(api_content)
|
||||
if api_playinfo_data.get('quality'):
|
||||
playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data})
|
||||
if not playinfos:
|
||||
log.w(message)
|
||||
# use bilibili error video instead
|
||||
url = 'https://static.hdslb.com/error.mp4'
|
||||
_, container, size = url_info(url)
|
||||
self.streams['flv480'] = {'container': container, 'size': size, 'src': [url]}
|
||||
return
|
||||
|
||||
for playinfo in playinfos:
|
||||
quality = playinfo['data']['quality']
|
||||
format_id = self.stream_qualities[quality]['id']
|
||||
container = self.stream_qualities[quality]['container'].lower()
|
||||
desc = self.stream_qualities[quality]['desc']
|
||||
|
||||
if 'durl' in playinfo['data']:
|
||||
src, size = [], 0
|
||||
for durl in playinfo['data']['durl']:
|
||||
src.append(durl['url'])
|
||||
size += durl['size']
|
||||
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
|
||||
|
||||
# DASH formats
|
||||
if 'dash' in playinfo['data']:
|
||||
audio_size_cache = {}
|
||||
for video in playinfo['data']['dash']['video']:
|
||||
# prefer the latter codecs!
|
||||
s = self.stream_qualities[video['id']]
|
||||
format_id = 'dash-' + s['id'] # prefix
|
||||
container = 'mp4' # enforce MP4 container
|
||||
desc = s['desc']
|
||||
audio_quality = s['audio_quality']
|
||||
baseurl = video['baseUrl']
|
||||
size = self.url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
# find matching audio track
|
||||
if playinfo['data']['dash']['audio']:
|
||||
audio_baseurl = playinfo['data']['dash']['audio'][0]['baseUrl']
|
||||
for audio in playinfo['data']['dash']['audio']:
|
||||
if int(audio['id']) == audio_quality:
|
||||
audio_baseurl = audio['baseUrl']
|
||||
break
|
||||
if not audio_size_cache.get(audio_quality, False):
|
||||
audio_size_cache[audio_quality] = self.url_size(audio_baseurl,
|
||||
headers=self.bilibili_headers(referer=self.url))
|
||||
size += audio_size_cache[audio_quality]
|
||||
|
||||
self.dash_streams[format_id] = {'container': container, 'quality': desc,
|
||||
'src': [[baseurl], [audio_baseurl]], 'size': size}
|
||||
else:
|
||||
self.dash_streams[format_id] = {'container': container, 'quality': desc,
|
||||
'src': [[baseurl]], 'size': size}
|
||||
|
||||
# get danmaku
|
||||
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
|
||||
|
||||
def extract(self, **kwargs):
|
||||
# set UA and referer for downloading
|
||||
headers = self.bilibili_headers(referer=self.url)
|
||||
self.ua, self.referer = headers['User-Agent'], headers['Referer']
|
||||
|
||||
if not self.streams_sorted:
|
||||
# no stream is available
|
||||
return
|
||||
|
||||
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||
# extract the stream
|
||||
stream_id = kwargs['stream_id']
|
||||
if stream_id not in self.streams and stream_id not in self.dash_streams:
|
||||
log.e('[Error] Invalid video format.')
|
||||
log.e('Run \'-i\' command with no specific video format to view all available formats.')
|
||||
exit(2)
|
||||
else:
|
||||
# extract stream with the best quality
|
||||
stream_id = self.streams_sorted[0]['id']
|
||||
|
||||
def download_playlist_by_url(self, url, **kwargs):
|
||||
self.url = url
|
||||
kwargs['playlist'] = True
|
||||
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
# sort it out
|
||||
if re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
|
||||
sort = 'bangumi'
|
||||
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
|
||||
sort = 'bangumi'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/media/md(\d+)', self.url) or \
|
||||
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)', self.url):
|
||||
sort = 'bangumi_md'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/video/(av(\d+)|BV(\S+))', self.url):
|
||||
sort = 'video'
|
||||
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url):
|
||||
sort = 'space_channel'
|
||||
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url):
|
||||
sort = 'space_favlist'
|
||||
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url):
|
||||
sort = 'space_video'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/audio/am(\d+)', self.url):
|
||||
sort = 'audio_menu'
|
||||
else:
|
||||
log.e('[Error] Unsupported URL pattern.')
|
||||
exit(1)
|
||||
|
||||
# regular av video
|
||||
if sort == 'video':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
aid = initial_state['videoData']['aid']
|
||||
pn = initial_state['videoData']['videos']
|
||||
if pn!= len(initial_state['videoData']['pages']):#interaction video 互动视频
|
||||
search_node_list = []
|
||||
download_cid_set = set([initial_state['videoData']['cid']])
|
||||
params = {
|
||||
'id': 'cid:{}'.format(initial_state['videoData']['cid']),
|
||||
'aid': str(aid)
|
||||
}
|
||||
urlcontent = get_content('https://api.bilibili.com/x/player.so?'+parse.urlencode(params), headers=self.bilibili_headers(referer='https://www.bilibili.com/video/av{}'.format(aid)))
|
||||
graph_version = json.loads(urlcontent[urlcontent.find('<interaction>')+13:urlcontent.find('</interaction>')])['graph_version']
|
||||
params = {
|
||||
'aid': str(aid),
|
||||
'graph_version': graph_version,
|
||||
'platform': 'pc',
|
||||
'portal': 0,
|
||||
'screen': 0,
|
||||
}
|
||||
node_info = json.loads(get_content('https://api.bilibili.com/x/stein/nodeinfo?'+parse.urlencode(params)))
|
||||
|
||||
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo = json.loads(playinfo_text) if playinfo_text else None
|
||||
|
||||
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
|
||||
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
|
||||
|
||||
self.prepare_by_cid(aid, initial_state['videoData']['cid'], initial_state['videoData']['title'] + ('P{}. {}'.format(1, node_info['data']['title'])),html_content,playinfo,playinfo_,url)
|
||||
self.extract(**kwargs)
|
||||
self.download(**kwargs)
|
||||
for choice in node_info['data']['edges']['choices']:
|
||||
search_node_list.append(choice['node_id'])
|
||||
if not choice['cid'] in download_cid_set:
|
||||
download_cid_set.add(choice['cid'])
|
||||
self.prepare_by_cid(aid,choice['cid'],initial_state['videoData']['title']+('P{}. {}'.format(len(download_cid_set),choice['option'])),html_content,playinfo,playinfo_,url)
|
||||
self.extract(**kwargs)
|
||||
self.download(**kwargs)
|
||||
while len(search_node_list)>0:
|
||||
node_id = search_node_list.pop(0)
|
||||
params.update({'node_id':node_id})
|
||||
node_info = json.loads(get_content('https://api.bilibili.com/x/stein/nodeinfo?'+parse.urlencode(params)))
|
||||
if node_info['data'].__contains__('edges'):
|
||||
for choice in node_info['data']['edges']['choices']:
|
||||
search_node_list.append(choice['node_id'])
|
||||
if not choice['cid'] in download_cid_set:
|
||||
download_cid_set.add(choice['cid'])
|
||||
self.prepare_by_cid(aid,choice['cid'],initial_state['videoData']['title']+('P{}. {}'.format(len(download_cid_set),choice['option'])),html_content,playinfo,playinfo_,url)
|
||||
try:
|
||||
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
|
||||
except:
|
||||
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
|
||||
self.extract(**kwargs)
|
||||
self.download(**kwargs)
|
||||
else:
|
||||
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo = json.loads(playinfo_text) if playinfo_text else None
|
||||
|
||||
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
|
||||
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
|
||||
p = int(match1(self.url, r'[\?&]p=(\d+)') or match1(self.url, r'/index_(\d+)') or '1')-1
|
||||
for pi in range(p,pn):
|
||||
self.prepare_by_cid(aid,initial_state['videoData']['pages'][pi]['cid'],'%s (P%s. %s)' % (initial_state['videoData']['title'], pi+1, initial_state['videoData']['pages'][pi]['part']),html_content,playinfo,playinfo_,url)
|
||||
try:
|
||||
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
|
||||
except:
|
||||
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
|
||||
self.extract(**kwargs)
|
||||
self.download(**kwargs)
|
||||
# purl = 'https://www.bilibili.com/video/av%s?p=%s' % (aid, pi+1)
|
||||
# self.__class__().download_by_url(purl, **kwargs)
|
||||
|
||||
elif sort == 'bangumi':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
epn, i = len(initial_state['epList']), 0
|
||||
for ep in initial_state['epList']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
ep_id = ep['id']
|
||||
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
|
||||
self.__class__().download_by_url(epurl, **kwargs)
|
||||
|
||||
elif sort == 'bangumi_md':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
epn, i = len(initial_state['mediaInfo']['episodes']), 0
|
||||
for ep in initial_state['mediaInfo']['episodes']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
ep_id = ep['ep_id']
|
||||
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
|
||||
self.__class__().download_by_url(epurl, **kwargs)
|
||||
|
||||
elif sort == 'space_channel':
|
||||
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url)
|
||||
mid, cid = m.group(1), m.group(2)
|
||||
api_url = self.bilibili_space_channel_api(mid, cid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
channel_info = json.loads(api_content)
|
||||
# TBD: channel of more than 100 videos
|
||||
|
||||
epn, i = len(channel_info['data']['list']['archives']), 0
|
||||
for video in channel_info['data']['list']['archives']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/video/av%s' % video['aid']
|
||||
self.__class__().download_playlist_by_url(url, **kwargs)
|
||||
|
||||
elif sort == 'space_favlist':
|
||||
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url)
|
||||
vmid, fid = m.group(1), m.group(2)
|
||||
api_url = self.bilibili_space_favlist_api(fid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
favlist_info = json.loads(api_content)
|
||||
pc = favlist_info['data']['info']['media_count'] // len(favlist_info['data']['medias'])
|
||||
if favlist_info['data']['info']['media_count'] % len(favlist_info['data']['medias']) != 0:
|
||||
pc += 1
|
||||
for pn in range(1, pc + 1):
|
||||
log.w('Extracting %s of %s pages ...' % (pn, pc))
|
||||
api_url = self.bilibili_space_favlist_api(fid, pn=pn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
favlist_info = json.loads(api_content)
|
||||
|
||||
epn, i = len(favlist_info['data']['medias']), 0
|
||||
for video in favlist_info['data']['medias']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/video/av%s' % video['id']
|
||||
self.__class__().download_playlist_by_url(url, **kwargs)
|
||||
|
||||
elif sort == 'space_video':
|
||||
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url)
|
||||
mid = m.group(1)
|
||||
api_url = self.bilibili_space_video_api(mid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
videos_info = json.loads(api_content)
|
||||
pc = videos_info['data']['page']['count'] // videos_info['data']['page']['ps']
|
||||
|
||||
for pn in range(1, pc + 1):
|
||||
api_url = self.bilibili_space_video_api(mid, pn=pn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
videos_info = json.loads(api_content)
|
||||
|
||||
epn, i = len(videos_info['data']['list']['vlist']), 0
|
||||
for video in videos_info['data']['list']['vlist']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/video/av%s' % video['aid']
|
||||
self.__class__().download_playlist_by_url(url, **kwargs)
|
||||
|
||||
elif sort == 'audio_menu':
|
||||
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/am(\d+)', self.url)
|
||||
sid = m.group(1)
|
||||
#api_url = self.bilibili_audio_menu_info_api(sid)
|
||||
#api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
#menu_info = json.loads(api_content)
|
||||
api_url = self.bilibili_audio_menu_song_api(sid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
menusong_info = json.loads(api_content)
|
||||
epn, i = len(menusong_info['data']['data']), 0
|
||||
for song in menusong_info['data']['data']:
|
||||
i += 1; log.w('Extracting %s of %s songs ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/audio/au%s' % song['id']
|
||||
self.__class__().download_by_url(url, **kwargs)
|
||||
|
||||
|
||||
site_info = "bilibili.com"
|
||||
download = bilibili_download
|
||||
download_playlist = bilibili_download
|
||||
site = Bilibili()
|
||||
download = site.download_by_url
|
||||
download_playlist = site.download_playlist_by_url
|
||||
|
||||
bilibili_download = download
|
||||
|
@ -52,10 +52,13 @@ class BokeCC(VideoExtractor):
|
||||
raise
|
||||
|
||||
if title is None:
|
||||
self.title = '_'.join([i.text for i in tree.iterfind('video/videomarks/videomark/markdesc')])
|
||||
self.title = '_'.join([i.text for i in self.tree.iterfind('video/videomarks/videomark/markdesc')])
|
||||
else:
|
||||
self.title = title
|
||||
|
||||
if not title:
|
||||
self.title = vid
|
||||
|
||||
for i in self.tree.iterfind('video/quality'):
|
||||
quality = i.attrib ['value']
|
||||
url = i[0].attrib['playurl']
|
||||
|
@ -6,10 +6,9 @@
|
||||
|
||||
__all__ = ['ckplayer_download']
|
||||
|
||||
from xml.etree import cElementTree as ET
|
||||
from xml.etree import ElementTree as ET
|
||||
from copy import copy
|
||||
from ..common import *
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def ckplayer_get_info_by_xml(ckinfo):
|
||||
"""str->dict
|
||||
@ -20,20 +19,22 @@ def ckplayer_get_info_by_xml(ckinfo):
|
||||
'links': [],
|
||||
'size': 0,
|
||||
'flashvars': '',}
|
||||
if '_text' in dictify(e)['ckplayer']['info'][0]['title'][0]: #title
|
||||
video_dict['title'] = dictify(e)['ckplayer']['info'][0]['title'][0]['_text'].strip()
|
||||
dictified = dictify(e)['ckplayer']
|
||||
if 'info' in dictified:
|
||||
if '_text' in dictified['info'][0]['title'][0]: #title
|
||||
video_dict['title'] = dictified['info'][0]['title'][0]['_text'].strip()
|
||||
|
||||
#if dictify(e)['ckplayer']['info'][0]['title'][0]['_text'].strip(): #duration
|
||||
#video_dict['title'] = dictify(e)['ckplayer']['info'][0]['title'][0]['_text'].strip()
|
||||
|
||||
if '_text' in dictify(e)['ckplayer']['video'][0]['size'][0]: #size exists for 1 piece
|
||||
video_dict['size'] = sum([int(i['size'][0]['_text']) for i in dictify(e)['ckplayer']['video']])
|
||||
if '_text' in dictified['video'][0]['size'][0]: #size exists for 1 piece
|
||||
video_dict['size'] = sum([int(i['size'][0]['_text']) for i in dictified['video']])
|
||||
|
||||
if '_text' in dictify(e)['ckplayer']['video'][0]['file'][0]: #link exist
|
||||
video_dict['links'] = [i['file'][0]['_text'].strip() for i in dictify(e)['ckplayer']['video']]
|
||||
if '_text' in dictified['video'][0]['file'][0]: #link exist
|
||||
video_dict['links'] = [i['file'][0]['_text'].strip() for i in dictified['video']]
|
||||
|
||||
if '_text' in dictify(e)['ckplayer']['flashvars'][0]:
|
||||
video_dict['flashvars'] = dictify(e)['ckplayer']['flashvars'][0]['_text'].strip()
|
||||
if '_text' in dictified['flashvars'][0]:
|
||||
video_dict['flashvars'] = dictified['flashvars'][0]['_text'].strip()
|
||||
|
||||
return video_dict
|
||||
|
||||
|
@ -1,49 +1,67 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['cntv_download', 'cntv_download_by_id']
|
||||
|
||||
from ..common import *
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
from ..common import get_content, r1, match1, playlist_not_supported
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
def cntv_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
|
||||
assert id
|
||||
info = json.loads(get_html('http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + id))
|
||||
title = title or info['title']
|
||||
video = info['video']
|
||||
alternatives = [x for x in video.keys() if x.endswith('hapters')]
|
||||
#assert alternatives in (['chapters'], ['lowChapters', 'chapters'], ['chapters', 'lowChapters']), alternatives
|
||||
chapters = video['chapters'] if 'chapters' in video else video['lowChapters']
|
||||
urls = [x['url'] for x in chapters]
|
||||
ext = r1(r'\.([^.]+)$', urls[0])
|
||||
assert ext in ('flv', 'mp4')
|
||||
size = 0
|
||||
for url in urls:
|
||||
_, _, temp = url_info(url)
|
||||
size += temp
|
||||
__all__ = ['cntv_download', 'cntv_download_by_id']
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
# avoid corrupted files - don't merge
|
||||
download_urls(urls, title, ext, size, output_dir = output_dir, merge = False)
|
||||
|
||||
def cntv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
class CNTV(VideoExtractor):
|
||||
name = 'CNTV.com'
|
||||
stream_types = [
|
||||
{'id': '1', 'video_profile': '1280x720_2000kb/s', 'map_to': 'chapters4'},
|
||||
{'id': '2', 'video_profile': '1280x720_1200kb/s', 'map_to': 'chapters3'},
|
||||
{'id': '3', 'video_profile': '640x360_850kb/s', 'map_to': 'chapters2'},
|
||||
{'id': '4', 'video_profile': '480x270_450kb/s', 'map_to': 'chapters'},
|
||||
{'id': '5', 'video_profile': '320x180_200kb/s', 'map_to': 'lowChapters'},
|
||||
]
|
||||
|
||||
ep = 'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid={}'
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.api_data = None
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
self.api_data = json.loads(get_content(self.__class__.ep.format(self.vid)))
|
||||
self.title = self.api_data['title']
|
||||
for s in self.api_data['video']:
|
||||
for st in self.__class__.stream_types:
|
||||
if st['map_to'] == s:
|
||||
urls = self.api_data['video'][s]
|
||||
src = [u['url'] for u in urls]
|
||||
stream_data = dict(src=src, size=0, container='mp4', video_profile=st['video_profile'])
|
||||
self.streams[st['id']] = stream_data
|
||||
|
||||
|
||||
def cntv_download_by_id(rid, **kwargs):
|
||||
CNTV().download_by_vid(rid, **kwargs)
|
||||
|
||||
|
||||
def cntv_download(url, **kwargs):
|
||||
if re.match(r'http://tv\.cntv\.cn/video/(\w+)/(\w+)', url):
|
||||
id = match1(url, r'http://tv\.cntv\.cn/video/\w+/(\w+)')
|
||||
rid = match1(url, r'http://tv\.cntv\.cn/video/\w+/(\w+)')
|
||||
elif re.match(r'http(s)?://tv\.cctv\.com/\d+/\d+/\d+/\w+.shtml', url):
|
||||
rid = r1(r'var guid = "(\w+)"', get_content(url))
|
||||
elif re.match(r'http://\w+\.cntv\.cn/(\w+/\w+/(classpage/video/)?)?\d+/\d+\.shtml', url) or \
|
||||
re.match(r'http://\w+.cntv.cn/(\w+/)*VIDE\d+.shtml', url) or \
|
||||
re.match(r'http://(\w+).cntv.cn/(\w+)/classpage/video/(\d+)/(\d+).shtml', url) or \
|
||||
re.match(r'http://\w+.cctv.com/\d+/\d+/\d+/\w+.shtml', url) or \
|
||||
re.match(r'http(s)?://\w+.cctv.com/\d+/\d+/\d+/\w+.shtml', url) or \
|
||||
re.match(r'http://\w+.cntv.cn/\d+/\d+/\d+/\w+.shtml', url):
|
||||
id = r1(r'videoCenterId","(\w+)"', get_html(url))
|
||||
page = get_content(url)
|
||||
rid = r1(r'videoCenterId","(\w+)"', page)
|
||||
if rid is None:
|
||||
guid = re.search(r'guid\s*=\s*"([0-9a-z]+)"', page).group(1)
|
||||
rid = guid
|
||||
elif re.match(r'http://xiyou.cntv.cn/v-[\w-]+\.html', url):
|
||||
id = r1(r'http://xiyou.cntv.cn/v-([\w-]+)\.html', url)
|
||||
rid = r1(r'http://xiyou.cntv.cn/v-([\w-]+)\.html', url)
|
||||
else:
|
||||
raise NotImplementedError(url)
|
||||
|
||||
cntv_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||
CNTV().download_by_vid(rid, **kwargs)
|
||||
|
||||
site_info = "CNTV.com"
|
||||
download = cntv_download
|
||||
|
105
src/you_get/extractors/coub.py
Normal file
105
src/you_get/extractors/coub.py
Normal file
@ -0,0 +1,105 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['coub_download']
|
||||
|
||||
from ..common import *
|
||||
from ..processor import ffmpeg
|
||||
from ..util.fs import legitimize
|
||||
|
||||
|
||||
def coub_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_content(url)
|
||||
|
||||
try:
|
||||
json_data = get_coub_data(html)
|
||||
title, video_url, audio_url = get_title_and_urls(json_data)
|
||||
video_file_name, video_file_path = get_file_path(merge, output_dir, title, video_url)
|
||||
audio_file_name, audio_file_path = get_file_path(merge, output_dir, title, audio_url)
|
||||
download_url(audio_url, merge, output_dir, title, info_only)
|
||||
download_url(video_url, merge, output_dir, title, info_only)
|
||||
if not info_only:
|
||||
try:
|
||||
fix_coub_video_file(video_file_path)
|
||||
audio_duration = float(ffmpeg.ffprobe_get_media_duration(audio_file_path))
|
||||
video_duration = float(ffmpeg.ffprobe_get_media_duration(video_file_path))
|
||||
loop_file_path = get_loop_file_path(title, output_dir)
|
||||
single_file_path = audio_file_path
|
||||
if audio_duration > video_duration:
|
||||
write_loop_file(round(audio_duration / video_duration), loop_file_path, video_file_name)
|
||||
else:
|
||||
single_file_path = audio_file_path
|
||||
write_loop_file(round(video_duration / audio_duration), loop_file_path, audio_file_name)
|
||||
|
||||
ffmpeg.ffmpeg_concat_audio_and_video([loop_file_path, single_file_path], title + "_full", "mp4")
|
||||
cleanup_files([video_file_path, audio_file_path, loop_file_path])
|
||||
except EnvironmentError as err:
|
||||
print("Error preparing full coub video. {}".format(err))
|
||||
except Exception as err:
|
||||
print("Error while downloading files. {}".format(err))
|
||||
|
||||
|
||||
def write_loop_file(records_number, loop_file_path, file_name):
|
||||
with open(loop_file_path, 'a') as file:
|
||||
for i in range(records_number):
|
||||
file.write("file '{}'\n".format(file_name))
|
||||
|
||||
|
||||
def download_url(url, merge, output_dir, title, info_only):
|
||||
mime, ext, size = url_info(url)
|
||||
print_info(site_info, title, mime, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
|
||||
def fix_coub_video_file(file_path):
|
||||
with open(file_path, 'r+b') as file:
|
||||
file.seek(0)
|
||||
file.write(bytes(2))
|
||||
|
||||
|
||||
def get_title_and_urls(json_data):
|
||||
title = legitimize(re.sub('[\s*]', "_", json_data['title']))
|
||||
video_info = json_data['file_versions']['html5']['video']
|
||||
if 'high' not in video_info:
|
||||
if 'med' not in video_info:
|
||||
video_url = video_info['low']['url']
|
||||
else:
|
||||
video_url = video_info['med']['url']
|
||||
else:
|
||||
video_url = video_info['high']['url']
|
||||
audio_info = json_data['file_versions']['html5']['audio']
|
||||
if 'high' not in audio_info:
|
||||
if 'med' not in audio_info:
|
||||
audio_url = audio_info['low']['url']
|
||||
else:
|
||||
audio_url = audio_info['med']['url']
|
||||
else:
|
||||
audio_url = audio_info['high']['url']
|
||||
return title, video_url, audio_url
|
||||
|
||||
|
||||
def get_coub_data(html):
|
||||
coub_data = r1(r'<script id=\'coubPageCoubJson\' type=\'text/json\'>([\w\W]+?(?=</script>))</script>', html)
|
||||
json_data = json.loads(coub_data)
|
||||
return json_data
|
||||
|
||||
|
||||
def get_file_path(merge, output_dir, title, url):
|
||||
mime, ext, size = url_info(url)
|
||||
file_name = get_output_filename([], title, ext, output_dir, merge)
|
||||
file_path = os.path.join(output_dir, file_name)
|
||||
return file_name, file_path
|
||||
|
||||
|
||||
def get_loop_file_path(title, output_dir):
|
||||
return os.path.join(output_dir, get_output_filename([], title, "txt", None, False))
|
||||
|
||||
|
||||
def cleanup_files(files):
|
||||
for file in files:
|
||||
os.remove(file)
|
||||
|
||||
|
||||
site_info = "coub.com"
|
||||
download = coub_download
|
||||
download_playlist = playlist_not_supported('coub')
|
@ -3,29 +3,36 @@
|
||||
__all__ = ['dailymotion_download']
|
||||
|
||||
from ..common import *
|
||||
import urllib.parse
|
||||
|
||||
def dailymotion_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
def rebuilt_url(url):
|
||||
path = urllib.parse.urlparse(url).path
|
||||
aid = path.split('/')[-1].split('_')[0]
|
||||
return 'http://www.dailymotion.com/embed/video/{}?autoplay=1'.format(aid)
|
||||
|
||||
def dailymotion_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
"""Downloads Dailymotion videos by URL.
|
||||
"""
|
||||
|
||||
html = get_content(url)
|
||||
html = get_content(rebuilt_url(url))
|
||||
info = json.loads(match1(html, r'qualities":({.+?}),"'))
|
||||
title = match1(html, r'"video_title"\s*:\s*"([^"]+)"') or \
|
||||
match1(html, r'"title"\s*:\s*"([^"]+)"')
|
||||
title = unicodize(title)
|
||||
|
||||
for quality in ['720','480','380','240','auto']:
|
||||
for quality in ['1080','720','480','380','240','144','auto']:
|
||||
try:
|
||||
real_url = info[quality][0]["url"]
|
||||
real_url = info[quality][1]["url"]
|
||||
if real_url:
|
||||
break
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
type, ext, size = url_info(real_url)
|
||||
mime, ext, size = url_info(real_url)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
print_info(site_info, title, mime, size)
|
||||
if not info_only:
|
||||
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
||||
download_urls([real_url], title, ext, size, output_dir=output_dir, merge=merge)
|
||||
|
||||
site_info = "Dailymotion.com"
|
||||
download = dailymotion_download
|
||||
|
@ -1,77 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['dilidili_download']
|
||||
|
||||
from ..common import *
|
||||
from .ckplayer import ckplayer_download
|
||||
|
||||
headers = {
|
||||
'DNT': '1',
|
||||
'Accept-Encoding': 'gzip, deflate, sdch, br',
|
||||
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
|
||||
'Cache-Control': 'max-age=0',
|
||||
'Referer': 'http://www.dilidili.com/',
|
||||
'Connection': 'keep-alive',
|
||||
'Save-Data': 'on',
|
||||
}
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def dilidili_parser_data_to_stream_types(typ ,vid ,hd2 ,sign, tmsign, ulk):
|
||||
"""->list"""
|
||||
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
|
||||
html = get_content(parse_url, headers=headers)
|
||||
|
||||
info = re.search(r'(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})', html).groups()
|
||||
info = [i.strip('{}').split('->') for i in info]
|
||||
info = {i[0]: i [1] for i in info}
|
||||
|
||||
stream_types = []
|
||||
for i in zip(info['deft'].split('|'), info['defa'].split('|')):
|
||||
stream_types.append({'id': str(i[1][-1]), 'container': 'mp4', 'video_profile': i[0]})
|
||||
return stream_types
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def dilidili_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
if re.match(r'http://www.dilidili.com/watch\S+', url):
|
||||
html = get_content(url)
|
||||
title = match1(html, r'<title>(.+)丨(.+)</title>') #title
|
||||
|
||||
# player loaded via internal iframe
|
||||
frame_url = re.search(r'<iframe src=\"(.+?)\"', html).group(1)
|
||||
#print(frame_url)
|
||||
|
||||
#https://player.005.tv:60000/?vid=a8760f03fd:a04808d307&v=yun&sign=a68f8110cacd892bc5b094c8e5348432
|
||||
html = get_content(frame_url, headers=headers, decoded=False).decode('utf-8')
|
||||
|
||||
match = re.search(r'(.+?)var video =(.+?);', html)
|
||||
vid = match1(html, r'var vid="(.+)"')
|
||||
hd2 = match1(html, r'var hd2="(.+)"')
|
||||
typ = match1(html, r'var typ="(.+)"')
|
||||
sign = match1(html, r'var sign="(.+)"')
|
||||
tmsign = match1(html, r'tmsign=([A-Za-z0-9]+)')
|
||||
ulk = match1(html, r'var ulk="(.+)"')
|
||||
|
||||
# here s the parser...
|
||||
stream_types = dilidili_parser_data_to_stream_types(typ, vid, hd2, sign, tmsign, ulk)
|
||||
|
||||
#get best
|
||||
best_id = max([i['id'] for i in stream_types])
|
||||
|
||||
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = best_id, sign = sign, tmsign = tmsign, ulk = ulk)
|
||||
|
||||
ckplayer_download(parse_url, output_dir, merge, info_only, is_xml = True, title = title, headers = headers)
|
||||
|
||||
#type_ = ''
|
||||
#size = 0
|
||||
|
||||
#type_, ext, size = url_info(url)
|
||||
#print_info(site_info, title, type_, size)
|
||||
#if not info_only:
|
||||
#download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
site_info = "dilidili"
|
||||
download = dilidili_download
|
||||
download_playlist = playlist_not_supported('dilidili')
|
@ -1,55 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
__all__ = ['dongting_download']
|
||||
|
||||
from ..common import *
|
||||
|
||||
_unit_prefixes = 'bkmg'
|
||||
|
||||
def parse_size(size):
|
||||
m = re.match(r'([\d.]+)(.(?:i?B)?)', size, re.I)
|
||||
if m:
|
||||
return int(float(m.group(1)) * 1024 **
|
||||
_unit_prefixes.index(m.group(2).lower()))
|
||||
else:
|
||||
return 0
|
||||
|
||||
def dongting_download_lyric(lrc_url, file_name, output_dir):
|
||||
j = get_html(lrc_url)
|
||||
info = json.loads(j)
|
||||
lrc = j['data']['lrc']
|
||||
filename = get_filename(file_name)
|
||||
with open(output_dir + "/" + filename + '.lrc', 'w', encoding='utf-8') as x:
|
||||
x.write(lrc)
|
||||
|
||||
def dongting_download_song(sid, output_dir = '.', merge = True, info_only = False):
|
||||
j = get_html('http://ting.hotchanson.com/detail.do?neid=%s&size=0' % sid)
|
||||
info = json.loads(j)
|
||||
|
||||
song_title = info['data']['songName']
|
||||
album_name = info['data']['albumName']
|
||||
artist = info['data']['singerName']
|
||||
ext = 'mp3'
|
||||
size = parse_size(info['data']['itemList'][-1]['size'])
|
||||
url = info['data']['itemList'][-1]['downUrl']
|
||||
|
||||
print_info(site_info, song_title, ext, size)
|
||||
if not info_only:
|
||||
file_name = "%s - %s - %s" % (song_title, album_name, artist)
|
||||
download_urls([url], file_name, ext, size, output_dir, merge = merge)
|
||||
lrc_url = ('http://lp.music.ttpod.com/lrc/down?'
|
||||
'lrcid=&artist=%s&title=%s') % (
|
||||
parse.quote(artist), parse.quote(song_title))
|
||||
try:
|
||||
dongting_download_lyric(lrc_url, file_name, output_dir)
|
||||
except:
|
||||
pass
|
||||
|
||||
def dongting_download(url, output_dir = '.', stream_type = None, merge = True, info_only = False, **kwargs):
|
||||
if re.match('http://www.dongting.com/\?song_id=\d+', url):
|
||||
id = r1(r'http://www.dongting.com/\?song_id=(\d+)', url)
|
||||
dongting_download_song(id, output_dir, merge, info_only)
|
||||
|
||||
site_info = "Dongting.com"
|
||||
download = dongting_download
|
||||
download_playlist = playlist_not_supported("dongting")
|
@ -7,7 +7,18 @@ from ..common import *
|
||||
|
||||
def douban_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
html = get_html(url)
|
||||
if 'subject' in url:
|
||||
|
||||
if re.match(r'https?://movie', url):
|
||||
title = match1(html, 'name="description" content="([^"]+)')
|
||||
tid = match1(url, 'trailer/(\d+)')
|
||||
real_url = 'https://movie.douban.com/trailer/video_url?tid=%s' % tid
|
||||
type, ext, size = url_info(real_url)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
||||
|
||||
elif 'subject' in url:
|
||||
titles = re.findall(r'data-title="([^"]*)">', html)
|
||||
song_id = re.findall(r'<li class="song-item" id="([^"]*)"', html)
|
||||
song_ssid = re.findall(r'data-ssid="([^"]*)"', html)
|
||||
|
46
src/you_get/extractors/douyin.py
Normal file
46
src/you_get/extractors/douyin.py
Normal file
@ -0,0 +1,46 @@
|
||||
# coding=utf-8
|
||||
|
||||
import re
|
||||
import json
|
||||
|
||||
from ..common import (
|
||||
url_size,
|
||||
print_info,
|
||||
get_content,
|
||||
fake_headers,
|
||||
download_urls,
|
||||
playlist_not_supported,
|
||||
)
|
||||
|
||||
|
||||
__all__ = ['douyin_download_by_url']
|
||||
|
||||
|
||||
def douyin_download_by_url(url, **kwargs):
|
||||
page_content = get_content(url, headers=fake_headers)
|
||||
match_rule = re.compile(r'var data = \[(.*?)\];')
|
||||
video_info = json.loads(match_rule.findall(page_content)[0])
|
||||
video_url = video_info['video']['play_addr']['url_list'][0]
|
||||
# fix: https://www.douyin.com/share/video/6553248251821165832
|
||||
# if there is no title, use desc
|
||||
cha_list = video_info['cha_list']
|
||||
if cha_list:
|
||||
title = cha_list[0]['cha_name']
|
||||
else:
|
||||
title = video_info['desc']
|
||||
video_format = 'mp4'
|
||||
size = url_size(video_url, faker=True)
|
||||
print_info(
|
||||
site_info='douyin.com', title=title,
|
||||
type=video_format, size=size
|
||||
)
|
||||
if not kwargs['info_only']:
|
||||
download_urls(
|
||||
urls=[video_url], title=title, ext=video_format, total_size=size,
|
||||
faker=True,
|
||||
**kwargs
|
||||
)
|
||||
|
||||
|
||||
download = douyin_download_by_url
|
||||
download_playlist = playlist_not_supported('douyin')
|
@ -3,53 +3,79 @@
|
||||
__all__ = ['douyutv_download']
|
||||
|
||||
from ..common import *
|
||||
from ..util.log import *
|
||||
import json
|
||||
import hashlib
|
||||
import time
|
||||
import uuid
|
||||
import urllib.parse, urllib.request
|
||||
import re
|
||||
|
||||
def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
html = get_content(url)
|
||||
room_id_patt = r'"room_id"\s*:\s*(\d+),'
|
||||
headers = {
|
||||
'user-agent': 'Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4'
|
||||
}
|
||||
|
||||
def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
ep = 'http://vmobile.douyu.com/video/getInfo?vid='
|
||||
patt = r'show/([0-9A-Za-z]+)'
|
||||
title_patt = r'<h1>(.+?)</h1>'
|
||||
|
||||
hit = re.search(patt, url)
|
||||
if hit is None:
|
||||
log.wtf('Unknown url pattern')
|
||||
vid = hit.group(1)
|
||||
|
||||
page = get_content(url, headers=headers)
|
||||
hit = re.search(title_patt, page)
|
||||
if hit is None:
|
||||
title = vid
|
||||
else:
|
||||
title = hit.group(1)
|
||||
|
||||
meta = json.loads(get_content(ep + vid))
|
||||
if meta['error'] != 0:
|
||||
log.wtf('Error from API server')
|
||||
m3u8_url = meta['data']['video_url']
|
||||
print_info('Douyu Video', title, 'm3u8', 0, m3u8_url=m3u8_url)
|
||||
if not info_only:
|
||||
urls = general_m3u8_extractor(m3u8_url)
|
||||
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
|
||||
def douyutv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if 'v.douyu.com/show/' in url:
|
||||
douyutv_video_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
url = re.sub(r'.*douyu.com','https://m.douyu.com/room', url)
|
||||
html = get_content(url, headers)
|
||||
room_id_patt = r'"rid"\s*:\s*(\d+),'
|
||||
room_id = match1(html, room_id_patt)
|
||||
if room_id == "0":
|
||||
room_id = url[url.rfind('/')+1:]
|
||||
room_id = url[url.rfind('/') + 1:]
|
||||
|
||||
json_request_url = "http://m.douyu.com/html5/live?roomId=%s" % room_id
|
||||
content = get_content(json_request_url)
|
||||
data = json.loads(content)['data']
|
||||
server_status = data.get('error',0)
|
||||
if server_status is not 0:
|
||||
api_url = "http://www.douyutv.com/api/v1/"
|
||||
args = "room/%s?aid=wp&client_sys=wp&time=%d" % (room_id, int(time.time()))
|
||||
auth_md5 = (args + "zNzMV1y4EMxOHS6I5WKm").encode("utf-8")
|
||||
auth_str = hashlib.md5(auth_md5).hexdigest()
|
||||
json_request_url = "%s%s&auth=%s" % (api_url, args, auth_str)
|
||||
|
||||
content = get_content(json_request_url, headers)
|
||||
json_content = json.loads(content)
|
||||
data = json_content['data']
|
||||
server_status = json_content.get('error', 0)
|
||||
if server_status != 0:
|
||||
raise ValueError("Server returned error:%s" % server_status)
|
||||
|
||||
title = data.get('room_name')
|
||||
show_status = data.get('show_status')
|
||||
if show_status is not "1":
|
||||
if show_status != "1":
|
||||
raise ValueError("The live stream is not online! (Errno:%s)" % server_status)
|
||||
|
||||
tt = int(time.time() / 60)
|
||||
did = uuid.uuid4().hex.upper()
|
||||
sign_content = '{room_id}{did}A12Svb&%1UUmf@hC{tt}'.format(room_id = room_id, did = did, tt = tt)
|
||||
sign = hashlib.md5(sign_content.encode('utf-8')).hexdigest()
|
||||
|
||||
json_request_url = "http://www.douyu.com/lapi/live/getPlay/%s" % room_id
|
||||
payload = {'cdn': 'ws', 'rate': '0', 'tt': tt, 'did': did, 'sign': sign}
|
||||
postdata = urllib.parse.urlencode(payload)
|
||||
req = urllib.request.Request(json_request_url, postdata.encode('utf-8'))
|
||||
with urllib.request.urlopen(req) as response:
|
||||
content = response.read()
|
||||
|
||||
data = json.loads(content.decode('utf-8'))['data']
|
||||
server_status = data.get('error',0)
|
||||
if server_status is not 0:
|
||||
raise ValueError("Server returned error:%s" % server_status)
|
||||
|
||||
real_url = data.get('rtmp_url')+'/'+data.get('rtmp_live')
|
||||
real_url = data.get('rtmp_url') + '/' + data.get('rtmp_live')
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_url_ffmpeg(real_url, title, 'flv', None, output_dir = output_dir, merge = merge)
|
||||
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
site_info = "douyu.com"
|
||||
download = douyutv_download
|
||||
|
@ -1,7 +1,11 @@
|
||||
__all__ = ['embed_download']
|
||||
|
||||
import urllib.parse
|
||||
|
||||
from ..common import *
|
||||
|
||||
from .bilibili import bilibili_download
|
||||
from .dailymotion import dailymotion_download
|
||||
from .iqiyi import iqiyi_download_by_vid
|
||||
from .le import letvcloud_download_by_vu
|
||||
from .netease import netease_download
|
||||
@ -11,6 +15,8 @@ from .tudou import tudou_download_by_id
|
||||
from .vimeo import vimeo_download_by_id
|
||||
from .yinyuetai import yinyuetai_download_by_id
|
||||
from .youku import youku_download_by_vid
|
||||
from . import iqiyi
|
||||
from . import bokecc
|
||||
|
||||
"""
|
||||
refer to http://open.youku.com/tools
|
||||
@ -25,7 +31,7 @@ youku_embed_patterns = [ 'youku\.com/v_show/id_([a-zA-Z0-9=]+)',
|
||||
"""
|
||||
http://www.tudou.com/programs/view/html5embed.action?type=0&code=3LS_URGvl54&lcode=&resourceId=0_06_05_99
|
||||
"""
|
||||
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_]+)\&',
|
||||
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_-]+)\&',
|
||||
'www\.tudou\.com/v/([a-zA-Z0-9_-]+)/[^"]*v\.swf'
|
||||
]
|
||||
|
||||
@ -42,8 +48,26 @@ netease_embed_patterns = [ '(http://\w+\.163\.com/movie/[^\'"]+)' ]
|
||||
|
||||
vimeo_embed_patters = [ 'player\.vimeo\.com/video/(\d+)' ]
|
||||
|
||||
dailymotion_embed_patterns = [ 'www\.dailymotion\.com/embed/video/(\w+)' ]
|
||||
|
||||
def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwargs):
|
||||
"""
|
||||
check the share button on http://www.bilibili.com/video/av5079467/
|
||||
"""
|
||||
bilibili_embed_patterns = [ 'static\.hdslb\.com/miniloader\.swf.*aid=(\d+)' ]
|
||||
|
||||
|
||||
'''
|
||||
http://open.iqiyi.com/lib/player.html
|
||||
'''
|
||||
iqiyi_patterns = [r'(?:\"|\')(https?://dispatcher\.video\.qiyi\.com\/disp\/shareplayer\.swf\?.+?)(?:\"|\')',
|
||||
r'(?:\"|\')(https?://open\.iqiyi\.com\/developer\/player_js\/coopPlayerIndex\.html\?.+?)(?:\"|\')']
|
||||
|
||||
bokecc_patterns = [r'bokecc\.com/flash/pocle/player\.swf\?siteid=(.+?)&vid=(.{32})']
|
||||
|
||||
recur_limit = 3
|
||||
|
||||
|
||||
def embed_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
content = get_content(url, headers=fake_headers)
|
||||
found = False
|
||||
title = match1(content, '<title>([^<>]+)</title>')
|
||||
@ -51,35 +75,78 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
|
||||
vids = matchall(content, youku_embed_patterns)
|
||||
for vid in set(vids):
|
||||
found = True
|
||||
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
vids = matchall(content, tudou_embed_patterns)
|
||||
for vid in set(vids):
|
||||
found = True
|
||||
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
vids = matchall(content, yinyuetai_embed_patterns)
|
||||
for vid in vids:
|
||||
found = True
|
||||
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
vids = matchall(content, iqiyi_embed_patterns)
|
||||
for vid in vids:
|
||||
found = True
|
||||
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
urls = matchall(content, netease_embed_patterns)
|
||||
for url in urls:
|
||||
found = True
|
||||
netease_download(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
netease_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
urls = matchall(content, vimeo_embed_patters)
|
||||
for url in urls:
|
||||
found = True
|
||||
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only, referer=url, **kwargs)
|
||||
|
||||
if not found:
|
||||
urls = matchall(content, dailymotion_embed_patterns)
|
||||
for url in urls:
|
||||
found = True
|
||||
dailymotion_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
aids = matchall(content, bilibili_embed_patterns)
|
||||
for aid in aids:
|
||||
found = True
|
||||
url = 'http://www.bilibili.com/video/av%s/' % aid
|
||||
bilibili_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
iqiyi_urls = matchall(content, iqiyi_patterns)
|
||||
for url in iqiyi_urls:
|
||||
found = True
|
||||
iqiyi.download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
bokecc_metas = matchall(content, bokecc_patterns)
|
||||
for meta in bokecc_metas:
|
||||
found = True
|
||||
bokecc.bokecc_download_by_id(meta[1], output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
if found:
|
||||
return True
|
||||
|
||||
# Try harder, check all iframes
|
||||
if 'recur_lv' not in kwargs or kwargs['recur_lv'] < recur_limit:
|
||||
r = kwargs.get('recur_lv')
|
||||
if r is None:
|
||||
r = 1
|
||||
else:
|
||||
r += 1
|
||||
iframes = matchall(content, [r'<iframe.+?src=(?:\"|\')(.*?)(?:\"|\')'])
|
||||
for iframe in iframes:
|
||||
if not iframe.startswith('http'):
|
||||
src = urllib.parse.urljoin(url, iframe)
|
||||
else:
|
||||
src = iframe
|
||||
found = embed_download(src, output_dir=output_dir, merge=merge, info_only=info_only, recur_lv=r, **kwargs)
|
||||
if found:
|
||||
return True
|
||||
|
||||
if not found and 'recur_lv' not in kwargs:
|
||||
raise NotImplementedError(url)
|
||||
else:
|
||||
return found
|
||||
|
||||
site_info = "any.any"
|
||||
download = embed_download
|
||||
|
@ -6,16 +6,21 @@ from ..common import *
|
||||
import json
|
||||
|
||||
def facebook_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
url = re.sub(r'//.*?facebook.com','//facebook.com',url)
|
||||
html = get_html(url)
|
||||
|
||||
title = r1(r'<title id="pageTitle">(.+)</title>', html)
|
||||
|
||||
if title is None:
|
||||
title = url
|
||||
|
||||
sd_urls = list(set([
|
||||
unicodize(str.replace(i, '\\/', '/'))
|
||||
for i in re.findall(r'"sd_src_no_ratelimit":"([^"]*)"', html)
|
||||
for i in re.findall(r'sd_src_no_ratelimit:"([^"]*)"', html)
|
||||
]))
|
||||
hd_urls = list(set([
|
||||
unicodize(str.replace(i, '\\/', '/'))
|
||||
for i in re.findall(r'"hd_src_no_ratelimit":"([^"]*)"', html)
|
||||
for i in re.findall(r'hd_src_no_ratelimit:"([^"]*)"', html)
|
||||
]))
|
||||
urls = hd_urls if hd_urls else sd_urls
|
||||
|
||||
|
@ -1,39 +1,228 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['flickr_download']
|
||||
__all__ = ['flickr_download_main']
|
||||
|
||||
from ..common import *
|
||||
|
||||
def flickr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
page = get_html(url)
|
||||
title = match1(page, r'<meta property="og:title" content="([^"]*)"')
|
||||
photo_id = match1(page, r'"id":"([0-9]+)"')
|
||||
import json
|
||||
|
||||
try: # extract video
|
||||
html = get_html('https://secure.flickr.com/apps/video/video_mtl_xml.gne?photo_id=%s' % photo_id)
|
||||
node_id = match1(html, r'<Item id="id">(.+)</Item>')
|
||||
secret = match1(html, r'<Item id="photo_secret">(.+)</Item>')
|
||||
pattern_url_photoset = r'https?://www\.flickr\.com/photos/.+/(?:(?:sets)|(?:albums))?/([^/]+)'
|
||||
pattern_url_photostream = r'https?://www\.flickr\.com/photos/([^/]+)(?:/|(?:/page))?$'
|
||||
pattern_url_single_photo = r'https?://www\.flickr\.com/photos/[^/]+/(\d+)'
|
||||
pattern_url_gallery = r'https?://www\.flickr\.com/photos/[^/]+/galleries/(\d+)'
|
||||
pattern_url_group = r'https?://www\.flickr\.com/groups/([^/]+)'
|
||||
pattern_url_favorite = r'https?://www\.flickr\.com/photos/([^/]+)/favorites'
|
||||
|
||||
html = get_html('https://secure.flickr.com/video_playlist.gne?node_id=%s&secret=%s' % (node_id, secret))
|
||||
app = match1(html, r'APP="([^"]+)"')
|
||||
fullpath = unescape_html(match1(html, r'FULLPATH="([^"]+)"'))
|
||||
url = app + fullpath
|
||||
pattern_inline_title = r'<title>([^<]*)</title>'
|
||||
pattern_inline_api_key = r'api\.site_key\s*=\s*"([^"]+)"'
|
||||
pattern_inline_img_url = r'"url":"([^"]+)","key":"[^"]+"}}'
|
||||
pattern_inline_NSID = r'"nsid"\s*:\s*"([^"]+)"'
|
||||
pattern_inline_video_mark = r'("mediaType":"video")'
|
||||
|
||||
# (api_key, method, ext, page)
|
||||
tmpl_api_call = (
|
||||
'https://api.flickr.com/services/rest?'
|
||||
'&format=json&nojsoncallback=1'
|
||||
# UNCOMMENT FOR TESTING
|
||||
#'&per_page=5'
|
||||
'&per_page=500'
|
||||
# this parameter CANNOT take control of 'flickr.galleries.getPhotos'
|
||||
# though the doc said it should.
|
||||
# it's always considered to be 500
|
||||
'&api_key=%s'
|
||||
'&method=flickr.%s'
|
||||
'&extras=url_sq,url_q,url_t,url_s,url_n,url_m,url_z,url_c,url_l,url_h,url_k,url_o,media'
|
||||
'%s&page=%d'
|
||||
)
|
||||
|
||||
tmpl_api_call_video_info = (
|
||||
'https://api.flickr.com/services/rest?'
|
||||
'&format=json&nojsoncallback=1'
|
||||
'&method=flickr.video.getStreamInfo'
|
||||
'&api_key=%s'
|
||||
'&photo_id=%s'
|
||||
'&secret=%s'
|
||||
)
|
||||
|
||||
tmpl_api_call_photo_info = (
|
||||
'https://api.flickr.com/services/rest?'
|
||||
'&format=json&nojsoncallback=1'
|
||||
'&method=flickr.photos.getInfo'
|
||||
'&api_key=%s'
|
||||
'&photo_id=%s'
|
||||
)
|
||||
|
||||
# looks that flickr won't return urls for all sizes
|
||||
# we required in 'extras field without a acceptable header
|
||||
dummy_header = {
|
||||
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0'
|
||||
}
|
||||
def get_content_headered(url):
|
||||
return get_content(url, dummy_header)
|
||||
|
||||
def get_photoset_id(url, page):
|
||||
return match1(url, pattern_url_photoset)
|
||||
|
||||
def get_photo_id(url, page):
|
||||
return match1(url, pattern_url_single_photo)
|
||||
|
||||
def get_gallery_id(url, page):
|
||||
return match1(url, pattern_url_gallery)
|
||||
|
||||
def get_api_key(page):
|
||||
match = match1(page, pattern_inline_api_key)
|
||||
# this happens only when the url points to a gallery page
|
||||
# that contains no inline api_key(and never makes xhr api calls)
|
||||
# in fact this might be a better approch for getting a temporary api key
|
||||
# since there's no place for a user to add custom information that may
|
||||
# misguide the regex in the homepage
|
||||
if not match:
|
||||
return match1(get_html('https://flickr.com'), pattern_inline_api_key)
|
||||
return match
|
||||
|
||||
def get_NSID(url, page):
|
||||
return match1(page, pattern_inline_NSID)
|
||||
|
||||
# [
|
||||
# (
|
||||
# regex_match_url,
|
||||
# remote_api_method,
|
||||
# additional_query_parameter_for_method,
|
||||
# parser_for_additional_parameter,
|
||||
# field_where_photourls_are_saved
|
||||
# )
|
||||
# ]
|
||||
url_patterns = [
|
||||
# www.flickr.com/photos/{username|NSID}/sets|albums/{album-id}
|
||||
(
|
||||
pattern_url_photoset,
|
||||
'photosets.getPhotos',
|
||||
'photoset_id',
|
||||
get_photoset_id,
|
||||
'photoset'
|
||||
),
|
||||
# www.flickr.com/photos/{username|NSID}/{pageN}?
|
||||
(
|
||||
pattern_url_photostream,
|
||||
# according to flickr api documentation, this method needs to be
|
||||
# authenticated in order to filter photo visible to the calling user
|
||||
# but it seems works fine anonymously as well
|
||||
'people.getPhotos',
|
||||
'user_id',
|
||||
get_NSID,
|
||||
'photos'
|
||||
),
|
||||
# www.flickr.com/photos/{username|NSID}/galleries/{gallery-id}
|
||||
(
|
||||
pattern_url_gallery,
|
||||
'galleries.getPhotos',
|
||||
'gallery_id',
|
||||
get_gallery_id,
|
||||
'photos'
|
||||
),
|
||||
# www.flickr.com/groups/{groupname|groupNSID}/
|
||||
(
|
||||
pattern_url_group,
|
||||
'groups.pools.getPhotos',
|
||||
'group_id',
|
||||
get_NSID,
|
||||
'photos'
|
||||
),
|
||||
# www.flickr.com/photos/{username|NSID}/favorites/*
|
||||
(
|
||||
pattern_url_favorite,
|
||||
'favorites.getList',
|
||||
'user_id',
|
||||
get_NSID,
|
||||
'photos'
|
||||
)
|
||||
]
|
||||
|
||||
def flickr_download_main(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
urls = None
|
||||
size = 'o' # works for collections only
|
||||
title = None
|
||||
if 'stream_id' in kwargs:
|
||||
size = kwargs['stream_id']
|
||||
if match1(url, pattern_url_single_photo):
|
||||
url, title = get_single_photo_url(url)
|
||||
urls = [url]
|
||||
else:
|
||||
urls, title = fetch_photo_url_list(url, size)
|
||||
index = 0
|
||||
for url in urls:
|
||||
mime, ext, size = url_info(url)
|
||||
|
||||
print_info(site_info, title, mime, size)
|
||||
print_info('Flickr.com', title, mime, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge, faker=True)
|
||||
suffix = '[%d]' % index
|
||||
download_urls([url], title + suffix, ext, False, output_dir, None, False, False)
|
||||
index = index + 1
|
||||
|
||||
except: # extract images
|
||||
image = match1(page, r'<meta property="og:image" content="([^"]*)')
|
||||
ext = 'jpg'
|
||||
_, _, size = url_info(image)
|
||||
def fetch_photo_url_list(url, size):
|
||||
for pattern in url_patterns:
|
||||
# FIXME: fix multiple matching since the match group is dropped
|
||||
if match1(url, pattern[0]):
|
||||
return fetch_photo_url_list_impl(url, size, *pattern[1:])
|
||||
raise NotImplementedError('Flickr extractor is not supported for %s.' % url)
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls([image], title, ext, size, output_dir, merge=merge)
|
||||
def fetch_photo_url_list_impl(url, size, method, id_field, id_parse_func, collection_name):
|
||||
page = get_html(url)
|
||||
api_key = get_api_key(page)
|
||||
ext_field = ''
|
||||
if id_parse_func:
|
||||
ext_field = '&%s=%s' % (id_field, id_parse_func(url, page))
|
||||
page_number = 1
|
||||
urls = []
|
||||
while True:
|
||||
call_url = tmpl_api_call % (api_key, method, ext_field, page_number)
|
||||
photoset = json.loads(get_content_headered(call_url))[collection_name]
|
||||
pagen = photoset['page']
|
||||
pages = photoset['pages']
|
||||
for info in photoset['photo']:
|
||||
url = get_url_of_largest(info, api_key, size)
|
||||
urls.append(url)
|
||||
page_number = page_number + 1
|
||||
# the typeof 'page' and 'pages' may change in different methods
|
||||
if str(pagen) == str(pages):
|
||||
break
|
||||
return urls, match1(page, pattern_inline_title)
|
||||
|
||||
# image size suffixes used in inline json 'key' field
|
||||
# listed in descending order
|
||||
size_suffixes = ['o', 'k', 'h', 'l', 'c', 'z', 'm', 'n', 's', 't', 'q', 'sq']
|
||||
|
||||
def get_orig_video_source(api_key, pid, secret):
|
||||
parsed = json.loads(get_content_headered(tmpl_api_call_video_info % (api_key, pid, secret)))
|
||||
for stream in parsed['streams']['stream']:
|
||||
if stream['type'] == 'orig':
|
||||
return stream['_content'].replace('\\', '')
|
||||
return None
|
||||
|
||||
def get_url_of_largest(info, api_key, size):
|
||||
if info['media'] == 'photo':
|
||||
sizes = size_suffixes
|
||||
if size in sizes:
|
||||
sizes = sizes[sizes.index(size):]
|
||||
for suffix in sizes:
|
||||
if 'url_' + suffix in info:
|
||||
return info['url_' + suffix].replace('\\', '')
|
||||
return None
|
||||
else:
|
||||
return get_orig_video_source(api_key, info['id'], info['secret'])
|
||||
|
||||
def get_single_photo_url(url):
|
||||
page = get_html(url)
|
||||
pid = get_photo_id(url, page)
|
||||
title = match1(page, pattern_inline_title)
|
||||
if match1(page, pattern_inline_video_mark):
|
||||
api_key = get_api_key(page)
|
||||
reply = get_content(tmpl_api_call_photo_info % (api_key, get_photo_id(url, page)))
|
||||
secret = json.loads(reply)['photo']['secret']
|
||||
return get_orig_video_source(api_key, pid, secret), title
|
||||
#last match always has the best resolution
|
||||
match = match1(page, pattern_inline_img_url)
|
||||
return 'https:' + match.replace('\\', ''), title
|
||||
|
||||
site_info = "Flickr.com"
|
||||
download = flickr_download
|
||||
download_playlist = playlist_not_supported('flickr')
|
||||
download = flickr_download_main
|
||||
download_playlist = playlist_not_supported('flickr');
|
||||
|
@ -1,150 +1,223 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import json
|
||||
import urllib.parse
|
||||
import base64
|
||||
import binascii
|
||||
import re
|
||||
|
||||
from ..extractors import VideoExtractor
|
||||
from ..util import log
|
||||
from ..common import get_content, playlist_not_supported
|
||||
|
||||
__all__ = ['funshion_download']
|
||||
|
||||
from ..common import *
|
||||
import urllib.error
|
||||
import json
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
""""""
|
||||
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url): #single video
|
||||
funshion_download_by_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif re.match(r'http://www.fun.tv/vplay/.*g-(\w+)', url): #whole drama
|
||||
funshion_download_by_drama_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
class KBaseMapping:
|
||||
def __init__(self, base=62):
|
||||
self.base = base
|
||||
mapping_table = [str(num) for num in range(10)]
|
||||
for i in range(26):
|
||||
mapping_table.append(chr(i + ord('a')))
|
||||
for i in range(26):
|
||||
mapping_table.append(chr(i + ord('A')))
|
||||
|
||||
self.mapping_table = mapping_table[:self.base]
|
||||
|
||||
def mapping(self, num):
|
||||
res = []
|
||||
while num > 0:
|
||||
res.append(self.mapping_table[num % self.base])
|
||||
num = num // self.base
|
||||
return ''.join(res[::-1])
|
||||
|
||||
|
||||
class Funshion(VideoExtractor):
|
||||
name = "funshion"
|
||||
stream_types = [
|
||||
{'id': 'sdvd'},
|
||||
{'id': 'sdvd_h265'},
|
||||
{'id': 'hd'},
|
||||
{'id': 'hd_h265'},
|
||||
{'id': 'dvd'},
|
||||
{'id': 'dvd_h265'},
|
||||
{'id': 'tv'},
|
||||
{'id': 'tv_h265'}
|
||||
]
|
||||
a_mobile_url = 'http://m.fun.tv/implay/?mid=302555'
|
||||
video_ep = 'http://pv.funshion.com/v7/video/play/?id={}&cl=mweb&uc=111'
|
||||
media_ep = 'http://pm.funshion.com/v7/media/play/?id={}&cl=mweb&uc=111'
|
||||
coeff = None
|
||||
|
||||
@classmethod
|
||||
def fetch_magic(cls, url):
|
||||
def search_dict(a_dict, target):
|
||||
for key, val in a_dict.items():
|
||||
if val == target:
|
||||
return key
|
||||
|
||||
magic_list = []
|
||||
|
||||
page = get_content(url)
|
||||
src = re.findall(r'src="(.+?)"', page)
|
||||
js = [path for path in src if path.endswith('.js')]
|
||||
|
||||
host = 'http://' + urllib.parse.urlparse(url).netloc
|
||||
js_path = [urllib.parse.urljoin(host, rel_path) for rel_path in js]
|
||||
|
||||
for p in js_path:
|
||||
if 'mtool' in p or 'mcore' in p:
|
||||
js_text = get_content(p)
|
||||
hit = re.search(r'\(\'(.+?)\',(\d+),(\d+),\'(.+?)\'\.split\(\'\|\'\),\d+,\{\}\)', js_text)
|
||||
|
||||
code = hit.group(1)
|
||||
base = hit.group(2)
|
||||
size = hit.group(3)
|
||||
names = hit.group(4).split('|')
|
||||
|
||||
mapping = KBaseMapping(base=int(base))
|
||||
sym_to_name = {}
|
||||
for no in range(int(size), 0, -1):
|
||||
no_in_base = mapping.mapping(no)
|
||||
val = names[no] if no < len(names) and names[no] else no_in_base
|
||||
sym_to_name[no_in_base] = val
|
||||
|
||||
moz_ec_name = search_dict(sym_to_name, 'mozEcName')
|
||||
push = search_dict(sym_to_name, 'push')
|
||||
patt = '{}\.{}\("(.+?)"\)'.format(moz_ec_name, push)
|
||||
ec_list = re.findall(patt, code)
|
||||
[magic_list.append(sym_to_name[ec]) for ec in ec_list]
|
||||
return magic_list
|
||||
|
||||
@classmethod
|
||||
def get_coeff(cls, magic_list):
|
||||
magic_set = set(magic_list)
|
||||
no_dup = []
|
||||
for item in magic_list:
|
||||
if item in magic_set:
|
||||
magic_set.remove(item)
|
||||
no_dup.append(item)
|
||||
# really necessary?
|
||||
|
||||
coeff = [0, 0, 0, 0]
|
||||
for num_pair in no_dup:
|
||||
idx = int(num_pair[-1])
|
||||
val = int(num_pair[:-1], 16)
|
||||
coeff[idx] = val
|
||||
|
||||
return coeff
|
||||
|
||||
@classmethod
|
||||
def funshion_decrypt(cls, a_bytes, coeff):
|
||||
res_list = []
|
||||
pos = 0
|
||||
while pos < len(a_bytes):
|
||||
a = a_bytes[pos]
|
||||
if pos == len(a_bytes) - 1:
|
||||
res_list.append(a)
|
||||
pos += 1
|
||||
else:
|
||||
return
|
||||
b = a_bytes[pos + 1]
|
||||
m = a * coeff[0] + b * coeff[2]
|
||||
n = a * coeff[1] + b * coeff[3]
|
||||
res_list.append(m & 0xff)
|
||||
res_list.append(n & 0xff)
|
||||
pos += 2
|
||||
return bytes(res_list).decode('utf8')
|
||||
|
||||
# Logics for single video until drama
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_download_by_url(url, output_dir = '.', merge = False, info_only = False):
|
||||
"""lots of stuff->None
|
||||
Main wrapper for single video download.
|
||||
"""
|
||||
@classmethod
|
||||
def funshion_decrypt_str(cls, a_str, coeff):
|
||||
# r'.{27}0' pattern, untested
|
||||
if len(a_str) == 28 and a_str[-1] == '0':
|
||||
data_bytes = base64.b64decode(a_str[:27] + '=')
|
||||
clear = cls.funshion_decrypt(data_bytes, coeff)
|
||||
return binascii.hexlify(clear.encode('utf8')).upper()
|
||||
|
||||
data_bytes = base64.b64decode(a_str[2:])
|
||||
return cls.funshion_decrypt(data_bytes, coeff)
|
||||
|
||||
@classmethod
|
||||
def checksum(cls, sha1_str):
|
||||
if len(sha1_str) != 41:
|
||||
return False
|
||||
if not re.match(r'[0-9A-Za-z]{41}', sha1_str):
|
||||
return False
|
||||
sha1 = sha1_str[:-1]
|
||||
if (15 & sum([int(char, 16) for char in sha1])) == int(sha1_str[-1], 16):
|
||||
return True
|
||||
return False
|
||||
|
||||
@classmethod
|
||||
def get_cdninfo(cls, hashid):
|
||||
url = 'http://jobsfe.funshion.com/query/v1/mp4/{}.json'.format(hashid)
|
||||
meta = json.loads(get_content(url, decoded=False).decode('utf8'))
|
||||
return meta['playlist'][0]['urls']
|
||||
|
||||
@classmethod
|
||||
def dec_playinfo(cls, info, coeff):
|
||||
res = None
|
||||
clear = cls.funshion_decrypt_str(info['infohash'], coeff)
|
||||
if cls.checksum(clear):
|
||||
res = dict(hashid=clear[:40], token=cls.funshion_decrypt_str(info['token'], coeff))
|
||||
else:
|
||||
clear = cls.funshion_decrypt_str(info['infohash_prev'], coeff)
|
||||
if cls.checksum(clear):
|
||||
res = dict(hashid=clear[:40], token=cls.funshion_decrypt_str(info['token_prev'], coeff))
|
||||
return res
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
if self.__class__.coeff is None:
|
||||
magic_list = self.__class__.fetch_magic(self.__class__.a_mobile_url)
|
||||
self.__class__.coeff = self.__class__.get_coeff(magic_list)
|
||||
|
||||
if 'title' not in kwargs:
|
||||
url = 'http://pv.funshion.com/v5/video/profile/?id={}&cl=mweb&uc=111'.format(self.vid)
|
||||
meta = json.loads(get_content(url))
|
||||
self.title = meta['name']
|
||||
else:
|
||||
self.title = kwargs['title']
|
||||
|
||||
ep_url = self.__class__.video_ep if 'single_video' in kwargs else self.__class__.media_ep
|
||||
|
||||
url = ep_url.format(self.vid)
|
||||
meta = json.loads(get_content(url))
|
||||
streams = meta['playlist']
|
||||
for stream in streams:
|
||||
definition = stream['code']
|
||||
for s in stream['playinfo']:
|
||||
codec = 'h' + s['codec'][2:]
|
||||
# h.264 -> h264
|
||||
for st in self.__class__.stream_types:
|
||||
s_id = '{}_{}'.format(definition, codec)
|
||||
if codec == 'h264':
|
||||
s_id = definition
|
||||
if s_id == st['id']:
|
||||
clear_info = self.__class__.dec_playinfo(s, self.__class__.coeff)
|
||||
cdn_list = self.__class__.get_cdninfo(clear_info['hashid'])
|
||||
base_url = cdn_list[0]
|
||||
vf = urllib.parse.quote(s['vf'])
|
||||
video_size = int(s['filesize'])
|
||||
token = urllib.parse.quote(base64.b64encode(clear_info['token'].encode('utf8')))
|
||||
video_url = '{}?token={}&vf={}'.format(base_url, token, vf)
|
||||
self.streams[s_id] = dict(size=video_size, src=[video_url], container='mp4')
|
||||
|
||||
|
||||
def funshion_download(url, **kwargs):
|
||||
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url):
|
||||
match = re.search(r'http://www.fun.tv/vplay/v-(\d+)(.?)', url)
|
||||
vid = match.group(1)
|
||||
funshion_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
vid = re.search(r'http://www.fun.tv/vplay/v-(\w+)', url).group(1)
|
||||
Funshion().download_by_vid(vid, single_video=True, **kwargs)
|
||||
elif re.match(r'http://www.fun.tv/vplay/.*g-(\w+)', url):
|
||||
epid = re.search(r'http://www.fun.tv/vplay/.*g-(\w+)', url).group(1)
|
||||
url = 'http://pm.funshion.com/v5/media/episode?id={}&cl=mweb&uc=111'.format(epid)
|
||||
meta = json.loads(get_content(url))
|
||||
drama_name = meta['name']
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_download_by_vid(vid, output_dir = '.', merge = False, info_only = False):
|
||||
"""vid->None
|
||||
Secondary wrapper for single video download.
|
||||
"""
|
||||
title = funshion_get_title_by_vid(vid)
|
||||
url_list = funshion_vid_to_urls(vid)
|
||||
|
||||
for url in url_list:
|
||||
type, ext, size = url_info(url)
|
||||
print_info(site_info, title, type, size)
|
||||
|
||||
if not info_only:
|
||||
download_urls(url_list, title, ext, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_get_title_by_vid(vid):
|
||||
"""vid->str
|
||||
Single video vid to title."""
|
||||
html = get_content('http://pv.funshion.com/v5/video/profile?id={vid}&cl=aphone&uc=5'.format(vid = vid))
|
||||
c = json.loads(html)
|
||||
return c['name']
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_vid_to_urls(vid):
|
||||
"""str->str
|
||||
Select one resolution for single video download."""
|
||||
html = get_content('http://pv.funshion.com/v5/video/play/?id={vid}&cl=aphone&uc=5'.format(vid = vid))
|
||||
return select_url_from_video_api(html)
|
||||
|
||||
#Logics for drama until helper functions
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_download_by_drama_url(url, output_dir = '.', merge = False, info_only = False):
|
||||
"""str->None
|
||||
url = 'http://www.fun.tv/vplay/g-95785/'
|
||||
"""
|
||||
id = r1(r'http://www.fun.tv/vplay/.*g-(\d+)', url)
|
||||
video_list = funshion_drama_id_to_vid(id)
|
||||
|
||||
for video in video_list:
|
||||
funshion_download_by_id((video[0], id), output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
# id is for drama, vid not the same as the ones used in single video
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_download_by_id(vid_id_tuple, output_dir = '.', merge = False, info_only = False):
|
||||
"""single_episode_id, drama_id->None
|
||||
Secondary wrapper for single drama video download.
|
||||
"""
|
||||
(vid, id) = vid_id_tuple
|
||||
title = funshion_get_title_by_id(vid, id)
|
||||
url_list = funshion_id_to_urls(vid)
|
||||
|
||||
for url in url_list:
|
||||
type, ext, size = url_info(url)
|
||||
print_info(site_info, title, type, size)
|
||||
|
||||
if not info_only:
|
||||
download_urls(url_list, title, ext, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_drama_id_to_vid(episode_id):
|
||||
"""int->[(int,int),...]
|
||||
id: 95785
|
||||
->[('626464', '1'), ('626466', '2'), ('626468', '3'),...
|
||||
Drama ID to vids used in drama.
|
||||
|
||||
**THIS VID IS NOT THE SAME WITH THE ONES USED IN SINGLE VIDEO!!**
|
||||
"""
|
||||
html = get_content('http://pm.funshion.com/v5/media/episode?id={episode_id}&cl=aphone&uc=5'.format(episode_id = episode_id))
|
||||
c = json.loads(html)
|
||||
#{'definition': [{'name': '流畅', 'code': 'tv'}, {'name': '标清', 'code': 'dvd'}, {'name': '高清', 'code': 'hd'}], 'retmsg': 'ok', 'total': '32', 'sort': '1', 'prevues': [], 'retcode': '200', 'cid': '2', 'template': 'grid', 'episodes': [{'num': '1', 'id': '624728', 'still': None, 'name': '第1集', 'duration': '45:55'}, ], 'name': '太行山上', 'share': 'http://pm.funshion.com/v5/media/share?id=201554&num=', 'media': '201554'}
|
||||
return [(i['id'], i['num']) for i in c['episodes']]
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_id_to_urls(id):
|
||||
"""int->list of URL
|
||||
Select video URL for single drama video.
|
||||
"""
|
||||
html = get_content('http://pm.funshion.com/v5/media/play/?id={id}&cl=aphone&uc=5'.format(id = id))
|
||||
return select_url_from_video_api(html)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def funshion_get_title_by_id(single_episode_id, drama_id):
|
||||
"""single_episode_id, drama_id->str
|
||||
This is for full drama.
|
||||
Get title for single drama video."""
|
||||
html = get_content('http://pm.funshion.com/v5/media/episode?id={id}&cl=aphone&uc=5'.format(id = drama_id))
|
||||
c = json.loads(html)
|
||||
|
||||
for i in c['episodes']:
|
||||
if i['id'] == str(single_episode_id):
|
||||
return c['name'] + ' - ' + i['name']
|
||||
|
||||
# Helper functions.
|
||||
#----------------------------------------------------------------------
|
||||
def select_url_from_video_api(html):
|
||||
"""str(html)->str(url)
|
||||
|
||||
Choose the best one.
|
||||
|
||||
Used in both single and drama download.
|
||||
|
||||
code definition:
|
||||
{'tv': 'liuchang',
|
||||
'dvd': 'biaoqing',
|
||||
'hd': 'gaoqing',
|
||||
'sdvd': 'chaoqing'}"""
|
||||
c = json.loads(html)
|
||||
#{'retmsg': 'ok', 'retcode': '200', 'selected': 'tv', 'mp4': [{'filename': '', 'http': 'http://jobsfe.funshion.com/query/v1/mp4/7FCD71C58EBD4336DF99787A63045A8F3016EC51.json', 'filesize': '96748671', 'code': 'tv', 'name': '流畅', 'infohash': '7FCD71C58EBD4336DF99787A63045A8F3016EC51'}...], 'episode': '626464'}
|
||||
video_dic = {}
|
||||
for i in c['mp4']:
|
||||
video_dic[i['code']] = i['http']
|
||||
quality_preference_list = ['sdvd', 'hd', 'dvd', 'sd']
|
||||
url = [video_dic[quality] for quality in quality_preference_list if quality in video_dic][0]
|
||||
html = get_html(url)
|
||||
c = json.loads(html)
|
||||
#'{"return":"succ","client":{"ip":"107.191.**.**","sp":"0","loc":"0"},"playlist":[{"bits":"1638400","tname":"dvd","size":"555811243","urls":["http:\\/\\/61.155.217.4:80\\/play\\/1E070CE31DAA1373B667FD23AA5397C192CA6F7F.mp4",...]}]}'
|
||||
return [i['urls'][0] for i in c['playlist']]
|
||||
extractor = Funshion()
|
||||
for ep in meta['episodes']:
|
||||
title = '{}_{}_{}'.format(drama_name, ep['num'], ep['name'])
|
||||
extractor.download_by_vid(ep['id'], title=title, **kwargs)
|
||||
else:
|
||||
log.wtf('Unknown url pattern')
|
||||
|
||||
site_info = "funshion"
|
||||
download = funshion_download
|
||||
|
33
src/you_get/extractors/giphy.py
Normal file
33
src/you_get/extractors/giphy.py
Normal file
@ -0,0 +1,33 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['giphy_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
|
||||
def giphy_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url)
|
||||
|
||||
url = list(set([
|
||||
unicodize(str.replace(i, '\\/', '/'))
|
||||
for i in re.findall(r'<meta property="og:video:secure_url" content="(.*?)">', html)
|
||||
]))
|
||||
|
||||
title = r1(r'<meta property="og:title" content="(.*?)">', html)
|
||||
|
||||
if title is None:
|
||||
title = url[0]
|
||||
|
||||
type, ext, size = url_info(url[0], True)
|
||||
size = urls_size(url)
|
||||
|
||||
type = "video/mp4"
|
||||
ext = "mp4"
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls(url, title, ext, size, output_dir, merge=False)
|
||||
|
||||
site_info = "Giphy.com"
|
||||
download = giphy_download
|
||||
download_playlist = playlist_not_supported('giphy')
|
@ -51,7 +51,7 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
|
||||
# attempt to extract images first
|
||||
# TBD: posts with > 4 images
|
||||
# TBD: album links
|
||||
html = get_html(parse.unquote(url))
|
||||
html = get_html(parse.unquote(url), faker=True)
|
||||
real_urls = []
|
||||
for src in re.findall(r'src="([^"]+)"[^>]*itemprop="image"', html):
|
||||
t = src.split('/')
|
||||
@ -59,14 +59,15 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
|
||||
u = '/'.join(t)
|
||||
real_urls.append(u)
|
||||
if not real_urls:
|
||||
real_urls = [r1(r'<meta property="og:image" content="([^"]+)', html)]
|
||||
post_date = r1(r'"(20\d\d-[01]\d-[0123]\d)"', html)
|
||||
real_urls = re.findall(r'<meta property="og:image" content="([^"]+)', html)
|
||||
real_urls = [re.sub(r'w\d+-h\d+-p', 's0', u) for u in real_urls]
|
||||
post_date = r1(r'"?(20\d\d[-/]?[01]\d[-/]?[0123]\d)"?', html)
|
||||
post_id = r1(r'/posts/([^"]+)', html)
|
||||
title = post_date + "_" + post_id
|
||||
|
||||
try:
|
||||
url = "https://plus.google.com/" + r1(r'"(photos/\d+/albums/\d+/\d+)', html)
|
||||
html = get_html(url)
|
||||
url = "https://plus.google.com/" + r1(r'(photos/\d+/albums/\d+/\d+)\?authkey', html)
|
||||
html = get_html(url, faker=True)
|
||||
temp = re.findall(r'\[(\d+),\d+,\d+,"([^"]+)"\]', html)
|
||||
temp = sorted(temp, key = lambda x : fmt_level[x[0]])
|
||||
urls = [unicodize(i[1]) for i in temp if i[0] == temp[0][0]]
|
||||
@ -77,7 +78,7 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
|
||||
post_author = r1(r'/\+([^/]+)/posts', post_url)
|
||||
if post_author:
|
||||
post_url = "https://plus.google.com/+%s/posts/%s" % (parse.quote(post_author), r1(r'posts/(.+)', post_url))
|
||||
post_html = get_html(post_url)
|
||||
post_html = get_html(post_url, faker=True)
|
||||
title = r1(r'<title[^>]*>([^<\n]+)', post_html)
|
||||
|
||||
if title is None:
|
||||
@ -98,20 +99,34 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
|
||||
|
||||
elif service in ['docs', 'drive'] : # Google Docs
|
||||
|
||||
html = get_html(url)
|
||||
html = get_content(url, headers=fake_headers)
|
||||
|
||||
title = r1(r'"title":"([^"]*)"', html) or r1(r'<meta itemprop="name" content="([^"]*)"', html)
|
||||
if len(title.split('.')) > 1:
|
||||
title = ".".join(title.split('.')[:-1])
|
||||
|
||||
docid = r1(r'"docid":"([^"]*)"', html)
|
||||
docid = r1('/file/d/([^/]+)', url)
|
||||
|
||||
request.install_opener(request.build_opener(request.HTTPCookieProcessor()))
|
||||
|
||||
request.urlopen(request.Request("https://docs.google.com/uc?id=%s&export=download" % docid))
|
||||
real_url ="https://docs.google.com/uc?export=download&confirm=no_antivirus&id=%s" % docid
|
||||
|
||||
type, ext, size = url_info(real_url)
|
||||
real_url = "https://docs.google.com/uc?export=download&confirm=no_antivirus&id=%s" % docid
|
||||
redirected_url = get_location(real_url)
|
||||
if real_url != redirected_url:
|
||||
# tiny file - get real url here
|
||||
type, ext, size = url_info(redirected_url)
|
||||
real_url = redirected_url
|
||||
else:
|
||||
# huge file - the real_url is a confirm page and real url is in it
|
||||
confirm_page = get_content(real_url)
|
||||
hrefs = re.findall(r'href="(.+?)"', confirm_page)
|
||||
for u in hrefs:
|
||||
if u.startswith('/uc?export=download'):
|
||||
rel = unescape_html(u)
|
||||
confirm_url = 'https://docs.google.com' + rel
|
||||
real_url = get_location(confirm_url)
|
||||
_, ext, size = url_info(real_url, headers=fake_headers)
|
||||
if size is None:
|
||||
size = 0
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
|
@ -1,85 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import math
|
||||
import traceback
|
||||
import urllib.parse as urlparse
|
||||
|
||||
from ..common import *
|
||||
|
||||
__all__ = ['huaban_download']
|
||||
|
||||
site_info = '花瓣 (Huaban)'
|
||||
|
||||
LIMIT = 100
|
||||
|
||||
|
||||
class Board:
|
||||
def __init__(self, title, pins):
|
||||
self.title = title
|
||||
self.pins = pins
|
||||
self.pin_count = len(pins)
|
||||
|
||||
|
||||
class Pin:
|
||||
host = 'http://img.hb.aicdn.com/'
|
||||
|
||||
def __init__(self, pin_json):
|
||||
img_file = pin_json['file']
|
||||
self.id = str(pin_json['pin_id'])
|
||||
self.url = urlparse.urljoin(self.host, img_file['key'])
|
||||
self.ext = img_file['type'].split('/')[-1]
|
||||
|
||||
|
||||
def construct_url(url, **params):
|
||||
param_str = urlparse.urlencode(params)
|
||||
return url + '?' + param_str
|
||||
|
||||
|
||||
def extract_json_data(url, **params):
|
||||
url = construct_url(url, **params)
|
||||
html = get_content(url, headers=fake_headers)
|
||||
json_string = match1(html, r'app.page\["board"\] = (.*?});')
|
||||
json_data = json.loads(json_string)
|
||||
return json_data
|
||||
|
||||
|
||||
def extract_board_data(url):
|
||||
json_data = extract_json_data(url, limit=LIMIT)
|
||||
pin_list = json_data['pins']
|
||||
title = json_data['title']
|
||||
pin_count = json_data['pin_count']
|
||||
pin_count -= len(pin_list)
|
||||
|
||||
while pin_count > 0:
|
||||
json_data = extract_json_data(url, max=pin_list[-1]['pin_id'],
|
||||
limit=LIMIT)
|
||||
pins = json_data['pins']
|
||||
pin_list += pins
|
||||
pin_count -= len(pins)
|
||||
|
||||
return Board(title, list(map(Pin, pin_list)))
|
||||
|
||||
|
||||
def huaban_download_board(url, output_dir, **kwargs):
|
||||
kwargs['merge'] = False
|
||||
board = extract_board_data(url)
|
||||
output_dir = os.path.join(output_dir, board.title)
|
||||
print_info(site_info, board.title, 'jpg', float('Inf'))
|
||||
for pin in board.pins:
|
||||
download_urls([pin.url], pin.id, pin.ext, float('Inf'),
|
||||
output_dir=output_dir, faker=True, **kwargs)
|
||||
|
||||
|
||||
def huaban_download(url, output_dir='.', **kwargs):
|
||||
if re.match(r'http://huaban\.com/boards/\d+/', url):
|
||||
huaban_download_board(url, output_dir, **kwargs)
|
||||
else:
|
||||
print('Only board (画板) pages are supported currently')
|
||||
print('ex: http://huaban.com/boards/12345678/')
|
||||
|
||||
|
||||
download = huaban_download
|
||||
download_playlist = playlist_not_supported("huaban")
|
@ -6,7 +6,7 @@ from ..common import *
|
||||
|
||||
|
||||
def get_mobile_room_url(room_id):
|
||||
return 'http://www.huomao.com/mobile/mob_live?cid=%s' % room_id
|
||||
return 'http://www.huomao.com/mobile/mob_live/%s' % room_id
|
||||
|
||||
|
||||
def get_m3u8_url(stream_id):
|
||||
|
396
src/you_get/extractors/icourses.py
Normal file
396
src/you_get/extractors/icourses.py
Normal file
@ -0,0 +1,396 @@
|
||||
#!/usr/bin/env python
|
||||
from ..common import *
|
||||
from urllib import parse, error
|
||||
import random
|
||||
from time import sleep
|
||||
import datetime
|
||||
import hashlib
|
||||
import base64
|
||||
import logging
|
||||
import re
|
||||
from xml.dom.minidom import parseString
|
||||
|
||||
__all__ = ['icourses_download', 'icourses_playlist_download']
|
||||
|
||||
|
||||
def icourses_download(url, output_dir='.', **kwargs):
|
||||
if 'showResDetail.action' in url:
|
||||
hit = re.search(r'id=(\d+)&courseId=(\d+)', url)
|
||||
url = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}'.format(hit.group(1), hit.group(2))
|
||||
if re.match(r'http://www.icourses.cn/coursestatic/course_(\d+).html', url):
|
||||
raise Exception('You can download it with -l flag')
|
||||
icourses_parser = ICousesExactor(url=url)
|
||||
icourses_parser.basic_extract()
|
||||
title = icourses_parser.title
|
||||
size = None
|
||||
for i in range(5):
|
||||
try:
|
||||
# use this url only for size
|
||||
size_url = icourses_parser.generate_url(0)
|
||||
_, type_, size = url_info(size_url, headers=fake_headers)
|
||||
except error.HTTPError:
|
||||
logging.warning('Failed to fetch the video file! Retrying...')
|
||||
sleep(random.Random().randint(2, 5)) # Prevent from blockage
|
||||
else:
|
||||
print_info(site_info, title, type_, size)
|
||||
break
|
||||
|
||||
if size is None:
|
||||
raise Exception("Failed")
|
||||
|
||||
if not kwargs['info_only']:
|
||||
real_url = icourses_parser.update_url(0)
|
||||
headers = fake_headers.copy()
|
||||
headers['Referer'] = url
|
||||
download_urls_icourses(real_url, title, 'flv',total_size=size, output_dir=output_dir, max_size=15728640, dyn_callback=icourses_parser.update_url)
|
||||
return
|
||||
|
||||
|
||||
def get_course_title(url, course_type, page=None):
|
||||
if page is None:
|
||||
try:
|
||||
# shard course page could be gbk but with charset="utf-8"
|
||||
page = get_content(url, decoded=False).decode('gbk')
|
||||
except UnicodeDecodeError:
|
||||
page = get_content(url, decoded=False).decode('utf8')
|
||||
|
||||
if course_type == 'shared_old':
|
||||
patt = r'<div\s+class="top_left_til">(.+?)<\/div>'
|
||||
elif course_type == 'shared_new':
|
||||
patt = r'<h1>(.+?)<\/h1>'
|
||||
else:
|
||||
patt = r'<div\s+class="con">(.+?)<\/div>'
|
||||
|
||||
return re.search(patt, page).group(1)
|
||||
|
||||
|
||||
def public_course_playlist(url, page=None):
|
||||
host = 'http://www.icourses.cn/'
|
||||
patt = r'<a href="(.+?)"\s*title="(.+?)".+?>(?:.|\n)+?</a>'
|
||||
|
||||
if page is None:
|
||||
page = get_content(url)
|
||||
playlist = re.findall(patt, page)
|
||||
return [(host+i[0], i[1]) for i in playlist]
|
||||
|
||||
|
||||
def public_course_get_title(url, page=None):
|
||||
patt = r'<div\s*class="kcslbut">.+?第(\d+)讲'
|
||||
|
||||
if page is None:
|
||||
page = get_content(url)
|
||||
seq_num = int(re.search(patt, page).group(1)) - 1
|
||||
course_main_title = get_course_title(url, 'public', page)
|
||||
return '{}_第{}讲_{}'.format(course_main_title, seq_num+1, public_course_playlist(url, page)[seq_num][1])
|
||||
|
||||
|
||||
def icourses_playlist_download(url, output_dir='.', **kwargs):
|
||||
page_type_patt = r'showSectionNode\(this,(\d+),(\d+)\)'
|
||||
resid_courseid_patt = r'changeforvideo\(\'(\d+)\',\'(\d+)\',\'(\d+)\'\)'
|
||||
ep = 'http://www.icourses.cn/jpk/viewCharacterDetail.action?sectionId={}&courseId={}'
|
||||
change_for_video_ip = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}'
|
||||
video_list = []
|
||||
|
||||
if 'viewVCourse' in url:
|
||||
playlist = public_course_playlist(url)
|
||||
for video in playlist:
|
||||
icourses_download(video[0], output_dir=output_dir, **kwargs)
|
||||
return
|
||||
elif 'coursestatic' in url:
|
||||
course_page = get_content(url)
|
||||
page_navi_vars = re.search(page_type_patt, course_page)
|
||||
|
||||
if page_navi_vars is None: # type 2 shared course
|
||||
video_list = icourses_playlist_new(url, course_page)
|
||||
else: # type 1 shared course
|
||||
sec_page = get_content(ep.format(page_navi_vars.group(2), page_navi_vars.group(1)))
|
||||
video_list = re.findall(resid_courseid_patt, sec_page)
|
||||
elif 'viewCharacterDetail.action' in url or 'changeforVideo.action' in url:
|
||||
page = get_content(url)
|
||||
video_list = re.findall(resid_courseid_patt, page)
|
||||
|
||||
if not video_list:
|
||||
raise Exception('Unknown url pattern')
|
||||
|
||||
for video in video_list:
|
||||
video_url = change_for_video_ip.format(video[0], video[1])
|
||||
sleep(random.Random().randint(0, 5)) # Prevent from blockage
|
||||
icourses_download(video_url, output_dir=output_dir, **kwargs)
|
||||
|
||||
|
||||
def icourses_playlist_new(url, page=None):
|
||||
# 2 helpers using same interface in the js code
|
||||
def to_chap(course_id, chap_id, mod):
|
||||
ep = 'http://www.icourses.cn/jpk/viewCharacterDetail2.action?courseId={}&characId={}&mod={}'
|
||||
req = post_content(ep.format(course_id, chap_id, mod), post_data={})
|
||||
return req
|
||||
|
||||
def to_sec(course_id, chap_id, mod):
|
||||
ep = 'http://www.icourses.cn/jpk/viewCharacterDetail2.action?courseId={}&characId={}&mod={}'
|
||||
req = post_content(ep.format(course_id, chap_id, mod), post_data={})
|
||||
return req
|
||||
|
||||
def show_sec(course_id, chap_id):
|
||||
ep = 'http://www.icourses.cn/jpk/getSectionNode.action?courseId={}&characId={}&mod=2'
|
||||
req = post_content(ep.format(course_id, chap_id), post_data={})
|
||||
return req
|
||||
|
||||
if page is None:
|
||||
page = get_content(url)
|
||||
chap_patt = r'<h3>.+?id="parent_row_(\d+)".+?onclick="(\w+)\((.+)\)"'
|
||||
to_chap_patt = r'this,(\d+),(\d+),(\d)'
|
||||
show_sec_patt = r'this,(\d+),(\d+)'
|
||||
res_patt = r'res_showResDetail\(\'(\d+)\',\'.+?\',\'\d+\',\'mp4\',\'(\d+)\'\)'
|
||||
l = re.findall(chap_patt, page)
|
||||
for i in l:
|
||||
if i[1] == 'ajaxtocharac':
|
||||
hit = re.search(to_chap_patt, i[2])
|
||||
page = to_chap(hit.group(1), hit.group(2), hit.group(3))
|
||||
hit_list = re.findall(res_patt, page)
|
||||
if hit_list:
|
||||
return get_playlist(hit_list[0][0], hit_list[0][1])
|
||||
for hit in hit_list:
|
||||
print(hit)
|
||||
elif i[1] == 'showSectionNode2':
|
||||
hit = re.search(show_sec_patt, i[2])
|
||||
page = show_sec(hit.group(1), hit.group(2))
|
||||
# print(page)
|
||||
patt = r'ajaxtosection\(this,(\d+),(\d+),(\d+)\)'
|
||||
hit_list = re.findall(patt, page)
|
||||
# print(hit_list)
|
||||
for hit in hit_list:
|
||||
page = to_sec(hit[0], hit[1], hit[2])
|
||||
vlist = re.findall(res_patt, page)
|
||||
if vlist:
|
||||
return get_playlist(vlist[0][0], vlist[0][1])
|
||||
raise Exception("No video found in this playlist")
|
||||
|
||||
|
||||
def get_playlist(res_id, course_id):
|
||||
ep = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}'
|
||||
req = get_content(ep.format(res_id, course_id))
|
||||
|
||||
patt = r'<a.+?changeforvideo\(\'(\d+)\',\'(\d+)\',\'(\d+)\'\).+?title=\"(.+?)\"'
|
||||
return re.findall(patt, req)
|
||||
|
||||
|
||||
class ICousesExactor(object):
|
||||
PLAYER_BASE_VER = '150606-1'
|
||||
ENCRYPT_MOD_VER = '151020'
|
||||
ENCRYPT_SALT = '3DAPmXsZ4o' # It took really long time to find this...
|
||||
|
||||
def __init__(self, url):
|
||||
self.url = url
|
||||
self.title = ''
|
||||
self.flashvars = ''
|
||||
self.api_data = {}
|
||||
self.media_url = ''
|
||||
self.common_args = {}
|
||||
self.enc_mode = True
|
||||
self.page = get_content(self.url)
|
||||
return
|
||||
|
||||
def get_title(self):
|
||||
if 'viewVCourse' in self.url:
|
||||
self.title = public_course_get_title(self.url, self.page)
|
||||
return
|
||||
title_a_patt = r'<div class="con"> <a.*?>(.*?)</a>'
|
||||
title_b_patt = r'<div class="con"> <a.*?/a>((.|\n)*?)</div>'
|
||||
title_a = match1(self.page, title_a_patt).strip()
|
||||
title_b = match1(self.page, title_b_patt).strip()
|
||||
title = title_a + title_b
|
||||
title = re.sub('( +|\n|\t|\r| )', '', unescape_html(title).replace(' ', ''))
|
||||
self.title = title
|
||||
|
||||
def get_flashvars(self):
|
||||
patt = r'var flashvars\s*=\s*(\{(?:.|\n)+?\});'
|
||||
hit = re.search(patt, self.page)
|
||||
if hit is None:
|
||||
raise Exception('Cannot find flashvars')
|
||||
flashvar_str = hit.group(1)
|
||||
|
||||
uuid = re.search(r'uuid\s*:\s*\"?(\w+)\"?', flashvar_str).group(1)
|
||||
other = re.search(r'other\s*:\s*"(.*?)"', flashvar_str).group(1)
|
||||
isvc = re.search(r'IService\s*:\s*\'(.+?)\'', flashvar_str).group(1)
|
||||
|
||||
player_time_patt = r'MPlayer.swf\?v\=(\d+)'
|
||||
player_time = re.search(player_time_patt, self.page).group(1)
|
||||
|
||||
self.flashvars = dict(IService=isvc, uuid=uuid, other=other, v=player_time)
|
||||
|
||||
def api_req(self, url):
|
||||
xml_str = get_content(url)
|
||||
dom = parseString(xml_str)
|
||||
status = dom.getElementsByTagName('result')[0].getAttribute('status')
|
||||
if status != 'success':
|
||||
raise Exception('API returned fail')
|
||||
|
||||
api_res = {}
|
||||
meta = dom.getElementsByTagName('metadata')
|
||||
for m in meta:
|
||||
key = m.getAttribute('name')
|
||||
val = m.firstChild.nodeValue
|
||||
api_res[key] = val
|
||||
self.api_data = api_res
|
||||
|
||||
def basic_extract(self):
|
||||
self.get_title()
|
||||
self.get_flashvars()
|
||||
api_req_url = '{}?{}'.format(self.flashvars['IService'], parse.urlencode(self.flashvars))
|
||||
self.api_req(api_req_url)
|
||||
|
||||
def do_extract(self, received=0):
|
||||
self.basic_extract()
|
||||
return self.generate_url(received)
|
||||
|
||||
def update_url(self, received):
|
||||
args = self.common_args.copy()
|
||||
play_type = 'seek' if received else 'play'
|
||||
received = received if received else -1
|
||||
args['ls'] = play_type
|
||||
args['start'] = received + 1
|
||||
args['lt'] = self.get_date_str()
|
||||
if self.enc_mode:
|
||||
ssl_ts, sign = self.get_sign(self.media_url)
|
||||
extra_args = dict(h=sign, r=ssl_ts, p=self.__class__.ENCRYPT_MOD_VER)
|
||||
args.update(extra_args)
|
||||
return '{}?{}'.format(self.media_url, parse.urlencode(args))
|
||||
|
||||
@classmethod
|
||||
def get_date_str(self):
|
||||
fmt_str = '%-m-%-d/%-H:%-M:%-S'
|
||||
now = datetime.datetime.now()
|
||||
try:
|
||||
date_str = now.strftime(fmt_str)
|
||||
except ValueError: # msvcrt
|
||||
date_str = '{}-{}/{}:{}:{}'.format(now.month, now.day, now.hour, now.minute, now.second)
|
||||
return date_str
|
||||
|
||||
def generate_url(self, received):
|
||||
media_host = self.get_media_host(self.api_data['host'])
|
||||
media_url = media_host + self.api_data['url']
|
||||
self.media_url = media_url
|
||||
|
||||
common_args = dict(lv=self.__class__.PLAYER_BASE_VER)
|
||||
h = self.api_data.get('h')
|
||||
r = self.api_data.get('p', self.__class__.ENCRYPT_MOD_VER)
|
||||
|
||||
if self.api_data['ssl'] != 'true':
|
||||
self.enc_mode = False
|
||||
common_args.update(dict(h=h, r=r))
|
||||
else:
|
||||
self.enc_mode = True
|
||||
common_args['p'] = self.__class__.ENCRYPT_MOD_VER
|
||||
self.common_args = common_args
|
||||
return self.update_url(received)
|
||||
|
||||
def get_sign(self, media_url):
|
||||
media_host = parse.urlparse(media_url).netloc
|
||||
ran = random.randint(0, 9999999)
|
||||
ssl_callback = get_content('http://{}/ssl/ssl.shtml?r={}'.format(media_host, ran)).split(',')
|
||||
ssl_ts = int(datetime.datetime.strptime(ssl_callback[1], "%b %d %H:%M:%S %Y").timestamp() + int(ssl_callback[0]))
|
||||
sign_this = self.__class__.ENCRYPT_SALT + parse.urlparse(media_url).path + str(ssl_ts)
|
||||
arg_h = base64.b64encode(hashlib.md5(bytes(sign_this, 'utf-8')).digest(), altchars=b'-_')
|
||||
return ssl_ts, arg_h.decode('utf-8').strip('=')
|
||||
|
||||
def get_media_host(self, ori_host):
|
||||
res = get_content(ori_host + '/ssl/host.shtml').strip()
|
||||
path = parse.urlparse(ori_host).path
|
||||
return ''.join([res, path])
|
||||
|
||||
|
||||
def download_urls_icourses(url, title, ext, total_size, output_dir='.', headers=None, **kwargs):
|
||||
if dry_run or player:
|
||||
log.wtf('Non standard protocol')
|
||||
|
||||
title = get_filename(title)
|
||||
|
||||
filename = '%s.%s' % (title, ext)
|
||||
filepath = os.path.join(output_dir, filename)
|
||||
if not force and os.path.exists(filepath):
|
||||
print('Skipping {}: file already exists\n'.format(filepath))
|
||||
return
|
||||
bar = SimpleProgressBar(total_size, 1)
|
||||
print('Downloading %s ...' % tr(filename))
|
||||
url_save_icourses(url, filepath, bar, total_size, headers=headers, **kwargs)
|
||||
bar.done()
|
||||
|
||||
print()
|
||||
|
||||
|
||||
def url_save_icourses(url, filepath, bar, total_size, dyn_callback=None, is_part=False, max_size=0, headers=None):
|
||||
def dyn_update_url(received):
|
||||
if callable(dyn_callback):
|
||||
logging.debug('Calling callback %s for new URL from %s' % (dyn_callback.__name__, received))
|
||||
return dyn_callback(received)
|
||||
if bar is None:
|
||||
bar = DummyProgressBar()
|
||||
if os.path.exists(filepath):
|
||||
if not force:
|
||||
if not is_part:
|
||||
bar.done()
|
||||
print('Skipping %s: file already exists' % tr(os.path.basename(filepath)))
|
||||
else:
|
||||
filesize = os.path.getsize(filepath)
|
||||
bar.update_received(filesize)
|
||||
return
|
||||
else:
|
||||
if not is_part:
|
||||
bar.done()
|
||||
print('Overwriting %s' % os.path.basename(filepath), '...')
|
||||
elif not os.path.exists(os.path.dirname(filepath)):
|
||||
os.mkdir(os.path.dirname(filepath))
|
||||
|
||||
temp_filepath = filepath + '.download'
|
||||
received = 0
|
||||
if not force:
|
||||
open_mode = 'ab'
|
||||
|
||||
if os.path.exists(temp_filepath):
|
||||
tempfile_size = os.path.getsize(temp_filepath)
|
||||
received += tempfile_size
|
||||
bar.update_received(tempfile_size)
|
||||
else:
|
||||
open_mode = 'wb'
|
||||
|
||||
if received:
|
||||
url = dyn_update_url(received)
|
||||
|
||||
if headers is None:
|
||||
headers = {}
|
||||
response = urlopen_with_retry(request.Request(url, headers=headers))
|
||||
# Do not update content-length here.
|
||||
# Only the 1st segment's content-length is the content-length of the file.
|
||||
# For other segments, content-length is the standard one, 15 * 1024 * 1024
|
||||
|
||||
with open(temp_filepath, open_mode) as output:
|
||||
before_this_uri = received
|
||||
# received - before_this_uri is size of the buf we get from one uri
|
||||
while True:
|
||||
update_bs = 256 * 1024
|
||||
left_bytes = total_size - received
|
||||
to_read = left_bytes if left_bytes <= update_bs else update_bs
|
||||
# calc the block size to read -- The server can fail to send an EOF
|
||||
buffer = response.read(to_read)
|
||||
if not buffer:
|
||||
logging.debug('Got EOF from server')
|
||||
break
|
||||
output.write(buffer)
|
||||
received += len(buffer)
|
||||
bar.update_received(len(buffer))
|
||||
if received >= total_size:
|
||||
break
|
||||
if max_size and (received - before_this_uri) >= max_size:
|
||||
url = dyn_update_url(received)
|
||||
before_this_uri = received
|
||||
response = urlopen_with_retry(request.Request(url, headers=headers))
|
||||
|
||||
assert received == os.path.getsize(temp_filepath), '%s == %s' % (received, os.path.getsize(temp_filepath))
|
||||
|
||||
if os.access(filepath, os.W_OK):
|
||||
os.remove(filepath) # on Windows rename could fail if destination filepath exists
|
||||
os.rename(temp_filepath, filepath)
|
||||
|
||||
site_info = 'icourses.cn'
|
||||
download = icourses_download
|
||||
download_playlist = icourses_playlist_download
|
@ -21,12 +21,18 @@ def ifeng_download_by_id(id, title = None, output_dir = '.', merge = True, info_
|
||||
download_urls([url], title, ext, size, output_dir, merge = merge)
|
||||
|
||||
def ifeng_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
id = r1(r'/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\.shtml$', url)
|
||||
# old pattern /uuid.shtml
|
||||
# now it could be #uuid
|
||||
id = r1(r'([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})', url)
|
||||
if id:
|
||||
return ifeng_download_by_id(id, None, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||
|
||||
html = get_html(url)
|
||||
html = get_content(url)
|
||||
uuid_pattern = r'"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})"'
|
||||
id = r1(r'var vid="([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})"', html)
|
||||
if id is None:
|
||||
video_pattern = r'"vid"\s*:\s*' + uuid_pattern
|
||||
id = match1(html, video_pattern)
|
||||
assert id, "can't find video info"
|
||||
return ifeng_download_by_id(id, None, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||
|
||||
|
@ -52,20 +52,16 @@ class Imgur(VideoExtractor):
|
||||
else:
|
||||
# gallery image
|
||||
content = get_content(self.url)
|
||||
image = json.loads(match1(content, r'image\s*:\s*({.*}),'))
|
||||
ext = image['ext']
|
||||
url = match1(content, r'(https?://i.imgur.com/[^"]+)')
|
||||
_, container, size = url_info(url)
|
||||
self.streams = {
|
||||
'original': {
|
||||
'src': ['http://i.imgur.com/%s%s' % (image['hash'], ext)],
|
||||
'size': image['size'],
|
||||
'container': ext[1:]
|
||||
},
|
||||
'thumbnail': {
|
||||
'src': ['http://i.imgur.com/%ss%s' % (image['hash'], '.jpg')],
|
||||
'container': 'jpg'
|
||||
'src': [url],
|
||||
'size': size,
|
||||
'container': container
|
||||
}
|
||||
}
|
||||
self.title = image['title']
|
||||
self.title = r1(r'i\.imgur\.com/([^./]*)', url)
|
||||
|
||||
def extract(self, **kwargs):
|
||||
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||
|
57
src/you_get/extractors/instagram.py
Normal file → Executable file
57
src/you_get/extractors/instagram.py
Normal file → Executable file
@ -5,24 +5,65 @@ __all__ = ['instagram_download']
|
||||
from ..common import *
|
||||
|
||||
def instagram_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
url = r1(r'([^?]*)', url)
|
||||
html = get_html(url)
|
||||
|
||||
vid = r1(r'instagram.com/p/([^/]+)', url)
|
||||
description = r1(r'<meta property="og:title" content="([^"]*)"', html)
|
||||
vid = r1(r'instagram.com/\w+/([^/]+)', url)
|
||||
description = r1(r'<meta property="og:title" content="([^"]*)"', html) or \
|
||||
r1(r'<title>\s([^<]*)</title>', html) # with logged-in cookies
|
||||
title = "{} [{}]".format(description.replace("\n", " "), vid)
|
||||
|
||||
stream = r1(r'<meta property="og:video" content="([^"]*)"', html)
|
||||
if stream:
|
||||
_, ext, size = url_info(stream)
|
||||
else:
|
||||
image = r1(r'<meta property="og:image" content="([^"]*)"', html)
|
||||
ext = 'jpg'
|
||||
_, _, size = url_info(image)
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
url = stream if stream else image
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
||||
download_urls([stream], title, ext, size, output_dir, merge=merge)
|
||||
else:
|
||||
data = re.search(r'window\._sharedData\s*=\s*(.*);</script>', html)
|
||||
if data is not None:
|
||||
info = json.loads(data.group(1))
|
||||
post = info['entry_data']['PostPage'][0]
|
||||
else:
|
||||
# with logged-in cookies
|
||||
data = re.search(r'window\.__additionalDataLoaded\(\'[^\']+\',(.*)\);</script>', html)
|
||||
if data is not None:
|
||||
log.e('[Error] Cookies needed.')
|
||||
post = json.loads(data.group(1))
|
||||
|
||||
if 'edge_sidecar_to_children' in post['graphql']['shortcode_media']:
|
||||
edges = post['graphql']['shortcode_media']['edge_sidecar_to_children']['edges']
|
||||
for edge in edges:
|
||||
title = edge['node']['shortcode']
|
||||
image_url = edge['node']['display_url']
|
||||
if 'video_url' in edge['node']:
|
||||
image_url = edge['node']['video_url']
|
||||
ext = image_url.split('?')[0].split('.')[-1]
|
||||
size = int(get_head(image_url)['Content-Length'])
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(urls=[image_url],
|
||||
title=title,
|
||||
ext=ext,
|
||||
total_size=size,
|
||||
output_dir=output_dir)
|
||||
else:
|
||||
title = post['graphql']['shortcode_media']['shortcode']
|
||||
image_url = post['graphql']['shortcode_media']['display_url']
|
||||
if 'video_url' in post['graphql']['shortcode_media']:
|
||||
image_url = post['graphql']['shortcode_media']['video_url']
|
||||
ext = image_url.split('?')[0].split('.')[-1]
|
||||
size = int(get_head(image_url)['Content-Length'])
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(urls=[image_url],
|
||||
title=title,
|
||||
ext=ext,
|
||||
total_size=size,
|
||||
output_dir=output_dir)
|
||||
|
||||
site_info = "Instagram.com"
|
||||
download = instagram_download
|
||||
|
@ -3,14 +3,18 @@
|
||||
__all__ = ['iqilu_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
|
||||
def iqilu_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
''''''
|
||||
if re.match(r'http://v.iqilu.com/\w+', url):
|
||||
patt = r'url\s*:\s*\[([^\]]+)\]'
|
||||
|
||||
#URL in webpage
|
||||
html = get_content(url)
|
||||
url = match1(html, r"<input type='hidden' id='playerId' url='(.+)'")
|
||||
player_data = '[' + match1(html, patt) + ']'
|
||||
urls = json.loads(player_data)
|
||||
url = urls[0]['stream_url']
|
||||
|
||||
#grab title
|
||||
title = match1(html, r'<meta name="description" content="(.*?)\"\W')
|
||||
|
@ -20,7 +20,7 @@ Changelog:
|
||||
use @fffonion 's method in #617.
|
||||
Add trace AVM(asasm) code in Iqiyi's encode function where the salt is put into the encode array and reassemble by RABCDasm(or WinRABCDasm),then use Fiddler to response modified file to replace the src file with its AutoResponder function ,set browser Fiddler proxy and play with !debug version! Flash Player ,finially get result in flashlog.txt(its location can be easily found in search engine).
|
||||
Code Like (without letters after #comment:),it just do the job : trace("{IQIYI_SALT}:"+salt_array.join(""))
|
||||
```(Postion After getTimer)
|
||||
```(Position After getTimer)
|
||||
findpropstrict QName(PackageNamespace(""), "trace")
|
||||
pushstring "{IQIYI_SALT}:" #comment for you to locate the salt
|
||||
getscopeobject 1
|
||||
@ -97,7 +97,9 @@ class Iqiyi(VideoExtractor):
|
||||
{'id': '4k', 'container': 'm3u8', 'video_profile': '4k'},
|
||||
{'id': 'BD', 'container': 'm3u8', 'video_profile': '1080p'},
|
||||
{'id': 'TD', 'container': 'm3u8', 'video_profile': '720p'},
|
||||
{'id': 'TD_H265', 'container': 'm3u8', 'video_profile': '720p H265'},
|
||||
{'id': 'HD', 'container': 'm3u8', 'video_profile': '540p'},
|
||||
{'id': 'HD_H265', 'container': 'm3u8', 'video_profile': '540p H265'},
|
||||
{'id': 'SD', 'container': 'm3u8', 'video_profile': '360p'},
|
||||
{'id': 'LD', 'container': 'm3u8', 'video_profile': '210p'},
|
||||
]
|
||||
@ -108,8 +110,8 @@ class Iqiyi(VideoExtractor):
|
||||
stream_to_bid = { '4k': 10, 'fullhd' : 5, 'suprt-high' : 4, 'super' : 3, 'high' : 2, 'standard' :1, 'topspeed' :96}
|
||||
'''
|
||||
ids = ['4k','BD', 'TD', 'HD', 'SD', 'LD']
|
||||
vd_2_id = {10: '4k', 19: '4k', 5:'BD', 18: 'BD', 21: 'HD', 2: 'HD', 4: 'TD', 17: 'TD', 96: 'LD', 1: 'SD'}
|
||||
id_2_profile = {'4k':'4k', 'BD': '1080p','TD': '720p', 'HD': '540p', 'SD': '360p', 'LD': '210p'}
|
||||
vd_2_id = {10: '4k', 19: '4k', 5:'BD', 18: 'BD', 21: 'HD_H265', 2: 'HD', 4: 'TD', 17: 'TD_H265', 96: 'LD', 1: 'SD', 14: 'TD'}
|
||||
id_2_profile = {'4k':'4k', 'BD': '1080p','TD': '720p', 'HD': '540p', 'SD': '360p', 'LD': '210p', 'HD_H265': '540p H265', 'TD_H265': '720p H265'}
|
||||
|
||||
|
||||
|
||||
@ -117,10 +119,10 @@ class Iqiyi(VideoExtractor):
|
||||
self.url = url
|
||||
|
||||
video_page = get_content(url)
|
||||
videos = set(re.findall(r'<a href="(http://www\.iqiyi\.com/v_[^"]+)"', video_page))
|
||||
videos = set(re.findall(r'<a href="(?=https?:)?(//www\.iqiyi\.com/v_[^"]+)"', video_page))
|
||||
|
||||
for video in videos:
|
||||
self.__class__().download_by_url(video, **kwargs)
|
||||
self.__class__().download_by_url('https:' + video, **kwargs)
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
assert self.url or self.vid
|
||||
@ -129,15 +131,17 @@ class Iqiyi(VideoExtractor):
|
||||
html = get_html(self.url)
|
||||
tvid = r1(r'#curid=(.+)_', self.url) or \
|
||||
r1(r'tvid=([^&]+)', self.url) or \
|
||||
r1(r'data-player-tvid="([^"]+)"', html)
|
||||
r1(r'data-player-tvid="([^"]+)"', html) or r1(r'tv(?:i|I)d=(.+?)\&', html) or r1(r'param\[\'tvid\'\]\s*=\s*"(.+?)"', html)
|
||||
videoid = r1(r'#curid=.+_(.*)$', self.url) or \
|
||||
r1(r'vid=([^&]+)', self.url) or \
|
||||
r1(r'data-player-videoid="([^"]+)"', html)
|
||||
r1(r'data-player-videoid="([^"]+)"', html) or r1(r'vid=(.+?)\&', html) or r1(r'param\[\'vid\'\]\s*=\s*"(.+?)"', html)
|
||||
self.vid = (tvid, videoid)
|
||||
self.title = match1(html, '<title>([^<]+)').split('-')[0]
|
||||
info_u = 'http://pcw-api.iqiyi.com/video/video/playervideoinfo?tvid=' + tvid
|
||||
json_res = get_content(info_u)
|
||||
self.title = json.loads(json_res)['data']['vn']
|
||||
tvid, videoid = self.vid
|
||||
info = getVMS(tvid, videoid)
|
||||
assert info['code'] == 'A00000', 'can\'t play this video'
|
||||
assert info['code'] == 'A00000', "can't play this video"
|
||||
|
||||
for stream in info['data']['vidl']:
|
||||
try:
|
||||
@ -145,8 +149,8 @@ class Iqiyi(VideoExtractor):
|
||||
if stream_id in self.stream_types:
|
||||
continue
|
||||
stream_profile = self.id_2_profile[stream_id]
|
||||
self.streams[stream_id] = {'video_profile': stream_profile, 'container': 'm3u8', 'src': [stream['m3u']], 'size' : 0}
|
||||
except:
|
||||
self.streams[stream_id] = {'video_profile': stream_profile, 'container': 'm3u8', 'src': [stream['m3u']], 'size' : 0, 'm3u8_url': stream['m3u']}
|
||||
except Exception as e:
|
||||
log.i("vd: {} is not handled".format(stream['vd']))
|
||||
log.i("info is {}".format(stream))
|
||||
|
||||
@ -199,9 +203,7 @@ class Iqiyi(VideoExtractor):
|
||||
# For legacy main()
|
||||
|
||||
#Here's the change!!
|
||||
download_url_ffmpeg(urls[0], self.title, 'mp4',
|
||||
output_dir=kwargs['output_dir'],
|
||||
merge=kwargs['merge'],)
|
||||
download_url_ffmpeg(urls[0], self.title, 'mp4', output_dir=kwargs['output_dir'], merge=kwargs['merge'], stream=False)
|
||||
|
||||
if not kwargs['caption']:
|
||||
print('Skipping captions.')
|
||||
|
50
src/you_get/extractors/iwara.py
Normal file
50
src/you_get/extractors/iwara.py
Normal file
@ -0,0 +1,50 @@
|
||||
#!/usr/bin/env python
|
||||
__all__ = ['iwara_download']
|
||||
from ..common import *
|
||||
headers = {
|
||||
'DNT': '1',
|
||||
'Accept-Encoding': 'gzip, deflate, sdch, br',
|
||||
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
|
||||
'Cache-Control': 'max-age=0',
|
||||
'Connection': 'keep-alive',
|
||||
'Save-Data': 'on',
|
||||
'Cookie':'has_js=1;show_adult=1',
|
||||
}
|
||||
stream_types = [
|
||||
{'id': 'Source', 'container': 'mp4', 'video_profile': '原始'},
|
||||
{'id': '540p', 'container': 'mp4', 'video_profile': '540p'},
|
||||
{'id': '360p', 'container': 'mp4', 'video_profile': '360P'},
|
||||
]
|
||||
def iwara_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
global headers
|
||||
video_hash = match1(url, r'https?://\w+.iwara.tv/videos/(\w+)')
|
||||
video_url = match1(url, r'(https?://\w+.iwara.tv)/videos/\w+')
|
||||
html = get_content(url, headers=headers)
|
||||
title = r1(r'<title>(.*)</title>', html)
|
||||
api_url = video_url + '/api/video/' + video_hash
|
||||
content = get_content(api_url, headers=headers)
|
||||
data = json.loads(content)
|
||||
down_urls = 'https:' + data[0]['uri']
|
||||
type, ext, size = url_info(down_urls, headers=headers)
|
||||
print_info(site_info, title+data[0]['resolution'], type, size)
|
||||
|
||||
if not info_only:
|
||||
download_urls([down_urls], title, ext, size, output_dir, merge=merge, headers=headers)
|
||||
|
||||
def download_playlist_by_url( url, **kwargs):
|
||||
video_page = get_content(url)
|
||||
# url_first=re.findall(r"(http[s]?://[^/]+)",url)
|
||||
url_first=match1(url, r"(http[s]?://[^/]+)")
|
||||
# print (url_first)
|
||||
videos = set(re.findall(r'<a href="(/videos/[^"]+)"', video_page))
|
||||
if(len(videos)>0):
|
||||
for video in videos:
|
||||
iwara_download(url_first+video, **kwargs)
|
||||
else:
|
||||
maybe_print('this page not found any videos')
|
||||
site_info = "Iwara"
|
||||
download = iwara_download
|
||||
download_playlist = download_playlist_by_url
|
157
src/you_get/extractors/ixigua.py
Normal file
157
src/you_get/extractors/ixigua.py
Normal file
@ -0,0 +1,157 @@
|
||||
#!/usr/bin/env python
|
||||
import base64
|
||||
|
||||
import binascii
|
||||
|
||||
from ..common import *
|
||||
import random
|
||||
import string
|
||||
import ctypes
|
||||
from json import loads
|
||||
from urllib import request
|
||||
|
||||
__all__ = ['ixigua_download', 'ixigua_download_playlist_by_url']
|
||||
|
||||
headers = {
|
||||
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 "
|
||||
"Safari/537.36",
|
||||
}
|
||||
|
||||
|
||||
def int_overflow(val):
|
||||
maxint = 2147483647
|
||||
if not -maxint - 1 <= val <= maxint:
|
||||
val = (val + (maxint + 1)) % (2 * (maxint + 1)) - maxint - 1
|
||||
return val
|
||||
|
||||
|
||||
def unsigned_right_shitf(n, i):
|
||||
if n < 0:
|
||||
n = ctypes.c_uint32(n).value
|
||||
if i < 0:
|
||||
return -int_overflow(n << abs(i))
|
||||
return int_overflow(n >> i)
|
||||
|
||||
|
||||
def get_video_url_from_video_id(video_id):
|
||||
"""Splicing URLs according to video ID to get video details"""
|
||||
# from js
|
||||
data = [""] * 256
|
||||
for index, _ in enumerate(data):
|
||||
t = index
|
||||
for i in range(8):
|
||||
t = -306674912 ^ unsigned_right_shitf(t, 1) if 1 & t else unsigned_right_shitf(t, 1)
|
||||
data[index] = t
|
||||
|
||||
def tmp():
|
||||
rand_num = random.random()
|
||||
path = "/video/urls/v/1/toutiao/mp4/{video_id}?r={random_num}".format(video_id=video_id,
|
||||
random_num=str(rand_num)[2:])
|
||||
e = o = r = -1
|
||||
i, a = 0, len(path)
|
||||
while i < a:
|
||||
e = ord(path[i])
|
||||
i += 1
|
||||
if e < 128:
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ e)]
|
||||
else:
|
||||
if e < 2048:
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (192 | e >> 6 & 31))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
|
||||
else:
|
||||
if 55296 <= e < 57344:
|
||||
e = (1023 & e) + 64
|
||||
i += 1
|
||||
o = 1023 & t.url(i)
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (240 | e >> 8 & 7))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 2 & 63))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | o >> 6 & 15 | (3 & e) << 4))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & o))]
|
||||
else:
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (224 | e >> 12 & 15))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 6 & 63))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
|
||||
|
||||
return "https://ib.365yg.com{path}&s={param}".format(path=path, param=unsigned_right_shitf(r ^ -1, 0))
|
||||
|
||||
while 1:
|
||||
url = tmp()
|
||||
if url.split("=")[-1][0] != "-": # 参数s不能为负数
|
||||
return url
|
||||
|
||||
|
||||
def ixigua_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
# example url: https://www.ixigua.com/i6631065141750268420/#mid=63024814422
|
||||
resp = urlopen_with_retry(request.Request(url))
|
||||
html = resp.read().decode('utf-8')
|
||||
|
||||
_cookies = []
|
||||
for c in resp.getheader('Set-Cookie').split("httponly,"):
|
||||
_cookies.append(c.strip().split(' ')[0])
|
||||
headers['cookie'] = ' '.join(_cookies)
|
||||
|
||||
conf = loads(match1(html, r"window\.config = (.+);"))
|
||||
if not conf:
|
||||
log.e("Get window.config from url failed, url: {}".format(url))
|
||||
return
|
||||
verify_url = conf['prefix'] + conf['url'] + '?key=' + conf['key'] + '&psm=' + conf['psm'] \
|
||||
+ '&_signature=' + ''.join(random.sample(string.ascii_letters + string.digits, 31))
|
||||
try:
|
||||
ok = get_content(verify_url)
|
||||
except Exception as e:
|
||||
ok = e.msg
|
||||
if ok != 'OK':
|
||||
log.e("Verify failed, verify_url: {}, result: {}".format(verify_url, ok))
|
||||
return
|
||||
html = get_content(url, headers=headers)
|
||||
|
||||
video_id = match1(html, r"\"vid\":\"([^\"]+)")
|
||||
title = match1(html, r"\"player__videoTitle\">.*?<h1.*?>(.*)<\/h1><\/div>")
|
||||
if not video_id:
|
||||
log.e("video_id not found, url:{}".format(url))
|
||||
return
|
||||
video_info_url = get_video_url_from_video_id(video_id)
|
||||
video_info = loads(get_content(video_info_url))
|
||||
if video_info.get("code", 1) != 0:
|
||||
log.e("Get video info from {} error: server return code {}".format(video_info_url, video_info.get("code", 1)))
|
||||
return
|
||||
if not video_info.get("data", None):
|
||||
log.e("Get video info from {} error: The server returns JSON value"
|
||||
" without data or data is empty".format(video_info_url))
|
||||
return
|
||||
if not video_info["data"].get("video_list", None):
|
||||
log.e("Get video info from {} error: The server returns JSON value"
|
||||
" without data.video_list or data.video_list is empty".format(video_info_url))
|
||||
return
|
||||
if not video_info["data"]["video_list"].get("video_1", None):
|
||||
log.e("Get video info from {} error: The server returns JSON value"
|
||||
" without data.video_list.video_1 or data.video_list.video_1 is empty".format(video_info_url))
|
||||
return
|
||||
bestQualityVideo = list(video_info["data"]["video_list"].keys())[-1] #There is not only video_1, there might be video_2
|
||||
size = int(video_info["data"]["video_list"][bestQualityVideo]["size"])
|
||||
print_info(site_info=site_info, title=title, type="mp4", size=size) # 该网站只有mp4类型文件
|
||||
if not info_only:
|
||||
video_url = base64.b64decode(video_info["data"]["video_list"][bestQualityVideo]["main_url"].encode("utf-8"))
|
||||
download_urls([video_url.decode("utf-8")], title, "mp4", size, output_dir, merge=merge, headers=headers, **kwargs)
|
||||
|
||||
|
||||
def ixigua_download_playlist_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
assert "user" in url, "Only support users to publish video list,Please provide a similar url:" \
|
||||
"https://www.ixigua.com/c/user/6907091136/"
|
||||
|
||||
user_id = url.split("/")[-2] if url[-1] == "/" else url.split("/")[-1]
|
||||
params = {"max_behot_time": "0", "max_repin_time": "0", "count": "20", "page_type": "0", "user_id": user_id}
|
||||
while 1:
|
||||
url = "https://www.ixigua.com/c/user/article/?" + "&".join(["{}={}".format(k, v) for k, v in params.items()])
|
||||
video_list = loads(get_content(url, headers=headers))
|
||||
params["max_behot_time"] = video_list["next"]["max_behot_time"]
|
||||
for video in video_list["data"]:
|
||||
ixigua_download("https://www.ixigua.com/i{}/".format(video["item_id"]), output_dir, merge, info_only,
|
||||
**kwargs)
|
||||
if video_list["next"]["max_behot_time"] == 0:
|
||||
break
|
||||
|
||||
|
||||
site_info = "ixigua.com"
|
||||
download = ixigua_download
|
||||
download_playlist = ixigua_download_playlist_by_url
|
@ -1,23 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['jpopsuki_download']
|
||||
|
||||
from ..common import *
|
||||
|
||||
def jpopsuki_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url, faker=True)
|
||||
|
||||
title = r1(r'<meta name="title" content="([^"]*)"', html)
|
||||
if title.endswith(' - JPopsuki TV'):
|
||||
title = title[:-14]
|
||||
|
||||
url = "http://jpopsuki.tv%s" % r1(r'<source src="([^"]*)"', html)
|
||||
type, ext, size = url_info(url, faker=True)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge, faker=True)
|
||||
|
||||
site_info = "JPopsuki.tv"
|
||||
download = jpopsuki_download
|
||||
download_playlist = playlist_not_supported('jpopsuki')
|
50
src/you_get/extractors/kakao.py
Normal file
50
src/you_get/extractors/kakao.py
Normal file
@ -0,0 +1,50 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
from ..common import *
|
||||
from .universal import *
|
||||
|
||||
__all__ = ['kakao_download']
|
||||
|
||||
|
||||
def kakao_download(url, output_dir='.', info_only=False, **kwargs):
|
||||
json_request_url = 'https://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?vid={}'
|
||||
|
||||
# in this implementation playlist not supported so use url_without_playlist
|
||||
# if want to support playlist need to change that
|
||||
if re.search('playlistId', url):
|
||||
url = re.search(r"(.+)\?.+?", url).group(1)
|
||||
|
||||
page = get_content(url)
|
||||
try:
|
||||
vid = re.search(r"<meta name=\"vid\" content=\"(.+)\">", page).group(1)
|
||||
title = re.search(r"<meta name=\"title\" content=\"(.+)\">", page).group(1)
|
||||
|
||||
meta_str = get_content(json_request_url.format(vid))
|
||||
meta_json = json.loads(meta_str)
|
||||
|
||||
standard_preset = meta_json['output_list']['standard_preset']
|
||||
output_videos = meta_json['output_list']['output_list']
|
||||
size = ''
|
||||
if meta_json['svcname'] == 'smr_pip':
|
||||
for v in output_videos:
|
||||
if v['preset'] == 'mp4_PIP_SMR_480P':
|
||||
size = int(v['filesize'])
|
||||
break
|
||||
else:
|
||||
for v in output_videos:
|
||||
if v['preset'] == standard_preset:
|
||||
size = int(v['filesize'])
|
||||
break
|
||||
|
||||
video_url = meta_json['location']['url']
|
||||
|
||||
print_info(site_info, title, 'mp4', size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, 'mp4', size, output_dir, **kwargs)
|
||||
except:
|
||||
universal_download(url, output_dir, merge=kwargs['merge'], info_only=info_only, **kwargs)
|
||||
|
||||
|
||||
site_info = "tv.kakao.com"
|
||||
download = kakao_download
|
||||
download_playlist = playlist_not_supported('kakao')
|
@ -14,7 +14,7 @@ def ku6_download_by_id(id, title = None, output_dir = '.', merge = True, info_on
|
||||
title = title or t
|
||||
assert title
|
||||
urls = f.split(',')
|
||||
ext = re.sub(r'.*\.', '', urls[0])
|
||||
ext = match1(urls[0], r'.*\.(\w+)\??[^\.]*')
|
||||
assert ext in ('flv', 'mp4', 'f4v'), ext
|
||||
ext = {'f4v': 'flv'}.get(ext, ext)
|
||||
size = 0
|
||||
@ -37,6 +37,30 @@ def ku6_download(url, output_dir = '.', merge = True, info_only = False, **kwarg
|
||||
r'http://my.ku6.com/watch\?.*v=(.*)\.\..*']
|
||||
id = r1_of(patterns, url)
|
||||
|
||||
if id is None:
|
||||
# http://www.ku6.com/2017/detail-zt.html?vid=xvqTmvZrH8MNvErpvRxFn3
|
||||
page = get_content(url)
|
||||
meta = re.search(r'detailDataMap=(\{.+?\});', page)
|
||||
if meta is not None:
|
||||
meta = meta.group(1)
|
||||
else:
|
||||
raise Exception('Unsupported url')
|
||||
vid = re.search(r'vid=([^&]+)', url)
|
||||
if vid is not None:
|
||||
vid = vid.group(1)
|
||||
else:
|
||||
raise Exception('Unsupported url')
|
||||
this_meta = re.search('"?'+vid+'"?:\{(.+?)\}', meta)
|
||||
if this_meta is not None:
|
||||
this_meta = this_meta.group(1)
|
||||
title = re.search('title:"(.+?)"', this_meta).group(1)
|
||||
video_url = re.search('playUrl:"(.+?)"', this_meta).group(1)
|
||||
video_size = url_size(video_url)
|
||||
print_info(site_info, title, 'mp4', video_size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, 'mp4', video_size, output_dir, merge=merge, **kwargs)
|
||||
return
|
||||
|
||||
ku6_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||
|
||||
def baidu_ku6(url):
|
||||
@ -48,6 +72,10 @@ def baidu_ku6(url):
|
||||
if isrc is not None:
|
||||
h2 = get_html(isrc)
|
||||
id = match1(h2, r'http://v.ku6.com/show/(.*)\.\.\.html')
|
||||
#fix #1746
|
||||
#some ku6 urls really ends with three dots? A bug?
|
||||
if id is None:
|
||||
id = match1(h2, r'http://v.ku6.com/show/(.*)\.html')
|
||||
|
||||
return id
|
||||
|
||||
|
42
src/you_get/extractors/kuaishou.py
Normal file
42
src/you_get/extractors/kuaishou.py
Normal file
@ -0,0 +1,42 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import urllib.request
|
||||
import urllib.parse
|
||||
import json
|
||||
import re
|
||||
|
||||
from ..util import log
|
||||
from ..common import get_content, download_urls, print_info, playlist_not_supported, url_size
|
||||
|
||||
__all__ = ['kuaishou_download_by_url']
|
||||
|
||||
|
||||
def kuaishou_download_by_url(url, info_only=False, **kwargs):
|
||||
page = get_content(url)
|
||||
# size = video_list[-1]['size']
|
||||
# result wrong size
|
||||
try:
|
||||
search_result=re.search(r"\"playUrls\":\[(\{\"quality\"\:\"\w+\",\"url\":\".*?\"\})+\]", page)
|
||||
all_video_info_str = search_result.group(1)
|
||||
all_video_infos=re.findall(r"\{\"quality\"\:\"(\w+)\",\"url\":\"(.*?)\"\}", all_video_info_str)
|
||||
# get the one of the best quality
|
||||
video_url = all_video_infos[0][1].encode("utf-8").decode('unicode-escape')
|
||||
title = re.search(r"<meta charset=UTF-8><title>(.*?)</title>", page).group(1)
|
||||
size = url_size(video_url)
|
||||
video_format = "flv"#video_url.split('.')[-1]
|
||||
print_info(site_info, title, video_format, size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, video_format, size, **kwargs)
|
||||
except:# extract image
|
||||
og_image_url = re.search(r"<meta\s+property=\"og:image\"\s+content=\"(.+?)\"/>", page).group(1)
|
||||
image_url = og_image_url
|
||||
title = url.split('/')[-1]
|
||||
size = url_size(image_url)
|
||||
image_format = image_url.split('.')[-1]
|
||||
print_info(site_info, title, image_format, size)
|
||||
if not info_only:
|
||||
download_urls([image_url], title, image_format, size, **kwargs)
|
||||
|
||||
site_info = "kuaishou.com"
|
||||
download = kuaishou_download_by_url
|
||||
download_playlist = playlist_not_supported('kuaishou')
|
@ -8,46 +8,88 @@ from base64 import b64decode
|
||||
import re
|
||||
import hashlib
|
||||
|
||||
|
||||
def kugou_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
|
||||
if url.lower().find("5sing")!=-1:
|
||||
#for 5sing.kugou.com
|
||||
html=get_html(url)
|
||||
ticket=r1(r'"ticket":\s*"(.*)"',html)
|
||||
j=loads(str(b64decode(ticket),encoding="utf-8"))
|
||||
url=j['file']
|
||||
title=j['songName']
|
||||
if url.lower().find("5sing") != -1:
|
||||
# for 5sing.kugou.com
|
||||
html = get_html(url)
|
||||
ticket = r1(r'"ticket":\s*"(.*)"', html)
|
||||
j = loads(str(b64decode(ticket), encoding="utf-8"))
|
||||
url = j['file']
|
||||
title = j['songName']
|
||||
songtype, ext, size = url_info(url)
|
||||
print_info(site_info, title, songtype, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
||||
elif url.lower().find("hash") != -1:
|
||||
return kugou_download_by_hash(url, output_dir, merge, info_only)
|
||||
else:
|
||||
#for the www.kugou.com/
|
||||
# for the www.kugou.com/
|
||||
return kugou_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
# raise NotImplementedError(url)
|
||||
|
||||
def kugou_download_by_hash(title,hash_val,output_dir = '.', merge = True, info_only = False):
|
||||
#sample
|
||||
#url_sample:http://www.kugou.com/yy/album/single/536957.html
|
||||
#hash ->key md5(hash+kgcloud")->key decompile swf
|
||||
#cmd 4 for mp3 cmd 3 for m4a
|
||||
key=hashlib.new('md5',(hash_val+"kgcloud").encode("utf-8")).hexdigest()
|
||||
html=get_html("http://trackercdn.kugou.com/i/?pid=6&key=%s&acceptMp3=1&cmd=4&hash=%s"%(key,hash_val))
|
||||
j=loads(html)
|
||||
url=j['url']
|
||||
|
||||
def kugou_download_by_hash(url, output_dir='.', merge=True, info_only=False):
|
||||
# sample
|
||||
# url_sample:http://www.kugou.com/song/#hash=93F7D2FC6E95424739448218B591AEAF&album_id=9019462
|
||||
hash_val = match1(url, 'hash=(\w+)')
|
||||
album_id = match1(url, 'album_id=(\d+)')
|
||||
if not album_id:
|
||||
album_id = 123
|
||||
html = get_html("http://www.kugou.com/yy/index.php?r=play/getdata&hash={}&album_id={}&mid=123".format(hash_val, album_id))
|
||||
j = loads(html)
|
||||
url = j['data']['play_url']
|
||||
title = j['data']['audio_name']
|
||||
# some songs cann't play because of copyright protection
|
||||
if (url == ''):
|
||||
return
|
||||
songtype, ext, size = url_info(url)
|
||||
print_info(site_info, title, songtype, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
def kugou_download_playlist(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
html=get_html(url)
|
||||
pattern=re.compile('title="(.*?)".* data="(\w*)\|.*?"')
|
||||
pairs=pattern.findall(html)
|
||||
for title,hash_val in pairs:
|
||||
kugou_download_by_hash(title,hash_val,output_dir,merge,info_only)
|
||||
|
||||
def kugou_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
urls = []
|
||||
|
||||
# download music leaderboard
|
||||
# sample: http://www.kugou.com/yy/html/rank.html
|
||||
if url.lower().find('rank') != -1:
|
||||
html = get_html(url)
|
||||
pattern = re.compile('<a href="(http://.*?)" data-active=')
|
||||
res = pattern.findall(html)
|
||||
for song in res:
|
||||
res = get_html(song)
|
||||
pattern_url = re.compile('"hash":"(\w+)".*"album_id":(\d)+')
|
||||
hash_val, album_id = res = pattern_url.findall(res)[0]
|
||||
if not album_id:
|
||||
album_id = 123
|
||||
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (hash_val, album_id))
|
||||
|
||||
# download album
|
||||
# album sample: http://www.kugou.com/yy/album/single/1645030.html
|
||||
elif url.lower().find('album') != -1:
|
||||
html = get_html(url)
|
||||
pattern = re.compile('var data=(\[.*?\]);')
|
||||
res = pattern.findall(html)[0]
|
||||
for v in json.loads(res):
|
||||
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v['hash'], v['album_id']))
|
||||
|
||||
# download the playlist
|
||||
# playlist sample:http://www.kugou.com/yy/special/single/487279.html
|
||||
else:
|
||||
html = get_html(url)
|
||||
pattern = re.compile('data="(\w+)\|(\d+)"')
|
||||
for v in pattern.findall(html):
|
||||
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
|
||||
print('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
|
||||
|
||||
# download the list by hash
|
||||
for url in urls:
|
||||
kugou_download_by_hash(url, output_dir, merge, info_only)
|
||||
|
||||
|
||||
site_info = "kugou.com"
|
||||
download = kugou_download
|
||||
# download_playlist = playlist_not_supported("kugou")
|
||||
download_playlist=kugou_download_playlist
|
||||
download_playlist = kugou_download_playlist
|
||||
|
@ -2,20 +2,23 @@
|
||||
|
||||
__all__ = ['letv_download', 'letvcloud_download', 'letvcloud_download_by_vu']
|
||||
|
||||
import json
|
||||
import base64
|
||||
import hashlib
|
||||
import random
|
||||
import xml.etree.ElementTree as ET
|
||||
import base64, hashlib, urllib, time, re
|
||||
import urllib
|
||||
|
||||
from ..common import *
|
||||
|
||||
#@DEPRECATED
|
||||
|
||||
# @DEPRECATED
|
||||
def get_timestamp():
|
||||
tn = random.random()
|
||||
url = 'http://api.letv.com/time?tn={}'.format(tn)
|
||||
result = get_content(url)
|
||||
return json.loads(result)['stime']
|
||||
#@DEPRECATED
|
||||
|
||||
|
||||
# @DEPRECATED
|
||||
def get_key(t):
|
||||
for s in range(0, 8):
|
||||
e = 1 & t
|
||||
@ -24,70 +27,72 @@ def get_key(t):
|
||||
t += e
|
||||
return t ^ 185025305
|
||||
|
||||
|
||||
def calcTimeKey(t):
|
||||
ror = lambda val, r_bits, : ((val & (2**32-1)) >> r_bits%32) | (val << (32-(r_bits%32)) & (2**32-1))
|
||||
return ror(ror(t,773625421%13)^773625421,773625421%17)
|
||||
ror = lambda val, r_bits,: ((val & (2 ** 32 - 1)) >> r_bits % 32) | (val << (32 - (r_bits % 32)) & (2 ** 32 - 1))
|
||||
magic = 185025305
|
||||
return ror(t, magic % 17) ^ magic
|
||||
# return ror(ror(t,773625421%13)^773625421,773625421%17)
|
||||
|
||||
|
||||
def decode(data):
|
||||
version = data[0:5]
|
||||
if version.lower() == b'vc_01':
|
||||
#get real m3u8
|
||||
# get real m3u8
|
||||
loc2 = data[5:]
|
||||
length = len(loc2)
|
||||
loc4 = [0]*(2*length)
|
||||
loc4 = [0] * (2 * length)
|
||||
for i in range(length):
|
||||
loc4[2*i] = loc2[i] >> 4
|
||||
loc4[2*i+1]= loc2[i] & 15;
|
||||
loc6 = loc4[len(loc4)-11:]+loc4[:len(loc4)-11]
|
||||
loc7 = [0]*length
|
||||
loc4[2 * i] = loc2[i] >> 4
|
||||
loc4[2 * i + 1] = loc2[i] & 15;
|
||||
loc6 = loc4[len(loc4) - 11:] + loc4[:len(loc4) - 11]
|
||||
loc7 = [0] * length
|
||||
for i in range(length):
|
||||
loc7[i] = (loc6[2 * i] << 4) +loc6[2*i+1]
|
||||
loc7[i] = (loc6[2 * i] << 4) + loc6[2 * i + 1]
|
||||
return ''.join([chr(i) for i in loc7])
|
||||
else:
|
||||
# directly return
|
||||
return data
|
||||
return str(data)
|
||||
|
||||
|
||||
|
||||
|
||||
def video_info(vid,**kwargs):
|
||||
url = 'http://api.letv.com/mms/out/video/playJson?id={}&platid=1&splatid=101&format=1&tkey={}&domain=www.letv.com'.format(vid,calcTimeKey(int(time.time())))
|
||||
def video_info(vid, **kwargs):
|
||||
url = 'http://player-pc.le.com/mms/out/video/playJson?id={}&platid=1&splatid=105&format=1&tkey={}&domain=www.le.com®ion=cn&source=1000&accesyx=1'.format(vid, calcTimeKey(int(time.time())))
|
||||
r = get_content(url, decoded=False)
|
||||
info=json.loads(str(r,"utf-8"))
|
||||
|
||||
info = json.loads(str(r, "utf-8"))
|
||||
info = info['msgs']
|
||||
|
||||
stream_id = None
|
||||
support_stream_id = info["playurl"]["dispatch"].keys()
|
||||
if "stream_id" in kwargs and kwargs["stream_id"].lower() in support_stream_id:
|
||||
stream_id = kwargs["stream_id"]
|
||||
else:
|
||||
print("Current Video Supports:")
|
||||
for i in support_stream_id:
|
||||
print("\t--format",i,"<URL>")
|
||||
if "1080p" in support_stream_id:
|
||||
stream_id = '1080p'
|
||||
elif "720p" in support_stream_id:
|
||||
stream_id = '720p'
|
||||
else:
|
||||
stream_id =sorted(support_stream_id,key= lambda i: int(i[1:]))[-1]
|
||||
stream_id = sorted(support_stream_id, key=lambda i: int(i[1:]))[-1]
|
||||
|
||||
url =info["playurl"]["domain"][0]+info["playurl"]["dispatch"][stream_id][0]
|
||||
url = info["playurl"]["domain"][0] + info["playurl"]["dispatch"][stream_id][0]
|
||||
uuid = hashlib.sha1(url.encode('utf8')).hexdigest() + '_0'
|
||||
ext = info["playurl"]["dispatch"][stream_id][1].split('.')[-1]
|
||||
url+="&ctv=pc&m3v=1&termid=1&format=1&hwtype=un&ostype=Linux&tag=letv&sign=letv&expect=3&tn={}&pay=0&iscpn=f9051&rateid={}".format(random.random(),stream_id)
|
||||
url = url.replace('tss=0', 'tss=ios')
|
||||
url += "&m3v=1&termid=1&format=1&hwtype=un&ostype=MacOS10.12.4&p1=1&p2=10&p3=-&expect=3&tn={}&vid={}&uuid={}&sign=letv".format(random.random(), vid, uuid)
|
||||
|
||||
r2=get_content(url,decoded=False)
|
||||
info2=json.loads(str(r2,"utf-8"))
|
||||
r2 = get_content(url, decoded=False)
|
||||
info2 = json.loads(str(r2, "utf-8"))
|
||||
|
||||
# hold on ! more things to do
|
||||
# to decode m3u8 (encoded)
|
||||
m3u8 = get_content(info2["location"],decoded=False)
|
||||
suffix = '&r=' + str(int(time.time() * 1000)) + '&appid=500'
|
||||
m3u8 = get_content(info2["location"] + suffix, decoded=False)
|
||||
m3u8_list = decode(m3u8)
|
||||
urls = re.findall(r'^[^#][^\r]*',m3u8_list,re.MULTILINE)
|
||||
return ext,urls
|
||||
urls = re.findall(r'(http.*?)#', m3u8_list, re.MULTILINE)
|
||||
return ext, urls
|
||||
|
||||
def letv_download_by_vid(vid,title, output_dir='.', merge=True, info_only=False,**kwargs):
|
||||
ext , urls = video_info(vid,**kwargs)
|
||||
|
||||
def letv_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
ext, urls = video_info(vid, **kwargs)
|
||||
size = 0
|
||||
for i in urls:
|
||||
_, _, tmp = url_info(i)
|
||||
@ -97,27 +102,29 @@ def letv_download_by_vid(vid,title, output_dir='.', merge=True, info_only=False,
|
||||
if not info_only:
|
||||
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def letvcloud_download_by_vu(vu, uu, title=None, output_dir='.', merge=True, info_only=False):
|
||||
#ran = float('0.' + str(random.randint(0, 9999999999999999))) # For ver 2.1
|
||||
#str2Hash = 'cfflashformatjsonran{ran}uu{uu}ver2.2vu{vu}bie^#@(%27eib58'.format(vu = vu, uu = uu, ran = ran) #Magic!/ In ver 2.1
|
||||
argumet_dict ={'cf' : 'flash', 'format': 'json', 'ran': str(int(time.time())), 'uu': str(uu),'ver': '2.2', 'vu': str(vu), }
|
||||
sign_key = '2f9d6924b33a165a6d8b5d3d42f4f987' #ALL YOUR BASE ARE BELONG TO US
|
||||
# ran = float('0.' + str(random.randint(0, 9999999999999999))) # For ver 2.1
|
||||
# str2Hash = 'cfflashformatjsonran{ran}uu{uu}ver2.2vu{vu}bie^#@(%27eib58'.format(vu = vu, uu = uu, ran = ran) #Magic!/ In ver 2.1
|
||||
argumet_dict = {'cf': 'flash', 'format': 'json', 'ran': str(int(time.time())), 'uu': str(uu), 'ver': '2.2', 'vu': str(vu), }
|
||||
sign_key = '2f9d6924b33a165a6d8b5d3d42f4f987' # ALL YOUR BASE ARE BELONG TO US
|
||||
str2Hash = ''.join([i + argumet_dict[i] for i in sorted(argumet_dict)]) + sign_key
|
||||
sign = hashlib.md5(str2Hash.encode('utf-8')).hexdigest()
|
||||
request_info = urllib.request.Request('http://api.letvcloud.com/gpc.php?' + '&'.join([i + '=' + argumet_dict[i] for i in argumet_dict]) + '&sign={sign}'.format(sign = sign))
|
||||
request_info = urllib.request.Request('http://api.letvcloud.com/gpc.php?' + '&'.join([i + '=' + argumet_dict[i] for i in argumet_dict]) + '&sign={sign}'.format(sign=sign))
|
||||
response = urllib.request.urlopen(request_info)
|
||||
data = response.read()
|
||||
info = json.loads(data.decode('utf-8'))
|
||||
type_available = []
|
||||
for video_type in info['data']['video_info']['media']:
|
||||
type_available.append({'video_url': info['data']['video_info']['media'][video_type]['play_url']['main_url'], 'video_quality': int(info['data']['video_info']['media'][video_type]['play_url']['vtype'])})
|
||||
urls = [base64.b64decode(sorted(type_available, key = lambda x:x['video_quality'])[-1]['video_url']).decode("utf-8")]
|
||||
urls = [base64.b64decode(sorted(type_available, key=lambda x: x['video_quality'])[-1]['video_url']).decode("utf-8")]
|
||||
size = urls_size(urls)
|
||||
ext = 'mp4'
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
|
||||
qs = parse.urlparse(url).query
|
||||
vu = match1(qs, r'vu=([\w]+)')
|
||||
@ -125,16 +132,24 @@ def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
|
||||
title = "LETV-%s" % vu
|
||||
letvcloud_download_by_vu(vu, uu, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
|
||||
def letv_download(url, output_dir='.', merge=True, info_only=False ,**kwargs):
|
||||
|
||||
def letv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
url = url_locations([url])[0]
|
||||
if re.match(r'http://yuntv.letv.com/', url):
|
||||
letvcloud_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif 'sports.le.com' in url:
|
||||
html = get_content(url)
|
||||
vid = match1(url, r'video/(\d+)\.html')
|
||||
title = match1(html, r'<h2 class="title">([^<]+)</h2>')
|
||||
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
else:
|
||||
html = get_content(url)
|
||||
vid = match1(url, r'http://www.letv.com/ptv/vplay/(\d+).html') or \
|
||||
match1(url, r'http://www.le.com/ptv/vplay/(\d+).html') or \
|
||||
match1(html, r'vid="(\d+)"')
|
||||
title = match1(html,r'name="irTitle" content="(.*?)"')
|
||||
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only,**kwargs)
|
||||
title = match1(html, r'name="irTitle" content="(.*?)"')
|
||||
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
|
||||
site_info = "Le.com"
|
||||
download = letv_download
|
||||
|
@ -2,39 +2,66 @@
|
||||
|
||||
__all__ = ['lizhi_download']
|
||||
import json
|
||||
import datetime
|
||||
from ..common import *
|
||||
|
||||
def lizhi_download_playlist(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
# like this http://www.lizhi.fm/#/31365/
|
||||
#api desc: s->start l->length band->some radio
|
||||
#http://www.lizhi.fm/api/radio_audios?s=0&l=100&band=31365
|
||||
band_id = match1(url,r'#/(\d+)')
|
||||
#try to get a considerable large l to reduce html parsing task.
|
||||
api_url = 'http://www.lizhi.fm/api/radio_audios?s=0&l=65535&band='+band_id
|
||||
content_json = json.loads(get_content(api_url))
|
||||
for sound in content_json:
|
||||
title = sound["name"]
|
||||
res_url = sound["url"]
|
||||
songtype, ext, size = url_info(res_url,faker=True)
|
||||
print_info(site_info, title, songtype, size)
|
||||
if not info_only:
|
||||
#no referer no speed!
|
||||
download_urls([res_url], title, ext, size, output_dir, merge=merge ,refer = 'http://www.lizhi.fm',faker=True)
|
||||
pass
|
||||
#
|
||||
# Worked well but not perfect.
|
||||
# TODO: add option --format={sd|hd}
|
||||
#
|
||||
def get_url(ep):
|
||||
readable = datetime.datetime.fromtimestamp(int(ep['create_time']) / 1000).strftime('%Y/%m/%d')
|
||||
return 'http://cdn5.lizhi.fm/audio/{}/{}_hd.mp3'.format(readable, ep['id'])
|
||||
|
||||
def lizhi_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
# url like http://www.lizhi.fm/#/549759/18864883431656710
|
||||
api_id = match1(url,r'#/(\d+/\d+)')
|
||||
api_url = 'http://www.lizhi.fm/api/audio/'+api_id
|
||||
content_json = json.loads(get_content(api_url))
|
||||
title = content_json["audio"]["name"]
|
||||
res_url = content_json["audio"]["url"]
|
||||
songtype, ext, size = url_info(res_url,faker=True)
|
||||
print_info(site_info, title, songtype, size)
|
||||
if not info_only:
|
||||
#no referer no speed!
|
||||
download_urls([res_url], title, ext, size, output_dir, merge=merge ,refer = 'http://www.lizhi.fm',faker=True)
|
||||
# radio_id: e.g. 549759 from http://www.lizhi.fm/549759/
|
||||
#
|
||||
# Returns a list of tuples (audio_id, title, url) for each episode
|
||||
# (audio) in the radio playlist. url is the direct link to the audio
|
||||
# file.
|
||||
def lizhi_extract_playlist_info(radio_id):
|
||||
# /api/radio_audios API parameters:
|
||||
#
|
||||
# - s: starting episode
|
||||
# - l: count (per page)
|
||||
# - band: radio_id
|
||||
#
|
||||
# We use l=65535 for poor man's pagination (that is, no pagination
|
||||
# at all -- hope all fits on a single page).
|
||||
#
|
||||
# TODO: Use /api/radio?band={radio_id} to get number of episodes
|
||||
# (au_cnt), then handle pagination properly.
|
||||
api_url = 'http://www.lizhi.fm/api/radio_audios?s=0&l=65535&band=%s' % radio_id
|
||||
api_response = json.loads(get_content(api_url))
|
||||
return [(ep['id'], ep['name'], get_url(ep)) for ep in api_response]
|
||||
|
||||
def lizhi_download_audio(audio_id, title, url, output_dir='.', info_only=False):
|
||||
filetype, ext, size = url_info(url)
|
||||
print_info(site_info, title, filetype, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir=output_dir)
|
||||
|
||||
def lizhi_download_playlist(url, output_dir='.', info_only=False, **kwargs):
|
||||
# Sample URL: http://www.lizhi.fm/549759/
|
||||
radio_id = match1(url,r'/(\d+)')
|
||||
if not radio_id:
|
||||
raise NotImplementedError('%s not supported' % url)
|
||||
for audio_id, title, url in lizhi_extract_playlist_info(radio_id):
|
||||
lizhi_download_audio(audio_id, title, url, output_dir=output_dir, info_only=info_only)
|
||||
|
||||
def lizhi_download(url, output_dir='.', info_only=False, **kwargs):
|
||||
# Sample URL: http://www.lizhi.fm/549759/18864883431656710/
|
||||
m = re.search(r'/(?P<radio_id>\d+)/(?P<audio_id>\d+)', url)
|
||||
if not m:
|
||||
raise NotImplementedError('%s not supported' % url)
|
||||
radio_id = m.group('radio_id')
|
||||
audio_id = m.group('audio_id')
|
||||
# Look for the audio_id among the full list of episodes
|
||||
for aid, title, url in lizhi_extract_playlist_info(radio_id):
|
||||
if aid == audio_id:
|
||||
lizhi_download_audio(audio_id, title, url, output_dir=output_dir, info_only=info_only)
|
||||
break
|
||||
else:
|
||||
raise NotImplementedError('Audio #%s not found in playlist #%s' % (audio_id, radio_id))
|
||||
|
||||
site_info = "lizhi.fm"
|
||||
download = lizhi_download
|
||||
|
74
src/you_get/extractors/longzhu.py
Normal file
74
src/you_get/extractors/longzhu.py
Normal file
@ -0,0 +1,74 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['longzhu_download']
|
||||
|
||||
import json
|
||||
from ..common import (
|
||||
get_content,
|
||||
general_m3u8_extractor,
|
||||
match1,
|
||||
print_info,
|
||||
download_urls,
|
||||
playlist_not_supported,
|
||||
)
|
||||
from ..common import player
|
||||
|
||||
def longzhu_download(url, output_dir = '.', merge=True, info_only=False, **kwargs):
|
||||
web_domain = url.split('/')[2]
|
||||
if (web_domain == 'star.longzhu.com') or (web_domain == 'y.longzhu.com'):
|
||||
domain = url.split('/')[3].split('?')[0]
|
||||
m_url = 'http://m.longzhu.com/{0}'.format(domain)
|
||||
m_html = get_content(m_url)
|
||||
room_id_patt = r'var\s*roomId\s*=\s*(\d+);'
|
||||
room_id = match1(m_html,room_id_patt)
|
||||
|
||||
json_url = 'http://liveapi.plu.cn/liveapp/roomstatus?roomId={0}'.format(room_id)
|
||||
content = get_content(json_url)
|
||||
data = json.loads(content)
|
||||
streamUri = data['streamUri']
|
||||
if len(streamUri) <= 4:
|
||||
raise ValueError('The live stream is not online!')
|
||||
title = data['title']
|
||||
streamer = data['userName']
|
||||
title = str.format(streamer,': ',title)
|
||||
|
||||
steam_api_url = 'http://livestream.plu.cn/live/getlivePlayurl?roomId={0}'.format(room_id)
|
||||
content = get_content(steam_api_url)
|
||||
data = json.loads(content)
|
||||
isonline = data.get('isTransfer')
|
||||
if isonline == '0':
|
||||
raise ValueError('The live stream is not online!')
|
||||
|
||||
real_url = data['playLines'][0]['urls'][0]['securityUrl']
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
|
||||
if not info_only:
|
||||
download_urls([real_url], title, 'flv', None, output_dir, merge=merge)
|
||||
|
||||
elif web_domain == 'replay.longzhu.com':
|
||||
videoid = match1(url, r'(\d+)$')
|
||||
json_url = 'http://liveapi.longzhu.com/livereplay/getreplayfordisplay?videoId={0}'.format(videoid)
|
||||
content = get_content(json_url)
|
||||
data = json.loads(content)
|
||||
|
||||
username = data['userName']
|
||||
title = data['title']
|
||||
title = str.format(username,':',title)
|
||||
real_url = data['videoUrl']
|
||||
|
||||
if player:
|
||||
print_info('Longzhu Video', title, 'm3u8', 0)
|
||||
download_urls([real_url], title, 'm3u8', 0, output_dir, merge=merge)
|
||||
else:
|
||||
urls = general_m3u8_extractor(real_url)
|
||||
print_info('Longzhu Video', title, 'm3u8', 0)
|
||||
if not info_only:
|
||||
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
else:
|
||||
raise ValueError('Wrong url or unsupported link ... {0}'.format(url))
|
||||
|
||||
site_info = 'longzhu.com'
|
||||
download = longzhu_download
|
||||
download_playlist = playlist_not_supported('longzhu')
|
@ -3,15 +3,19 @@
|
||||
__all__ = ['magisto_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
|
||||
def magisto_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url)
|
||||
|
||||
title1 = r1(r'<meta name="twitter:title" content="([^"]*)"', html)
|
||||
title2 = r1(r'<meta name="twitter:description" content="([^"]*)"', html)
|
||||
video_hash = r1(r'http://www.magisto.com/video/([^/]+)', url)
|
||||
title = "%s %s - %s" % (title1, title2, video_hash)
|
||||
url = r1(r'<source type="[^"]+" src="([^"]*)"', html)
|
||||
video_hash = r1(r'video\/([a-zA-Z0-9]+)', url)
|
||||
api_url = 'https://www.magisto.com/api/video/{}'.format(video_hash)
|
||||
content = get_html(api_url)
|
||||
data = json.loads(content)
|
||||
title1 = data['title']
|
||||
title2 = data['creator']
|
||||
title = "%s - %s" % (title1, title2)
|
||||
url = data['video_direct_url']
|
||||
type, ext, size = url_info(url)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
|
@ -12,22 +12,25 @@ import re
|
||||
class MGTV(VideoExtractor):
|
||||
name = "芒果 (MGTV)"
|
||||
|
||||
# Last updated: 2015-11-24
|
||||
# Last updated: 2016-11-13
|
||||
stream_types = [
|
||||
{'id': 'hd', 'container': 'flv', 'video_profile': '超清'},
|
||||
{'id': 'sd', 'container': 'flv', 'video_profile': '高清'},
|
||||
{'id': 'ld', 'container': 'flv', 'video_profile': '标清'},
|
||||
{'id': 'hd', 'container': 'ts', 'video_profile': '超清'},
|
||||
{'id': 'sd', 'container': 'ts', 'video_profile': '高清'},
|
||||
{'id': 'ld', 'container': 'ts', 'video_profile': '标清'},
|
||||
]
|
||||
|
||||
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
|
||||
|
||||
api_endpoint = 'http://v.api.mgtv.com/player/video?video_id={video_id}'
|
||||
api_endpoint = 'http://pcweb.api.mgtv.com/player/video?video_id={video_id}'
|
||||
|
||||
@staticmethod
|
||||
def get_vid_from_url(url):
|
||||
"""Extracts video ID from URL.
|
||||
"""
|
||||
return match1(url, 'http://www.mgtv.com/v/\d/\d+/\w+/(\d+).html')
|
||||
vid = match1(url, 'https?://www.mgtv.com/(?:b|l)/\d+/(\d+).html')
|
||||
if not vid:
|
||||
vid = match1(url, 'https?://www.mgtv.com/hz/bdpz/\d+/(\d+).html')
|
||||
return vid
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
@staticmethod
|
||||
@ -44,10 +47,15 @@ class MGTV(VideoExtractor):
|
||||
|
||||
content = get_content(content['info']) #get the REAL M3U url, maybe to be changed later?
|
||||
segment_list = []
|
||||
segments_size = 0
|
||||
for i in content.split():
|
||||
if not i.startswith('#'): #not the best way, better we use the m3u8 package
|
||||
segment_list.append(base_url + i)
|
||||
return segment_list
|
||||
# use ext-info for fast size calculate
|
||||
elif i.startswith('#EXT-MGTV-File-SIZE:'):
|
||||
segments_size += int(i[i.rfind(':')+1:])
|
||||
|
||||
return m3u_url, segments_size, segment_list
|
||||
|
||||
def download_playlist_by_url(self, url, **kwargs):
|
||||
pass
|
||||
@ -58,8 +66,9 @@ class MGTV(VideoExtractor):
|
||||
content = get_content(self.api_endpoint.format(video_id = self.vid))
|
||||
content = loads(content)
|
||||
self.title = content['data']['info']['title']
|
||||
domain = content['data']['stream_domain'][0]
|
||||
|
||||
#stream_avalable = [i['name'] for i in content['data']['stream']]
|
||||
#stream_available = [i['name'] for i in content['data']['stream']]
|
||||
stream_available = {}
|
||||
for i in content['data']['stream']:
|
||||
stream_available[i['name']] = i['url']
|
||||
@ -68,15 +77,11 @@ class MGTV(VideoExtractor):
|
||||
if s['video_profile'] in stream_available.keys():
|
||||
quality_id = self.id_dic[s['video_profile']]
|
||||
url = stream_available[s['video_profile']]
|
||||
url = re.sub( r'(\&arange\=\d+)', '', url) #Un-Hum
|
||||
segment_list_this = self.get_mgtv_real_url(url)
|
||||
url = domain + re.sub( r'(\&arange\=\d+)', '', url) #Un-Hum
|
||||
m3u8_url, m3u8_size, segment_list_this = self.get_mgtv_real_url(url)
|
||||
|
||||
container_this_stream = ''
|
||||
size_this_stream = 0
|
||||
stream_fileid_list = []
|
||||
for i in segment_list_this:
|
||||
_, container_this_stream, size_this_seg = url_info(i)
|
||||
size_this_stream += size_this_seg
|
||||
stream_fileid_list.append(os.path.basename(i).split('.')[0])
|
||||
|
||||
#make pieces
|
||||
@ -85,10 +90,11 @@ class MGTV(VideoExtractor):
|
||||
pieces.append({'fileid': i[0], 'segs': i[1],})
|
||||
|
||||
self.streams[quality_id] = {
|
||||
'container': 'flv',
|
||||
'container': s['container'],
|
||||
'video_profile': s['video_profile'],
|
||||
'size': size_this_stream,
|
||||
'pieces': pieces
|
||||
'size': m3u8_size,
|
||||
'pieces': pieces,
|
||||
'm3u8_url': m3u8_url
|
||||
}
|
||||
|
||||
if not kwargs['info_only']:
|
||||
@ -107,6 +113,44 @@ class MGTV(VideoExtractor):
|
||||
# Extract stream with the best quality
|
||||
stream_id = self.streams_sorted[0]['id']
|
||||
|
||||
def download(self, **kwargs):
|
||||
|
||||
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||
stream_id = kwargs['stream_id']
|
||||
else:
|
||||
stream_id = 'null'
|
||||
|
||||
# print video info only
|
||||
if 'info_only' in kwargs and kwargs['info_only']:
|
||||
if stream_id != 'null':
|
||||
if 'index' not in kwargs:
|
||||
self.p(stream_id)
|
||||
else:
|
||||
self.p_i(stream_id)
|
||||
else:
|
||||
# Display all available streams
|
||||
if 'index' not in kwargs:
|
||||
self.p([])
|
||||
else:
|
||||
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||
self.p_i(stream_id)
|
||||
|
||||
# default to use the best quality
|
||||
if stream_id == 'null':
|
||||
stream_id = self.streams_sorted[0]['id']
|
||||
|
||||
stream_info = self.streams[stream_id]
|
||||
|
||||
if not kwargs['info_only']:
|
||||
if player:
|
||||
# with m3u8 format because some video player can process urls automatically (e.g. mpv)
|
||||
launch_player(player, [stream_info['m3u8_url']])
|
||||
else:
|
||||
download_urls(stream_info['src'], self.title, stream_info['container'], stream_info['size'],
|
||||
output_dir=kwargs['output_dir'],
|
||||
merge=kwargs.get('merge', True))
|
||||
# av=stream_id in self.dash_streams)
|
||||
|
||||
site = MGTV()
|
||||
download = site.download_by_url
|
||||
download_playlist = site.download_playlist_by_url
|
@ -2,42 +2,127 @@
|
||||
|
||||
__all__ = ['miaopai_download']
|
||||
|
||||
import string
|
||||
import random
|
||||
from ..common import *
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
from ..util import fs
|
||||
|
||||
def miaopai_download_by_url(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
'''Source: Android mobile'''
|
||||
if re.match(r'http://video.weibo.com/show\?fid=(\d{4}:\w{32})\w*', url):
|
||||
fake_headers_mobile = {
|
||||
fake_headers_mobile = {
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
||||
'Accept-Charset': 'UTF-8,*;q=0.5',
|
||||
'Accept-Encoding': 'gzip,deflate,sdch',
|
||||
'Accept-Language': 'en-US,en;q=0.8',
|
||||
'User-Agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'
|
||||
}
|
||||
webpage_url = re.search(r'(http://video.weibo.com/show\?fid=\d{4}:\w{32})\w*', url).group(1) + '&type=mp4' #mobile
|
||||
}
|
||||
|
||||
#grab download URL
|
||||
a = get_content(webpage_url, headers= fake_headers_mobile , decoded=True)
|
||||
url = match1(a, r'<video src="(.*?)\"\W')
|
||||
def miaopai_download_by_fid(fid, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
'''Source: Android mobile'''
|
||||
page_url = 'http://video.weibo.com/show?fid=' + fid + '&type=mp4'
|
||||
|
||||
#grab title
|
||||
b = get_content(webpage_url) #normal
|
||||
title = match1(b, r'<meta name="description" content="([\s\S]*?)\"\W')
|
||||
|
||||
type_, ext, size = url_info(url)
|
||||
print_info(site_info, title, type_, size)
|
||||
mobile_page = get_content(page_url, headers=fake_headers_mobile)
|
||||
url = match1(mobile_page, r'<video id=.*?src=[\'"](.*?)[\'"]\W')
|
||||
if url is None:
|
||||
wb_mp = re.search(r'<script src=([\'"])(.+?wb_mp\.js)\1>', mobile_page).group(2)
|
||||
return miaopai_download_by_wbmp(wb_mp, fid, output_dir=output_dir, merge=merge,
|
||||
info_only=info_only, total_size=None, **kwargs)
|
||||
title = match1(mobile_page, r'<title>((.|\n)+?)</title>')
|
||||
if not title:
|
||||
title = fid
|
||||
title = title.replace('\n', '_')
|
||||
ext, size = 'mp4', url_info(url)[2]
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def miaopai_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
""""""
|
||||
if re.match(r'http://video.weibo.com/show\?fid=(\d{4}:\w{32})\w*', url):
|
||||
miaopai_download_by_url(url, output_dir, merge, info_only)
|
||||
elif re.match(r'http://weibo.com/p/230444\w+', url):
|
||||
_fid = match1(url, r'http://weibo.com/p/230444(\w+)')
|
||||
miaopai_download_by_url('http://video.weibo.com/show?fid=1034:{_fid}'.format(_fid = _fid), output_dir, merge, info_only)
|
||||
|
||||
def miaopai_download_by_wbmp(wbmp_url, fid, info_only=False, **kwargs):
|
||||
headers = {}
|
||||
headers.update(fake_headers_mobile)
|
||||
headers['Host'] = 'imgaliyuncdn.miaopai.com'
|
||||
wbmp = get_content(wbmp_url, headers=headers)
|
||||
appid = re.search(r'appid:\s*?([^,]+?),', wbmp).group(1)
|
||||
jsonp = re.search(r'jsonp:\s*?([\'"])(\w+?)\1', wbmp).group(2)
|
||||
population = [i for i in string.ascii_lowercase] + [i for i in string.digits]
|
||||
info_url = '{}?{}'.format('http://p.weibo.com/aj_media/info', parse.urlencode({
|
||||
'appid': appid.strip(),
|
||||
'fid': fid,
|
||||
jsonp.strip(): '_jsonp' + ''.join(random.sample(population, 11))
|
||||
}))
|
||||
headers['Host'] = 'p.weibo.com'
|
||||
jsonp_text = get_content(info_url, headers=headers)
|
||||
jsonp_dict = json.loads(match1(jsonp_text, r'\(({.+})\)'))
|
||||
if jsonp_dict['code'] != 200:
|
||||
log.wtf('[Failed] "%s"' % jsonp_dict['msg'])
|
||||
video_url = jsonp_dict['data']['meta_data'][0]['play_urls']['l']
|
||||
title = jsonp_dict['data']['description']
|
||||
title = title.replace('\n', '_')
|
||||
ext = 'mp4'
|
||||
headers['Host'] = 'f.us.sinaimg.cn'
|
||||
print_info(site_info, title, ext, url_info(video_url, headers=headers)[2])
|
||||
if not info_only:
|
||||
download_urls([video_url], fs.legitimize(title), ext, headers=headers, **kwargs)
|
||||
|
||||
|
||||
def miaopai_download_story(url, output_dir='.', merge=False, info_only=False, **kwargs):
|
||||
data_url = 'https://m.weibo.cn/s/video/object?%s' % url.split('?')[1]
|
||||
data_content = get_content(data_url, headers=fake_headers_mobile)
|
||||
data = json.loads(data_content)
|
||||
title = data['data']['object']['summary']
|
||||
stream_url = data['data']['object']['stream']['url']
|
||||
|
||||
ext = 'mp4'
|
||||
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
|
||||
if not info_only:
|
||||
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
|
||||
|
||||
|
||||
def miaopai_download_direct(url, output_dir='.', merge=False, info_only=False, **kwargs):
|
||||
mobile_page = get_content(url, headers=fake_headers_mobile)
|
||||
try:
|
||||
title = re.search(r'([\'"])title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
except:
|
||||
title = re.search(r'([\'"])status_title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
title = title.replace('\n', '_')
|
||||
try:
|
||||
stream_url = re.search(r'([\'"])stream_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
except:
|
||||
page_url = re.search(r'([\'"])page_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
return miaopai_download_story(page_url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
ext = 'mp4'
|
||||
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
|
||||
if not info_only:
|
||||
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
|
||||
|
||||
|
||||
def miaopai_download(url, output_dir='.', merge=False, info_only=False, **kwargs):
|
||||
if re.match(r'^http[s]://.*\.weibo\.com/\d+/.+', url):
|
||||
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
if re.match(r'^http[s]://.*\.weibo\.(com|cn)/s/video/.+', url):
|
||||
return miaopai_download_story(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
# FIXME!
|
||||
if re.match(r'^http[s]://.*\.weibo\.com/tv/v/(\w+)', url):
|
||||
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
fid = match1(url, r'\?fid=(\d{4}:\w+)')
|
||||
if fid is not None:
|
||||
miaopai_download_by_fid(fid, output_dir, merge, info_only)
|
||||
elif '/p/230444' in url:
|
||||
fid = match1(url, r'/p/230444(\w+)')
|
||||
miaopai_download_by_fid('1034:'+fid, output_dir, merge, info_only)
|
||||
else:
|
||||
mobile_page = get_content(url, headers = fake_headers_mobile)
|
||||
hit = re.search(r'"page_url"\s*:\s*"([^"]+)"', mobile_page)
|
||||
if not hit:
|
||||
raise Exception('Unknown pattern')
|
||||
else:
|
||||
escaped_url = hit.group(1)
|
||||
miaopai_download(urllib.parse.unquote(escaped_url), output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
|
||||
site_info = "miaopai"
|
||||
download = miaopai_download
|
||||
|
361
src/you_get/extractors/missevan.py
Normal file
361
src/you_get/extractors/missevan.py
Normal file
@ -0,0 +1,361 @@
|
||||
"""
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2019 WaferJay
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
|
||||
from ..common import get_content, urls_size, log, player, dry_run
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
_UA = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 ' \
|
||||
'(KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
|
||||
|
||||
|
||||
class _NoMatchException(Exception):
|
||||
pass
|
||||
|
||||
|
||||
class _Dispatcher(object):
|
||||
|
||||
def __init__(self):
|
||||
self.entry = []
|
||||
|
||||
def register(self, patterns, fun):
|
||||
if not isinstance(patterns, (list, tuple)):
|
||||
patterns = [patterns]
|
||||
|
||||
patterns = [re.compile(reg) for reg in patterns]
|
||||
self.entry.append((patterns, fun))
|
||||
|
||||
def endpoint(self, *patterns):
|
||||
assert patterns, 'patterns must not be empty'
|
||||
def _wrap(fun):
|
||||
self.register(patterns, fun)
|
||||
return fun
|
||||
return _wrap
|
||||
|
||||
def test(self, url):
|
||||
return any(pa.search(url) for pas, _ in self.entry for pa in pas)
|
||||
|
||||
def dispatch(self, url, *args, **kwargs):
|
||||
|
||||
for patterns, fun in self.entry:
|
||||
|
||||
for pa in patterns:
|
||||
|
||||
match = pa.search(url)
|
||||
if not match:
|
||||
continue
|
||||
|
||||
kwargs.update(match.groupdict())
|
||||
return fun(*args, **kwargs)
|
||||
|
||||
raise _NoMatchException()
|
||||
|
||||
missevan_stream_types = [
|
||||
{'id': 'source', 'quality': '源文件', 'url_json_key': 'soundurl',
|
||||
'resource_url_fmt': 'sound/{resource_url}'},
|
||||
{'id': '320', 'quality': '320 Kbps', 'url_json_key': 'soundurl_64'},
|
||||
{'id': '128', 'quality': '128 Kbps', 'url_json_key': 'soundurl_128'},
|
||||
{'id': '32', 'quality': '32 Kbps', 'url_json_key': 'soundurl_32'},
|
||||
{'id': 'covers', 'desc': '封面图', 'url_json_key': 'cover_image',
|
||||
'default_src': 'covers/nocover.png',
|
||||
'resource_url_fmt': 'covers/{resource_url}'},
|
||||
{'id': 'coversmini', 'desc': '封面缩略图', 'url_json_key': 'cover_image',
|
||||
'default_src': 'coversmini/nocover.png',
|
||||
'resource_url_fmt': 'coversmini/{resource_url}'}
|
||||
]
|
||||
|
||||
def _get_resource_uri(data, stream_type):
|
||||
uri = data[stream_type['url_json_key']]
|
||||
if not uri:
|
||||
return stream_type.get('default_src')
|
||||
|
||||
uri_fmt = stream_type.get('resource_url_fmt')
|
||||
if not uri_fmt:
|
||||
return uri
|
||||
return uri_fmt.format(resource_url=uri)
|
||||
|
||||
def is_covers_stream(stream):
|
||||
stream = stream or ''
|
||||
return stream.lower() in ('covers', 'coversmini')
|
||||
|
||||
def get_file_extension(file_path, default=''):
|
||||
_, suffix = os.path.splitext(file_path)
|
||||
if suffix:
|
||||
# remove dot
|
||||
suffix = suffix[1:]
|
||||
return suffix or default
|
||||
|
||||
def best_quality_stream_id(streams, stream_types):
|
||||
for stream_type in stream_types:
|
||||
if streams.get(stream_type['id']):
|
||||
return stream_type['id']
|
||||
|
||||
raise AssertionError('no stream selected')
|
||||
|
||||
|
||||
class MissEvanWithStream(VideoExtractor):
|
||||
|
||||
name = 'MissEvan'
|
||||
stream_types = missevan_stream_types
|
||||
|
||||
def __init__(self, *args):
|
||||
super().__init__(*args)
|
||||
self.referer = 'https://www.missevan.com/'
|
||||
self.ua = _UA
|
||||
|
||||
@classmethod
|
||||
def create(cls, title, streams, *, streams_sorted=None):
|
||||
obj = cls()
|
||||
obj.title = title
|
||||
obj.streams.update(streams)
|
||||
streams_sorted = streams_sorted or cls._setup_streams_sorted(streams)
|
||||
obj.streams_sorted.extend(streams_sorted)
|
||||
return obj
|
||||
|
||||
def set_danmaku(self, danmaku):
|
||||
self.danmaku = danmaku
|
||||
return self
|
||||
|
||||
@staticmethod
|
||||
def _setup_streams_sorted(streams):
|
||||
streams_sorted = []
|
||||
for key, stream in streams.items():
|
||||
copy_stream = stream.copy()
|
||||
copy_stream['id'] = key
|
||||
streams_sorted.append(copy_stream)
|
||||
|
||||
return streams_sorted
|
||||
|
||||
def download(self, **kwargs):
|
||||
stream_id = kwargs.get('stream_id') or self.stream_types[0]['id']
|
||||
stream = self.streams[stream_id]
|
||||
if 'size' not in stream:
|
||||
stream['size'] = urls_size(stream['src'])
|
||||
|
||||
super().download(**kwargs)
|
||||
|
||||
def unsupported_method(self, *args, **kwargs):
|
||||
raise AssertionError('Unsupported')
|
||||
|
||||
download_by_url = unsupported_method
|
||||
download_by_vid = unsupported_method
|
||||
prepare = unsupported_method
|
||||
extract = unsupported_method
|
||||
|
||||
|
||||
class MissEvan(VideoExtractor):
|
||||
|
||||
name = 'MissEvan'
|
||||
stream_types = missevan_stream_types
|
||||
|
||||
def __init__(self, *args):
|
||||
super().__init__(*args)
|
||||
self.referer = 'https://www.missevan.com/'
|
||||
self.ua = _UA
|
||||
self.__headers = {'User-Agent': self.ua, 'Referer': self.referer}
|
||||
|
||||
__prepare_dispatcher = _Dispatcher()
|
||||
|
||||
@__prepare_dispatcher.endpoint(
|
||||
re.compile(r'missevan\.com/sound/(?:player\?.*?id=)?(?P<sid>\d+)', re.I))
|
||||
def prepare_sound(self, sid, **kwargs):
|
||||
json_data = self._get_json(self.url_sound_api(sid))
|
||||
sound = json_data['info']['sound']
|
||||
|
||||
self.title = sound['soundstr']
|
||||
if sound.get('need_pay'):
|
||||
log.e('付费资源无法下载')
|
||||
return
|
||||
|
||||
if not is_covers_stream(kwargs.get('stream_id')) and not dry_run:
|
||||
self.danmaku = self._get_content(self.url_danmaku_api(sid))
|
||||
|
||||
self.streams = self.setup_streams(sound)
|
||||
|
||||
@classmethod
|
||||
def setup_streams(cls, sound):
|
||||
streams = {}
|
||||
|
||||
for stream_type in cls.stream_types:
|
||||
uri = _get_resource_uri(sound, stream_type)
|
||||
resource_url = cls.url_resource(uri) if uri else None
|
||||
|
||||
if resource_url:
|
||||
container = get_file_extension(resource_url)
|
||||
stream_id = stream_type['id']
|
||||
streams[stream_id] = {'src': [resource_url], 'container': container}
|
||||
quality = stream_type.get('quality')
|
||||
if quality:
|
||||
streams[stream_id]['quality'] = quality
|
||||
return streams
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
if self.vid:
|
||||
self.prepare_sound(self.vid, **kwargs)
|
||||
return
|
||||
|
||||
try:
|
||||
self.__prepare_dispatcher.dispatch(self.url, self, **kwargs)
|
||||
except _NoMatchException:
|
||||
log.e('[Error] Unsupported URL pattern.')
|
||||
exit(1)
|
||||
|
||||
@staticmethod
|
||||
def download_covers(title, streams, **kwargs):
|
||||
if not is_covers_stream(kwargs.get('stream_id')) \
|
||||
and not kwargs.get('json_output') \
|
||||
and not kwargs.get('info_only') \
|
||||
and not player:
|
||||
kwargs['stream_id'] = 'covers'
|
||||
MissEvanWithStream \
|
||||
.create(title, streams) \
|
||||
.download(**kwargs)
|
||||
|
||||
_download_playlist_dispatcher = _Dispatcher()
|
||||
|
||||
@_download_playlist_dispatcher.endpoint(
|
||||
re.compile(r'missevan\.com/album(?:info)?/(?P<aid>\d+)', re.I))
|
||||
def download_album(self, aid, **kwargs):
|
||||
json_data = self._get_json(self.url_album_api(aid))
|
||||
album = json_data['info']['album']
|
||||
self.title = album['title']
|
||||
sounds = json_data['info']['sounds']
|
||||
|
||||
output_dir = os.path.abspath(kwargs.pop('output_dir', '.'))
|
||||
output_dir = os.path.join(output_dir, self.title)
|
||||
kwargs['output_dir'] = output_dir
|
||||
|
||||
for sound in sounds:
|
||||
sound_title = sound['soundstr']
|
||||
if sound.get('need_pay'):
|
||||
log.w('跳过付费资源: ' + sound_title)
|
||||
continue
|
||||
|
||||
streams = self.setup_streams(sound)
|
||||
extractor = MissEvanWithStream.create(sound_title, streams)
|
||||
if not dry_run:
|
||||
sound_id = sound['id']
|
||||
danmaku = self._get_content(self.url_danmaku_api(sound_id))
|
||||
extractor.set_danmaku(danmaku)
|
||||
extractor.download(**kwargs)
|
||||
|
||||
self.download_covers(sound_title, streams, **kwargs)
|
||||
|
||||
@_download_playlist_dispatcher.endpoint(
|
||||
re.compile(r'missevan\.com(?:/mdrama)?/drama/(?P<did>\d+)', re.I))
|
||||
def download_drama(self, did, **kwargs):
|
||||
json_data = self._get_json(self.url_drama_api(did))
|
||||
|
||||
drama = json_data['info']['drama']
|
||||
if drama.get('need_pay'):
|
||||
log.w('该剧集包含付费资源, 付费资源将被跳过')
|
||||
|
||||
self.title = drama['name']
|
||||
output_dir = os.path.abspath(kwargs.pop('output_dir', '.'))
|
||||
output_dir = os.path.join(output_dir, self.title)
|
||||
kwargs['output_dir'] = output_dir
|
||||
|
||||
episodes = json_data['info']['episodes']
|
||||
for each in episodes['episode']:
|
||||
if each.get('need_pay'):
|
||||
log.w('跳过付费资源: ' + each['soundstr'])
|
||||
continue
|
||||
sound_id = each['sound_id']
|
||||
MissEvan().download_by_vid(sound_id, **kwargs)
|
||||
|
||||
def download_playlist_by_url(self, url, **kwargs):
|
||||
self.url = url
|
||||
try:
|
||||
self._download_playlist_dispatcher.dispatch(url, self, **kwargs)
|
||||
except _NoMatchException:
|
||||
log.e('[Error] Unsupported URL pattern with --playlist option.')
|
||||
exit(1)
|
||||
|
||||
def download_by_url(self, url, **kwargs):
|
||||
if not kwargs.get('playlist') and self._download_playlist_dispatcher.test(url):
|
||||
log.w('This is an album or drama. (use --playlist option to download all).')
|
||||
else:
|
||||
super().download_by_url(url, **kwargs)
|
||||
|
||||
def download(self, **kwargs):
|
||||
kwargs['keep_obj'] = True # keep the self.streams to download cover
|
||||
super().download(**kwargs)
|
||||
self.download_covers(self.title, self.streams, **kwargs)
|
||||
|
||||
def extract(self, **kwargs):
|
||||
stream_id = kwargs.get('stream_id')
|
||||
|
||||
# fetch all streams size when output info or json
|
||||
if kwargs.get('info_only') and not stream_id \
|
||||
or kwargs.get('json_output'):
|
||||
|
||||
for _, stream in self.streams.items():
|
||||
stream['size'] = urls_size(stream['src'])
|
||||
return
|
||||
|
||||
# fetch size of the selected stream only
|
||||
if not stream_id:
|
||||
stream_id = best_quality_stream_id(self.streams, self.stream_types)
|
||||
|
||||
stream = self.streams[stream_id]
|
||||
if 'size' not in stream:
|
||||
stream['size'] = urls_size(stream['src'])
|
||||
|
||||
def _get_content(self, url):
|
||||
return get_content(url, headers=self.__headers)
|
||||
|
||||
def _get_json(self, url):
|
||||
content = self._get_content(url)
|
||||
return json.loads(content)
|
||||
|
||||
@staticmethod
|
||||
def url_album_api(album_id):
|
||||
return 'https://www.missevan.com/sound' \
|
||||
'/soundalllist?albumid=' + str(album_id)
|
||||
|
||||
@staticmethod
|
||||
def url_sound_api(sound_id):
|
||||
return 'https://www.missevan.com/sound' \
|
||||
'/getsound?soundid=' + str(sound_id)
|
||||
|
||||
@staticmethod
|
||||
def url_drama_api(drama_id):
|
||||
return 'https://www.missevan.com/dramaapi' \
|
||||
'/getdrama?drama_id=' + str(drama_id)
|
||||
|
||||
@staticmethod
|
||||
def url_danmaku_api(sound_id):
|
||||
return 'https://www.missevan.com/sound/getdm?soundid=' + str(sound_id)
|
||||
|
||||
@staticmethod
|
||||
def url_resource(uri):
|
||||
return 'https://static.missevan.com/' + uri
|
||||
|
||||
site = MissEvan()
|
||||
site_info = 'MissEvan.com'
|
||||
download = site.download_by_url
|
||||
download_playlist = site.download_playlist_by_url
|
@ -1,38 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
from ..common import *
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
import json
|
||||
|
||||
class MusicPlayOn(VideoExtractor):
|
||||
name = "MusicPlayOn"
|
||||
|
||||
stream_types = [
|
||||
{'id': '720p HD'},
|
||||
{'id': '360p SD'},
|
||||
]
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
content = get_content(self.url)
|
||||
|
||||
self.title = match1(content,
|
||||
r'setup\[\'title\'\] = "([^"]+)";')
|
||||
|
||||
for s in self.stream_types:
|
||||
quality = s['id']
|
||||
src = match1(content,
|
||||
r'src: "([^"]+)", "data-res": "%s"' % quality)
|
||||
if src is not None:
|
||||
url = 'http://en.musicplayon.com%s' % src
|
||||
self.streams[quality] = {'url': url}
|
||||
|
||||
def extract(self, **kwargs):
|
||||
for i in self.streams:
|
||||
s = self.streams[i]
|
||||
_, s['container'], s['size'] = url_info(s['url'])
|
||||
s['src'] = [s['url']]
|
||||
|
||||
site = MusicPlayOn()
|
||||
download = site.download_by_url
|
||||
# TBD: implement download_playlist
|
@ -17,6 +17,10 @@ def nanagogo_download(url, output_dir='.', merge=True, info_only=False, **kwargs
|
||||
info = json.loads(get_content(api_url))
|
||||
|
||||
items = []
|
||||
if info['data']['posts']['post'] is None:
|
||||
return
|
||||
if info['data']['posts']['post']['body'] is None:
|
||||
return
|
||||
for i in info['data']['posts']['post']['body']:
|
||||
if 'image' in i:
|
||||
image_url = i['image']
|
||||
|
@ -1,48 +1,40 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['naver_download']
|
||||
import urllib.request, urllib.parse
|
||||
from ..common import *
|
||||
import urllib.request
|
||||
import urllib.parse
|
||||
import json
|
||||
import re
|
||||
|
||||
def naver_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
from ..util import log
|
||||
from ..common import get_content, download_urls, print_info, playlist_not_supported, url_size
|
||||
from .universal import *
|
||||
|
||||
assert re.search(r'http://tvcast.naver.com/v/', url), "URL is not supported"
|
||||
__all__ = ['naver_download_by_url']
|
||||
|
||||
html = get_html(url)
|
||||
contentid = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',html)
|
||||
videoid = contentid.group(1)
|
||||
inkey = contentid.group(2)
|
||||
assert videoid
|
||||
assert inkey
|
||||
info_key = urllib.parse.urlencode({'vid': videoid, 'inKey': inkey, })
|
||||
down_key = urllib.parse.urlencode({'masterVid': videoid,'protocol': 'p2p','inKey': inkey, })
|
||||
inf_xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?%s' % info_key )
|
||||
|
||||
from xml.dom.minidom import parseString
|
||||
doc_info = parseString(inf_xml)
|
||||
Subject = doc_info.getElementsByTagName('Subject')[0].firstChild
|
||||
title = Subject.data
|
||||
assert title
|
||||
|
||||
xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?%s' % down_key )
|
||||
doc = parseString(xml)
|
||||
|
||||
encodingoptions = doc.getElementsByTagName('EncodingOption')
|
||||
old_height = doc.getElementsByTagName('height')[0]
|
||||
real_url= ''
|
||||
#to download the highest resolution one,
|
||||
for node in encodingoptions:
|
||||
new_height = node.getElementsByTagName('height')[0]
|
||||
domain_node = node.getElementsByTagName('Domain')[0]
|
||||
uri_node = node.getElementsByTagName('uri')[0]
|
||||
if int(new_height.firstChild.data) > int (old_height.firstChild.data):
|
||||
real_url= domain_node.firstChild.data+ '/' +uri_node.firstChild.data
|
||||
|
||||
type, ext, size = url_info(real_url)
|
||||
print_info(site_info, title, type, size)
|
||||
def naver_download_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
ep = 'https://apis.naver.com/rmcnmv/rmcnmv/vod/play/v2.0/{}?key={}'
|
||||
page = get_content(url)
|
||||
try:
|
||||
vid = re.search(r"\"videoId\"\s*:\s*\"(.+?)\"", page).group(1)
|
||||
key = re.search(r"\"inKey\"\s*:\s*\"(.+?)\"", page).group(1)
|
||||
meta_str = get_content(ep.format(vid, key))
|
||||
meta_json = json.loads(meta_str)
|
||||
if 'errorCode' in meta_json:
|
||||
log.wtf(meta_json['errorCode'])
|
||||
title = meta_json['meta']['subject']
|
||||
videos = meta_json['videos']['list']
|
||||
video_list = sorted(videos, key=lambda video: video['encodingOption']['width'])
|
||||
video_url = video_list[-1]['source']
|
||||
# size = video_list[-1]['size']
|
||||
# result wrong size
|
||||
size = url_size(video_url)
|
||||
print_info(site_info, title, 'mp4', size)
|
||||
if not info_only:
|
||||
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
||||
download_urls([video_url], title, 'mp4', size, output_dir, **kwargs)
|
||||
except:
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
site_info = "tvcast.naver.com"
|
||||
download = naver_download
|
||||
site_info = "naver.com"
|
||||
download = naver_download_by_url
|
||||
download_playlist = playlist_not_supported('naver')
|
||||
|
@ -22,14 +22,14 @@ def netease_hymn():
|
||||
"""
|
||||
|
||||
def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
rid = match1(url, r'id=(.*)')
|
||||
rid = match1(url, r'\Wid=(.*)')
|
||||
if rid is None:
|
||||
rid = match1(url, r'/(\d+)/?$')
|
||||
rid = match1(url, r'/(\d+)/?')
|
||||
if "album" in url:
|
||||
j = loads(get_content("http://music.163.com/api/album/%s?id=%s&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
|
||||
|
||||
artist_name = j['album']['artists'][0]['name']
|
||||
album_name = j['album']['name']
|
||||
album_name = j['album']['name'].strip()
|
||||
new_dir = output_dir + '/' + fs.legitimize("%s - %s" % (artist_name, album_name))
|
||||
if not info_only:
|
||||
if not os.path.exists(new_dir):
|
||||
@ -55,12 +55,14 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
|
||||
cover_url = j['result']['coverImgUrl']
|
||||
download_urls([cover_url], "cover", "jpg", 0, new_dir)
|
||||
|
||||
for i in j['result']['tracks']:
|
||||
netease_song_download(i, output_dir=new_dir, info_only=info_only)
|
||||
prefix_width = len(str(len(j['result']['tracks'])))
|
||||
for n, i in enumerate(j['result']['tracks']):
|
||||
playlist_prefix = '%%.%dd_' % prefix_width % n
|
||||
netease_song_download(i, output_dir=new_dir, info_only=info_only, playlist_prefix=playlist_prefix)
|
||||
try: # download lyrics
|
||||
assert kwargs['caption']
|
||||
l = loads(get_content("http://music.163.com/api/song/lyric/?id=%s&lv=-1&csrf_token=" % i['id'], headers={"Referer": "http://music.163.com/"}))
|
||||
netease_lyric_download(i, l["lrc"]["lyric"], output_dir=new_dir, info_only=info_only)
|
||||
netease_lyric_download(i, l["lrc"]["lyric"], output_dir=new_dir, info_only=info_only, playlist_prefix=playlist_prefix)
|
||||
except: pass
|
||||
|
||||
elif "song" in url:
|
||||
@ -85,10 +87,10 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
|
||||
j = loads(get_content("http://music.163.com/api/mv/detail/?id=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
|
||||
netease_video_download(j['data'], output_dir=output_dir, info_only=info_only)
|
||||
|
||||
def netease_lyric_download(song, lyric, output_dir='.', info_only=False):
|
||||
def netease_lyric_download(song, lyric, output_dir='.', info_only=False, playlist_prefix=""):
|
||||
if info_only: return
|
||||
|
||||
title = "%s. %s" % (song['position'], song['name'])
|
||||
title = "%s%s. %s" % (playlist_prefix, song['position'], song['name'])
|
||||
filename = '%s.lrc' % get_filename(title)
|
||||
print('Saving %s ...' % filename, end="", flush=True)
|
||||
with open(os.path.join(output_dir, filename),
|
||||
@ -103,8 +105,11 @@ def netease_video_download(vinfo, output_dir='.', info_only=False):
|
||||
netease_download_common(title, url_best,
|
||||
output_dir=output_dir, info_only=info_only)
|
||||
|
||||
def netease_song_download(song, output_dir='.', info_only=False):
|
||||
title = "%s. %s" % (song['position'], song['name'])
|
||||
def netease_song_download(song, output_dir='.', info_only=False, playlist_prefix=""):
|
||||
title = "%s%s. %s" % (playlist_prefix, song['position'], song['name'])
|
||||
url_best = "http://music.163.com/song/media/outer/url?id=" + \
|
||||
str(song['id']) + ".mp3"
|
||||
'''
|
||||
songNet = 'p' + song['mp3Url'].split('/')[2][1:]
|
||||
|
||||
if 'hMusic' in song and song['hMusic'] != None:
|
||||
@ -113,15 +118,15 @@ def netease_song_download(song, output_dir='.', info_only=False):
|
||||
url_best = song['mp3Url']
|
||||
elif 'bMusic' in song:
|
||||
url_best = make_url(songNet, song['bMusic']['dfsId'])
|
||||
|
||||
'''
|
||||
netease_download_common(title, url_best,
|
||||
output_dir=output_dir, info_only=info_only)
|
||||
|
||||
def netease_download_common(title, url_best, output_dir, info_only):
|
||||
songtype, ext, size = url_info(url_best)
|
||||
songtype, ext, size = url_info(url_best, faker=True)
|
||||
print_info(site_info, title, songtype, size)
|
||||
if not info_only:
|
||||
download_urls([url_best], title, ext, size, output_dir)
|
||||
download_urls([url_best], title, ext, size, output_dir, faker=True)
|
||||
|
||||
|
||||
def netease_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
|
@ -31,10 +31,11 @@ context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
|
||||
nicovideo_login(user, password)
|
||||
|
||||
html = get_html(url) # necessary!
|
||||
title = unicodize(r1(r'<span class="videoHeaderTitle"[^>]*>([^<]+)</span>', html))
|
||||
title = r1(r'<title>(.+?)</title>', html)
|
||||
#title = unicodize(r1(r'<span class="videoHeaderTitle"[^>]*>([^<]+)</span>', html))
|
||||
|
||||
vid = url.split('/')[-1].split('?')[0]
|
||||
api_html = get_html('http://www.nicovideo.jp/api/getflv?v=%s' % vid)
|
||||
api_html = get_html('http://flapi.nicovideo.jp/api/getflv?v=%s' % vid)
|
||||
real_url = parse.unquote(r1(r'url=([^&]+)&', api_html))
|
||||
|
||||
type, ext, size = url_info(real_url)
|
||||
|
@ -1,33 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['panda_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
import time
|
||||
|
||||
def panda_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
roomid = url[url.rfind('/')+1:]
|
||||
json_request_url = 'http://www.panda.tv/api_room?roomid={}&pub_key=&_={}'.format(roomid, int(time.time()))
|
||||
content = get_html(json_request_url)
|
||||
errno = json.loads(content)['errno']
|
||||
errmsg = json.loads(content)['errmsg']
|
||||
if errno:
|
||||
raise ValueError("Errno : {}, Errmsg : {}".format(errno, errmsg))
|
||||
|
||||
data = json.loads(content)['data']
|
||||
title = data.get('roominfo')['name']
|
||||
room_key = data.get('videoinfo')['room_key']
|
||||
plflag = data.get('videoinfo')['plflag'].split('_')
|
||||
status = data.get('videoinfo')['status']
|
||||
if status is not "2":
|
||||
raise ValueError("The live stream is not online! (status:%s)" % status)
|
||||
real_url = 'http://pl{}.live.panda.tv/live_panda/{}.flv'.format(plflag[1],room_key)
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
|
||||
|
||||
site_info = "panda.tv"
|
||||
download = panda_download
|
||||
download_playlist = playlist_not_supported('panda')
|
@ -1,154 +1,229 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['pptv_download', 'pptv_download_by_id']
|
||||
#__all__ = ['pptv_download', 'pptv_download_by_id']
|
||||
|
||||
from ..common import *
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
import re
|
||||
import time
|
||||
import urllib
|
||||
from random import random
|
||||
import random
|
||||
import binascii
|
||||
from xml.dom.minidom import parseString
|
||||
|
||||
|
||||
def constructKey(arg):
|
||||
def lshift(a, b):
|
||||
return (a << b) & 0xffffffff
|
||||
def rshift(a, b):
|
||||
if a >= 0:
|
||||
return a >> b
|
||||
return (0x100000000 + a) >> b
|
||||
|
||||
def str2hex(s):
|
||||
r=""
|
||||
for i in s[:8]:
|
||||
t=hex(ord(i))[2:]
|
||||
if len(t)==1:
|
||||
t="0"+t
|
||||
r+=t
|
||||
for i in range(16):
|
||||
r+=hex(int(15*random()))[2:]
|
||||
return r
|
||||
def le32_pack(b_str):
|
||||
result = 0
|
||||
result |= b_str[0]
|
||||
result |= (b_str[1] << 8)
|
||||
result |= (b_str[2] << 16)
|
||||
result |= (b_str[3] << 24)
|
||||
return result
|
||||
|
||||
#ABANDONED Because SERVER_KEY is static
|
||||
def getkey(s):
|
||||
#returns 1896220160
|
||||
l2=[i for i in s]
|
||||
l4=0
|
||||
l3=0
|
||||
while l4<len(l2):
|
||||
l5=l2[l4]
|
||||
l6=ord(l5)
|
||||
l7=l6<<((l4%4)*8)
|
||||
l3=l3^l7
|
||||
l4+=1
|
||||
return l3
|
||||
pass
|
||||
def tea_core(data, key_seg):
|
||||
delta = 2654435769
|
||||
|
||||
def rot(k,b): ##>>> in as3
|
||||
if k>=0:
|
||||
return k>>b
|
||||
elif k<0:
|
||||
return (2**32+k)>>b
|
||||
pass
|
||||
d0 = le32_pack(data[:4])
|
||||
d1 = le32_pack(data[4:8])
|
||||
|
||||
def lot(k,b):
|
||||
return (k<<b)%(2**32)
|
||||
sum_ = 0
|
||||
for rnd in range(32):
|
||||
sum_ = (sum_ + delta) & 0xffffffff
|
||||
p1 = (lshift(d1, 4) + key_seg[0]) & 0xffffffff
|
||||
p2 = (d1 + sum_) & 0xffffffff
|
||||
p3 = (rshift(d1, 5) + key_seg[1]) & 0xffffffff
|
||||
|
||||
#WTF?
|
||||
def encrypt(arg1,arg2):
|
||||
delta=2654435769
|
||||
l3=16;
|
||||
l4=getkey(arg2) #1896220160
|
||||
l8=[i for i in arg1]
|
||||
l10=l4;
|
||||
l9=[i for i in arg2]
|
||||
l5=lot(l10,8)|rot(l10,24)#101056625
|
||||
# assert l5==101056625
|
||||
l6=lot(l10,16)|rot(l10,16)#100692230
|
||||
# assert 100692230==l6
|
||||
l7=lot(l10,24)|rot(l10,8)
|
||||
# assert 7407110==l7
|
||||
l11=""
|
||||
l12=0
|
||||
l13=ord(l8[l12])<<0
|
||||
l14=ord(l8[l12+1])<<8
|
||||
l15=ord(l8[l12+2])<<16
|
||||
l16=ord(l8[l12+3])<<24
|
||||
l17=ord(l8[l12+4])<<0
|
||||
l18=ord(l8[l12+5])<<8
|
||||
l19=ord(l8[l12+6])<<16
|
||||
l20=ord(l8[l12+7])<<24
|
||||
mid_p = p1 ^ p2 ^ p3
|
||||
d0 = (d0 + mid_p) & 0xffffffff
|
||||
|
||||
l21=(((0|l13)|l14)|l15)|l16
|
||||
l22=(((0|l17)|l18)|l19)|l20
|
||||
p4 = (lshift(d0, 4) + key_seg[2]) & 0xffffffff
|
||||
p5 = (d0 + sum_) & 0xffffffff
|
||||
p6 = (rshift(d0, 5) + key_seg[3]) & 0xffffffff
|
||||
|
||||
l23=0
|
||||
l24=0
|
||||
while l24<32:
|
||||
l23=(l23+delta)%(2**32)
|
||||
l33=(lot(l22,4)+l4)%(2**32)
|
||||
l34=(l22+l23)%(2**32)
|
||||
l35=(rot(l22,5)+l5)%(2**32)
|
||||
l36=(l33^l34)^l35
|
||||
l21=(l21+l36)%(2**32)
|
||||
l37=(lot(l21,4)+l6)%(2**32)
|
||||
l38=(l21+l23)%(2**32)
|
||||
l39=(rot(l21,5))%(2**32)
|
||||
l40=(l39+l7)%(2**32)
|
||||
l41=((l37^l38)%(2**32)^l40)%(2**32)
|
||||
l22=(l22+l41)%(2**32)
|
||||
mid_p = p4 ^ p5 ^ p6
|
||||
d1 = (d1 + mid_p) & 0xffffffff
|
||||
|
||||
l24+=1
|
||||
return bytes(unpack_le32(d0) + unpack_le32(d1))
|
||||
|
||||
l11+=chr(rot(l21,0)&0xff)
|
||||
l11+=chr(rot(l21,8)&0xff)
|
||||
l11+=chr(rot(l21,16)&0xff)
|
||||
l11+=chr(rot(l21,24)&0xff)
|
||||
l11+=chr(rot(l22,0)&0xff)
|
||||
l11+=chr(rot(l22,8)&0xff)
|
||||
l11+=chr(rot(l22,16)&0xff)
|
||||
l11+=chr(rot(l22,24)&0xff)
|
||||
def ran_hex(size):
|
||||
result = []
|
||||
for i in range(size):
|
||||
result.append(hex(int(15 * random.random()))[2:])
|
||||
return ''.join(result)
|
||||
|
||||
return l11
|
||||
def zpad(b_str, size):
|
||||
size_diff = size - len(b_str)
|
||||
return b_str + bytes(size_diff)
|
||||
|
||||
def gen_key(t):
|
||||
key_seg = [1896220160,101056625, 100692230, 7407110]
|
||||
t_s = hex(int(t))[2:].encode('utf8')
|
||||
input_data = zpad(t_s, 16)
|
||||
out = tea_core(input_data, key_seg)
|
||||
return binascii.hexlify(out[:8]).decode('utf8') + ran_hex(16)
|
||||
|
||||
def unpack_le32(i32):
|
||||
result = []
|
||||
result.append(i32 & 0xff)
|
||||
i32 = rshift(i32, 8)
|
||||
result.append(i32 & 0xff)
|
||||
i32 = rshift(i32, 8)
|
||||
result.append(i32 & 0xff)
|
||||
i32 = rshift(i32, 8)
|
||||
result.append(i32 & 0xff)
|
||||
return result
|
||||
|
||||
def get_elem(elem, tag):
|
||||
return elem.getElementsByTagName(tag)
|
||||
|
||||
def get_attr(elem, attr):
|
||||
return elem.getAttribute(attr)
|
||||
|
||||
def get_text(elem):
|
||||
return elem.firstChild.nodeValue
|
||||
|
||||
def shift_time(time_str):
|
||||
ts = time_str[:-4]
|
||||
return time.mktime(time.strptime(ts)) - 60
|
||||
|
||||
def parse_pptv_xml(dom):
|
||||
channel = get_elem(dom, 'channel')[0]
|
||||
title = get_attr(channel, 'nm')
|
||||
file_list = get_elem(channel, 'file')[0]
|
||||
item_list = get_elem(file_list, 'item')
|
||||
streams_cnt = len(item_list)
|
||||
item_mlist = []
|
||||
for item in item_list:
|
||||
rid = get_attr(item, 'rid')
|
||||
file_type = get_attr(item, 'ft')
|
||||
size = get_attr(item, 'filesize')
|
||||
width = get_attr(item, 'width')
|
||||
height = get_attr(item, 'height')
|
||||
bitrate = get_attr(item, 'bitrate')
|
||||
res = '{}x{}@{}kbps'.format(width, height, bitrate)
|
||||
item_meta = (file_type, rid, size, res)
|
||||
item_mlist.append(item_meta)
|
||||
|
||||
dt_list = get_elem(dom, 'dt')
|
||||
dragdata_list = get_elem(dom, 'dragdata')
|
||||
|
||||
stream_mlist = []
|
||||
for dt in dt_list:
|
||||
file_type = get_attr(dt, 'ft')
|
||||
serv_time = get_text(get_elem(dt, 'st')[0])
|
||||
expr_time = get_text(get_elem(dt, 'key')[0])
|
||||
serv_addr = get_text(get_elem(dt, 'sh')[0])
|
||||
stream_meta = (file_type, serv_addr, expr_time, serv_time)
|
||||
stream_mlist.append(stream_meta)
|
||||
|
||||
segs_mlist = []
|
||||
for dd in dragdata_list:
|
||||
file_type = get_attr(dd, 'ft')
|
||||
seg_list = get_elem(dd, 'sgm')
|
||||
segs = []
|
||||
segs_size = []
|
||||
for seg in seg_list:
|
||||
rid = get_attr(seg, 'rid')
|
||||
size = get_attr(seg, 'fs')
|
||||
segs.append(rid)
|
||||
segs_size.append(size)
|
||||
segs_meta = (file_type, segs, segs_size)
|
||||
segs_mlist.append(segs_meta)
|
||||
return title, item_mlist, stream_mlist, segs_mlist
|
||||
|
||||
#mergs 3 meta_data
|
||||
def merge_meta(item_mlist, stream_mlist, segs_mlist):
|
||||
streams = {}
|
||||
for i in range(len(segs_mlist)):
|
||||
streams[str(i)] = {}
|
||||
|
||||
for item in item_mlist:
|
||||
stream = streams[item[0]]
|
||||
stream['rid'] = item[1]
|
||||
stream['size'] = item[2]
|
||||
stream['res'] = item[3]
|
||||
|
||||
for s in stream_mlist:
|
||||
stream = streams[s[0]]
|
||||
stream['serv_addr'] = s[1]
|
||||
stream['expr_time'] = s[2]
|
||||
stream['serv_time'] = s[3]
|
||||
|
||||
for seg in segs_mlist:
|
||||
stream = streams[seg[0]]
|
||||
stream['segs'] = seg[1]
|
||||
stream['segs_size'] = seg[2]
|
||||
|
||||
return streams
|
||||
|
||||
|
||||
loc1=hex(int(arg))[2:]+(16-len(hex(int(arg))[2:]))*"\x00"
|
||||
SERVER_KEY="qqqqqww"+"\x00"*9
|
||||
res=encrypt(loc1,SERVER_KEY)
|
||||
return str2hex(res)
|
||||
def make_url(stream):
|
||||
host = stream['serv_addr']
|
||||
rid = stream['rid']
|
||||
key = gen_key(shift_time(stream['serv_time']))
|
||||
key_expr = stream['expr_time']
|
||||
|
||||
src = []
|
||||
for i, seg in enumerate(stream['segs']):
|
||||
url = 'http://{}/{}/{}?key={}&k={}'.format(host, i, rid, key, key_expr)
|
||||
url += '&type=web.fpp'
|
||||
src.append(url)
|
||||
return src
|
||||
|
||||
def pptv_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
|
||||
xml = get_html('http://web-play.pptv.com/webplay3-0-%s.xml?type=web.fpp' % id)
|
||||
#vt=3 means vod mode vt=5 means live mode
|
||||
host = r1(r'<sh>([^<>]+)</sh>', xml)
|
||||
k = r1(r'<key expire=[^<>]+>([^<>]+)</key>', xml)
|
||||
rid = r1(r'rid="([^"]+)"', xml)
|
||||
title = r1(r'nm="([^"]+)"', xml)
|
||||
class PPTV(VideoExtractor):
|
||||
name = 'PPTV'
|
||||
stream_types = [
|
||||
{'itag': '4'},
|
||||
{'itag': '3'},
|
||||
{'itag': '2'},
|
||||
{'itag': '1'},
|
||||
{'itag': '0'},
|
||||
]
|
||||
|
||||
st=r1(r'<st>([^<>]+)</st>',xml)[:-4]
|
||||
st=time.mktime(time.strptime(st))*1000-60*1000-time.time()*1000
|
||||
st+=time.time()*1000
|
||||
st=st/1000
|
||||
def prepare(self, **kwargs):
|
||||
headers = {
|
||||
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/69.0.3497.100 Safari/537.36"
|
||||
}
|
||||
self.vid = match1(self.url, r'https?://sports.pptv.com/vod/(\d+)/*')
|
||||
if self.url and not self.vid:
|
||||
if not re.match(r'https?://v.pptv.com/show/(\w+)\.html', self.url):
|
||||
raise('Unknown url pattern')
|
||||
page_content = get_content(self.url, headers)
|
||||
|
||||
key=constructKey(st)
|
||||
self.vid = match1(page_content, r'webcfg\s*=\s*{"id":\s*(\d+)')
|
||||
if not self.vid:
|
||||
request = urllib.request.Request(self.url, headers=headers)
|
||||
response = urllib.request.urlopen(request)
|
||||
self.vid = match1(response.url, r'https?://sports.pptv.com/vod/(\d+)/*')
|
||||
|
||||
pieces = re.findall('<sgm no="(\d+)"[^<>]+fs="(\d+)"', xml)
|
||||
numbers, fs = zip(*pieces)
|
||||
urls=["http://{}/{}/{}?key={}&fpp.ver=1.3.0.4&k={}&type=web.fpp".format(host,i,rid,key,k) for i in range(max(map(int,numbers))+1)]
|
||||
if not self.vid:
|
||||
raise('Cannot find id')
|
||||
api_url = 'http://web-play.pptv.com/webplay3-0-{}.xml'.format(self.vid)
|
||||
api_url += '?type=web.fpp¶m=type=web.fpp&version=4'
|
||||
dom = parseString(get_content(api_url, headers))
|
||||
self.title, m_items, m_streams, m_segs = parse_pptv_xml(dom)
|
||||
xml_streams = merge_meta(m_items, m_streams, m_segs)
|
||||
for stream_id in xml_streams:
|
||||
stream_data = xml_streams[stream_id]
|
||||
src = make_url(stream_data)
|
||||
self.streams[stream_id] = {
|
||||
'container': 'mp4',
|
||||
'video_profile': stream_data['res'],
|
||||
'size': int(stream_data['size']),
|
||||
'src': src
|
||||
}
|
||||
|
||||
total_size = sum(map(int, fs))
|
||||
assert rid.endswith('.mp4')
|
||||
print_info(site_info, title, 'mp4', total_size)
|
||||
|
||||
if not info_only:
|
||||
try:
|
||||
download_urls(urls, title, 'mp4', total_size, output_dir = output_dir, merge = merge)
|
||||
except urllib.error.HTTPError:
|
||||
#for key expired
|
||||
pptv_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||
|
||||
def pptv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
assert re.match(r'http://v.pptv.com/show/(\w+)\.html$', url)
|
||||
html = get_html(url)
|
||||
id = r1(r'webcfg\s*=\s*{"id":\s*(\d+)', html)
|
||||
assert id
|
||||
pptv_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||
|
||||
site_info = "PPTV.com"
|
||||
download = pptv_download
|
||||
site = PPTV()
|
||||
#site_info = "PPTV.com"
|
||||
#download = pptv_download
|
||||
download = site.download_by_url
|
||||
download_playlist = playlist_not_supported('pptv')
|
||||
|
@ -1,40 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['qianmo_download']
|
||||
|
||||
from ..common import *
|
||||
import urllib.error
|
||||
import json
|
||||
|
||||
def qianmo_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
if re.match(r'http://qianmo.com/\w+', url):
|
||||
html = get_html(url)
|
||||
match = re.search(r'(.+?)var video =(.+?);', html)
|
||||
|
||||
if match:
|
||||
video_info_json = json.loads(match.group(2))
|
||||
title = video_info_json['title']
|
||||
ext_video_id = video_info_json['ext_video_id']
|
||||
|
||||
html = get_content('http://v.qianmo.com/player/{ext_video_id}'.format(ext_video_id = ext_video_id))
|
||||
c = json.loads(html)
|
||||
url_list = []
|
||||
for i in c['seg']: #Cannot do list comprehensions
|
||||
for a in c['seg'][i]:
|
||||
for b in a['url']:
|
||||
url_list.append(b[0])
|
||||
|
||||
type_ = ''
|
||||
size = 0
|
||||
for url in url_list:
|
||||
_, type_, temp = url_info(url)
|
||||
size += temp
|
||||
|
||||
type, ext, size = url_info(url)
|
||||
print_info(site_info, title, type_, size)
|
||||
if not info_only:
|
||||
download_urls(url_list, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
site_info = "qianmo"
|
||||
download = qianmo_download
|
||||
download_playlist = playlist_not_supported('qianmo')
|
@ -3,6 +3,7 @@
|
||||
|
||||
from ..common import *
|
||||
from ..extractor import VideoExtractor
|
||||
from ..util.log import *
|
||||
|
||||
from json import loads
|
||||
|
||||
@ -19,13 +20,32 @@ class QiE(VideoExtractor):
|
||||
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
|
||||
|
||||
api_endpoint = 'http://www.qie.tv/api/v1/room/{room_id}'
|
||||
game_ep = 'http://live.qq.com/game/game_details/get_game_details_info/'
|
||||
|
||||
@staticmethod
|
||||
def get_vid_from_url(url):
|
||||
def get_room_id_from_url(self, match_id):
|
||||
meta = json.loads(get_content(self.game_ep + str(match_id)))
|
||||
if meta['error'] != 0:
|
||||
log.wtf('Error happens when accessing game_details api')
|
||||
rooms = meta['data']['anchor_data']
|
||||
for room in rooms:
|
||||
if room['is_use_room']:
|
||||
return room['room_id']
|
||||
log.wtf('No room available for match {}'.format(match_id))
|
||||
|
||||
def get_vid_from_url(self, url):
|
||||
"""Extracts video ID from live.qq.com.
|
||||
"""
|
||||
hit = re.search(r'live.qq.com/(\d+)', url)
|
||||
if hit is not None:
|
||||
return hit.group(1)
|
||||
hit = re.search(r'live.qq.com/directory/match/(\d+)', url)
|
||||
if hit is not None:
|
||||
return self.get_room_id_from_url(hit.group(1))
|
||||
html = get_content(url)
|
||||
return match1(html, r'room_id\":(\d+)')
|
||||
room_id = match1(html, r'room_id\":(\d+)')
|
||||
if room_id is None:
|
||||
log.wtf('Unknown page {}'.format(url))
|
||||
return room_id
|
||||
|
||||
def download_playlist_by_url(self, url, **kwargs):
|
||||
pass
|
||||
@ -38,7 +58,7 @@ class QiE(VideoExtractor):
|
||||
content = loads(content)
|
||||
self.title = content['data']['room_name']
|
||||
rtmp_url = content['data']['rtmp_url']
|
||||
#stream_avalable = [i['name'] for i in content['data']['stream']]
|
||||
#stream_available = [i['name'] for i in content['data']['stream']]
|
||||
stream_available = {}
|
||||
stream_available['normal'] = rtmp_url + '/' + content['data']['rtmp_live']
|
||||
if len(content['data']['rtmp_multi_bitrate']) > 0:
|
||||
|
77
src/you_get/extractors/qie_video.py
Normal file
77
src/you_get/extractors/qie_video.py
Normal file
@ -0,0 +1,77 @@
|
||||
from ..common import *
|
||||
from ..extractor import VideoExtractor
|
||||
from ..util.log import *
|
||||
|
||||
import json
|
||||
import math
|
||||
|
||||
class QieVideo(VideoExtractor):
|
||||
name = 'QiE Video'
|
||||
vid_patt = r'"stream_name":"(\d+)"'
|
||||
title_patt = r'"title":"([^\"]+)"'
|
||||
cdn = 'http://qietv-play.wcs.8686c.com/'
|
||||
ep = 'http://api.qiecdn.com/api/v1/video/stream/{}'
|
||||
stream_types = [
|
||||
{'id':'1080p', 'video_profile':'1920x1080', 'container':'m3u8'},
|
||||
{'id':'720p', 'video_profile':'1280x720', 'container':'m3u8'},
|
||||
{'id':'480p', 'video_profile':'853x480', 'container':'m3u8'}
|
||||
]
|
||||
|
||||
def get_vid_from_url(self):
|
||||
hit = re.search(self.__class__.vid_patt, self.page)
|
||||
if hit is None:
|
||||
log.wtf('Cannot get stream_id')
|
||||
return hit.group(1)
|
||||
|
||||
def get_title(self):
|
||||
hit = re.search(self.__class__.title_patt, self.page)
|
||||
if hit is None:
|
||||
return self.vid
|
||||
return hit.group(1).strip()
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
self.page = get_content(self.url)
|
||||
if self.vid is None:
|
||||
self.vid = self.get_vid_from_url()
|
||||
self.title = self.get_title()
|
||||
meta = json.loads(get_content(self.__class__.ep.format(self.vid)))
|
||||
if meta['code'] != 200:
|
||||
log.wtf(meta['message'])
|
||||
for video in meta['result']['videos']:
|
||||
height = video['height']
|
||||
url = self.__class__.cdn + video['key']
|
||||
stream_meta = dict(m3u8_url=url, size=0, container='m3u8')
|
||||
video_profile = '{}x{}'.format(video['width'], video['height'])
|
||||
stream_meta['video_profile'] = video_profile
|
||||
for stream_type in self.__class__.stream_types:
|
||||
if height // 10 == int(stream_type['id'][:-1]) // 10:
|
||||
# width 481, 482... 489 are all 480p here
|
||||
stream_id = stream_type['id']
|
||||
self.streams[stream_id] = stream_meta
|
||||
|
||||
def extract(self, **kwargs):
|
||||
for stream_id in self.streams:
|
||||
self.streams[stream_id]['src'], dur = general_m3u8_extractor(self.streams[stream_id]['m3u8_url'])
|
||||
self.streams[stream_id]['video_profile'] += ', Duration: {}s'.format(math.floor(dur))
|
||||
|
||||
def general_m3u8_extractor(url):
|
||||
dur = 0
|
||||
base_url = url[:url.rfind('/')]
|
||||
m3u8_content = get_content(url).split('\n')
|
||||
result = []
|
||||
for line in m3u8_content:
|
||||
trimmed = line.strip()
|
||||
if len(trimmed) > 0:
|
||||
if trimmed.startswith('#'):
|
||||
if trimmed.startswith('#EXTINF'):
|
||||
t_str = re.search(r'(\d+\.\d+)', trimmed).group(1)
|
||||
dur += float(t_str)
|
||||
else:
|
||||
if trimmed.startswith('http'):
|
||||
result.append(trimmed)
|
||||
else:
|
||||
result.append(base_url + '/' + trimmed)
|
||||
return result, dur
|
||||
|
||||
site = QieVideo()
|
||||
download_by_url = site.download_by_url
|
50
src/you_get/extractors/qingting.py
Normal file
50
src/you_get/extractors/qingting.py
Normal file
@ -0,0 +1,50 @@
|
||||
import json
|
||||
import re
|
||||
|
||||
from ..common import get_content, playlist_not_supported, url_size
|
||||
from ..extractors import VideoExtractor
|
||||
from ..util import log
|
||||
|
||||
__all__ = ['qingting_download_by_url']
|
||||
|
||||
|
||||
class Qingting(VideoExtractor):
|
||||
# every resource is described by its channel id and program id
|
||||
# so vid is tuple (chaanel_id, program_id)
|
||||
|
||||
name = 'Qingting'
|
||||
stream_types = [
|
||||
{'id': '_default'}
|
||||
]
|
||||
|
||||
ep = 'http://i.qingting.fm/wapi/channels/{}/programs/{}'
|
||||
file_host = 'http://od.qingting.fm/{}'
|
||||
mobile_pt = r'channels\/(\d+)\/programs/(\d+)'
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
if self.vid is None:
|
||||
hit = re.search(self.__class__.mobile_pt, self.url)
|
||||
self.vid = (hit.group(1), hit.group(2))
|
||||
|
||||
ep_url = self.__class__.ep.format(self.vid[0], self.vid[1])
|
||||
meta = json.loads(get_content(ep_url))
|
||||
|
||||
if meta['code'] != 0:
|
||||
log.wtf(meta['message']['errormsg'])
|
||||
|
||||
file_path = self.__class__.file_host.format(meta['data']['file_path'])
|
||||
self.title = meta['data']['name']
|
||||
duration = str(meta['data']['duration']) + 's'
|
||||
|
||||
self.streams['_default'] = {'src': [file_path], 'video_profile': duration, 'container': 'm4a'}
|
||||
|
||||
def extract(self, **kwargs):
|
||||
self.streams['_default']['size'] = url_size(self.streams['_default']['src'][0])
|
||||
|
||||
|
||||
def qingting_download_by_url(url, **kwargs):
|
||||
Qingting().download_by_url(url, **kwargs)
|
||||
|
||||
site_info = 'Qingting'
|
||||
download = qingting_download_by_url
|
||||
download_playlist = playlist_not_supported('Qingting')
|
@ -2,89 +2,151 @@
|
||||
|
||||
__all__ = ['qq_download']
|
||||
|
||||
from ..common import *
|
||||
from .qie import download as qieDownload
|
||||
from urllib.parse import urlparse,parse_qs
|
||||
from .qie_video import download_by_url as qie_video_download
|
||||
from ..common import *
|
||||
|
||||
headers = {
|
||||
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) QQLive/10275340/50192209 Chrome/43.0.2357.134 Safari/537.36 QBCore/3.43.561.202 QQBrowser/9.0.2524.400'
|
||||
}
|
||||
|
||||
|
||||
def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
|
||||
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3%2E2%2E19%2E333&platform=11&defnpayver=1&vid=' + vid
|
||||
info = get_html(info_api)
|
||||
|
||||
# http://v.sports.qq.com/#/cover/t0fqsm1y83r8v5j/a0026nvw5jr https://v.qq.com/x/cover/t0fqsm1y83r8v5j/a0026nvw5jr.html
|
||||
video_json = None
|
||||
platforms = [4100201, 11]
|
||||
for platform in platforms:
|
||||
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3.2.19.333&platform={}&defnpayver=1&defn=shd&vid={}'.format(platform, vid)
|
||||
info = get_content(info_api, headers)
|
||||
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
|
||||
parts_vid = video_json['vl']['vi'][0]['vid']
|
||||
parts_ti = video_json['vl']['vi'][0]['ti']
|
||||
parts_prefix = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
|
||||
parts_formats = video_json['fl']['fi']
|
||||
# find best quality
|
||||
# only looking for fhd(1080p) and shd(720p) here.
|
||||
# 480p usually come with a single file, will be downloaded as fallback.
|
||||
best_quality = ''
|
||||
for part_format in parts_formats:
|
||||
if part_format['name'] == 'fhd':
|
||||
best_quality = 'fhd'
|
||||
if not video_json.get('msg')=='cannot play outside':
|
||||
break
|
||||
fn_pre = video_json['vl']['vi'][0]['lnk']
|
||||
title = video_json['vl']['vi'][0]['ti']
|
||||
host = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
|
||||
seg_cnt = fc_cnt = video_json['vl']['vi'][0]['cl']['fc']
|
||||
|
||||
if part_format['name'] == 'shd':
|
||||
best_quality = 'shd'
|
||||
filename = video_json['vl']['vi'][0]['fn']
|
||||
if seg_cnt == 0:
|
||||
seg_cnt = 1
|
||||
else:
|
||||
fn_pre, magic_str, video_type = filename.split('.')
|
||||
|
||||
for part_format in parts_formats:
|
||||
if (not best_quality == '') and (not part_format['name'] == best_quality):
|
||||
continue
|
||||
part_format_id = part_format['id']
|
||||
part_format_sl = part_format['sl']
|
||||
if part_format_sl == 0:
|
||||
part_urls= []
|
||||
total_size = 0
|
||||
try:
|
||||
# For fhd(1080p), every part is about 100M and 6 minutes
|
||||
# try 100 parts here limited download longest single video of 10 hours.
|
||||
for part in range(1,100):
|
||||
filename = vid + '.p' + str(part_format_id % 1000) + '.' + str(part) + '.mp4'
|
||||
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format=%s&vid=%s&filename=%s" % (part_format_id, parts_vid, filename)
|
||||
#print(filename)
|
||||
#print(key_api)
|
||||
part_info = get_html(key_api)
|
||||
for part in range(1, seg_cnt+1):
|
||||
if fc_cnt == 0:
|
||||
# fix json parsing error
|
||||
# example:https://v.qq.com/x/page/w0674l9yrrh.html
|
||||
part_format_id = video_json['vl']['vi'][0]['cl']['keyid'].split('.')[-1]
|
||||
else:
|
||||
part_format_id = video_json['vl']['vi'][0]['cl']['ci'][part - 1]['keyid'].split('.')[1]
|
||||
filename = '.'.join([fn_pre, magic_str, str(part), video_type])
|
||||
|
||||
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format={}&vid={}&filename={}&appver=3.2.19.333".format(part_format_id, vid, filename)
|
||||
part_info = get_content(key_api, headers)
|
||||
key_json = json.loads(match1(part_info, r'QZOutputJson=(.*)')[:-1])
|
||||
#print(key_json)
|
||||
if key_json.get('key') is None:
|
||||
vkey = video_json['vl']['vi'][0]['fvkey']
|
||||
url = '{}{}?vkey={}'.format(video_json['vl']['vi'][0]['ul']['ui'][0]['url'], fn_pre + '.mp4', vkey)
|
||||
else:
|
||||
vkey = key_json['key']
|
||||
url = '%s/%s?vkey=%s' % (parts_prefix, filename, vkey)
|
||||
url = '{}{}?vkey={}'.format(host, filename, vkey)
|
||||
if not vkey:
|
||||
if part == 1:
|
||||
log.wtf(key_json['msg'])
|
||||
else:
|
||||
log.w(key_json['msg'])
|
||||
break
|
||||
if key_json.get('filename') is None:
|
||||
log.w(key_json['msg'])
|
||||
break
|
||||
|
||||
part_urls.append(url)
|
||||
_, ext, size = url_info(url, faker=True)
|
||||
_, ext, size = url_info(url)
|
||||
total_size += size
|
||||
except:
|
||||
pass
|
||||
print_info(site_info, parts_ti, ext, total_size)
|
||||
if not info_only:
|
||||
download_urls(part_urls, parts_ti, ext, total_size, output_dir=output_dir, merge=merge)
|
||||
else:
|
||||
fvkey = output_json['vl']['vi'][0]['fvkey']
|
||||
mp4 = output_json['vl']['vi'][0]['cl'].get('ci', None)
|
||||
if mp4:
|
||||
mp4 = mp4[0]['keyid'].replace('.10', '.p') + '.mp4'
|
||||
else:
|
||||
mp4 = output_json['vl']['vi'][0]['fn']
|
||||
url = '%s/%s?vkey=%s' % ( parts_prefix, mp4, fvkey )
|
||||
_, ext, size = url_info(url, faker=True)
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
print_info(site_info, title, ext, total_size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir=output_dir, merge=merge)
|
||||
download_urls(part_urls, title, ext, total_size, output_dir=output_dir, merge=merge)
|
||||
|
||||
def kg_qq_download_by_shareid(shareid, output_dir='.', info_only=False, caption=False):
|
||||
BASE_URL = 'http://cgi.kg.qq.com/fcgi-bin/kg_ugc_getdetail'
|
||||
params_str = '?dataType=jsonp&jsonp=callback&jsonpCallback=jsopgetsonginfo&v=4&outCharset=utf-8&shareid=' + shareid
|
||||
url = BASE_URL + params_str
|
||||
content = get_content(url, headers)
|
||||
json_str = content[len('jsonpcallback('):-1]
|
||||
json_data = json.loads(json_str)
|
||||
|
||||
playurl = json_data['data']['playurl']
|
||||
videourl = json_data['data']['playurl_video']
|
||||
real_url = playurl if playurl else videourl
|
||||
real_url = real_url.replace('\/', '/')
|
||||
|
||||
ksong_mid = json_data['data']['ksong_mid']
|
||||
lyric_url = 'http://cgi.kg.qq.com/fcgi-bin/fcg_lyric?jsonpCallback=jsopgetlrcdata&outCharset=utf-8&ksongmid=' + ksong_mid
|
||||
lyric_data = get_content(lyric_url)
|
||||
lyric_string = lyric_data[len('jsopgetlrcdata('):-1]
|
||||
lyric_json = json.loads(lyric_string)
|
||||
lyric = lyric_json['data']['lyric']
|
||||
|
||||
title = match1(lyric, r'\[ti:([^\]]*)\]')
|
||||
|
||||
type, ext, size = url_info(real_url)
|
||||
if not title:
|
||||
title = shareid
|
||||
|
||||
print_info('腾讯全民K歌', title, type, size)
|
||||
if not info_only:
|
||||
download_urls([real_url], title, ext, size, output_dir, merge=False)
|
||||
if caption:
|
||||
caption_filename = title + '.lrc'
|
||||
caption_path = output_dir + '/' + caption_filename
|
||||
with open(caption_path, 'w') as f:
|
||||
lrc_list = lyric.split('\r\n')
|
||||
for line in lrc_list:
|
||||
f.write(line)
|
||||
f.write('\n')
|
||||
|
||||
def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
""""""
|
||||
if 'live.qq.com' in url:
|
||||
qieDownload(url,output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
|
||||
if re.match(r'https?://(m\.)?egame.qq.com/', url):
|
||||
from . import qq_egame
|
||||
qq_egame.qq_egame_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
#do redirect
|
||||
if 'v.qq.com/page' in url:
|
||||
# for URLs like this:
|
||||
# http://v.qq.com/page/k/9/7/k0194pwgw97.html
|
||||
content = get_html(url)
|
||||
url = match1(content,r'window\.location\.href="(.*?)"')
|
||||
if 'kg.qq.com' in url or 'kg2.qq.com' in url:
|
||||
shareid = url.split('?s=')[-1]
|
||||
caption = kwargs['caption']
|
||||
kg_qq_download_by_shareid(shareid, output_dir=output_dir, info_only=info_only, caption=caption)
|
||||
return
|
||||
|
||||
if 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
|
||||
content = get_html(url)
|
||||
if 'live.qq.com' in url:
|
||||
if 'live.qq.com/video/v' in url:
|
||||
qie_video_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
else:
|
||||
qieDownload(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
return
|
||||
|
||||
if 'mp.weixin.qq.com/s' in url:
|
||||
content = get_content(url, headers)
|
||||
vids = matchall(content, [r'[?;]vid=(\w+)'])
|
||||
for vid in vids:
|
||||
qq_download_by_vid(vid, vid, output_dir, merge, info_only)
|
||||
return
|
||||
|
||||
if 'kuaibao.qq.com/s/' in url:
|
||||
# https://kuaibao.qq.com/s/20180521V0Z9MH00
|
||||
nid = match1(url, r'/s/([^/&?#]+)')
|
||||
content = get_content('https://kuaibao.qq.com/getVideoRelate?id=' + nid)
|
||||
info_json = json.loads(content)
|
||||
vid=info_json['videoinfo']['vid']
|
||||
title=info_json['videoinfo']['title']
|
||||
elif 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
|
||||
# http://daxue.qq.com/content/content/id/2321
|
||||
content = get_content(url, headers)
|
||||
vid = match1(content, r'vid\s*=\s*"\s*([^"]+)"')
|
||||
title = match1(content, r'title">([^"]+)</p>')
|
||||
title = title.strip() if title else vid
|
||||
@ -92,17 +154,31 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
vid = match1(url, r'\bvid=(\w+)')
|
||||
# for embedded URLs; don't know what the title is
|
||||
title = vid
|
||||
elif 'view.inews.qq.com' in url:
|
||||
# view.inews.qq.com/a/20180521V0Z9MH00
|
||||
content = get_content(url, headers)
|
||||
vid = match1(content, r'"vid":"(\w+)"')
|
||||
title = match1(content, r'"title":"(\w+)"')
|
||||
else:
|
||||
content = get_html(url)
|
||||
vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
|
||||
vid = vid[0] if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
|
||||
content = get_content(url, headers)
|
||||
#vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
|
||||
rurl = match1(content, r'<link.*?rel\s*=\s*"canonical".*?href\s*="(.+?)".*?>') #https://v.qq.com/x/cover/9hpjiv5fhiyn86u/t0522x58xma.html
|
||||
vid = ""
|
||||
if rurl:
|
||||
vid = rurl.split('/')[-1].split('.')[0]
|
||||
# https://v.qq.com/x/page/d0552xbadkl.html https://y.qq.com/n/yqq/mv/v/g00268vlkzy.html
|
||||
if vid == "undefined" or vid == "index":
|
||||
vid = ""
|
||||
vid = vid if vid else url.split('/')[-1].split('.')[0] #https://v.qq.com/x/cover/ps6mnfqyrfo7es3/q0181hpdvo5.html?
|
||||
vid = vid if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
|
||||
if not vid:
|
||||
vid = match1(content, r'id"*\s*:\s*"(.+?)"')
|
||||
title = match1(content,r'<a.*?id\s*=\s*"%s".*?title\s*=\s*"(.+?)".*?>'%vid)
|
||||
title = match1(content, r'title">([^"]+)</p>') if not title else title
|
||||
title = match1(content, r'"title":"([^"]+)"') if not title else title
|
||||
title = vid if not title else title #general fallback
|
||||
|
||||
|
||||
|
||||
qq_download_by_vid(vid, title, output_dir, merge, info_only)
|
||||
|
||||
site_info = "QQ.com"
|
||||
|
44
src/you_get/extractors/qq_egame.py
Normal file
44
src/you_get/extractors/qq_egame.py
Normal file
@ -0,0 +1,44 @@
|
||||
import re
|
||||
import json
|
||||
|
||||
from ..common import *
|
||||
from ..extractors import VideoExtractor
|
||||
from ..util import log
|
||||
from ..util.strings import unescape_html
|
||||
|
||||
__all__ = ['qq_egame_download']
|
||||
|
||||
|
||||
def qq_egame_download(url,
|
||||
output_dir='.',
|
||||
merge=True,
|
||||
info_only=False,
|
||||
**kwargs):
|
||||
uid = re.search('\d\d\d+', url)
|
||||
an_url = "https://m.egame.qq.com/live?anchorid={}&".format(uid.group(0))
|
||||
page = get_content(an_url)
|
||||
server_data = re.search(r'window\.serverData\s*=\s*({.+?});', page)
|
||||
if server_data is None:
|
||||
log.wtf('Can not find window.server_data')
|
||||
json_data = json.loads(server_data.group(1))
|
||||
if json_data['anchorInfo']['data']['isLive'] == 0:
|
||||
log.wtf('Offline...')
|
||||
live_info = json_data['liveInfo']['data']
|
||||
title = '{}_{}'.format(live_info['profileInfo']['nickName'],
|
||||
live_info['videoInfo']['title'])
|
||||
real_url = live_info['videoInfo']['streamInfos'][0]['playUrl']
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_url_ffmpeg(
|
||||
real_url,
|
||||
title,
|
||||
'flv',
|
||||
params={},
|
||||
output_dir=output_dir,
|
||||
merge=merge)
|
||||
|
||||
|
||||
site_info = "egame.qq.com"
|
||||
download = qq_egame_download
|
||||
download_playlist = playlist_not_supported('qq_egame')
|
@ -3,45 +3,50 @@
|
||||
__all__ = ['sina_download', 'sina_download_by_vid', 'sina_download_by_vkey']
|
||||
|
||||
from ..common import *
|
||||
from ..util.log import *
|
||||
|
||||
from hashlib import md5
|
||||
from random import randint
|
||||
from time import time
|
||||
from xml.dom.minidom import parseString
|
||||
import urllib.parse
|
||||
|
||||
def get_k(vid, rand):
|
||||
t = str(int('{0:b}'.format(int(time()))[:-6], 2))
|
||||
return md5((vid + 'Z6prk18aWxP278cVAH' + t + rand).encode('utf-8')).hexdigest()[:16] + t
|
||||
|
||||
def video_info_xml(vid):
|
||||
def api_req(vid):
|
||||
rand = "0.{0}{1}".format(randint(10000, 10000000), randint(10000, 10000000))
|
||||
url = 'http://ask.ivideo.sina.com.cn/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, get_k(vid, rand))
|
||||
xml = get_content(url, headers=fake_headers, decoded=True)
|
||||
t = str(int('{0:b}'.format(int(time()))[:-6], 2))
|
||||
k = md5((vid + 'Z6prk18aWxP278cVAH' + t + rand).encode('utf-8')).hexdigest()[:16] + t
|
||||
url = 'http://ask.ivideo.sina.com.cn/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, k)
|
||||
xml = get_content(url, headers=fake_headers)
|
||||
return xml
|
||||
|
||||
def video_info(xml):
|
||||
urls = re.findall(r'<url>(?:<!\[CDATA\[)?(.*?)(?:\]\]>)?</url>', xml)
|
||||
name = match1(xml, r'<vname>(?:<!\[CDATA\[)?(.+?)(?:\]\]>)?</vname>')
|
||||
vstr = match1(xml, r'<vstr>(?:<!\[CDATA\[)?(.+?)(?:\]\]>)?</vstr>')
|
||||
return urls, name, vstr
|
||||
video = parseString(xml).getElementsByTagName('video')[0]
|
||||
result = video.getElementsByTagName('result')[0]
|
||||
if result.firstChild.nodeValue == 'error':
|
||||
message = video.getElementsByTagName('message')[0]
|
||||
return None, message.firstChild.nodeValue, None
|
||||
vname = video.getElementsByTagName('vname')[0].firstChild.nodeValue
|
||||
durls = video.getElementsByTagName('durl')
|
||||
|
||||
urls = []
|
||||
size = 0
|
||||
for durl in durls:
|
||||
url = durl.getElementsByTagName('url')[0].firstChild.nodeValue
|
||||
seg_size = durl.getElementsByTagName('filesize')[0].firstChild.nodeValue
|
||||
urls.append(url)
|
||||
size += int(seg_size)
|
||||
|
||||
return urls, vname, size
|
||||
|
||||
def sina_download_by_vid(vid, title=None, output_dir='.', merge=True, info_only=False):
|
||||
"""Downloads a Sina video by its unique vid.
|
||||
http://video.sina.com.cn/
|
||||
"""
|
||||
|
||||
xml = video_info_xml(vid)
|
||||
sina_download_by_xml(xml, title, output_dir, merge, info_only)
|
||||
|
||||
|
||||
def sina_download_by_xml(xml, title, output_dir, merge, info_only):
|
||||
urls, name, vstr = video_info(xml)
|
||||
title = title or name
|
||||
assert title
|
||||
size = 0
|
||||
for url in urls:
|
||||
_, _, temp = url_info(url)
|
||||
size += temp
|
||||
|
||||
xml = api_req(vid)
|
||||
urls, name, size = video_info(xml)
|
||||
if urls is None:
|
||||
log.wtf(name)
|
||||
title = name
|
||||
print_info(site_info, title, 'flv', size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, 'flv', size, output_dir = output_dir, merge = merge)
|
||||
@ -58,9 +63,40 @@ def sina_download_by_vkey(vkey, title=None, output_dir='.', merge=True, info_onl
|
||||
if not info_only:
|
||||
download_urls([url], title, 'flv', size, output_dir = output_dir, merge = merge)
|
||||
|
||||
def sina_zxt(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
ep = 'http://s.video.sina.com.cn/video/play?video_id='
|
||||
frag = urllib.parse.urlparse(url).fragment
|
||||
if not frag:
|
||||
log.wtf('No video specified with fragment')
|
||||
meta = json.loads(get_content(ep + frag))
|
||||
if meta['code'] != 1:
|
||||
# Yes they use 1 for success.
|
||||
log.wtf(meta['message'])
|
||||
title = meta['data']['title']
|
||||
videos = sorted(meta['data']['videos'], key = lambda i: int(i['size']))
|
||||
|
||||
if len(videos) == 0:
|
||||
log.wtf('No video file returned by API server')
|
||||
|
||||
vid = videos[-1]['file_id']
|
||||
container = videos[-1]['type']
|
||||
size = int(videos[-1]['size'])
|
||||
|
||||
if container == 'hlv':
|
||||
container = 'flv'
|
||||
|
||||
urls, _, _ = video_info(api_req(vid))
|
||||
print_info(site_info, title, container, size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, container, size, output_dir=output_dir, merge=merge, **kwargs)
|
||||
return
|
||||
|
||||
def sina_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
"""Downloads Sina videos by URL.
|
||||
"""
|
||||
if 'news.sina.com.cn/zxt' in url:
|
||||
sina_zxt(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
vid = match1(url, r'vid=(\d+)')
|
||||
if vid is None:
|
||||
@ -73,10 +109,14 @@ def sina_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if vid is None:
|
||||
vid = match1(video_page, r'vid:"?(\d+)"?')
|
||||
if vid:
|
||||
title = match1(video_page, r'title\s*:\s*\'([^\']+)\'')
|
||||
sina_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
#title = match1(video_page, r'title\s*:\s*\'([^\']+)\'')
|
||||
sina_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
else:
|
||||
vkey = match1(video_page, r'vkey\s*:\s*"([^"]+)"')
|
||||
if vkey is None:
|
||||
vid = match1(url, r'#(\d+)')
|
||||
sina_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
return
|
||||
title = match1(video_page, r'title\s*:\s*"([^"]+)"')
|
||||
sina_download_by_vkey(vkey, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
|
||||
|
@ -15,28 +15,30 @@ Changelog:
|
||||
new api
|
||||
'''
|
||||
|
||||
def real_url(host,vid,tvid,new,clipURL,ck):
|
||||
url = 'http://'+host+'/?prot=9&prod=flash&pt=1&file='+clipURL+'&new='+new +'&key='+ ck+'&vid='+str(vid)+'&uid='+str(int(time.time()*1000))+'&t='+str(random())+'&rb=1'
|
||||
return json.loads(get_html(url))['url']
|
||||
|
||||
def sohu_download(url, output_dir = '.', merge = True, info_only = False, extractor_proxy=None, **kwargs):
|
||||
def real_url(fileName, key, ch):
|
||||
url = "https://data.vod.itc.cn/ip?new=" + fileName + "&num=1&key=" + key + "&ch=" + ch + "&pt=1&pg=2&prod=h5n"
|
||||
return json.loads(get_html(url))['servers'][0]['url']
|
||||
|
||||
|
||||
def sohu_download(url, output_dir='.', merge=True, info_only=False, extractor_proxy=None, **kwargs):
|
||||
if re.match(r'http://share.vrs.sohu.com', url):
|
||||
vid = r1('id=(\d+)', url)
|
||||
else:
|
||||
html = get_html(url)
|
||||
vid = r1(r'\Wvid\s*[\:=]\s*[\'"]?(\d+)[\'"]?', html)
|
||||
vid = r1(r'\Wvid\s*[\:=]\s*[\'"]?(\d+)[\'"]?', html) or r1(r'bid:\'(\d+)\',', html) or r1(r'bid=(\d+)', html)
|
||||
assert vid
|
||||
|
||||
if re.match(r'http://tv.sohu.com/', url):
|
||||
if extractor_proxy:
|
||||
set_proxy(tuple(extractor_proxy.split(":")))
|
||||
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
|
||||
for qtyp in ["oriVid","superVid","highVid" ,"norVid","relativeId"]:
|
||||
if info and info.get("data", ""):
|
||||
for qtyp in ["oriVid", "superVid", "highVid", "norVid", "relativeId"]:
|
||||
if 'data' in info:
|
||||
hqvid = info['data'][qtyp]
|
||||
else:
|
||||
hqvid = info[qtyp]
|
||||
if hqvid != 0 and hqvid != vid :
|
||||
if hqvid != 0 and hqvid != vid:
|
||||
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % hqvid))
|
||||
if not 'allot' in info:
|
||||
continue
|
||||
@ -51,9 +53,8 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
|
||||
title = data['tvName']
|
||||
size = sum(data['clipsBytes'])
|
||||
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
|
||||
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
|
||||
clipURL = urlparse(clip).path
|
||||
urls.append(real_url(host,hqvid,tvid,new,clipURL,ck))
|
||||
for fileName, key in zip(data['su'], data['ck']):
|
||||
urls.append(real_url(fileName, key, data['ch']))
|
||||
# assert data['clipsURL'][0].endswith('.mp4')
|
||||
|
||||
else:
|
||||
@ -64,15 +65,15 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
|
||||
urls = []
|
||||
data = info['data']
|
||||
title = data['tvName']
|
||||
size = sum(map(int,data['clipsBytes']))
|
||||
size = sum(map(int, data['clipsBytes']))
|
||||
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
|
||||
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
|
||||
clipURL = urlparse(clip).path
|
||||
urls.append(real_url(host,vid,tvid,new,clipURL,ck))
|
||||
for fileName, key in zip(data['su'], data['ck']):
|
||||
urls.append(real_url(fileName, key, data['ch']))
|
||||
|
||||
print_info(site_info, title, 'mp4', size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, 'mp4', size, output_dir, refer = url, merge = merge)
|
||||
download_urls(urls, title, 'mp4', size, output_dir, refer=url, merge=merge)
|
||||
|
||||
|
||||
site_info = "Sohu.com"
|
||||
download = sohu_download
|
||||
|
@ -1,31 +1,80 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['soundcloud_download', 'soundcloud_download_by_id']
|
||||
__all__ = ['sndcd_download']
|
||||
|
||||
from ..common import *
|
||||
import re
|
||||
import json
|
||||
import urllib.error
|
||||
|
||||
def soundcloud_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
|
||||
assert title
|
||||
|
||||
#if info["downloadable"]:
|
||||
# url = 'https://api.soundcloud.com/tracks/' + id + '/download?client_id=b45b1aa10f1ac2941910a7f0d10f8e28'
|
||||
url = 'https://api.soundcloud.com/tracks/' + id + '/stream?client_id=02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea'
|
||||
assert url
|
||||
type, ext, size = url_info(url)
|
||||
def get_sndcd_apikey():
|
||||
home_page = get_content('https://soundcloud.com')
|
||||
js_url = re.findall(r'script crossorigin src="(.+?)"></script>', home_page)[-1]
|
||||
|
||||
client_id = get_content(js_url)
|
||||
return re.search(r'client_id:"(.+?)"', client_id).group(1)
|
||||
|
||||
|
||||
def get_resource_info(resource_url, client_id):
|
||||
cont = get_content(resource_url, decoded=True)
|
||||
|
||||
x = re.escape('forEach(function(e){n(e)})}catch(t){}})},')
|
||||
x = re.search(r'' + x + r'(.*)\);</script>', cont)
|
||||
|
||||
info = json.loads(x.group(1))[-1]['data'][0]
|
||||
|
||||
info = info['tracks'] if info.get('track_count') else [info]
|
||||
|
||||
ids = [i['id'] for i in info if i.get('comment_count') is None]
|
||||
ids = list(map(str, ids))
|
||||
ids_split = ['%2C'.join(ids[i:i+10]) for i in range(0, len(ids), 10)]
|
||||
api_url = 'https://api-v2.soundcloud.com/tracks?ids={ids}&client_id={client_id}&%5Bobject%20Object%5D=&app_version=1584348206&app_locale=en'
|
||||
|
||||
res = []
|
||||
for ids in ids_split:
|
||||
uri = api_url.format(ids=ids, client_id=client_id)
|
||||
cont = get_content(uri, decoded=True)
|
||||
res += json.loads(cont)
|
||||
|
||||
res = iter(res)
|
||||
info = [next(res) if i.get('comment_count') is None else i for i in info]
|
||||
|
||||
return info
|
||||
|
||||
|
||||
def sndcd_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
client_id = get_sndcd_apikey()
|
||||
|
||||
r_info = get_resource_info(url, client_id)
|
||||
|
||||
for info in r_info:
|
||||
title = info['title']
|
||||
metadata = info.get('publisher_metadata')
|
||||
|
||||
transcodings = info['media']['transcodings']
|
||||
sq = [i for i in transcodings if i['quality'] == 'sq']
|
||||
hq = [i for i in transcodings if i['quality'] == 'hq']
|
||||
# source url
|
||||
surl = sq[0] if hq == [] else hq[0]
|
||||
surl = surl['url']
|
||||
|
||||
uri = surl + '?client_id=' + client_id
|
||||
r = get_content(uri)
|
||||
surl = json.loads(r)['url']
|
||||
|
||||
m3u8 = get_content(surl)
|
||||
# url list
|
||||
urll = re.findall(r'http.*?(?=\n)', m3u8)
|
||||
|
||||
size = urls_size(urll)
|
||||
print_info(site_info, title, 'audio/mpeg', size)
|
||||
print(end='', flush=True)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge = merge)
|
||||
download_urls(urll, title=title, ext='mp3', total_size=size, output_dir=output_dir, merge=True)
|
||||
|
||||
def soundcloud_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
metadata = get_html('https://api.soundcloud.com/resolve.json?url=' + url + '&client_id=02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea')
|
||||
import json
|
||||
info = json.loads(metadata)
|
||||
title = info["title"]
|
||||
id = str(info["id"])
|
||||
|
||||
soundcloud_download_by_id(id, title, output_dir, merge = merge, info_only = info_only)
|
||||
|
||||
site_info = "SoundCloud.com"
|
||||
download = soundcloud_download
|
||||
download_playlist = playlist_not_supported('soundcloud')
|
||||
download = sndcd_download
|
||||
download_playlist = sndcd_download
|
||||
|
@ -7,9 +7,10 @@ import json
|
||||
|
||||
def ted_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url)
|
||||
metadata = json.loads(match1(html, r'({"talks"(.*)})\)'))
|
||||
patt = r'"__INITIAL_DATA__"\s*:\s*\{(.+)\}'
|
||||
metadata = json.loads('{' + match1(html, patt) + '}')
|
||||
title = metadata['talks'][0]['title']
|
||||
nativeDownloads = metadata['talks'][0]['nativeDownloads']
|
||||
nativeDownloads = metadata['talks'][0]['downloads']['nativeDownloads']
|
||||
for quality in ['high', 'medium', 'low']:
|
||||
if quality in nativeDownloads:
|
||||
url = nativeDownloads[quality]
|
||||
|
@ -1,83 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['thvideo_download']
|
||||
|
||||
from ..common import *
|
||||
from xml.dom.minidom import parseString
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def thvideo_cid_to_url(cid, p):
|
||||
"""int,int->list
|
||||
From Biligrab."""
|
||||
interface_url = 'http://thvideo.tv/api/playurl.php?cid={cid}-{p}'.format(cid = cid, p = p)
|
||||
data = get_content(interface_url)
|
||||
rawurl = []
|
||||
dom = parseString(data)
|
||||
|
||||
for node in dom.getElementsByTagName('durl'):
|
||||
url = node.getElementsByTagName('url')[0]
|
||||
rawurl.append(url.childNodes[0].data)
|
||||
return rawurl
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def th_video_get_title(url, p):
|
||||
""""""
|
||||
if re.match(r'http://thvideo.tv/v/\w+', url):
|
||||
html = get_content(url)
|
||||
title = match1(html, r'<meta property="og:title" content="([^"]*)"').strip()
|
||||
|
||||
video_list = match1(html, r'<li>cid=(.+)</li>').split('**')
|
||||
|
||||
if int(p) > 0: #not the 1st P or multi part
|
||||
title = title + ' - ' + [i.split('=')[-1:][0].split('|')[1] for i in video_list][p]
|
||||
|
||||
return title
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def thvideo_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
if re.match(r'http://thvideo.tv/v/\w+', url):
|
||||
if 'p' in kwargs and kwargs['p']:
|
||||
p = kwargs['p']
|
||||
else:
|
||||
p = int(match1(url, r'http://thvideo.tv/v/th\d+#(\d+)'))
|
||||
p -= 1
|
||||
|
||||
if not p or p < 0:
|
||||
p = 0
|
||||
|
||||
if 'title' in kwargs and kwargs['title']:
|
||||
title = kwargs['title']
|
||||
else:
|
||||
title = th_video_get_title(url, p)
|
||||
|
||||
cid = match1(url, r'http://thvideo.tv/v/th(\d+)')
|
||||
|
||||
type_ = ''
|
||||
size = 0
|
||||
urls = thvideo_cid_to_url(cid, p)
|
||||
|
||||
for url in urls:
|
||||
_, type_, temp = url_info(url)
|
||||
size += temp
|
||||
|
||||
print_info(site_info, title, type_, size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def thvideo_download_playlist(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
""""""
|
||||
if re.match(r'http://thvideo.tv/v/\w+', url):
|
||||
html = get_content(url)
|
||||
video_list = match1(html, r'<li>cid=(.+)</li>').split('**')
|
||||
|
||||
title_base = th_video_get_title(url, 0)
|
||||
for p, v in video_list:
|
||||
part_title = [i.split('=')[-1:][0].split('|')[1] for i in video_list][p]
|
||||
title = title_base + part_title
|
||||
thvideo_download(url, output_dir, merge,
|
||||
info_only, p = p, title = title)
|
||||
|
||||
site_info = "THVideo"
|
||||
download = thvideo_download
|
||||
download_playlist = thvideo_download_playlist
|
47
src/you_get/extractors/tiktok.py
Normal file
47
src/you_get/extractors/tiktok.py
Normal file
@ -0,0 +1,47 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['tiktok_download']
|
||||
|
||||
from ..common import *
|
||||
|
||||
def tiktok_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
referUrl = url.split('?')[0]
|
||||
headers = fake_headers
|
||||
|
||||
# trick or treat
|
||||
html = get_content(url, headers=headers)
|
||||
data = r1(r'<script id="__NEXT_DATA__".*?>(.*?)</script>', html)
|
||||
info = json.loads(data)
|
||||
wid = info['props']['initialProps']['$wid']
|
||||
cookie = 'tt_webid=%s; tt_webid_v2=%s' % (wid, wid)
|
||||
|
||||
# here's the cookie
|
||||
headers['Cookie'] = cookie
|
||||
|
||||
# try again
|
||||
html = get_content(url, headers=headers)
|
||||
data = r1(r'<script id="__NEXT_DATA__".*?>(.*?)</script>', html)
|
||||
info = json.loads(data)
|
||||
wid = info['props']['initialProps']['$wid']
|
||||
cookie = 'tt_webid=%s; tt_webid_v2=%s' % (wid, wid)
|
||||
|
||||
videoData = info['props']['pageProps']['itemInfo']['itemStruct']
|
||||
videoId = videoData['id']
|
||||
videoUrl = videoData['video']['downloadAddr']
|
||||
uniqueId = videoData['author'].get('uniqueId')
|
||||
nickName = videoData['author'].get('nickname')
|
||||
|
||||
title = '%s [%s]' % (nickName or uniqueId, videoId)
|
||||
|
||||
# we also need the referer
|
||||
headers['Referer'] = referUrl
|
||||
|
||||
mime, ext, size = url_info(videoUrl, headers=headers)
|
||||
|
||||
print_info(site_info, title, mime, size)
|
||||
if not info_only:
|
||||
download_urls([videoUrl], title, ext, size, output_dir=output_dir, merge=merge, headers=headers)
|
||||
|
||||
site_info = "TikTok.com"
|
||||
download = tiktok_download
|
||||
download_playlist = playlist_not_supported('tiktok')
|
86
src/you_get/extractors/toutiao.py
Normal file
86
src/you_get/extractors/toutiao.py
Normal file
@ -0,0 +1,86 @@
|
||||
#!/usr/bin/env python
|
||||
import binascii
|
||||
import random
|
||||
from json import loads
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from ..common import *
|
||||
|
||||
try:
|
||||
from base64 import decodebytes
|
||||
except ImportError:
|
||||
from base64 import decodestring
|
||||
|
||||
decodebytes = decodestring
|
||||
|
||||
__all__ = ['toutiao_download', ]
|
||||
|
||||
|
||||
def random_with_n_digits(n):
|
||||
return random.randint(10 ** (n - 1), (10 ** n) - 1)
|
||||
|
||||
|
||||
def sign_video_url(vid):
|
||||
r = str(random_with_n_digits(16))
|
||||
|
||||
url = 'https://ib.365yg.com/video/urls/v/1/toutiao/mp4/{vid}'.format(vid=vid)
|
||||
n = urlparse(url).path + '?r=' + r
|
||||
b_n = bytes(n, encoding="utf-8")
|
||||
s = binascii.crc32(b_n)
|
||||
aid = 1364
|
||||
ts = int(time.time() * 1000)
|
||||
return url + '?r={r}&s={s}&aid={aid}&vfrom=xgplayer&callback=axiosJsonpCallback1&_={ts}'.format(r=r, s=s, aid=aid,
|
||||
ts=ts)
|
||||
|
||||
|
||||
class ToutiaoVideoInfo(object):
|
||||
|
||||
def __init__(self):
|
||||
self.bitrate = None
|
||||
self.definition = None
|
||||
self.size = None
|
||||
self.height = None
|
||||
self.width = None
|
||||
self.type = None
|
||||
self.url = None
|
||||
|
||||
def __str__(self):
|
||||
return json.dumps(self.__dict__)
|
||||
|
||||
|
||||
def get_file_by_vid(video_id):
|
||||
vRet = []
|
||||
url = sign_video_url(video_id)
|
||||
ret = get_content(url)
|
||||
ret = loads(ret[20:-1])
|
||||
vlist = ret.get('data').get('video_list')
|
||||
if len(vlist) > 0:
|
||||
vInfo = vlist.get(sorted(vlist.keys(), reverse=True)[0])
|
||||
vUrl = vInfo.get('main_url')
|
||||
vUrl = decodebytes(vUrl.encode('ascii')).decode('ascii')
|
||||
videoInfo = ToutiaoVideoInfo()
|
||||
videoInfo.bitrate = vInfo.get('bitrate')
|
||||
videoInfo.definition = vInfo.get('definition')
|
||||
videoInfo.size = vInfo.get('size')
|
||||
videoInfo.height = vInfo.get('vheight')
|
||||
videoInfo.width = vInfo.get('vwidth')
|
||||
videoInfo.type = vInfo.get('vtype')
|
||||
videoInfo.url = vUrl
|
||||
vRet.append(videoInfo)
|
||||
return vRet
|
||||
|
||||
|
||||
def toutiao_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url, faker=True)
|
||||
video_id = match1(html, r".*?videoId: '(?P<vid>.*)'")
|
||||
title = match1(html, '.*?<title>(?P<title>.*?)</title>')
|
||||
video_file_list = get_file_by_vid(video_id) # 调api获取视频源文件
|
||||
type, ext, size = url_info(video_file_list[0].url, faker=True)
|
||||
print_info(site_info=site_info, title=title, type=type, size=size)
|
||||
if not info_only:
|
||||
download_urls([video_file_list[0].url], title, ext, size, output_dir, merge=merge, faker=True)
|
||||
|
||||
|
||||
site_info = "Toutiao.com"
|
||||
download = toutiao_download
|
||||
download_playlist = playlist_not_supported("toutiao")
|
@ -26,7 +26,10 @@ def tudou_download_by_id(id, title, output_dir = '.', merge = True, info_only =
|
||||
html = get_html('http://www.tudou.com/programs/view/%s/' % id)
|
||||
|
||||
iid = r1(r'iid\s*[:=]\s*(\S+)', html)
|
||||
try:
|
||||
title = r1(r'kw\s*[:=]\s*[\'\"]([^\n]+?)\'\s*\n', html).replace("\\'", "\'")
|
||||
except AttributeError:
|
||||
title = ''
|
||||
tudou_download_by_iid(iid, title, output_dir = output_dir, merge = merge, info_only = info_only)
|
||||
|
||||
def tudou_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
@ -42,16 +45,23 @@ def tudou_download(url, output_dir = '.', merge = True, info_only = False, **kwa
|
||||
if id:
|
||||
return tudou_download_by_id(id, title="", info_only=info_only)
|
||||
|
||||
html = get_decoded_html(url)
|
||||
html = get_content(url)
|
||||
|
||||
title = r1(r'kw\s*[:=]\s*[\'\"]([^\n]+?)\'\s*\n', html).replace("\\'", "\'")
|
||||
try:
|
||||
title = r1(r'\Wkw\s*[:=]\s*[\'\"]([^\n]+?)\'\s*\n', html).replace("\\'", "\'")
|
||||
assert title
|
||||
title = unescape_html(title)
|
||||
except AttributeError:
|
||||
title = match1(html, r'id=\"subtitle\"\s*title\s*=\s*\"([^\"]+)\"')
|
||||
if title is None:
|
||||
title = ''
|
||||
|
||||
vcode = r1(r'vcode\s*[:=]\s*\'([^\']+)\'', html)
|
||||
if vcode is None:
|
||||
vcode = match1(html, r'viden\s*[:=]\s*\"([\w+/=]+)\"')
|
||||
if vcode:
|
||||
from .youku import youku_download_by_vid
|
||||
return youku_download_by_vid(vcode, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return youku_download_by_vid(vcode, title=title, output_dir=output_dir, merge=merge, info_only=info_only, src='tudou', **kwargs)
|
||||
|
||||
iid = r1(r'iid\s*[:=]\s*(\d+)', html)
|
||||
if not iid:
|
||||
|
@ -13,7 +13,29 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only)
|
||||
return
|
||||
|
||||
html = parse.unquote(get_html(url)).replace('\/', '/')
|
||||
import ssl
|
||||
ssl_context = request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
|
||||
cookie_handler = request.HTTPCookieProcessor()
|
||||
opener = request.build_opener(ssl_context, cookie_handler)
|
||||
request.install_opener(opener)
|
||||
|
||||
page = get_html(url)
|
||||
form_key = match1(page, r'id="tumblr_form_key" content="([^"]+)"')
|
||||
if form_key is not None:
|
||||
# bypass GDPR consent page
|
||||
referer = 'https://www.tumblr.com/privacy/consent?redirect=%s' % parse.quote_plus(url)
|
||||
post_content('https://www.tumblr.com/svc/privacy/consent',
|
||||
headers={
|
||||
'Content-Type': 'application/json',
|
||||
'User-Agent': fake_headers['User-Agent'],
|
||||
'Referer': referer,
|
||||
'X-tumblr-form-key': form_key,
|
||||
'X-Requested-With': 'XMLHttpRequest'
|
||||
},
|
||||
post_data_raw='{"eu_resident":true,"gdpr_is_acceptable_age":true,"gdpr_consent_core":true,"gdpr_consent_first_party_ads":true,"gdpr_consent_third_party_ads":true,"gdpr_consent_search_history":true,"redirect_to":"%s","gdpr_reconsent":false}' % url)
|
||||
page = get_html(url, faker=True)
|
||||
|
||||
html = parse.unquote(page).replace('\/', '/')
|
||||
feed = r1(r'<meta property="og:type" content="tumblr-feed:(\w+)" />', html)
|
||||
|
||||
if feed in ['photo', 'photoset', 'entry'] or feed is None:
|
||||
@ -21,26 +43,36 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
page_title = r1(r'<meta name="description" content="([^"\n]+)', html) or \
|
||||
r1(r'<meta property="og:description" content="([^"\n]+)', html) or \
|
||||
r1(r'<title>([^<\n]*)', html)
|
||||
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.jpg)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.png)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^";]+_\d+\.gif)', html)
|
||||
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.jpg)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.png)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^";&]+_\d+\.gif)', html)
|
||||
|
||||
tuggles = {}
|
||||
for url in urls:
|
||||
filename = parse.unquote(url.split('/')[-1])
|
||||
if url.endswith('.gif'):
|
||||
hd_url = url
|
||||
elif url.endswith('.jpg'):
|
||||
hd_url = r1(r'(.+)_\d+\.jpg$', url) + '_1280.jpg' # FIXME: decide actual quality
|
||||
elif url.endswith('.png'):
|
||||
hd_url = r1(r'(.+)_\d+\.png$', url) + '_1280.png' # FIXME: decide actual quality
|
||||
else:
|
||||
continue
|
||||
filename = parse.unquote(hd_url.split('/')[-1])
|
||||
title = '.'.join(filename.split('.')[:-1])
|
||||
tumblr_id = r1(r'^tumblr_(.+)_\d+$', title)
|
||||
quality = int(r1(r'^tumblr_.+_(\d+)$', title))
|
||||
ext = filename.split('.')[-1]
|
||||
size = int(get_head(url)['Content-Length'])
|
||||
try:
|
||||
size = int(get_head(hd_url)['Content-Length'])
|
||||
if tumblr_id not in tuggles or tuggles[tumblr_id]['quality'] < quality:
|
||||
tuggles[tumblr_id] = {
|
||||
'title': title,
|
||||
'url': url,
|
||||
'url': hd_url,
|
||||
'quality': quality,
|
||||
'ext': ext,
|
||||
'size': size,
|
||||
}
|
||||
except: pass
|
||||
|
||||
if tuggles:
|
||||
size = sum([tuggles[t]['size'] for t in tuggles])
|
||||
@ -68,6 +100,11 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
real_url = r1(r'<source src="([^"]*)"', html)
|
||||
if not real_url:
|
||||
iframe_url = r1(r'<[^>]+tumblr_video_container[^>]+><iframe[^>]+src=[\'"]([^\'"]*)[\'"]', html)
|
||||
|
||||
if iframe_url is None:
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
if iframe_url:
|
||||
iframe_html = get_content(iframe_url, headers=fake_headers)
|
||||
real_url = r1(r'<video[^>]*>[\n ]*<source[^>]+src=[\'"]([^\'"]*)[\'"]', iframe_html)
|
||||
@ -92,11 +129,15 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
r1(r'<meta property="og:description" content="([^"]*)" />', html) or
|
||||
r1(r'<title>([^<\n]*)', html) or url.split("/")[4]).replace('\n', '')
|
||||
|
||||
type, ext, size = url_info(real_url)
|
||||
# this is better
|
||||
vcode = r1(r'tumblr_(\w+)', real_url)
|
||||
real_url = 'https://vt.media.tumblr.com/tumblr_%s.mp4' % vcode
|
||||
|
||||
type, ext, size = url_info(real_url, faker=True)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
||||
download_urls([real_url], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
site_info = "Tumblr.com"
|
||||
download = tumblr_download
|
||||
|
@ -3,87 +3,102 @@
|
||||
__all__ = ['twitter_download']
|
||||
|
||||
from ..common import *
|
||||
from .universal import *
|
||||
from .vine import vine_download
|
||||
|
||||
def extract_m3u(source):
|
||||
r1 = get_content(source)
|
||||
s1 = re.findall(r'(/ext_tw_video/.*)', r1)
|
||||
s1 += re.findall(r'(/amplify_video/.*)', r1)
|
||||
r2 = get_content('https://video.twimg.com%s' % s1[-1])
|
||||
s2 = re.findall(r'(/ext_tw_video/.*)', r2)
|
||||
s2 += re.findall(r'(/amplify_video/.*)', r2)
|
||||
return ['https://video.twimg.com%s' % i for i in s2]
|
||||
|
||||
def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url)
|
||||
screen_name = r1(r'data-screen-name="([^"]*)"', html) or \
|
||||
if re.match(r'https?://pbs\.twimg\.com', url):
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
if re.match(r'https?://mobile', url): # normalize mobile URL
|
||||
url = 'https://' + match1(url, r'//mobile\.(.+)')
|
||||
|
||||
if re.match(r'https?://twitter\.com/i/moments/', url): # moments
|
||||
html = get_html(url, faker=True)
|
||||
paths = re.findall(r'data-permalink-path="([^"]+)"', html)
|
||||
for path in paths:
|
||||
twitter_download('https://twitter.com' + path,
|
||||
output_dir=output_dir,
|
||||
merge=merge,
|
||||
info_only=info_only,
|
||||
**kwargs)
|
||||
return
|
||||
|
||||
html = get_html(url, faker=False) # disable faker to prevent 302 infinite redirect
|
||||
screen_name = r1(r'twitter\.com/([^/]+)', url) or r1(r'data-screen-name="([^"]*)"', html) or \
|
||||
r1(r'<meta name="twitter:title" content="([^"]*)"', html)
|
||||
item_id = r1(r'data-item-id="([^"]*)"', html) or \
|
||||
item_id = r1(r'twitter\.com/[^/]+/status/(\d+)', url) or r1(r'data-item-id="([^"]*)"', html) or \
|
||||
r1(r'<meta name="twitter:site:id" content="([^"]*)"', html)
|
||||
page_title = "{} [{}]".format(screen_name, item_id)
|
||||
|
||||
try: # extract images
|
||||
urls = re.findall(r'property="og:image"\s*content="([^"]+:large)"', html)
|
||||
assert urls
|
||||
images = []
|
||||
for url in urls:
|
||||
url = ':'.join(url.split(':')[:-1]) + ':orig'
|
||||
filename = parse.unquote(url.split('/')[-1])
|
||||
title = '.'.join(filename.split('.')[:-1])
|
||||
ext = url.split(':')[-2].split('.')[-1]
|
||||
size = int(get_head(url)['Content-Length'])
|
||||
images.append({'title': title,
|
||||
'url': url,
|
||||
'ext': ext,
|
||||
'size': size})
|
||||
size = sum([image['size'] for image in images])
|
||||
print_info(site_info, page_title, images[0]['ext'], size)
|
||||
authorization = 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
|
||||
|
||||
if not info_only:
|
||||
for image in images:
|
||||
title = image['title']
|
||||
ext = image['ext']
|
||||
size = image['size']
|
||||
url = image['url']
|
||||
print_info(site_info, title, ext, size)
|
||||
download_urls([url], title, ext, size,
|
||||
output_dir=output_dir)
|
||||
ga_url = 'https://api.twitter.com/1.1/guest/activate.json'
|
||||
ga_content = post_content(ga_url, headers={'authorization': authorization})
|
||||
guest_token = json.loads(ga_content)['guest_token']
|
||||
|
||||
except: # extract video
|
||||
# always use i/cards or videos url
|
||||
if not re.match(r'https?://twitter.com/i/', url):
|
||||
url = r1(r'<meta\s*property="og:video:url"\s*content="([^"]+)"', html)
|
||||
if not url:
|
||||
url = 'https://twitter.com/i/videos/%s' % item_id
|
||||
html = get_content(url)
|
||||
api_url = 'https://api.twitter.com/2/timeline/conversation/%s.json?tweet_mode=extended' % item_id
|
||||
api_content = get_content(api_url, headers={'authorization': authorization, 'x-guest-token': guest_token})
|
||||
|
||||
data_config = r1(r'data-config="([^"]*)"', html) or \
|
||||
r1(r'data-player-config="([^"]*)"', html)
|
||||
i = json.loads(unescape_html(data_config))
|
||||
if 'video_url' in i:
|
||||
source = i['video_url']
|
||||
if not item_id: page_title = i['tweet_id']
|
||||
elif 'playlist' in i:
|
||||
source = i['playlist'][0]['source']
|
||||
if not item_id: page_title = i['playlist'][0]['contentId']
|
||||
elif 'vmap_url' in i:
|
||||
vmap_url = i['vmap_url']
|
||||
vmap = get_content(vmap_url)
|
||||
source = r1(r'<MediaFile>\s*<!\[CDATA\[(.*)\]\]>', vmap)
|
||||
if not item_id: page_title = i['tweet_id']
|
||||
elif 'scribe_playlist_url' in i:
|
||||
scribe_playlist_url = i['scribe_playlist_url']
|
||||
return vine_download(scribe_playlist_url, output_dir, merge=merge, info_only=info_only)
|
||||
info = json.loads(api_content)
|
||||
if 'extended_entities' in info['globalObjects']['tweets'][item_id]:
|
||||
# if the tweet contains media, download them
|
||||
media = info['globalObjects']['tweets'][item_id]['extended_entities']['media']
|
||||
|
||||
try:
|
||||
urls = extract_m3u(source)
|
||||
except:
|
||||
urls = [source]
|
||||
elif info['globalObjects']['tweets'][item_id].get('is_quote_status') == True:
|
||||
# if the tweet does not contain media, but it quotes a tweet
|
||||
# and the quoted tweet contains media, download them
|
||||
item_id = info['globalObjects']['tweets'][item_id]['quoted_status_id_str']
|
||||
|
||||
api_url = 'https://api.twitter.com/2/timeline/conversation/%s.json?tweet_mode=extended' % item_id
|
||||
api_content = get_content(api_url, headers={'authorization': authorization, 'x-guest-token': guest_token})
|
||||
|
||||
info = json.loads(api_content)
|
||||
|
||||
if 'extended_entities' in info['globalObjects']['tweets'][item_id]:
|
||||
media = info['globalObjects']['tweets'][item_id]['extended_entities']['media']
|
||||
else:
|
||||
# quoted tweet has no media
|
||||
return
|
||||
|
||||
else:
|
||||
# no media, no quoted tweet
|
||||
return
|
||||
|
||||
for medium in media:
|
||||
if 'video_info' in medium:
|
||||
# FIXME: we're assuming one tweet only contains one video here
|
||||
variants = medium['video_info']['variants']
|
||||
variants = sorted(variants, key=lambda kv: kv.get('bitrate', 0))
|
||||
urls = [ variants[-1]['url'] ]
|
||||
size = urls_size(urls)
|
||||
mime, ext = 'video/mp4', 'mp4'
|
||||
mime, ext = variants[-1]['content_type'], 'mp4'
|
||||
|
||||
print_info(site_info, page_title, mime, size)
|
||||
if not info_only:
|
||||
download_urls(urls, page_title, ext, size, output_dir, merge=merge)
|
||||
|
||||
else:
|
||||
title = item_id + '_' + medium['media_url_https'].split('.')[-2].split('/')[-1]
|
||||
urls = [ medium['media_url_https'] + ':orig' ]
|
||||
size = urls_size(urls)
|
||||
ext = medium['media_url_https'].split('.')[-1]
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, ext, size, output_dir, merge=merge)
|
||||
|
||||
|
||||
site_info = "Twitter.com"
|
||||
download = twitter_download
|
||||
download_playlist = playlist_not_supported('twitter')
|
||||
|
137
src/you_get/extractors/ucas.py
Normal file
137
src/you_get/extractors/ucas.py
Normal file
@ -0,0 +1,137 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['ucas_download', 'ucas_download_single', 'ucas_download_playlist']
|
||||
|
||||
from ..common import *
|
||||
import urllib.error
|
||||
import http.client
|
||||
from time import time
|
||||
from random import random
|
||||
import xml.etree.ElementTree as ET
|
||||
from copy import copy
|
||||
|
||||
"""
|
||||
Do not replace http.client with get_content
|
||||
for UCAS's server is not correctly returning data!
|
||||
"""
|
||||
|
||||
def dictify(r,root=True):
|
||||
"""http://stackoverflow.com/a/30923963/2946714"""
|
||||
if root:
|
||||
return {r.tag : dictify(r, False)}
|
||||
d=copy(r.attrib)
|
||||
if r.text:
|
||||
d["_text"]=r.text
|
||||
for x in r.findall("./*"):
|
||||
if x.tag not in d:
|
||||
d[x.tag]=[]
|
||||
d[x.tag].append(dictify(x,False))
|
||||
return d
|
||||
|
||||
def _get_video_query_url(resourceID):
|
||||
# has to be like this
|
||||
headers = {
|
||||
'DNT': '1',
|
||||
'Accept-Encoding': 'gzip, deflate',
|
||||
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
|
||||
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.47 Safari/537.36',
|
||||
'Accept': '*/*',
|
||||
'Referer': 'http://v.ucas.ac.cn/',
|
||||
'Connection': 'keep-alive',
|
||||
}
|
||||
conn = http.client.HTTPConnection("210.76.211.10")
|
||||
|
||||
conn.request("GET", "/vplus/remote.do?method=query2&loginname=videocas&pwd=af1c7a4c5f77f790722f7cae474c37e281203765d423a23b&resource=%5B%7B%22resourceID%22%3A%22" + resourceID + "%22%2C%22on%22%3A1%2C%22time%22%3A600%2C%22eid%22%3A100%2C%22w%22%3A800%2C%22h%22%3A600%7D%5D&timeStamp=" + str(int(time())), headers=headers)
|
||||
res = conn.getresponse()
|
||||
data = res.read()
|
||||
|
||||
info = data.decode("utf-8")
|
||||
return match1(info, r'video":"(.+)"')
|
||||
|
||||
def _get_virtualPath(video_query_url):
|
||||
#getResourceJsCode2
|
||||
html = get_content(video_query_url)
|
||||
|
||||
return match1(html, r"function\s+getVirtualPath\(\)\s+{\s+return\s+'(\w+)'")
|
||||
|
||||
|
||||
def _get_video_list(resourceID):
|
||||
""""""
|
||||
conn = http.client.HTTPConnection("210.76.211.10")
|
||||
|
||||
conn.request("GET", '/vplus/member/resource.do?isyulan=0&method=queryFlashXmlByResourceId&resourceId={resourceID}&randoms={randoms}'.format(resourceID = resourceID,
|
||||
randoms = random()))
|
||||
res = conn.getresponse()
|
||||
data = res.read()
|
||||
|
||||
video_xml = data.decode("utf-8")
|
||||
|
||||
root = ET.fromstring(video_xml.split('___!!!___')[0])
|
||||
|
||||
r = dictify(root)
|
||||
|
||||
huge_list = []
|
||||
# main
|
||||
huge_list.append([i['value'] for i in sorted(r['video']['mainUrl'][0]['_flv'][0]['part'][0]['video'], key=lambda k: int(k['index']))])
|
||||
|
||||
# sub
|
||||
if '_flv' in r['video']['subUrl'][0]:
|
||||
huge_list.append([i['value'] for i in sorted(r['video']['subUrl'][0]['_flv'][0]['part'][0]['video'], key=lambda k: int(k['index']))])
|
||||
|
||||
return huge_list
|
||||
|
||||
def _ucas_get_url_lists_by_resourceID(resourceID):
|
||||
video_query_url = _get_video_query_url(resourceID)
|
||||
assert video_query_url != '', 'Cannot find video GUID!'
|
||||
|
||||
virtualPath = _get_virtualPath(video_query_url)
|
||||
assert virtualPath != '', 'Cannot find virtualPath!'
|
||||
|
||||
url_lists = _get_video_list(resourceID)
|
||||
assert url_lists, 'Cannot find any URL to download!'
|
||||
|
||||
# make real url
|
||||
# credit to a mate in UCAS
|
||||
for video_type_id, video_urls in enumerate(url_lists):
|
||||
for k, path in enumerate(video_urls):
|
||||
url_lists[video_type_id][k] = 'http://210.76.211.10/vplus/member/resource.do?virtualPath={virtualPath}&method=getImgByStream&imgPath={path}'.format(virtualPath = virtualPath,
|
||||
path = path)
|
||||
|
||||
return url_lists
|
||||
|
||||
def ucas_download_single(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
'''video page'''
|
||||
html = get_content(url)
|
||||
# resourceID is UUID
|
||||
resourceID = re.findall( r'resourceID":"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})', html)[0]
|
||||
assert resourceID != '', 'Cannot find resourceID!'
|
||||
|
||||
title = match1(html, r'<div class="bc-h">(.+)</div>')
|
||||
url_lists = _ucas_get_url_lists_by_resourceID(resourceID)
|
||||
assert url_lists, 'Cannot find any URL of such class!'
|
||||
|
||||
for k, part in enumerate(url_lists):
|
||||
part_title = title + '_' + str(k)
|
||||
print_info(site_info, part_title, 'flv', 0)
|
||||
if not info_only:
|
||||
download_urls(part, part_title, 'flv', total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
def ucas_download_playlist(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
'''course page'''
|
||||
html = get_content(url)
|
||||
|
||||
parts = re.findall( r'(getplaytitle.do\?.+)"', html)
|
||||
assert parts, 'No part found!'
|
||||
|
||||
for part_path in parts:
|
||||
ucas_download('http://v.ucas.ac.cn/course/' + part_path, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
|
||||
def ucas_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
if 'classid=' in url and 'getplaytitle.do' in url:
|
||||
ucas_download_single(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif 'CourseIndex.do' in url:
|
||||
ucas_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
|
||||
site_info = "UCAS"
|
||||
download = ucas_download
|
||||
download_playlist = ucas_download_playlist
|
@ -6,12 +6,17 @@ from ..common import *
|
||||
from .embed import *
|
||||
|
||||
def universal_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
try:
|
||||
content_type = get_head(url, headers=fake_headers)['Content-Type']
|
||||
except:
|
||||
content_type = get_head(url, headers=fake_headers, get_method='GET')['Content-Type']
|
||||
if content_type.startswith('text/html'):
|
||||
try:
|
||||
embed_download(url, output_dir, merge=merge, info_only=info_only)
|
||||
except: pass
|
||||
else: return
|
||||
embed_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
except Exception:
|
||||
pass
|
||||
else:
|
||||
return
|
||||
|
||||
domains = url.split('/')[2].split('.')
|
||||
if len(domains) > 2: domains = domains[1:]
|
||||
@ -26,6 +31,38 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
if page_title:
|
||||
page_title = unescape_html(page_title)
|
||||
|
||||
meta_videos = re.findall(r'<meta property="og:video:url" content="([^"]*)"', page)
|
||||
if meta_videos:
|
||||
try:
|
||||
for meta_video in meta_videos:
|
||||
meta_video_url = unescape_html(meta_video)
|
||||
type_, ext, size = url_info(meta_video_url)
|
||||
print_info(site_info, page_title, type_, size)
|
||||
if not info_only:
|
||||
download_urls([meta_video_url], page_title,
|
||||
ext, size,
|
||||
output_dir=output_dir, merge=merge,
|
||||
faker=True)
|
||||
except:
|
||||
pass
|
||||
else:
|
||||
return
|
||||
|
||||
hls_urls = re.findall(r'(https?://[^;"\'\\]+' + '\.m3u8?' +
|
||||
r'[^;"\'\\]*)', page)
|
||||
if hls_urls:
|
||||
try:
|
||||
for hls_url in hls_urls:
|
||||
type_, ext, size = url_info(hls_url)
|
||||
print_info(site_info, page_title, type_, size)
|
||||
if not info_only:
|
||||
download_url_ffmpeg(url=hls_url, title=page_title,
|
||||
ext='mp4', output_dir=output_dir)
|
||||
except:
|
||||
pass
|
||||
else:
|
||||
return
|
||||
|
||||
# most common media file extensions on the Internet
|
||||
media_exts = ['\.flv', '\.mp3', '\.mp4', '\.webm',
|
||||
'[-_]1\d\d\d\.jpe?g', '[-_][6-9]\d\d\.jpe?g', # tumblr
|
||||
@ -38,18 +75,49 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
|
||||
urls = []
|
||||
for i in media_exts:
|
||||
urls += re.findall(r'(https?://[^;"\'\\]+' + i + r'[^;"\'\\]*)', page)
|
||||
urls += re.findall(r'(https?://[^ ;&"\'\\<>]+' + i + r'[^ ;&"\'\\<>]*)', page)
|
||||
|
||||
p_urls = re.findall(r'(https?%3A%2F%2F[^;&]+' + i + r'[^;&]*)', page)
|
||||
p_urls = re.findall(r'(https?%3A%2F%2F[^;&"]+' + i + r'[^;&"]*)', page)
|
||||
urls += [parse.unquote(url) for url in p_urls]
|
||||
|
||||
q_urls = re.findall(r'(https?:\\\\/\\\\/[^;"\']+' + i + r'[^;"\']*)', page)
|
||||
q_urls = re.findall(r'(https?:\\\\/\\\\/[^ ;"\'<>]+' + i + r'[^ ;"\'<>]*)', page)
|
||||
urls += [url.replace('\\\\/', '/') for url in q_urls]
|
||||
|
||||
# a link href to an image is often an interesting one
|
||||
urls += re.findall(r'href="(https?://[^"]+\.jpe?g)"', page)
|
||||
urls += re.findall(r'href="(https?://[^"]+\.png)"', page)
|
||||
urls += re.findall(r'href="(https?://[^"]+\.gif)"', page)
|
||||
urls += re.findall(r'href="(https?://[^"]+\.jpe?g)"', page, re.I)
|
||||
urls += re.findall(r'href="(https?://[^"]+\.png)"', page, re.I)
|
||||
urls += re.findall(r'href="(https?://[^"]+\.gif)"', page, re.I)
|
||||
|
||||
# <img> with high widths
|
||||
urls += re.findall(r'<img src="([^"]*)"[^>]*width="\d\d\d+"', page, re.I)
|
||||
|
||||
# relative path
|
||||
rel_urls = []
|
||||
rel_urls += re.findall(r'href="(\.[^"]+\.jpe?g)"', page, re.I)
|
||||
rel_urls += re.findall(r'href="(\.[^"]+\.png)"', page, re.I)
|
||||
rel_urls += re.findall(r'href="(\.[^"]+\.gif)"', page, re.I)
|
||||
for rel_url in rel_urls:
|
||||
urls += [ r1(r'(.*/)', url) + rel_url ]
|
||||
|
||||
# site-relative path
|
||||
rel_urls = []
|
||||
rel_urls += re.findall(r'href="(/[^"]+\.jpe?g)"', page, re.I)
|
||||
rel_urls += re.findall(r'href="(/[^"]+\.png)"', page, re.I)
|
||||
rel_urls += re.findall(r'href="(/[^"]+\.gif)"', page, re.I)
|
||||
for rel_url in rel_urls:
|
||||
urls += [ r1(r'(https?://[^/]+)', url) + rel_url ]
|
||||
|
||||
# sometimes naive
|
||||
urls += re.findall(r'data-original="(https?://[^"]+\.jpe?g)"', page, re.I)
|
||||
urls += re.findall(r'data-original="(https?://[^"]+\.png)"', page, re.I)
|
||||
urls += re.findall(r'data-original="(https?://[^"]+\.gif)"', page, re.I)
|
||||
|
||||
# MPEG-DASH MPD
|
||||
mpd_urls = re.findall(r'src="(https?://[^"]+\.mpd)"', page)
|
||||
for mpd_url in mpd_urls:
|
||||
cont = get_content(mpd_url)
|
||||
base_url = r1(r'<BaseURL>(.*)</BaseURL>', cont)
|
||||
urls += [ r1(r'(.*/)[^/]*', mpd_url) + base_url ]
|
||||
|
||||
# have some candy!
|
||||
candies = []
|
||||
@ -57,23 +125,35 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
for url in set(urls):
|
||||
filename = parse.unquote(url.split('/')[-1])
|
||||
if 5 <= len(filename) <= 80:
|
||||
title = '.'.join(filename.split('.')[:-1])
|
||||
title = '.'.join(filename.split('.')[:-1]) or filename
|
||||
else:
|
||||
title = '%s' % i
|
||||
i += 1
|
||||
|
||||
if r1(r'(https://pinterest.com/pin/)', url):
|
||||
continue
|
||||
|
||||
candies.append({'url': url,
|
||||
'title': title})
|
||||
|
||||
for candy in candies:
|
||||
try:
|
||||
try:
|
||||
mime, ext, size = url_info(candy['url'], faker=False)
|
||||
assert size
|
||||
except:
|
||||
mime, ext, size = url_info(candy['url'], faker=True)
|
||||
if not size: size = float('Int')
|
||||
if not size: size = float('Inf')
|
||||
except:
|
||||
continue
|
||||
else:
|
||||
print_info(site_info, candy['title'], ext, size)
|
||||
if not info_only:
|
||||
try:
|
||||
download_urls([candy['url']], candy['title'], ext, size,
|
||||
output_dir=output_dir, merge=merge,
|
||||
faker=False)
|
||||
except:
|
||||
download_urls([candy['url']], candy['title'], ext, size,
|
||||
output_dir=output_dir, merge=merge,
|
||||
faker=True)
|
||||
@ -81,10 +161,10 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
|
||||
else:
|
||||
# direct download
|
||||
filename = parse.unquote(url.split('/')[-1])
|
||||
title = '.'.join(filename.split('.')[:-1])
|
||||
ext = filename.split('.')[-1]
|
||||
_, _, size = url_info(url, faker=True)
|
||||
url_trunk = url.split('?')[0] # strip query string
|
||||
filename = parse.unquote(url_trunk.split('/')[-1]) or parse.unquote(url_trunk.split('/')[-2])
|
||||
title = '.'.join(filename.split('.')[:-1]) or filename
|
||||
_, ext, size = url_info(url, faker=True)
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size,
|
||||
|
@ -1,44 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['videomega_download']
|
||||
|
||||
from ..common import *
|
||||
import ssl
|
||||
|
||||
def videomega_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
# Hot-plug cookie handler
|
||||
ssl_context = request.HTTPSHandler(
|
||||
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
|
||||
cookie_handler = request.HTTPCookieProcessor()
|
||||
opener = request.build_opener(ssl_context, cookie_handler)
|
||||
opener.addheaders = [('Referer', url),
|
||||
('Cookie', 'noadvtday=0')]
|
||||
request.install_opener(opener)
|
||||
|
||||
if re.search(r'view\.php', url):
|
||||
php_url = url
|
||||
else:
|
||||
content = get_content(url)
|
||||
m = re.search(r'ref="([^"]*)";\s*width="([^"]*)";\s*height="([^"]*)"', content)
|
||||
ref = m.group(1)
|
||||
width, height = m.group(2), m.group(3)
|
||||
php_url = 'http://videomega.tv/view.php?ref=%s&width=%s&height=%s' % (ref, width, height)
|
||||
content = get_content(php_url)
|
||||
|
||||
title = match1(content, r'<title>(.*)</title>')
|
||||
js = match1(content, r'(eval.*)')
|
||||
t = match1(js, r'\$\("\w+"\)\.\w+\("\w+","([^"]+)"\)')
|
||||
t = re.sub(r'(\w)', r'{\1}', t)
|
||||
t = t.translate({87 + i: str(i) for i in range(10, 36)})
|
||||
s = match1(js, r"'([^']+)'\.split").split('|')
|
||||
src = t.format(*s)
|
||||
|
||||
type, ext, size = url_info(src, faker=True)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([src], title, ext, size, output_dir, merge=merge, faker=True)
|
||||
|
||||
site_info = "Videomega.tv"
|
||||
download = videomega_download
|
||||
download_playlist = playlist_not_supported('videomega')
|
@ -1,40 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['vidto_download']
|
||||
|
||||
from ..common import *
|
||||
import pdb
|
||||
import time
|
||||
|
||||
|
||||
def vidto_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_content(url)
|
||||
params = {}
|
||||
r = re.findall(
|
||||
r'type="(?:hidden|submit)?"(?:.*?)name="(.+?)"\s* value="?(.+?)">', html)
|
||||
for name, value in r:
|
||||
params[name] = value
|
||||
data = parse.urlencode(params).encode('utf-8')
|
||||
req = request.Request(url)
|
||||
print("Please wait for 6 seconds...")
|
||||
time.sleep(6)
|
||||
print("Starting")
|
||||
new_html = request.urlopen(req, data).read().decode('utf-8', 'replace')
|
||||
new_stff = re.search('lnk_download" href="(.*?)">', new_html)
|
||||
if(new_stff):
|
||||
url = new_stff.group(1)
|
||||
title = params['fname']
|
||||
type = ""
|
||||
ext = ""
|
||||
a, b, size = url_info(url)
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
||||
else:
|
||||
print("cannot find link, please review")
|
||||
pdb.set_trace()
|
||||
|
||||
|
||||
site_info = "vidto.me"
|
||||
download = vidto_download
|
||||
download_playlist = playlist_not_supported('vidto')
|
@ -3,7 +3,12 @@
|
||||
__all__ = ['vimeo_download', 'vimeo_download_by_id', 'vimeo_download_by_channel', 'vimeo_download_by_channel_id']
|
||||
|
||||
from ..common import *
|
||||
from ..util.log import *
|
||||
from ..extractor import VideoExtractor
|
||||
from json import loads
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
|
||||
access_token = 'f6785418277b72c7c87d3132c79eec24' #By Beining
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
@ -11,10 +16,10 @@ def vimeo_download_by_channel(url, output_dir='.', merge=False, info_only=False,
|
||||
"""str->None"""
|
||||
# https://vimeo.com/channels/464686
|
||||
channel_id = match1(url, r'http://vimeo.com/channels/(\w+)')
|
||||
vimeo_download_by_channel_id(channel_id, output_dir, merge, info_only)
|
||||
vimeo_download_by_channel_id(channel_id, output_dir, merge, info_only, **kwargs)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def vimeo_download_by_channel_id(channel_id, output_dir='.', merge=False, info_only=False):
|
||||
def vimeo_download_by_channel_id(channel_id, output_dir='.', merge=False, info_only=False, **kwargs):
|
||||
"""str/int->None"""
|
||||
html = get_content('https://api.vimeo.com/channels/{channel_id}/videos?access_token={access_token}'.format(channel_id=channel_id, access_token=access_token))
|
||||
data = loads(html)
|
||||
@ -25,15 +30,116 @@ def vimeo_download_by_channel_id(channel_id, output_dir='.', merge=False, info_o
|
||||
id_list.append(match1(i['uri'], r'/videos/(\w+)'))
|
||||
|
||||
for id in id_list:
|
||||
vimeo_download_by_id(id, None, output_dir, merge, info_only)
|
||||
try:
|
||||
vimeo_download_by_id(id, None, output_dir, merge, info_only, **kwargs)
|
||||
except urllib.error.URLError as e:
|
||||
log.w('{} failed with {}'.format(id, e))
|
||||
|
||||
class VimeoExtractor(VideoExtractor):
|
||||
stream_types = [
|
||||
{'id': '2160p', 'video_profile': '3840x2160'},
|
||||
{'id': '1440p', 'video_profile': '2560x1440'},
|
||||
{'id': '1080p', 'video_profile': '1920x1080'},
|
||||
{'id': '720p', 'video_profile': '1280x720'},
|
||||
{'id': '540p', 'video_profile': '960x540'},
|
||||
{'id': '360p', 'video_profile': '640x360'}
|
||||
]
|
||||
name = 'Vimeo'
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
headers = fake_headers.copy()
|
||||
if 'referer' in kwargs:
|
||||
headers['Referer'] = kwargs['referer']
|
||||
|
||||
try:
|
||||
page = get_content('https://vimeo.com/{}'.format(self.vid))
|
||||
cfg_patt = r'clip_page_config\s*=\s*(\{.+?\});'
|
||||
cfg = json.loads(match1(page, cfg_patt))
|
||||
video_page = get_content(cfg['player']['config_url'], headers=headers)
|
||||
self.title = cfg['clip']['title']
|
||||
info = json.loads(video_page)
|
||||
except Exception as e:
|
||||
page = get_content('https://player.vimeo.com/video/{}'.format(self.vid))
|
||||
self.title = r1(r'<title>([^<]+)</title>', page)
|
||||
info = json.loads(match1(page, r'var t=(\{.+?\});'))
|
||||
|
||||
plain = info['request']['files']['progressive']
|
||||
for s in plain:
|
||||
meta = dict(src=[s['url']], container='mp4')
|
||||
meta['video_profile'] = '{}x{}'.format(s['width'], s['height'])
|
||||
for stream in self.__class__.stream_types:
|
||||
if s['quality'] == stream['id']:
|
||||
self.streams[s['quality']] = meta
|
||||
self.master_m3u8 = info['request']['files']['hls']['cdns']
|
||||
|
||||
def extract(self, **kwargs):
|
||||
for s in self.streams:
|
||||
self.streams[s]['size'] = urls_size(self.streams[s]['src'])
|
||||
|
||||
master_m3u8s = []
|
||||
for m in self.master_m3u8:
|
||||
master_m3u8s.append(self.master_m3u8[m]['url'])
|
||||
|
||||
master_content = None
|
||||
master_url = None
|
||||
|
||||
for master_u in master_m3u8s:
|
||||
try:
|
||||
master_content = get_content(master_u).split('\n')
|
||||
except urllib.error.URLError:
|
||||
continue
|
||||
else:
|
||||
master_url = master_u
|
||||
|
||||
if master_content is None:
|
||||
return
|
||||
|
||||
lines = []
|
||||
for line in master_content:
|
||||
if len(line.strip()) > 0:
|
||||
lines.append(line.strip())
|
||||
|
||||
pos = 0
|
||||
while pos < len(lines):
|
||||
if lines[pos].startswith('#EXT-X-STREAM-INF'):
|
||||
patt = 'RESOLUTION=(\d+)x(\d+)'
|
||||
hit = re.search(patt, lines[pos])
|
||||
if hit is None:
|
||||
continue
|
||||
width = hit.group(1)
|
||||
height = hit.group(2)
|
||||
|
||||
if height in ('2160', '1440'):
|
||||
m3u8_url = urllib.parse.urljoin(master_url, lines[pos+1])
|
||||
meta = dict(m3u8_url=m3u8_url, container='m3u8')
|
||||
if height == '1440':
|
||||
meta['video_profile'] = '2560x1440'
|
||||
else:
|
||||
meta['video_profile'] = '3840x2160'
|
||||
meta['size'] = 0
|
||||
meta['src'] = general_m3u8_extractor(m3u8_url)
|
||||
self.streams[height+'p'] = meta
|
||||
|
||||
pos += 2
|
||||
else:
|
||||
pos += 1
|
||||
self.streams_sorted = []
|
||||
for stream_type in self.stream_types:
|
||||
if stream_type['id'] in self.streams:
|
||||
item = [('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())
|
||||
self.streams_sorted.append(dict(item))
|
||||
|
||||
|
||||
|
||||
def vimeo_download_by_id(id, title=None, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
'''
|
||||
try:
|
||||
# normal Vimeo video
|
||||
html = get_content('https://vimeo.com/' + id)
|
||||
config_url = unescape_html(r1(r'data-config-url="([^"]+)"', html))
|
||||
video_page = get_content(config_url, headers=fake_headers)
|
||||
title = r1(r'"title":"([^"]+)"', video_page)
|
||||
cfg_patt = r'clip_page_config\s*=\s*(\{.+?\});'
|
||||
cfg = json.loads(match1(html, cfg_patt))
|
||||
video_page = get_content(cfg['player']['config_url'], headers=fake_headers)
|
||||
title = cfg['clip']['title']
|
||||
info = loads(video_page)
|
||||
except:
|
||||
# embedded player - referer may be required
|
||||
@ -42,7 +148,7 @@ def vimeo_download_by_id(id, title=None, output_dir='.', merge=True, info_only=F
|
||||
|
||||
video_page = get_content('http://player.vimeo.com/video/%s' % id, headers=fake_headers)
|
||||
title = r1(r'<title>([^<]+)</title>', video_page)
|
||||
info = loads(match1(video_page, r'var t=(\{[^;]+\});'))
|
||||
info = loads(match1(video_page, r'var t=(\{.+?\});'))
|
||||
|
||||
streams = info['request']['files']['progressive']
|
||||
streams = sorted(streams, key=lambda i: i['height'])
|
||||
@ -53,6 +159,9 @@ def vimeo_download_by_id(id, title=None, output_dir='.', merge=True, info_only=F
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge, faker=True)
|
||||
'''
|
||||
site = VimeoExtractor()
|
||||
site.download_by_vid(id, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
def vimeo_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if re.match(r'https?://vimeo.com/channels/\w+', url):
|
||||
|
@ -3,15 +3,26 @@
|
||||
__all__ = ['vine_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
|
||||
|
||||
def vine_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url)
|
||||
html = get_content(url)
|
||||
|
||||
vid = r1(r'vine.co/v/([^/]+)', url)
|
||||
video_id = r1(r'vine.co/v/([^/]+)', url)
|
||||
title = r1(r'<title>([^<]*)</title>', html)
|
||||
stream = r1(r'<meta property="twitter:player:stream" content="([^"]*)">', html)
|
||||
if not stream: # https://vine.co/v/.../card
|
||||
stream = r1(r'"videoUrl":"([^"]+)"', html).replace('\\/', '/')
|
||||
stream = r1(r'"videoUrl":"([^"]+)"', html)
|
||||
if stream:
|
||||
stream = stream.replace('\\/', '/')
|
||||
else:
|
||||
posts_url = 'https://archive.vine.co/posts/' + video_id + '.json'
|
||||
json_data = json.loads(get_content(posts_url))
|
||||
stream = json_data['videoDashUrl']
|
||||
title = json_data['description']
|
||||
if title == "":
|
||||
title = json_data['username'].replace(" ", "_") + "_" + video_id
|
||||
|
||||
mime, ext, size = url_info(stream)
|
||||
|
||||
@ -19,6 +30,7 @@ def vine_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if not info_only:
|
||||
download_urls([stream], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
|
||||
site_info = "Vine.co"
|
||||
download = vine_download
|
||||
download_playlist = playlist_not_supported('vine')
|
||||
|
@ -22,6 +22,19 @@ def get_video_info(url):
|
||||
return url, title, ext, size
|
||||
|
||||
|
||||
def get_video_from_user_videolist(url):
|
||||
ep = 'https://vk.com/al_video.php'
|
||||
to_post = dict(act='show', al=1, module='direct', video=re.search(r'video(\d+_\d+)', url).group(1))
|
||||
page = post_content(ep, post_data=to_post)
|
||||
video_pt = r'<source src="(.+?)" type="video\/mp4"'
|
||||
url = re.search(video_pt, page).group(1)
|
||||
title = re.search(r'<div class="mv_title".+?>(.+?)</div>', page).group(1)
|
||||
mime, ext, size = url_info(url)
|
||||
print_info(site_info, title, mime, size)
|
||||
|
||||
return url, title, ext, size
|
||||
|
||||
|
||||
def get_image_info(url):
|
||||
image_page = get_content(url)
|
||||
# used for title - vk page owner
|
||||
@ -43,6 +56,8 @@ def vk_download(url, output_dir='.', stream_type=None, merge=True, info_only=Fal
|
||||
link, title, ext, size = get_video_info(url)
|
||||
elif re.match(r'(.+)vk\.com\/photo(.+)', url):
|
||||
link, title, ext, size = get_image_info(url)
|
||||
elif re.search(r'vk\.com\/video\d+_\d+', url):
|
||||
link, title, ext, size = get_video_from_user_videolist(url)
|
||||
else:
|
||||
raise NotImplementedError('Nothing to download here')
|
||||
|
||||
|
@ -28,51 +28,52 @@ def location_dec(str):
|
||||
return parse.unquote(out).replace("^", "0")
|
||||
|
||||
def xiami_download_lyric(lrc_url, file_name, output_dir):
|
||||
lrc = get_html(lrc_url, faker = True)
|
||||
lrc = get_content(lrc_url, headers=fake_headers)
|
||||
filename = get_filename(file_name)
|
||||
if len(lrc) > 0:
|
||||
with open(output_dir + "/" + filename + '.lrc', 'w', encoding='utf-8') as x:
|
||||
x.write(lrc)
|
||||
|
||||
def xiami_download_pic(pic_url, file_name, output_dir):
|
||||
from ..util.strings import get_filename
|
||||
pic_url = pic_url.replace('_1', '')
|
||||
pos = pic_url.rfind('.')
|
||||
ext = pic_url[pos:]
|
||||
pic = get_response(pic_url, faker = True).data
|
||||
pic = get_content(pic_url, headers=fake_headers, decoded=False)
|
||||
if len(pic) > 0:
|
||||
with open(output_dir + "/" + file_name.replace('/', '-') + ext, 'wb') as x:
|
||||
x.write(pic)
|
||||
|
||||
def xiami_download_song(sid, output_dir = '.', merge = True, info_only = False):
|
||||
xml = get_html('http://www.xiami.com/song/playlist/id/%s/object_name/default/object_id/0' % sid, faker = True)
|
||||
def xiami_download_song(sid, output_dir = '.', info_only = False):
|
||||
xml = get_content('http://www.xiami.com/song/playlist/id/%s/object_name/default/object_id/0' % sid, headers=fake_headers)
|
||||
doc = parseString(xml)
|
||||
i = doc.getElementsByTagName("track")[0]
|
||||
artist = i.getElementsByTagName("artist")[0].firstChild.nodeValue
|
||||
album_name = i.getElementsByTagName("album_name")[0].firstChild.nodeValue
|
||||
song_title = i.getElementsByTagName("title")[0].firstChild.nodeValue
|
||||
song_title = i.getElementsByTagName("name")[0].firstChild.nodeValue
|
||||
url = location_dec(i.getElementsByTagName("location")[0].firstChild.nodeValue)
|
||||
try:
|
||||
lrc_url = i.getElementsByTagName("lyric")[0].firstChild.nodeValue
|
||||
except:
|
||||
pass
|
||||
type, ext, size = url_info(url, faker = True)
|
||||
type_, ext, size = url_info(url, headers=fake_headers)
|
||||
if not ext:
|
||||
ext = 'mp3'
|
||||
|
||||
print_info(site_info, song_title, ext, size)
|
||||
if not info_only:
|
||||
file_name = "%s - %s - %s" % (song_title, artist, album_name)
|
||||
download_urls([url], file_name, ext, size, output_dir, merge = merge, faker = True)
|
||||
download_urls([url], file_name, ext, size, output_dir, headers=fake_headers)
|
||||
try:
|
||||
xiami_download_lyric(lrc_url, file_name, output_dir)
|
||||
except:
|
||||
pass
|
||||
|
||||
def xiami_download_showcollect(cid, output_dir = '.', merge = True, info_only = False):
|
||||
html = get_html('http://www.xiami.com/song/showcollect/id/' + cid, faker = True)
|
||||
def xiami_download_showcollect(cid, output_dir = '.', info_only = False):
|
||||
html = get_content('http://www.xiami.com/song/showcollect/id/' + cid, headers=fake_headers)
|
||||
collect_name = r1(r'<title>(.*)</title>', html)
|
||||
|
||||
xml = get_html('http://www.xiami.com/song/playlist/id/%s/type/3' % cid, faker = True)
|
||||
xml = get_content('http://www.xiami.com/song/playlist/id/%s/type/3' % cid, headers=fake_headers)
|
||||
doc = parseString(xml)
|
||||
output_dir = output_dir + "/" + "[" + collect_name + "]"
|
||||
tracks = doc.getElementsByTagName("track")
|
||||
@ -92,14 +93,14 @@ def xiami_download_showcollect(cid, output_dir = '.', merge = True, info_only =
|
||||
lrc_url = i.getElementsByTagName("lyric")[0].firstChild.nodeValue
|
||||
except:
|
||||
pass
|
||||
type, ext, size = url_info(url, faker = True)
|
||||
type_, ext, size = url_info(url, headers=fake_headers)
|
||||
if not ext:
|
||||
ext = 'mp3'
|
||||
|
||||
print_info(site_info, song_title, type, size)
|
||||
print_info(site_info, song_title, ext, size)
|
||||
if not info_only:
|
||||
file_name = "%02d.%s - %s - %s" % (track_nr, song_title, artist, album_name)
|
||||
download_urls([url], file_name, ext, size, output_dir, merge = merge, faker = True)
|
||||
download_urls([url], file_name, ext, size, output_dir, headers=fake_headers)
|
||||
try:
|
||||
xiami_download_lyric(lrc_url, file_name, output_dir)
|
||||
except:
|
||||
@ -107,17 +108,22 @@ def xiami_download_showcollect(cid, output_dir = '.', merge = True, info_only =
|
||||
|
||||
track_nr += 1
|
||||
|
||||
def xiami_download_album(aid, output_dir = '.', merge = True, info_only = False):
|
||||
xml = get_html('http://www.xiami.com/song/playlist/id/%s/type/1' % aid, faker = True)
|
||||
def xiami_download_album(aid, output_dir='.', info_only=False):
|
||||
xml = get_content('http://www.xiami.com/song/playlist/id/%s/type/1' % aid, headers=fake_headers)
|
||||
album_name = r1(r'<album_name><!\[CDATA\[(.*)\]\]>', xml)
|
||||
artist = r1(r'<artist><!\[CDATA\[(.*)\]\]>', xml)
|
||||
doc = parseString(xml)
|
||||
output_dir = output_dir + "/%s - %s" % (artist, album_name)
|
||||
tracks = doc.getElementsByTagName("track")
|
||||
track_list = doc.getElementsByTagName('trackList')[0]
|
||||
tracks = track_list.getElementsByTagName("track")
|
||||
track_nr = 1
|
||||
pic_exist = False
|
||||
for i in tracks:
|
||||
song_title = i.getElementsByTagName("title")[0].firstChild.nodeValue
|
||||
#in this xml track tag is used for both "track in a trackList" and track no
|
||||
#dirty here
|
||||
if i.firstChild.nodeValue is not None:
|
||||
continue
|
||||
song_title = i.getElementsByTagName("songName")[0].firstChild.nodeValue
|
||||
url = location_dec(i.getElementsByTagName("location")[0].firstChild.nodeValue)
|
||||
try:
|
||||
lrc_url = i.getElementsByTagName("lyric")[0].firstChild.nodeValue
|
||||
@ -125,14 +131,14 @@ def xiami_download_album(aid, output_dir = '.', merge = True, info_only = False)
|
||||
pass
|
||||
if not pic_exist:
|
||||
pic_url = i.getElementsByTagName("pic")[0].firstChild.nodeValue
|
||||
type, ext, size = url_info(url, faker = True)
|
||||
type_, ext, size = url_info(url, headers=fake_headers)
|
||||
if not ext:
|
||||
ext = 'mp3'
|
||||
|
||||
print_info(site_info, song_title, type, size)
|
||||
print_info(site_info, song_title, ext, size)
|
||||
if not info_only:
|
||||
file_name = "%02d.%s" % (track_nr, song_title)
|
||||
download_urls([url], file_name, ext, size, output_dir, merge = merge, faker = True)
|
||||
download_urls([url], file_name, ext, size, output_dir, headers=fake_headers)
|
||||
try:
|
||||
xiami_download_lyric(lrc_url, file_name, output_dir)
|
||||
except:
|
||||
@ -143,22 +149,66 @@ def xiami_download_album(aid, output_dir = '.', merge = True, info_only = False)
|
||||
|
||||
track_nr += 1
|
||||
|
||||
def xiami_download(url, output_dir = '.', stream_type = None, merge = True, info_only = False, **kwargs):
|
||||
def xiami_download_mv(url, output_dir='.', merge=True, info_only=False):
|
||||
# FIXME: broken merge
|
||||
page = get_content(url, headers=fake_headers)
|
||||
title = re.findall('<title>([^<]+)', page)[0]
|
||||
vid, uid = re.findall(r'vid:"(\d+)",uid:"(\d+)"', page)[0]
|
||||
api_url = 'http://cloud.video.taobao.com/videoapi/info.php?vid=%s&uid=%s' % (vid, uid)
|
||||
result = get_content(api_url, headers=fake_headers)
|
||||
doc = parseString(result)
|
||||
video_url = doc.getElementsByTagName("video_url")[-1].firstChild.nodeValue
|
||||
length = int(doc.getElementsByTagName("length")[-1].firstChild.nodeValue)
|
||||
|
||||
v_urls = []
|
||||
k_start = 0
|
||||
total_size = 0
|
||||
while True:
|
||||
k_end = k_start + 20000000
|
||||
if k_end >= length: k_end = length - 1
|
||||
v_url = video_url + '/start_%s/end_%s/1.flv' % (k_start, k_end)
|
||||
try:
|
||||
_, ext, size = url_info(v_url)
|
||||
except:
|
||||
break
|
||||
v_urls.append(v_url)
|
||||
total_size += size
|
||||
k_start = k_end + 1
|
||||
|
||||
print_info(site_info, title, ext, total_size)
|
||||
if not info_only:
|
||||
download_urls(v_urls, title, ext, total_size, output_dir, merge=merge, headers=fake_headers)
|
||||
|
||||
def xiami_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
#albums
|
||||
if re.match(r'http://www.xiami.com/album/\d+', url):
|
||||
id = r1(r'http://www.xiami.com/album/(\d+)', url)
|
||||
xiami_download_album(id, output_dir, merge, info_only)
|
||||
xiami_download_album(id, output_dir, info_only)
|
||||
elif re.match(r'http://www.xiami.com/album/\w+', url):
|
||||
page = get_content(url, headers=fake_headers)
|
||||
album_id = re.search(r'rel="canonical"\s+href="http://www.xiami.com/album/([^"]+)"', page).group(1)
|
||||
xiami_download_album(album_id, output_dir, info_only)
|
||||
|
||||
#collections
|
||||
if re.match(r'http://www.xiami.com/collect/\d+', url):
|
||||
id = r1(r'http://www.xiami.com/collect/(\d+)', url)
|
||||
xiami_download_showcollect(id, output_dir, merge, info_only)
|
||||
xiami_download_showcollect(id, output_dir, info_only)
|
||||
|
||||
if re.match('http://www.xiami.com/song/\d+', url):
|
||||
#single track
|
||||
if re.match(r'http://www.xiami.com/song/\d+\b', url):
|
||||
id = r1(r'http://www.xiami.com/song/(\d+)', url)
|
||||
xiami_download_song(id, output_dir, merge, info_only)
|
||||
xiami_download_song(id, output_dir, info_only)
|
||||
elif re.match(r'http://www.xiami.com/song/\w+', url):
|
||||
html = get_content(url, headers=fake_headers)
|
||||
id = r1(r'rel="canonical" href="http://www.xiami.com/song/([^"]+)"', html)
|
||||
xiami_download_song(id, output_dir, info_only)
|
||||
|
||||
if re.match('http://www.xiami.com/song/detail/id/\d+', url):
|
||||
id = r1(r'http://www.xiami.com/song/detail/id/(\d+)', url)
|
||||
xiami_download_song(id, output_dir, merge, info_only)
|
||||
xiami_download_song(id, output_dir, info_only)
|
||||
|
||||
if re.match('http://www.xiami.com/mv', url):
|
||||
xiami_download_mv(url, output_dir, merge=merge, info_only=info_only)
|
||||
|
||||
site_info = "Xiami.com"
|
||||
download = xiami_download
|
||||
|
98
src/you_get/extractors/ximalaya.py
Normal file
98
src/you_get/extractors/ximalaya.py
Normal file
@ -0,0 +1,98 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['ximalaya_download_playlist', 'ximalaya_download', 'ximalaya_download_by_id']
|
||||
|
||||
from ..common import *
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
stream_types = [
|
||||
{'itag': '1', 'container': 'm4a', 'bitrate': 'default'},
|
||||
{'itag': '2', 'container': 'm4a', 'bitrate': '32'},
|
||||
{'itag': '3', 'container': 'm4a', 'bitrate': '64'}
|
||||
]
|
||||
|
||||
def ximalaya_download_by_id(id, title = None, output_dir = '.', info_only = False, stream_id = None):
|
||||
BASE_URL = 'http://www.ximalaya.com/tracks/'
|
||||
json_url = BASE_URL + id + '.json'
|
||||
json_data = json.loads(get_content(json_url, headers=fake_headers))
|
||||
if 'res' in json_data:
|
||||
if json_data['res'] == False:
|
||||
raise ValueError('Server reported id %s is invalid' % id)
|
||||
if 'is_paid' in json_data and json_data['is_paid']:
|
||||
if 'is_free' in json_data and not json_data['is_free']:
|
||||
raise ValueError('%s is paid item' % id)
|
||||
if (not title) and 'title' in json_data:
|
||||
title = json_data['title']
|
||||
#no size data in the json. should it be calculated?
|
||||
size = 0
|
||||
url = json_data['play_path_64']
|
||||
if stream_id:
|
||||
if stream_id == '1':
|
||||
url = json_data['play_path_32']
|
||||
elif stream_id == '0':
|
||||
url = json_data['play_path']
|
||||
logging.debug('ximalaya_download_by_id: %s' % url)
|
||||
ext = 'm4a'
|
||||
urls = [url]
|
||||
print('Site: %s' % site_info)
|
||||
print('title: %s' % title)
|
||||
if info_only:
|
||||
if stream_id:
|
||||
print_stream_info(stream_id)
|
||||
else:
|
||||
for item in range(0, len(stream_types)):
|
||||
print_stream_info(item)
|
||||
if not info_only:
|
||||
print('Type: MPEG-4 audio m4a')
|
||||
print('Size: N/A')
|
||||
download_urls(urls, title, ext, size, output_dir = output_dir, merge = False)
|
||||
|
||||
def ximalaya_download(url, output_dir = '.', info_only = False, stream_id = None, **kwargs):
|
||||
if re.match(r'http://www\.ximalaya\.com/(\d+)/sound/(\d+)', url):
|
||||
id = match1(url, r'http://www\.ximalaya\.com/\d+/sound/(\d+)')
|
||||
else:
|
||||
raise NotImplementedError(url)
|
||||
ximalaya_download_by_id(id, output_dir = output_dir, info_only = info_only, stream_id = stream_id)
|
||||
|
||||
def ximalaya_download_page(playlist_url, output_dir = '.', info_only = False, stream_id = None, **kwargs):
|
||||
if re.match(r'http://www\.ximalaya\.com/(\d+)/album/(\d+)', playlist_url):
|
||||
page_content = get_content(playlist_url)
|
||||
pattern = re.compile(r'<li sound_id="(\d+)"')
|
||||
ids = pattern.findall(page_content)
|
||||
for id in ids:
|
||||
try:
|
||||
ximalaya_download_by_id(id, output_dir=output_dir, info_only=info_only, stream_id=stream_id)
|
||||
except(ValueError):
|
||||
print("something wrong with %s, perhaps paid item?" % id)
|
||||
else:
|
||||
raise NotImplementedError(playlist_url)
|
||||
|
||||
def ximalaya_download_playlist(url, output_dir='.', info_only=False, stream_id=None, **kwargs):
|
||||
match_result = re.match(r'http://www\.ximalaya\.com/(\d+)/album/(\d+)', url)
|
||||
if not match_result:
|
||||
raise NotImplementedError(url)
|
||||
pages = []
|
||||
page_content = get_content(url)
|
||||
if page_content.find('<div class="pagingBar_wrapper"') == -1:
|
||||
pages.append(url)
|
||||
else:
|
||||
base_url = 'http://www.ximalaya.com/' + match_result.group(1) + '/album/' + match_result.group(2)
|
||||
html_str = '<a href=(\'|")\/' + match_result.group(1) + '\/album\/' + match_result.group(2) + '\?page='
|
||||
count = len(re.findall(html_str, page_content))
|
||||
for page_num in range(count):
|
||||
pages.append(base_url + '?page=' +str(page_num+1))
|
||||
print(pages[-1])
|
||||
for page in pages:
|
||||
ximalaya_download_page(page, output_dir=output_dir, info_only=info_only, stream_id=stream_id)
|
||||
def print_stream_info(stream_id):
|
||||
print(' - itag: %s' % stream_id)
|
||||
print(' container: %s' % 'm4a')
|
||||
print(' bitrate: %s' % stream_types[int(stream_id)]['bitrate'])
|
||||
print(' size: %s' % 'N/A')
|
||||
print(' # download-with: you-get --itag=%s [URL]' % stream_id)
|
||||
|
||||
site_info = 'ximalaya.com'
|
||||
download = ximalaya_download
|
||||
download_playlist = ximalaya_download_playlist
|
44
src/you_get/extractors/xinpianchang.py
Normal file
44
src/you_get/extractors/xinpianchang.py
Normal file
@ -0,0 +1,44 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import re
|
||||
import json
|
||||
from ..extractor import VideoExtractor
|
||||
from ..common import get_content, playlist_not_supported
|
||||
|
||||
|
||||
class Xinpianchang(VideoExtractor):
|
||||
name = 'xinpianchang'
|
||||
stream_types = [
|
||||
{'id': '4K', 'quality': '超清 4K', 'video_profile': 'mp4-4K'},
|
||||
{'id': '2K', 'quality': '超清 2K', 'video_profile': 'mp4-2K'},
|
||||
{'id': '1080', 'quality': '高清 1080P', 'video_profile': 'mp4-FHD'},
|
||||
{'id': '720', 'quality': '高清 720P', 'video_profile': 'mp4-HD'},
|
||||
{'id': '540', 'quality': '清晰 540P', 'video_profile': 'mp4-SD'},
|
||||
{'id': '360', 'quality': '流畅 360P', 'video_profile': 'mp4-LD'}
|
||||
]
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
# find key
|
||||
page_content = get_content(self.url)
|
||||
match_rule = r"vid = \"(.+?)\";"
|
||||
key = re.findall(match_rule, page_content)[0]
|
||||
|
||||
# get videos info
|
||||
video_url = 'https://openapi-vtom.vmovier.com/v3/video/' + key + '?expand=resource'
|
||||
data = json.loads(get_content(video_url))
|
||||
self.title = data["data"]["video"]["title"]
|
||||
video_info = data["data"]["resource"]["progressive"]
|
||||
|
||||
# set streams dict
|
||||
for video in video_info:
|
||||
url = video["https_url"]
|
||||
size = video["filesize"]
|
||||
profile = video["profile_code"]
|
||||
stype = [st for st in self.__class__.stream_types if st['video_profile'] == profile][0]
|
||||
|
||||
stream_data = dict(src=[url], size=size, container='mp4', quality=stype['quality'])
|
||||
self.streams[stype['id']] = stream_data
|
||||
|
||||
|
||||
download = Xinpianchang().download_by_url
|
||||
download_playlist = playlist_not_supported('xinpianchang')
|
@ -7,6 +7,24 @@ from urllib.parse import urlparse
|
||||
from json import loads
|
||||
import re
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def miaopai_download_by_smid(smid, output_dir = '.', merge = True, info_only = False):
|
||||
""""""
|
||||
api_endpoint = 'https://n.miaopai.com/api/aj_media/info.json?smid={smid}'.format(smid = smid)
|
||||
|
||||
html = get_content(api_endpoint)
|
||||
|
||||
api_content = loads(html)
|
||||
|
||||
video_url = api_content['data']['meta_data'][0]['play_urls']['l']
|
||||
title = api_content['data']['description']
|
||||
|
||||
type, ext, size = url_info(video_url)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def yixia_miaopai_download_by_scid(scid, output_dir = '.', merge = True, info_only = False):
|
||||
""""""
|
||||
@ -47,16 +65,18 @@ def yixia_xiaokaxiu_download_by_scid(scid, output_dir = '.', merge = True, info_
|
||||
def yixia_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
"""wrapper"""
|
||||
hostname = urlparse(url).hostname
|
||||
if 'miaopai.com' in hostname: #Miaopai
|
||||
if 'n.miaopai.com' == hostname:
|
||||
smid = match1(url, r'n\.miaopai\.com/media/([^.]+)')
|
||||
miaopai_download_by_smid(smid, output_dir, merge, info_only)
|
||||
return
|
||||
elif 'miaopai.com' in hostname: #Miaopai
|
||||
yixia_download_by_scid = yixia_miaopai_download_by_scid
|
||||
site_info = "Yixia Miaopai"
|
||||
|
||||
if re.match(r'http://www.miaopai.com/show/channel/\w+', url): #PC
|
||||
scid = match1(url, r'http://www.miaopai.com/show/channel/(.+)\.htm')
|
||||
elif re.match(r'http://www.miaopai.com/show/\w+', url): #PC
|
||||
scid = match1(url, r'http://www.miaopai.com/show/(.+)\.htm')
|
||||
elif re.match(r'http://m.miaopai.com/show/channel/\w+', url): #Mobile
|
||||
scid = match1(url, r'http://m.miaopai.com/show/channel/(.+)\.htm')
|
||||
scid = match1(url, r'miaopai\.com/show/channel/([^.]+)\.htm') or \
|
||||
match1(url, r'miaopai\.com/show/([^.]+)\.htm') or \
|
||||
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)\.htm') or \
|
||||
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)')
|
||||
|
||||
elif 'xiaokaxiu.com' in hostname: #Xiaokaxiu
|
||||
yixia_download_by_scid = yixia_xiaokaxiu_download_by_scid
|
||||
|
37
src/you_get/extractors/yizhibo.py
Normal file
37
src/you_get/extractors/yizhibo.py
Normal file
@ -0,0 +1,37 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['yizhibo_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
import time
|
||||
|
||||
def yizhibo_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
video_id = url[url.rfind('/')+1:].split(".")[0]
|
||||
json_request_url = 'http://www.yizhibo.com/live/h5api/get_basic_live_info?scid={}'.format(video_id)
|
||||
content = get_content(json_request_url)
|
||||
error = json.loads(content)['result']
|
||||
if (error != 1):
|
||||
raise ValueError("Error : {}".format(error))
|
||||
|
||||
data = json.loads(content)
|
||||
title = data.get('data')['live_title']
|
||||
if (title == ''):
|
||||
title = data.get('data')['nickname']
|
||||
m3u8_url = data.get('data')['play_url']
|
||||
m3u8 = get_content(m3u8_url)
|
||||
base_url = "/".join(data.get('data')['play_url'].split("/")[:7])+"/"
|
||||
part_url = re.findall(r'([0-9]+\.ts)', m3u8)
|
||||
real_url = []
|
||||
for i in part_url:
|
||||
url = base_url + i
|
||||
real_url.append(url)
|
||||
print_info(site_info, title, 'ts', float('inf'))
|
||||
if not info_only:
|
||||
if player:
|
||||
launch_player(player, [m3u8_url])
|
||||
download_urls(real_url, title, 'ts', float('inf'), output_dir, merge = merge)
|
||||
|
||||
site_info = "yizhibo.com"
|
||||
download = yizhibo_download
|
||||
download_playlist = playlist_not_supported('yizhibo')
|
@ -4,217 +4,197 @@
|
||||
from ..common import *
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
import base64
|
||||
import ssl
|
||||
import time
|
||||
import traceback
|
||||
import json
|
||||
import urllib.request
|
||||
import urllib.parse
|
||||
|
||||
|
||||
def fetch_cna():
|
||||
|
||||
def quote_cna(val):
|
||||
if '%' in val:
|
||||
return val
|
||||
return urllib.parse.quote(val)
|
||||
|
||||
if cookies:
|
||||
for cookie in cookies:
|
||||
if cookie.name == 'cna' and cookie.domain == '.youku.com':
|
||||
log.i('Found cna in imported cookies. Use it')
|
||||
return quote_cna(cookie.value)
|
||||
url = 'http://log.mmstat.com/eg.js'
|
||||
req = urllib.request.urlopen(url)
|
||||
headers = req.getheaders()
|
||||
for header in headers:
|
||||
if header[0].lower() == 'set-cookie':
|
||||
n_v = header[1].split(';')[0]
|
||||
name, value = n_v.split('=')
|
||||
if name == 'cna':
|
||||
return quote_cna(value)
|
||||
log.w('It seems that the client failed to fetch a cna cookie. Please load your own cookie if possible')
|
||||
return quote_cna('DOG4EdW4qzsCAbZyXbU+t7Jt')
|
||||
|
||||
|
||||
class Youku(VideoExtractor):
|
||||
name = "优酷 (Youku)"
|
||||
mobile_ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36'
|
||||
dispatcher_url = 'vali.cp31.ott.cibntv.net'
|
||||
|
||||
# Last updated: 2015-11-24
|
||||
stream_types = [
|
||||
{'id': 'mp4hd3', 'alias-of' : 'hd3'},
|
||||
{'id': 'hd3', 'container': 'flv', 'video_profile': '1080P'},
|
||||
{'id': 'mp4hd2', 'alias-of' : 'hd2'},
|
||||
{'id': 'hd3v2', 'container': 'flv', 'video_profile': '1080P'},
|
||||
{'id': 'mp4hd3', 'container': 'mp4', 'video_profile': '1080P'},
|
||||
{'id': 'mp4hd3v2', 'container': 'mp4', 'video_profile': '1080P'},
|
||||
|
||||
{'id': 'hd2', 'container': 'flv', 'video_profile': '超清'},
|
||||
{'id': 'mp4hd', 'alias-of' : 'mp4'},
|
||||
{'id': 'mp4', 'container': 'mp4', 'video_profile': '高清'},
|
||||
{'id': 'flvhd', 'container': 'flv', 'video_profile': '标清'},
|
||||
{'id': 'hd2v2', 'container': 'flv', 'video_profile': '超清'},
|
||||
{'id': 'mp4hd2', 'container': 'mp4', 'video_profile': '超清'},
|
||||
{'id': 'mp4hd2v2', 'container': 'mp4', 'video_profile': '超清'},
|
||||
|
||||
{'id': 'mp4hd', 'container': 'mp4', 'video_profile': '高清'},
|
||||
# not really equivalent to mp4hd
|
||||
{'id': 'flvhd', 'container': 'flv', 'video_profile': '渣清'},
|
||||
{'id': '3gphd', 'container': 'mp4', 'video_profile': '渣清'},
|
||||
|
||||
{'id': 'mp4sd', 'container': 'mp4', 'video_profile': '标清'},
|
||||
# obsolete?
|
||||
{'id': 'flv', 'container': 'flv', 'video_profile': '标清'},
|
||||
{'id': '3gphd', 'container': '3gp', 'video_profile': '标清(3GP)'},
|
||||
{'id': 'mp4', 'container': 'mp4', 'video_profile': '标清'},
|
||||
]
|
||||
|
||||
f_code_1 = 'becaf9be'
|
||||
f_code_2 = 'bf7e5f01'
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
ctype = 12 #differ from 86
|
||||
self.ua = self.__class__.mobile_ua
|
||||
self.referer = 'http://v.youku.com'
|
||||
|
||||
def trans_e(a, c):
|
||||
"""str, str->str
|
||||
This is an RC4 encryption."""
|
||||
f = h = 0
|
||||
b = list(range(256))
|
||||
result = ''
|
||||
while h < 256:
|
||||
f = (f + b[h] + ord(a[h % len(a)])) % 256
|
||||
b[h], b[f] = b[f], b[h]
|
||||
h += 1
|
||||
q = f = h = 0
|
||||
while q < len(c):
|
||||
h = (h + 1) % 256
|
||||
f = (f + b[h]) % 256
|
||||
b[h], b[f] = b[f], b[h]
|
||||
if isinstance(c[q], int):
|
||||
result += chr(c[q] ^ b[(b[h] + b[f]) % 256])
|
||||
self.page = None
|
||||
self.video_list = None
|
||||
self.video_next = None
|
||||
self.password = None
|
||||
self.api_data = None
|
||||
self.api_error_code = None
|
||||
self.api_error_msg = None
|
||||
|
||||
self.ccode = '0590'
|
||||
# Found in http://g.alicdn.com/player/ykplayer/0.5.64/youku-player.min.js
|
||||
# grep -oE '"[0-9a-zA-Z+/=]{256}"' youku-player.min.js
|
||||
self.ckey = 'DIl58SLFxFNndSV1GFNnMQVYkx1PP5tKe1siZu/86PR1u/Wh1Ptd+WOZsHHWxysSfAOhNJpdVWsdVJNsfJ8Sxd8WKVvNfAS8aS8fAOzYARzPyPc3JvtnPHjTdKfESTdnuTW6ZPvk2pNDh4uFzotgdMEFkzQ5wZVXl2Pf1/Y6hLK0OnCNxBj3+nb0v72gZ6b0td+WOZsHHWxysSo/0y9D2K42SaB8Y/+aD2K42SaB8Y/+ahU+WOZsHcrxysooUeND'
|
||||
self.utid = None
|
||||
|
||||
def youku_ups(self):
|
||||
url = 'https://ups.youku.com/ups/get.json?vid={}&ccode={}'.format(self.vid, self.ccode)
|
||||
url += '&client_ip=192.168.1.1'
|
||||
url += '&utid=' + self.utid
|
||||
url += '&client_ts=' + str(int(time.time()))
|
||||
url += '&ckey=' + urllib.parse.quote(self.ckey)
|
||||
if self.password_protected:
|
||||
url += '&password=' + self.password
|
||||
headers = dict(Referer=self.referer)
|
||||
headers['User-Agent'] = self.ua
|
||||
api_meta = json.loads(get_content(url, headers=headers))
|
||||
|
||||
self.api_data = api_meta['data']
|
||||
data_error = self.api_data.get('error')
|
||||
if data_error:
|
||||
self.api_error_code = data_error.get('code')
|
||||
self.api_error_msg = data_error.get('note')
|
||||
if 'videos' in self.api_data:
|
||||
if 'list' in self.api_data['videos']:
|
||||
self.video_list = self.api_data['videos']['list']
|
||||
if 'next' in self.api_data['videos']:
|
||||
self.video_next = self.api_data['videos']['next']
|
||||
|
||||
@classmethod
|
||||
def change_cdn(cls, url):
|
||||
# if the cnd_url starts with an ip addr, it should be youku's old CDN
|
||||
# which rejects http requests randomly with status code > 400
|
||||
# change it to the dispatcher of aliCDN can do better
|
||||
# at least a little more recoverable from HTTP 403
|
||||
if cls.dispatcher_url in url:
|
||||
return url
|
||||
elif 'k.youku.com' in url:
|
||||
return url
|
||||
else:
|
||||
result += chr(ord(c[q]) ^ b[(b[h] + b[f]) % 256])
|
||||
q += 1
|
||||
url_seg_list = list(urllib.parse.urlsplit(url))
|
||||
url_seg_list[1] = cls.dispatcher_url
|
||||
return urllib.parse.urlunsplit(url_seg_list)
|
||||
|
||||
return result
|
||||
def get_vid_from_url(self):
|
||||
# It's unreliable. check #1633
|
||||
b64p = r'([a-zA-Z0-9=]+)'
|
||||
p_list = [r'youku\.com/v_show/id_'+b64p,
|
||||
r'player\.youku\.com/player\.php/sid/'+b64p+r'/v\.swf',
|
||||
r'loader\.swf\?VideoIDS='+b64p,
|
||||
r'player\.youku\.com/embed/'+b64p]
|
||||
if not self.url:
|
||||
raise Exception('No url')
|
||||
for p in p_list:
|
||||
hit = re.search(p, self.url)
|
||||
if hit is not None:
|
||||
self.vid = hit.group(1)
|
||||
return
|
||||
|
||||
def generate_ep(self, no, streamfileids, sid, token):
|
||||
number = hex(int(str(no), 10))[2:].upper()
|
||||
if len(number) == 1:
|
||||
number = '0' + number
|
||||
fileid = streamfileids[0:8] + number + streamfileids[10:]
|
||||
ep = parse.quote(base64.b64encode(
|
||||
''.join(self.__class__.trans_e(
|
||||
self.f_code_2, #use the 86 fcode if using 86
|
||||
sid + '_' + fileid + '_' + token)).encode('latin1')),
|
||||
safe='~()*!.\''
|
||||
)
|
||||
return fileid, ep
|
||||
|
||||
# Obsolete -- used to parse m3u8 on pl.youku.com
|
||||
def parse_m3u8(m3u8):
|
||||
return re.findall(r'(http://[^?]+)\?ts_start=0', m3u8)
|
||||
|
||||
def oset(xs):
|
||||
"""Turns a list into an ordered set. (removes duplicates)"""
|
||||
mem = set()
|
||||
for x in xs:
|
||||
if x not in mem:
|
||||
mem.add(x)
|
||||
return mem
|
||||
|
||||
def get_vid_from_url(url):
|
||||
"""Extracts video ID from URL.
|
||||
"""
|
||||
return match1(url, r'youku\.com/v_show/id_([a-zA-Z0-9=]+)') or \
|
||||
match1(url, r'player\.youku\.com/player\.php/sid/([a-zA-Z0-9=]+)/v\.swf') or \
|
||||
match1(url, r'loader\.swf\?VideoIDS=([a-zA-Z0-9=]+)') or \
|
||||
match1(url, r'player\.youku\.com/embed/([a-zA-Z0-9=]+)')
|
||||
|
||||
def get_playlist_id_from_url(url):
|
||||
"""Extracts playlist ID from URL.
|
||||
"""
|
||||
return match1(url, r'youku\.com/albumlist/show\?id=([a-zA-Z0-9=]+)')
|
||||
|
||||
def download_playlist_by_url(self, url, **kwargs):
|
||||
self.url = url
|
||||
|
||||
try:
|
||||
playlist_id = self.__class__.get_playlist_id_from_url(self.url)
|
||||
assert playlist_id
|
||||
video_page = get_content('http://list.youku.com/albumlist/show?id=%s' % playlist_id)
|
||||
videos = Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', video_page))
|
||||
# Parse multi-page playlists
|
||||
last_page_url = re.findall(r'href="(/albumlist/show\?id=%s[^"]+)" title="末页"' % playlist_id, video_page)[0]
|
||||
num_pages = int(re.findall(r'page=([0-9]+)\.htm', last_page_url)[0])
|
||||
if (num_pages > 0):
|
||||
# download one by one
|
||||
for pn in range(2, num_pages + 1):
|
||||
extra_page_url = re.sub(r'page=([0-9]+)\.htm', r'page=%s.htm' % pn, last_page_url)
|
||||
extra_page = get_content('http://list.youku.com' + extra_page_url)
|
||||
videos |= Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', extra_page))
|
||||
except:
|
||||
# Show full list of episodes
|
||||
if match1(url, r'youku\.com/show_page/id_([a-zA-Z0-9=]+)'):
|
||||
ep_id = match1(url, r'youku\.com/show_page/id_([a-zA-Z0-9=]+)')
|
||||
url = 'http://www.youku.com/show_episode/id_%s' % ep_id
|
||||
|
||||
video_page = get_content(url)
|
||||
videos = Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', video_page))
|
||||
|
||||
self.title = r1(r'<meta name="title" content="([^"]+)"', video_page) or \
|
||||
r1(r'<title>([^<]+)', video_page)
|
||||
self.p_playlist()
|
||||
for video in videos:
|
||||
index = parse_query_param(video, 'f')
|
||||
try:
|
||||
self.__class__().download_by_url(video, index=index, **kwargs)
|
||||
except KeyboardInterrupt:
|
||||
raise
|
||||
except:
|
||||
exc_type, exc_value, exc_traceback = sys.exc_info()
|
||||
traceback.print_exception(exc_type, exc_value, exc_traceback)
|
||||
def get_vid_from_page(self):
|
||||
if not self.url:
|
||||
raise Exception('No url')
|
||||
self.page = get_content(self.url)
|
||||
hit = re.search(r'videoId2:"([A-Za-z0-9=]+)"', self.page)
|
||||
if hit is not None:
|
||||
self.vid = hit.group(1)
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
# Hot-plug cookie handler
|
||||
ssl_context = request.HTTPSHandler(
|
||||
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
|
||||
cookie_handler = request.HTTPCookieProcessor()
|
||||
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
|
||||
proxy = parse_host(kwargs['extractor_proxy'])
|
||||
proxy_handler = request.ProxyHandler({
|
||||
'http': '%s:%s' % proxy,
|
||||
'https': '%s:%s' % proxy,
|
||||
})
|
||||
else:
|
||||
proxy_handler = request.ProxyHandler({})
|
||||
opener = request.build_opener(ssl_context, cookie_handler, proxy_handler)
|
||||
opener.addheaders = [('Cookie','__ysuid={}'.format(time.time()))]
|
||||
request.install_opener(opener)
|
||||
|
||||
assert self.url or self.vid
|
||||
|
||||
if self.url and not self.vid:
|
||||
self.vid = self.__class__.get_vid_from_url(self.url)
|
||||
self.get_vid_from_url()
|
||||
|
||||
if self.vid is None:
|
||||
self.download_playlist_by_url(self.url, **kwargs)
|
||||
exit(0)
|
||||
self.get_vid_from_page()
|
||||
|
||||
#HACK!
|
||||
if 'api_url' in kwargs:
|
||||
api_url = kwargs['api_url'] #85
|
||||
api12_url = kwargs['api12_url'] #86
|
||||
self.ctype = kwargs['ctype']
|
||||
self.title = kwargs['title']
|
||||
if self.vid is None:
|
||||
log.wtf('Cannot fetch vid')
|
||||
|
||||
else:
|
||||
api_url = 'http://play.youku.com/play/get.json?vid=%s&ct=10' % self.vid
|
||||
api12_url = 'http://play.youku.com/play/get.json?vid=%s&ct=12' % self.vid
|
||||
if kwargs.get('src') and kwargs['src'] == 'tudou':
|
||||
self.ccode = '0512'
|
||||
|
||||
try:
|
||||
meta = json.loads(get_content(
|
||||
api_url,
|
||||
headers={'Referer': 'http://static.youku.com/'}
|
||||
))
|
||||
meta12 = json.loads(get_content(
|
||||
api12_url,
|
||||
headers={'Referer': 'http://static.youku.com/'}
|
||||
))
|
||||
data = meta['data']
|
||||
data12 = meta12['data']
|
||||
assert 'stream' in data
|
||||
except AssertionError:
|
||||
if 'error' in data:
|
||||
if data['error']['code'] == -202:
|
||||
# Password protected
|
||||
if kwargs.get('password') and kwargs['password']:
|
||||
self.password_protected = True
|
||||
self.password = kwargs['password']
|
||||
|
||||
self.utid = fetch_cna()
|
||||
time.sleep(3)
|
||||
self.youku_ups()
|
||||
|
||||
if self.api_data.get('stream') is None:
|
||||
if self.api_error_code == -6001: # wrong vid parsed from the page
|
||||
vid_from_url = self.vid
|
||||
self.get_vid_from_page()
|
||||
if vid_from_url == self.vid:
|
||||
log.wtf(self.api_error_msg)
|
||||
self.youku_ups()
|
||||
|
||||
if self.api_data.get('stream') is None:
|
||||
if self.api_error_code == -2002: # wrong password
|
||||
self.password_protected = True
|
||||
# it can be True already(from cli). offer another chance to retry
|
||||
self.password = input(log.sprint('Password: ', log.YELLOW))
|
||||
api_url += '&pwd={}'.format(self.password)
|
||||
api12_url += '&pwd={}'.format(self.password)
|
||||
meta = json.loads(get_content(
|
||||
api_url,
|
||||
headers={'Referer': 'http://static.youku.com/'}
|
||||
))
|
||||
meta12 = json.loads(get_content(
|
||||
api12_url,
|
||||
headers={'Referer': 'http://static.youku.com/'}
|
||||
))
|
||||
data = meta['data']
|
||||
data12 = meta12['data']
|
||||
self.youku_ups()
|
||||
|
||||
if self.api_data.get('stream') is None:
|
||||
if self.api_error_msg:
|
||||
log.wtf(self.api_error_msg)
|
||||
else:
|
||||
log.wtf('[Failed] ' + data['error']['note'])
|
||||
else:
|
||||
log.wtf('[Failed] Video not found.')
|
||||
|
||||
if not self.title: #86
|
||||
self.title = data['video']['title']
|
||||
self.ep = data12['security']['encrypt_string']
|
||||
self.ip = data12['security']['ip']
|
||||
|
||||
if 'stream' not in data and self.password_protected:
|
||||
log.wtf('[Failed] Wrong password.')
|
||||
log.wtf('Unknown error')
|
||||
|
||||
self.title = self.api_data['video']['title']
|
||||
stream_types = dict([(i['id'], i) for i in self.stream_types])
|
||||
audio_lang = data['stream'][0]['audio_lang']
|
||||
audio_lang = self.api_data['stream'][0]['audio_lang']
|
||||
|
||||
for stream in data['stream']:
|
||||
for stream in self.api_data['stream']:
|
||||
stream_id = stream['stream_type']
|
||||
is_preview = False
|
||||
if stream_id in stream_types and stream['audio_lang'] == audio_lang:
|
||||
if 'alias-of' in stream_types[stream_id]:
|
||||
stream_id = stream_types[stream_id]['alias-of']
|
||||
@ -225,175 +205,111 @@ class Youku(VideoExtractor):
|
||||
'video_profile': stream_types[stream_id]['video_profile'],
|
||||
'size': stream['size'],
|
||||
'pieces': [{
|
||||
'fileid': stream['stream_fileid'],
|
||||
'segs': stream['segs']
|
||||
}]
|
||||
}],
|
||||
'm3u8_url': stream['m3u8_url']
|
||||
}
|
||||
src = []
|
||||
for seg in stream['segs']:
|
||||
if seg.get('cdn_url'):
|
||||
src.append(self.__class__.change_cdn(seg['cdn_url']))
|
||||
else:
|
||||
is_preview = True
|
||||
self.streams[stream_id]['src'] = src
|
||||
else:
|
||||
self.streams[stream_id]['size'] += stream['size']
|
||||
self.streams[stream_id]['pieces'].append({
|
||||
'fileid': stream['stream_fileid'],
|
||||
'segs': stream['segs']
|
||||
})
|
||||
|
||||
self.streams_fallback = {}
|
||||
for stream in data12['stream']:
|
||||
stream_id = stream['stream_type']
|
||||
if stream_id in stream_types and stream['audio_lang'] == audio_lang:
|
||||
if 'alias-of' in stream_types[stream_id]:
|
||||
stream_id = stream_types[stream_id]['alias-of']
|
||||
|
||||
if stream_id not in self.streams_fallback:
|
||||
self.streams_fallback[stream_id] = {
|
||||
'container': stream_types[stream_id]['container'],
|
||||
'video_profile': stream_types[stream_id]['video_profile'],
|
||||
'size': stream['size'],
|
||||
'pieces': [{
|
||||
'fileid': stream['stream_fileid'],
|
||||
'segs': stream['segs']
|
||||
}]
|
||||
}
|
||||
src = []
|
||||
for seg in stream['segs']:
|
||||
if seg.get('cdn_url'):
|
||||
src.append(self.__class__.change_cdn(seg['cdn_url']))
|
||||
else:
|
||||
self.streams_fallback[stream_id]['size'] += stream['size']
|
||||
self.streams_fallback[stream_id]['pieces'].append({
|
||||
'fileid': stream['stream_fileid'],
|
||||
'segs': stream['segs']
|
||||
})
|
||||
is_preview = True
|
||||
self.streams[stream_id]['src'].extend(src)
|
||||
if is_preview:
|
||||
log.w('{} is a preview'.format(stream_id))
|
||||
|
||||
# Audio languages
|
||||
if 'dvd' in data and 'audiolang' in data['dvd']:
|
||||
self.audiolang = data['dvd']['audiolang']
|
||||
if 'dvd' in self.api_data:
|
||||
al = self.api_data['dvd'].get('audiolang')
|
||||
if al:
|
||||
self.audiolang = al
|
||||
for i in self.audiolang:
|
||||
i['url'] = 'http://v.youku.com/v_show/id_{}'.format(i['vid'])
|
||||
|
||||
def extract(self, **kwargs):
|
||||
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||
# Extract the stream
|
||||
stream_id = kwargs['stream_id']
|
||||
|
||||
if stream_id not in self.streams:
|
||||
log.e('[Error] Invalid video format.')
|
||||
log.e('Run \'-i\' command with no specific video format to view all available formats.')
|
||||
exit(2)
|
||||
else:
|
||||
# Extract stream with the best quality
|
||||
stream_id = self.streams_sorted[0]['id']
|
||||
|
||||
e_code = self.__class__.trans_e(
|
||||
self.f_code_1,
|
||||
base64.b64decode(bytes(self.ep, 'ascii'))
|
||||
)
|
||||
sid, token = e_code.split('_')
|
||||
|
||||
while True:
|
||||
def youku_download_playlist_by_url(url, **kwargs):
|
||||
video_page_pt = 'https?://v.youku.com/v_show/id_([A-Za-z0-9=]+)'
|
||||
js_cb_pt = '\(({.+})\)'
|
||||
if re.match(video_page_pt, url):
|
||||
youku_obj = Youku()
|
||||
youku_obj.url = url
|
||||
youku_obj.prepare(**kwargs)
|
||||
total_episode = None
|
||||
try:
|
||||
ksegs = []
|
||||
pieces = self.streams[stream_id]['pieces']
|
||||
for piece in pieces:
|
||||
segs = piece['segs']
|
||||
streamfileid = piece['fileid']
|
||||
for no in range(0, len(segs)):
|
||||
k = segs[no]['key']
|
||||
if k == -1: break # we hit the paywall; stop here
|
||||
fileid, ep = self.__class__.generate_ep(self, no, streamfileid,
|
||||
sid, token)
|
||||
q = parse.urlencode(dict(
|
||||
ctype = self.ctype,
|
||||
ev = 1,
|
||||
K = k,
|
||||
ep = parse.unquote(ep),
|
||||
oip = str(self.ip),
|
||||
token = token,
|
||||
yxon = 1
|
||||
))
|
||||
u = 'http://k.youku.com/player/getFlvPath/sid/{sid}_00' \
|
||||
'/st/{container}/fileid/{fileid}?{q}'.format(
|
||||
sid = sid,
|
||||
container = self.streams[stream_id]['container'],
|
||||
fileid = fileid,
|
||||
q = q
|
||||
)
|
||||
ksegs += [i['server'] for i in json.loads(get_content(u))]
|
||||
|
||||
if (parse_host(ksegs[len(ksegs)-1])[0] == "vali.cp31.ott.cibntv.net"):
|
||||
ksegs.pop(len(ksegs)-1)
|
||||
except error.HTTPError as e:
|
||||
# Use fallback stream data in case of HTTP 404
|
||||
log.e('[Error] ' + str(e))
|
||||
self.streams = {}
|
||||
self.streams = self.streams_fallback
|
||||
total_episode = youku_obj.api_data['show']['episode_total']
|
||||
except KeyError:
|
||||
# Move on to next stream if best quality not available
|
||||
del self.streams_sorted[0]
|
||||
stream_id = self.streams_sorted[0]['id']
|
||||
else: break
|
||||
|
||||
if not kwargs['info_only']:
|
||||
self.streams[stream_id]['src'] = ksegs
|
||||
|
||||
def open_download_by_vid(self, client_id, vid, **kwargs):
|
||||
"""self, str, str, **kwargs->None
|
||||
|
||||
Arguments:
|
||||
client_id: An ID per client. For now we only know Acfun's
|
||||
such ID.
|
||||
|
||||
vid: An video ID for each video, starts with "C".
|
||||
|
||||
kwargs['embsig']: Youku COOP's anti hotlinking.
|
||||
For Acfun, an API call must be done to Acfun's
|
||||
server, or the "playsign" of the content of sign_url
|
||||
shall be empty.
|
||||
|
||||
Misc:
|
||||
Override the original one with VideoExtractor.
|
||||
|
||||
Author:
|
||||
Most of the credit are to @ERioK, who gave his POC.
|
||||
|
||||
History:
|
||||
Jul.28.2016 Youku COOP now have anti hotlinking via embsig. """
|
||||
self.f_code_1 = '10ehfkbv' #can be retrived by running r.translate with the keys and the list e
|
||||
self.f_code_2 = 'msjv7h2b'
|
||||
|
||||
# as in VideoExtractor
|
||||
self.url = None
|
||||
self.vid = vid
|
||||
self.name = "优酷开放平台 (Youku COOP)"
|
||||
|
||||
#A little bit of work before self.prepare
|
||||
|
||||
#Change as Jul.28.2016 Youku COOP updates its platform to add ant hotlinking
|
||||
if kwargs['embsig']:
|
||||
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}&embsig={embsig}".format(client_id = client_id, video_id = vid, embsig = kwargs['embsig'])
|
||||
log.wtf('Cannot get total_episode for {}'.format(url))
|
||||
next_vid = youku_obj.vid
|
||||
for _ in range(total_episode):
|
||||
this_extractor = Youku()
|
||||
this_extractor.download_by_vid(next_vid, keep_obj=True, **kwargs)
|
||||
next_vid = this_extractor.video_next['encodevid']
|
||||
'''
|
||||
if youku_obj.video_list is None:
|
||||
log.wtf('Cannot find video list for {}'.format(url))
|
||||
else:
|
||||
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}".format(client_id = client_id, video_id = vid)
|
||||
vid_list = [v['encodevid'] for v in youku_obj.video_list]
|
||||
for v in vid_list:
|
||||
Youku().download_by_vid(v, **kwargs)
|
||||
'''
|
||||
|
||||
playsign = json.loads(get_content(sign_url))['playsign']
|
||||
elif re.match('https?://list.youku.com/show/id_', url):
|
||||
# http://list.youku.com/show/id_z2ae8ee1c837b11e18195.html
|
||||
# official playlist
|
||||
page = get_content(url)
|
||||
show_id = re.search(r'showid:"(\d+)"', page).group(1)
|
||||
ep = 'http://list.youku.com/show/module?id={}&tab=showInfo&callback=jQuery'.format(show_id)
|
||||
xhr_page = get_content(ep).replace('\/', '/').replace('\"', '"')
|
||||
video_url = re.search(r'(v.youku.com/v_show/id_(?:[A-Za-z0-9=]+)\.html)', xhr_page).group(1)
|
||||
youku_download_playlist_by_url('http://'+video_url, **kwargs)
|
||||
return
|
||||
elif re.match('https?://list.youku.com/albumlist/show/id_(\d+)\.html', url):
|
||||
# http://list.youku.com/albumlist/show/id_2336634.html
|
||||
# UGC playlist
|
||||
list_id = re.search('https?://list.youku.com/albumlist/show/id_(\d+)\.html', url).group(1)
|
||||
ep = 'http://list.youku.com/albumlist/items?id={}&page={}&size=20&ascending=1&callback=tuijsonp6'
|
||||
|
||||
#to be injected and replace ct10 and 12
|
||||
api85_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=85&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
|
||||
api86_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=86&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
|
||||
first_u = ep.format(list_id, 1)
|
||||
xhr_page = get_content(first_u)
|
||||
json_data = json.loads(re.search(js_cb_pt, xhr_page).group(1))
|
||||
video_cnt = json_data['data']['total']
|
||||
xhr_html = json_data['html']
|
||||
v_urls = re.findall(r'(v.youku.com/v_show/id_(?:[A-Za-z0-9=]+)\.html)', xhr_html)
|
||||
|
||||
self.prepare(api_url = api85_url, api12_url = api86_url, ctype = 86, **kwargs)
|
||||
if video_cnt > 20:
|
||||
req_cnt = video_cnt // 20
|
||||
for i in range(2, req_cnt+2):
|
||||
req_u = ep.format(list_id, i)
|
||||
xhr_page = get_content(req_u)
|
||||
json_data = json.loads(re.search(js_cb_pt, xhr_page).group(1).replace('\/', '/'))
|
||||
xhr_html = json_data['html']
|
||||
page_videos = re.findall(r'(v.youku.com/v_show/id_(?:[A-Za-z0-9=]+)\.html)', xhr_html)
|
||||
v_urls.extend(page_videos)
|
||||
for u in v_urls[0::2]:
|
||||
url = 'http://' + u
|
||||
Youku().download_by_url(url, **kwargs)
|
||||
return
|
||||
|
||||
#exact copy from original VideoExtractor
|
||||
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
|
||||
unset_proxy()
|
||||
|
||||
try:
|
||||
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
|
||||
except:
|
||||
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
|
||||
def youku_download_by_url(url, **kwargs):
|
||||
Youku().download_by_url(url, **kwargs)
|
||||
|
||||
self.extract(**kwargs)
|
||||
|
||||
self.download(**kwargs)
|
||||
def youku_download_by_vid(vid, **kwargs):
|
||||
Youku().download_by_vid(vid, **kwargs)
|
||||
|
||||
site = Youku()
|
||||
download = site.download_by_url
|
||||
download_playlist = site.download_playlist_by_url
|
||||
|
||||
youku_download_by_vid = site.download_by_vid
|
||||
youku_open_download_by_vid = site.open_download_by_vid
|
||||
# Used by: acfun.py bilibili.py miomio.py tudou.py
|
||||
download = youku_download_by_url
|
||||
download_playlist = youku_download_playlist_by_url
|
||||
|
@ -8,35 +8,74 @@ from xml.dom.minidom import parseString
|
||||
class YouTube(VideoExtractor):
|
||||
name = "YouTube"
|
||||
|
||||
# YouTube media encoding options, in descending quality order.
|
||||
# Non-DASH YouTube media encoding options, in descending quality order.
|
||||
# http://en.wikipedia.org/wiki/YouTube#Quality_and_codecs. Retrieved July 17, 2014.
|
||||
stream_types = [
|
||||
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
#{'itag': '85', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '3-4', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
#{'itag': '102', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
#{'itag': '84', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2', 'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
|
||||
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
|
||||
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
#{'itag': '101', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
#{'itag': '100', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
#{'itag': '82', 'container': 'MP4', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '18', 'container': 'MP4', 'video_resolution': '270p/360p', 'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
{'itag': '18', 'container': 'MP4', 'video_resolution': '360p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p',
|
||||
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8',
|
||||
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
#{'itag': '83', 'container': 'MP4', 'video_resolution': '240p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '13', 'container': '3GP', 'video_resolution': '', 'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': ''},
|
||||
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
{'itag': '36', 'container': '3GP', 'video_resolution': '240p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175', 'audio_encoding': 'AAC', 'audio_bitrate': '36'},
|
||||
{'itag': '17', 'container': '3GP', 'video_resolution': '144p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05', 'audio_encoding': 'AAC', 'audio_bitrate': '24'},
|
||||
{'itag': '13', 'container': '3GP', 'video_resolution': '',
|
||||
'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': ''},
|
||||
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p',
|
||||
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25',
|
||||
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
{'itag': '36', 'container': '3GP', 'video_resolution': '240p',
|
||||
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '32'},
|
||||
{'itag': '17', 'container': '3GP', 'video_resolution': '144p',
|
||||
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '24'},
|
||||
]
|
||||
|
||||
def decipher(js, s):
|
||||
def s_to_sig(js, s):
|
||||
# Examples:
|
||||
# - https://www.youtube.com/yts/jsbin/player-da_DK-vflWlK-zq/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player-vflvABTsY/da_DK/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player-vfls4aurX/da_DK/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player_ias-vfl_RGK2l/en_US/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player-vflRjqq_w/da_DK/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player_ias-vfl-jbnrr/da_DK/base.js
|
||||
def tr_js(code):
|
||||
code = re.sub(r'function', r'def', code)
|
||||
code = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', code)
|
||||
@ -52,11 +91,15 @@ class YouTube(VideoExtractor):
|
||||
return code
|
||||
|
||||
js = js.replace('\n', ' ')
|
||||
f1 = match1(js, r'\w+\.sig\|\|([$\w]+)\(\w+\.\w+\)')
|
||||
f1 = match1(js, r'\.set\(\w+\.sp,encodeURIComponent\(([$\w]+)') or \
|
||||
match1(js, r'\.set\(\w+\.sp,\(0,window\.encodeURIComponent\)\(([$\w]+)') or \
|
||||
match1(js, r'\.set\(\w+\.sp,([$\w]+)\(\w+\.s\)\)') or \
|
||||
match1(js, r'"signature",([$\w]+)\(\w+\.\w+\)') or \
|
||||
match1(js, r'=([$\w]+)\(decodeURIComponent\(')
|
||||
f1def = match1(js, r'function %s(\(\w+\)\{[^\{]+\})' % re.escape(f1)) or \
|
||||
match1(js, r'\W%s=function(\(\w+\)\{[^\{]+\})' % re.escape(f1))
|
||||
f1def = re.sub(r'([$\w]+\.)([$\w]+\(\w+,\d+\))', r'\2', f1def)
|
||||
f1def = 'function %s%s' % (f1, f1def)
|
||||
f1def = 'function main_%s%s' % (f1, f1def) # prefix to avoid potential namespace conflict
|
||||
code = tr_js(f1def)
|
||||
f2s = set(re.findall(r'([$\w]+)\(\w+,\d+\)', f1def))
|
||||
for f2 in f2s:
|
||||
@ -67,23 +110,33 @@ class YouTube(VideoExtractor):
|
||||
else:
|
||||
f2def = re.search(r'[^$\w]%s:function\((\w+)\)(\{[^\{\}]+\})' % f2e, js)
|
||||
f2def = 'function {}({},b){}'.format(f2e, f2def.group(1), f2def.group(2))
|
||||
f2 = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', f2)
|
||||
f2 = re.sub(r'(as|if|in|is|or)', r'_\1', f2)
|
||||
f2 = re.sub(r'\$', '_dollar', f2)
|
||||
code = code + 'global %s\n' % f2 + tr_js(f2def)
|
||||
|
||||
f1 = re.sub(r'(as|if|in|is|or)', r'_\1', f1)
|
||||
f1 = re.sub(r'\$', '_dollar', f1)
|
||||
code = code + 'sig=%s(s)' % f1
|
||||
code = code + 'sig=main_%s(s)' % f1 # prefix to avoid potential namespace conflict
|
||||
exec(code, globals(), locals())
|
||||
return locals()['sig']
|
||||
|
||||
def chunk_by_range(url, size):
|
||||
urls = []
|
||||
chunk_size = 10485760
|
||||
start, end = 0, chunk_size - 1
|
||||
urls.append('%s&range=%s-%s' % (url, start, end))
|
||||
while end + 1 < size: # processed size < expected size
|
||||
start, end = end + 1, end + chunk_size
|
||||
urls.append('%s&range=%s-%s' % (url, start, end))
|
||||
return urls
|
||||
|
||||
def get_url_from_vid(vid):
|
||||
return 'https://youtu.be/{}'.format(vid)
|
||||
|
||||
def get_vid_from_url(url):
|
||||
"""Extracts video ID from URL.
|
||||
"""
|
||||
return match1(url, r'youtu\.be/([^/]+)') or \
|
||||
return match1(url, r'youtu\.be/([^?/]+)') or \
|
||||
match1(url, r'youtube\.com/embed/([^/?]+)') or \
|
||||
match1(url, r'youtube\.com/v/([^/?]+)') or \
|
||||
match1(url, r'youtube\.com/watch/([^/?]+)') or \
|
||||
@ -104,31 +157,22 @@ class YouTube(VideoExtractor):
|
||||
log.wtf('[Failed] Unsupported URL pattern.')
|
||||
|
||||
video_page = get_content('https://www.youtube.com/playlist?list=%s' % playlist_id)
|
||||
from html.parser import HTMLParser
|
||||
videos = sorted([HTMLParser().unescape(video)
|
||||
for video in re.findall(r'<a href="(/watch\?[^"]+)"', video_page)
|
||||
if parse_query_param(video, 'index')],
|
||||
key=lambda video: parse_query_param(video, 'index'))
|
||||
ytInitialData = json.loads(match1(video_page, r'window\["ytInitialData"\]\s*=\s*(.+);'))
|
||||
|
||||
# Parse browse_ajax page for more videos to load
|
||||
load_more_href = match1(video_page, r'data-uix-load-more-href="([^"]+)"')
|
||||
while load_more_href:
|
||||
browse_ajax = get_content('https://www.youtube.com/%s' % load_more_href)
|
||||
browse_data = json.loads(browse_ajax)
|
||||
load_more_widget_html = browse_data['load_more_widget_html']
|
||||
content_html = browse_data['content_html']
|
||||
vs = set(re.findall(r'href="(/watch\?[^"]+)"', content_html))
|
||||
videos += sorted([HTMLParser().unescape(video)
|
||||
for video in list(vs)
|
||||
if parse_query_param(video, 'index')])
|
||||
load_more_href = match1(load_more_widget_html, r'data-uix-load-more-href="([^"]+)"')
|
||||
tab0 = ytInitialData['contents']['twoColumnBrowseResultsRenderer']['tabs'][0]
|
||||
itemSection0 = tab0['tabRenderer']['content']['sectionListRenderer']['contents'][0]
|
||||
playlistVideoList0 = itemSection0['itemSectionRenderer']['contents'][0]
|
||||
videos = playlistVideoList0['playlistVideoListRenderer']['contents']
|
||||
|
||||
self.title = re.search(r'<meta name="title" content="([^"]+)"', video_page).group(1)
|
||||
self.p_playlist()
|
||||
for video in videos:
|
||||
vid = parse_query_param(video, 'v')
|
||||
index = parse_query_param(video, 'index')
|
||||
for index, video in enumerate(videos, 1):
|
||||
vid = video['playlistVideoRenderer']['videoId']
|
||||
try:
|
||||
self.__class__().download_by_url(self.__class__.get_url_from_vid(vid), index=index, **kwargs)
|
||||
except:
|
||||
pass
|
||||
# FIXME: show DASH stream sizes (by default) for playlist videos
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
assert self.url or self.vid
|
||||
@ -140,22 +184,50 @@ class YouTube(VideoExtractor):
|
||||
self.download_playlist_by_url(self.url, **kwargs)
|
||||
exit(0)
|
||||
|
||||
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}'.format(self.vid)))
|
||||
if re.search('\Wlist=', self.url) and not kwargs.get('playlist'):
|
||||
log.w('This video is from a playlist. (use --playlist to download all videos in the playlist.)')
|
||||
|
||||
# Get video info
|
||||
# 'eurl' is a magic parameter that can bypass age restriction
|
||||
# full form: 'eurl=https%3A%2F%2Fyoutube.googleapis.com%2Fv%2F{VIDEO_ID}'
|
||||
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}&eurl=https%3A%2F%2Fy'.format(self.vid)))
|
||||
logging.debug('STATUS: %s' % video_info['status'][0])
|
||||
|
||||
ytplayer_config = None
|
||||
if 'status' not in video_info:
|
||||
log.wtf('[Failed] Unknown status.')
|
||||
|
||||
log.wtf('[Failed] Unknown status.', exit_code=None)
|
||||
raise
|
||||
elif video_info['status'] == ['ok']:
|
||||
if 'use_cipher_signature' not in video_info or video_info['use_cipher_signature'] == ['False']:
|
||||
self.title = parse.unquote_plus(video_info['title'][0])
|
||||
stream_list = video_info['url_encoded_fmt_stream_map'][0].split(',')
|
||||
|
||||
self.title = parse.unquote_plus(json.loads(video_info["player_response"][0])["videoDetails"]["title"])
|
||||
# Parse video page (for DASH)
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
|
||||
try:
|
||||
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+?});', video_page).group(1))
|
||||
self.html5player = 'https:' + ytplayer_config['assets']['js']
|
||||
|
||||
# Workaround: get_video_info returns bad s. Why?
|
||||
if 'url_encoded_fmt_stream_map' not in ytplayer_config['args']:
|
||||
stream_list = json.loads(ytplayer_config['args']['player_response'])['streamingData']['formats']
|
||||
else:
|
||||
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
|
||||
#stream_list = ytplayer_config['args']['adaptive_fmts'].split(',')
|
||||
|
||||
if 'assets' in ytplayer_config:
|
||||
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
|
||||
elif re.search('([^"]*/base\.js)"', video_page):
|
||||
self.html5player = 'https://www.youtube.com' + re.search('([^"]*/base\.js)"', video_page).group(1)
|
||||
self.html5player = self.html5player.replace('\/', '/') # unescape URL
|
||||
else:
|
||||
self.html5player = None
|
||||
|
||||
except:
|
||||
if 'url_encoded_fmt_stream_map' not in video_info:
|
||||
stream_list = json.loads(video_info['player_response'][0])['streamingData']['formats']
|
||||
else:
|
||||
stream_list = video_info['url_encoded_fmt_stream_map'][0].split(',')
|
||||
if re.search('([^"]*/base\.js)"', video_page):
|
||||
self.html5player = 'https://www.youtube.com' + re.search('([^"]*/base\.js)"', video_page).group(1)
|
||||
else:
|
||||
self.html5player = None
|
||||
|
||||
else:
|
||||
@ -163,40 +235,78 @@ class YouTube(VideoExtractor):
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
|
||||
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+?});', video_page).group(1))
|
||||
|
||||
self.title = ytplayer_config['args']['title']
|
||||
self.html5player = 'https:' + ytplayer_config['assets']['js']
|
||||
self.title = json.loads(ytplayer_config["args"]["player_response"])["videoDetails"]["title"]
|
||||
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
|
||||
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
|
||||
|
||||
elif video_info['status'] == ['fail']:
|
||||
logging.debug('ERRORCODE: %s' % video_info['errorcode'][0])
|
||||
if video_info['errorcode'] == ['150']:
|
||||
# FIXME: still relevant?
|
||||
if cookies:
|
||||
# Load necessary cookies into headers (for age-restricted videos)
|
||||
consent, ssid, hsid, sid = 'YES', '', '', ''
|
||||
for cookie in cookies:
|
||||
if cookie.domain.endswith('.youtube.com'):
|
||||
if cookie.name == 'SSID':
|
||||
ssid = cookie.value
|
||||
elif cookie.name == 'HSID':
|
||||
hsid = cookie.value
|
||||
elif cookie.name == 'SID':
|
||||
sid = cookie.value
|
||||
cookie_str = 'CONSENT=%s; SSID=%s; HSID=%s; SID=%s' % (consent, ssid, hsid, sid)
|
||||
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid,
|
||||
headers={'Cookie': cookie_str})
|
||||
else:
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
|
||||
|
||||
try:
|
||||
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+});ytplayer', video_page).group(1))
|
||||
except:
|
||||
msg = re.search('class="message">([^<]+)<', video_page).group(1)
|
||||
log.wtf('[Failed] "%s"' % msg.strip())
|
||||
log.wtf('[Failed] Got message "%s". Try to login with --cookies.' % msg.strip())
|
||||
|
||||
if 'title' in ytplayer_config['args']:
|
||||
# 150 Restricted from playback on certain sites
|
||||
# Parse video page instead
|
||||
self.title = ytplayer_config['args']['title']
|
||||
self.html5player = 'https:' + ytplayer_config['assets']['js']
|
||||
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
|
||||
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
|
||||
else:
|
||||
log.wtf('[Error] The uploader has not made this video available in your country.')
|
||||
log.wtf('[Error] The uploader has not made this video available in your country.', exit_code=None)
|
||||
raise
|
||||
#self.title = re.search('<meta name="title" content="([^"]+)"', video_page).group(1)
|
||||
#stream_list = []
|
||||
|
||||
elif video_info['errorcode'] == ['100']:
|
||||
log.wtf('[Failed] This video does not exist.', exit_code=int(video_info['errorcode'][0]))
|
||||
log.wtf('[Failed] This video does not exist.', exit_code=None) #int(video_info['errorcode'][0])
|
||||
raise
|
||||
|
||||
else:
|
||||
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=int(video_info['errorcode'][0]))
|
||||
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=None) #int(video_info['errorcode'][0])
|
||||
raise
|
||||
|
||||
else:
|
||||
log.wtf('[Failed] Invalid status.')
|
||||
log.wtf('[Failed] Invalid status.', exit_code=None)
|
||||
raise
|
||||
|
||||
# YouTube Live
|
||||
if ytplayer_config and (ytplayer_config['args'].get('livestream') == '1' or ytplayer_config['args'].get('live_playback') == '1'):
|
||||
if 'hlsvp' in ytplayer_config['args']:
|
||||
hlsvp = ytplayer_config['args']['hlsvp']
|
||||
else:
|
||||
player_response= json.loads(ytplayer_config['args']['player_response'])
|
||||
log.e('[Failed] %s' % player_response['playabilityStatus']['reason'], exit_code=1)
|
||||
|
||||
if 'info_only' in kwargs and kwargs['info_only']:
|
||||
return
|
||||
else:
|
||||
download_url_ffmpeg(hlsvp, self.title, 'mp4')
|
||||
exit(0)
|
||||
|
||||
for stream in stream_list:
|
||||
if isinstance(stream, str):
|
||||
metadata = parse.parse_qs(stream)
|
||||
stream_itag = metadata['itag'][0]
|
||||
self.streams[stream_itag] = {
|
||||
@ -204,22 +314,34 @@ class YouTube(VideoExtractor):
|
||||
'url': metadata['url'][0],
|
||||
'sig': metadata['sig'][0] if 'sig' in metadata else None,
|
||||
's': metadata['s'][0] if 's' in metadata else None,
|
||||
'quality': metadata['quality'][0],
|
||||
'quality': metadata['quality'][0] if 'quality' in metadata else None,
|
||||
#'quality': metadata['quality_label'][0] if 'quality_label' in metadata else None,
|
||||
'type': metadata['type'][0],
|
||||
'mime': metadata['type'][0].split(';')[0],
|
||||
'container': mime_to_container(metadata['type'][0].split(';')[0]),
|
||||
}
|
||||
else:
|
||||
stream_itag = str(stream['itag'])
|
||||
self.streams[stream_itag] = {
|
||||
'itag': str(stream['itag']),
|
||||
'url': stream['url'] if 'url' in stream else None,
|
||||
'sig': None,
|
||||
's': None,
|
||||
'quality': stream['quality'],
|
||||
'type': stream['mimeType'],
|
||||
'mime': stream['mimeType'].split(';')[0],
|
||||
'container': mime_to_container(stream['mimeType'].split(';')[0]),
|
||||
}
|
||||
if 'signatureCipher' in stream:
|
||||
self.streams[stream_itag].update(dict([(_.split('=')[0], parse.unquote(_.split('=')[1]))
|
||||
for _ in stream['signatureCipher'].split('&')]))
|
||||
|
||||
# Prepare caption tracks
|
||||
try:
|
||||
caption_tracks = ytplayer_config['args']['caption_tracks'].split(',')
|
||||
caption_tracks = json.loads(ytplayer_config['args']['player_response'])['captions']['playerCaptionsTracklistRenderer']['captionTracks']
|
||||
for ct in caption_tracks:
|
||||
lang = None
|
||||
for i in ct.split('&'):
|
||||
[k, v] = i.split('=')
|
||||
if k == 'lc' and lang is None: lang = v
|
||||
if k == 'v' and v[0] != '.': lang = v # auto-generated
|
||||
if k == 'u': ttsurl = parse.unquote_plus(v)
|
||||
ttsurl, lang = ct['baseUrl'], ct['languageCode']
|
||||
|
||||
tts_xml = parseString(get_content(ttsurl))
|
||||
transcript = tts_xml.getElementsByTagName('transcript')[0]
|
||||
texts = transcript.getElementsByTagName('text')
|
||||
@ -245,7 +367,7 @@ class YouTube(VideoExtractor):
|
||||
self.caption_tracks[lang] = srt
|
||||
except: pass
|
||||
|
||||
# Prepare DASH streams
|
||||
# Prepare DASH streams (NOTE: not every video has DASH streams!)
|
||||
try:
|
||||
dashmpd = ytplayer_config['args']['dashmpd']
|
||||
dash_xml = parseString(get_content(dashmpd))
|
||||
@ -256,11 +378,17 @@ class YouTube(VideoExtractor):
|
||||
burls = rep.getElementsByTagName('BaseURL')
|
||||
dash_mp4_a_url = burls[0].firstChild.nodeValue
|
||||
dash_mp4_a_size = burls[0].getAttribute('yt:contentLength')
|
||||
if not dash_mp4_a_size:
|
||||
try: dash_mp4_a_size = url_size(dash_mp4_a_url)
|
||||
except: continue
|
||||
elif mimeType == 'audio/webm':
|
||||
rep = aset.getElementsByTagName('Representation')[-1]
|
||||
burls = rep.getElementsByTagName('BaseURL')
|
||||
dash_webm_a_url = burls[0].firstChild.nodeValue
|
||||
dash_webm_a_size = burls[0].getAttribute('yt:contentLength')
|
||||
if not dash_webm_a_size:
|
||||
try: dash_webm_a_size = url_size(dash_webm_a_url)
|
||||
except: continue
|
||||
elif mimeType == 'video/mp4':
|
||||
for rep in aset.getElementsByTagName('Representation'):
|
||||
w = int(rep.getAttribute('width'))
|
||||
@ -269,13 +397,18 @@ class YouTube(VideoExtractor):
|
||||
burls = rep.getElementsByTagName('BaseURL')
|
||||
dash_url = burls[0].firstChild.nodeValue
|
||||
dash_size = burls[0].getAttribute('yt:contentLength')
|
||||
if not dash_size:
|
||||
try: dash_size = url_size(dash_url)
|
||||
except: continue
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': '%sx%s' % (w, h),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'mp4',
|
||||
'src': [dash_url, dash_mp4_a_url],
|
||||
'src': [dash_urls, dash_mp4_a_urls],
|
||||
'size': int(dash_size) + int(dash_mp4_a_size)
|
||||
}
|
||||
elif mimeType == 'video/webm':
|
||||
@ -286,36 +419,85 @@ class YouTube(VideoExtractor):
|
||||
burls = rep.getElementsByTagName('BaseURL')
|
||||
dash_url = burls[0].firstChild.nodeValue
|
||||
dash_size = burls[0].getAttribute('yt:contentLength')
|
||||
if not dash_size:
|
||||
try: dash_size = url_size(dash_url)
|
||||
except: continue
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
dash_webm_a_urls = self.__class__.chunk_by_range(dash_webm_a_url, int(dash_webm_a_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': '%sx%s' % (w, h),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'webm',
|
||||
'src': [dash_url, dash_webm_a_url],
|
||||
'src': [dash_urls, dash_webm_a_urls],
|
||||
'size': int(dash_size) + int(dash_webm_a_size)
|
||||
}
|
||||
except:
|
||||
# VEVO
|
||||
if not self.html5player: return
|
||||
self.html5player = self.html5player.replace('\/', '/') # unescape URL (for age-restricted videos)
|
||||
self.js = get_content(self.html5player)
|
||||
if 'adaptive_fmts' in ytplayer_config['args']:
|
||||
|
||||
try:
|
||||
# Video info from video page (not always available)
|
||||
streams = [dict([(i.split('=')[0],
|
||||
parse.unquote(i.split('=')[1]))
|
||||
for i in afmt.split('&')])
|
||||
for afmt in ytplayer_config['args']['adaptive_fmts'].split(',')]
|
||||
except:
|
||||
if 'adaptive_fmts' in video_info:
|
||||
streams = [dict([(i.split('=')[0],
|
||||
parse.unquote(i.split('=')[1]))
|
||||
for i in afmt.split('&')])
|
||||
for afmt in video_info['adaptive_fmts'][0].split(',')]
|
||||
else:
|
||||
try:
|
||||
streams = json.loads(video_info['player_response'][0])['streamingData']['adaptiveFormats']
|
||||
except: # no DASH stream at all
|
||||
return
|
||||
# streams without contentLength got broken urls, just remove them (#2767)
|
||||
streams = [stream for stream in streams if 'contentLength' in stream]
|
||||
for stream in streams:
|
||||
stream['itag'] = str(stream['itag'])
|
||||
if 'qualityLabel' in stream:
|
||||
stream['quality_label'] = stream['qualityLabel']
|
||||
del stream['qualityLabel']
|
||||
if 'width' in stream:
|
||||
stream['size'] = '{}x{}'.format(stream['width'], stream['height'])
|
||||
del stream['width']
|
||||
del stream['height']
|
||||
stream['type'] = stream['mimeType']
|
||||
stream['clen'] = stream['contentLength']
|
||||
stream['init'] = '{}-{}'.format(
|
||||
stream['initRange']['start'],
|
||||
stream['initRange']['end'])
|
||||
stream['index'] = '{}-{}'.format(
|
||||
stream['indexRange']['start'],
|
||||
stream['indexRange']['end'])
|
||||
del stream['mimeType']
|
||||
del stream['contentLength']
|
||||
del stream['initRange']
|
||||
del stream['indexRange']
|
||||
if 'signatureCipher' in stream:
|
||||
stream.update(dict([(_.split('=')[0], parse.unquote(_.split('=')[1]))
|
||||
for _ in stream['signatureCipher'].split('&')]))
|
||||
del stream['signatureCipher']
|
||||
|
||||
for stream in streams: # get over speed limiting
|
||||
stream['url'] += '&ratebypass=yes'
|
||||
for stream in streams: # audio
|
||||
if stream['type'].startswith('audio/mp4'):
|
||||
dash_mp4_a_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_mp4_a_url += '&signature={}'.format(sig)
|
||||
sig = self.__class__.s_to_sig(self.js, stream['s'])
|
||||
dash_mp4_a_url += '&sig={}'.format(sig)
|
||||
dash_mp4_a_size = stream['clen']
|
||||
elif stream['type'].startswith('audio/webm'):
|
||||
dash_webm_a_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_webm_a_url += '&signature={}'.format(sig)
|
||||
sig = self.__class__.s_to_sig(self.js, stream['s'])
|
||||
dash_webm_a_url += '&sig={}'.format(sig)
|
||||
dash_webm_a_size = stream['clen']
|
||||
for stream in streams: # video
|
||||
if 'size' in stream:
|
||||
@ -323,35 +505,47 @@ class YouTube(VideoExtractor):
|
||||
mimeType = 'video/mp4'
|
||||
dash_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_url += '&signature={}'.format(sig)
|
||||
sig = self.__class__.s_to_sig(self.js, stream['s'])
|
||||
dash_url += '&sig={}'.format(sig)
|
||||
dash_size = stream['clen']
|
||||
itag = stream['itag']
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': stream['size'],
|
||||
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'mp4',
|
||||
'src': [dash_url, dash_mp4_a_url],
|
||||
'src': [dash_urls, dash_mp4_a_urls],
|
||||
'size': int(dash_size) + int(dash_mp4_a_size)
|
||||
}
|
||||
elif stream['type'].startswith('video/webm'):
|
||||
mimeType = 'video/webm'
|
||||
dash_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_url += '&signature={}'.format(sig)
|
||||
sig = self.__class__.s_to_sig(self.js, stream['s'])
|
||||
dash_url += '&sig={}'.format(sig)
|
||||
dash_size = stream['clen']
|
||||
itag = stream['itag']
|
||||
audio_url = None
|
||||
audio_size = None
|
||||
try:
|
||||
audio_url = dash_webm_a_url
|
||||
audio_size = int(dash_webm_a_size)
|
||||
except UnboundLocalError as e:
|
||||
audio_url = dash_mp4_a_url
|
||||
audio_size = int(dash_mp4_a_size)
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
audio_urls = self.__class__.chunk_by_range(audio_url, int(audio_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': stream['size'],
|
||||
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'webm',
|
||||
'src': [dash_url, dash_webm_a_url],
|
||||
'size': int(dash_size) + int(dash_webm_a_size)
|
||||
'src': [dash_urls, audio_urls],
|
||||
'size': int(dash_size) + int(audio_size)
|
||||
}
|
||||
|
||||
def extract(self, **kwargs):
|
||||
@ -374,13 +568,13 @@ class YouTube(VideoExtractor):
|
||||
src = self.streams[stream_id]['url']
|
||||
if self.streams[stream_id]['sig'] is not None:
|
||||
sig = self.streams[stream_id]['sig']
|
||||
src += '&signature={}'.format(sig)
|
||||
src += '&sig={}'.format(sig)
|
||||
elif self.streams[stream_id]['s'] is not None:
|
||||
if not hasattr(self, 'js'):
|
||||
self.js = get_content(self.html5player)
|
||||
s = self.streams[stream_id]['s']
|
||||
sig = self.__class__.decipher(self.js, s)
|
||||
src += '&signature={}'.format(sig)
|
||||
sig = self.__class__.s_to_sig(self.js, s)
|
||||
src += '&sig={}'.format(sig)
|
||||
|
||||
self.streams[stream_id]['src'] = [src]
|
||||
self.streams[stream_id]['size'] = urls_size(self.streams[stream_id]['src'])
|
||||
|
@ -3,73 +3,50 @@
|
||||
__all__ = ['zhanqi_download']
|
||||
|
||||
from ..common import *
|
||||
import re
|
||||
import base64
|
||||
import json
|
||||
import time
|
||||
import hashlib
|
||||
import base64
|
||||
from urllib.parse import urlparse
|
||||
|
||||
def zhanqi_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
html = get_content(url)
|
||||
video_type_patt = r'VideoType":"([^"]+)"'
|
||||
video_type = match1(html, video_type_patt)
|
||||
path = urlparse(url).path[1:]
|
||||
|
||||
#rtmp_base_patt = r'VideoUrl":"([^"]+)"'
|
||||
rtmp_id_patt = r'videoId":"([^"]+)"'
|
||||
vod_m3u8_id_patt = r'VideoID":"([^"]+)"'
|
||||
title_patt = r'<p class="title-name" title="[^"]+">([^<]+)</p>'
|
||||
title_patt_backup = r'<title>([^<]{1,9999})</title>'
|
||||
title = match1(html, title_patt) or match1(html, title_patt_backup)
|
||||
title = unescape_html(title)
|
||||
rtmp_base = "http://wshdl.load.cdn.zhanqi.tv/zqlive"
|
||||
vod_base = "http://dlvod.cdn.zhanqi.tv"
|
||||
rtmp_real_base = "rtmp://dlrtmp.cdn.zhanqi.tv/zqlive/"
|
||||
room_info = "http://www.zhanqi.tv/api/static/live.roomid/"
|
||||
KEY_MASK = "#{&..?!("
|
||||
ak2_pattern = r'ak2":"\d-([^|]+)'
|
||||
if not (path.startswith('videos') or path.startswith('v2/videos')): #url = "https://www.zhanqi.tv/huashan?param_s=1_0.2.0"
|
||||
path_list = path.split('/')
|
||||
room_id = path_list[1] if path_list[0] == 'topic' else path_list[0]
|
||||
zhanqi_live(room_id, merge=merge, output_dir=output_dir, info_only=info_only, **kwargs)
|
||||
else: #url = 'https://www.zhanqi.tv/videos/Lyingman/2017/01/182308.html'
|
||||
# https://www.zhanqi.tv/v2/videos/215593.html
|
||||
video_id = path.split('.')[0].split('/')[-1]
|
||||
zhanqi_video(video_id, merge=merge, output_dir=output_dir, info_only=info_only, **kwargs)
|
||||
|
||||
if video_type == "LIVE":
|
||||
rtmp_id = match1(html, rtmp_id_patt).replace('\\/','/')
|
||||
#request_url = rtmp_base+'/'+rtmp_id+'.flv?get_url=1'
|
||||
#real_url = get_html(request_url)
|
||||
html2 = get_content(room_info + rtmp_id.split("_")[0] + ".json")
|
||||
json_data = json.loads(html2)
|
||||
cdns = json_data["data"]["flashvars"]["cdns"]
|
||||
cdns = base64.b64decode(cdns).decode("utf-8")
|
||||
cdn = match1(cdns, ak2_pattern)
|
||||
cdn = base64.b64decode(cdn).decode("utf-8")
|
||||
key = ''
|
||||
i = 0
|
||||
while(i < len(cdn)):
|
||||
key = key + chr(ord(cdn[i]) ^ ord(KEY_MASK[i % 8]))
|
||||
i = i + 1
|
||||
time_hex = hex(int(time.time()))[2:]
|
||||
key = hashlib.md5(bytes(key + "/zqlive/" + rtmp_id + time_hex, "utf-8")).hexdigest()
|
||||
real_url = rtmp_real_base + '/' + rtmp_id + "?k=" + key + "&t=" + time_hex
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
def zhanqi_live(room_id, merge=True, output_dir='.', info_only=False, **kwargs):
|
||||
api_url = "https://www.zhanqi.tv/api/static/v2.1/room/domain/{}.json".format(room_id)
|
||||
json_data = json.loads(get_content(api_url))['data']
|
||||
status = json_data['status']
|
||||
if status != '4':
|
||||
raise Exception("The live stream is not online!")
|
||||
|
||||
nickname = json_data['nickname']
|
||||
title = nickname + ": " + json_data['title']
|
||||
video_levels = base64.b64decode(json_data['flashvars']['VideoLevels']).decode('utf8')
|
||||
m3u8_url = json.loads(video_levels)['streamUrl']
|
||||
|
||||
print_info(site_info, title, 'm3u8', 0, m3u8_url=m3u8_url, m3u8_type='master')
|
||||
if not info_only:
|
||||
download_rtmp_url(real_url, title, 'flv', {}, output_dir, merge = merge)
|
||||
#download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
|
||||
elif video_type == "VOD":
|
||||
vod_m3u8_request = vod_base + match1(html, vod_m3u8_id_patt).replace('\\/','/')
|
||||
vod_m3u8 = get_html(vod_m3u8_request)
|
||||
part_url = re.findall(r'(/[^#]+)\.ts',vod_m3u8)
|
||||
real_url = []
|
||||
for i in part_url:
|
||||
i = vod_base + i + ".ts"
|
||||
real_url.append(i)
|
||||
type_ = ''
|
||||
size = 0
|
||||
for url in real_url:
|
||||
_, type_, temp = url_info(url)
|
||||
size += temp or 0
|
||||
download_url_ffmpeg(m3u8_url, title, 'mp4', output_dir=output_dir, merge=merge)
|
||||
|
||||
print_info(site_info, title, type_ or 'ts', size)
|
||||
def zhanqi_video(video_id, output_dir='.', info_only=False, merge=True, **kwargs):
|
||||
api_url = 'https://www.zhanqi.tv/api/static/v2.1/video/{}.json'.format(video_id)
|
||||
json_data = json.loads(get_content(api_url))['data']
|
||||
|
||||
title = json_data['title']
|
||||
vid = json_data['flashvars']['VideoID']
|
||||
m3u8_url = 'http://dlvod.cdn.zhanqi.tv/' + vid
|
||||
urls = general_m3u8_extractor(m3u8_url)
|
||||
print_info(site_info, title, 'm3u8', 0)
|
||||
if not info_only:
|
||||
download_urls(real_url, title, type_ or 'ts', size, output_dir, merge = merge)
|
||||
else:
|
||||
NotImplementedError('Unknown_video_type')
|
||||
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
site_info = "zhanqi.tv"
|
||||
site_info = "www.zhanqi.tv"
|
||||
download = zhanqi_download
|
||||
download_playlist = playlist_not_supported('zhanqi')
|
||||
|
55
src/you_get/extractors/zhibo.py
Normal file
55
src/you_get/extractors/zhibo.py
Normal file
@ -0,0 +1,55 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['zhibo_download']
|
||||
|
||||
from ..common import *
|
||||
|
||||
def zhibo_vedio_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
# http://video.zhibo.tv/video/details/d103057f-663e-11e8-9d83-525400ccac43.html
|
||||
|
||||
html = get_html(url)
|
||||
title = r1(r'<title>([\s\S]*)</title>', html)
|
||||
total_size = 0
|
||||
part_urls= []
|
||||
|
||||
video_html = r1(r'<script type="text/javascript">([\s\S]*)</script></head>', html)
|
||||
|
||||
# video_guessulike = r1(r"window.xgData =([s\S'\s\.]*)\'\;[\s\S]*window.vouchData", video_html)
|
||||
video_url = r1(r"window.vurl = \'([s\S'\s\.]*)\'\;[\s\S]*window.imgurl", video_html)
|
||||
part_urls.append(video_url)
|
||||
ext = video_url.split('.')[-1]
|
||||
|
||||
print_info(site_info, title, ext, total_size)
|
||||
if not info_only:
|
||||
download_urls(part_urls, title, ext, total_size, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def zhibo_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
if 'video.zhibo.tv' in url:
|
||||
zhibo_vedio_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
# if 'v.zhibo.tv' in url:
|
||||
# http://v.zhibo.tv/31609372
|
||||
html = get_html(url)
|
||||
title = r1(r'<title>([\s\S]*)</title>', html)
|
||||
is_live = r1(r"window.videoIsLive=\'([s\S'\s\.]*)\'\;[\s\S]*window.resDomain", html)
|
||||
if is_live != "1":
|
||||
raise ValueError("The live stream is not online! (Errno:%s)" % is_live)
|
||||
|
||||
match = re.search(r"""
|
||||
ourStreamName .*?
|
||||
'(.*?)' .*?
|
||||
rtmpHighSource .*?
|
||||
'(.*?)' .*?
|
||||
'(.*?)'
|
||||
""", html, re.S | re.X)
|
||||
real_url = match.group(3) + match.group(1) + match.group(2)
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
|
||||
|
||||
site_info = "zhibo.tv"
|
||||
download = zhibo_download
|
||||
download_playlist = playlist_not_supported('zhibo')
|
79
src/you_get/extractors/zhihu.py
Normal file
79
src/you_get/extractors/zhihu.py
Normal file
@ -0,0 +1,79 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['zhihu_download', 'zhihu_download_playlist']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
|
||||
|
||||
def zhihu_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
paths = url.split("/")
|
||||
# question or column
|
||||
if len(paths) < 3 and len(paths) < 6:
|
||||
raise TypeError("URL does not conform to specifications, Support column and question only."
|
||||
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
|
||||
"https://www.zhihu.com/question/267782048/answer/490720324")
|
||||
|
||||
if ("question" not in paths or "answer" not in paths) and "zhuanlan.zhihu.com" not in paths:
|
||||
raise TypeError("URL does not conform to specifications, Support column and question only."
|
||||
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
|
||||
"https://www.zhihu.com/question/267782048/answer/490720324")
|
||||
|
||||
html = get_html(url, faker=True)
|
||||
title = match1(html, r'data-react-helmet="true">(.*?)</title>')
|
||||
for index, video_id in enumerate(matchall(html, [r'<a class="video-box" href="\S+video/(\d+)"'])):
|
||||
try:
|
||||
video_info = json.loads(
|
||||
get_content(r"https://lens.zhihu.com/api/videos/{}".format(video_id), headers=fake_headers))
|
||||
except json.decoder.JSONDecodeError:
|
||||
log.w("Video id not found:{}".format(video_id))
|
||||
continue
|
||||
|
||||
play_list = video_info["playlist"]
|
||||
# first High Definition
|
||||
# second Second Standard Definition
|
||||
# third ld. What is ld ?
|
||||
# finally continue
|
||||
data = play_list.get("hd", play_list.get("sd", play_list.get("ld", None)))
|
||||
if not data:
|
||||
log.w("Video id No play address:{}".format(video_id))
|
||||
continue
|
||||
print_info(site_info, title, data["format"], data["size"])
|
||||
if not info_only:
|
||||
ext = "_{}.{}".format(index, data["format"])
|
||||
if kwargs.get("zhihu_offset"):
|
||||
ext = "_{}".format(kwargs["zhihu_offset"]) + ext
|
||||
download_urls([data["play_url"]], title, ext, data["size"],
|
||||
output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
|
||||
def zhihu_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if "question" not in url or "answer" in url: # question page
|
||||
raise TypeError("URL does not conform to specifications, Support question only."
|
||||
" Example URL: https://www.zhihu.com/question/267782048")
|
||||
url = url.split("?")[0]
|
||||
if url[-1] == "/":
|
||||
question_id = url.split("/")[-2]
|
||||
else:
|
||||
question_id = url.split("/")[-1]
|
||||
videos_url = r"https://www.zhihu.com/api/v4/questions/{}/answers".format(question_id)
|
||||
try:
|
||||
questions = json.loads(get_content(videos_url))
|
||||
except json.decoder.JSONDecodeError:
|
||||
raise TypeError("Check whether the problem URL exists.Example URL: https://www.zhihu.com/question/267782048")
|
||||
|
||||
count = 0
|
||||
while 1:
|
||||
for data in questions["data"]:
|
||||
kwargs["zhihu_offset"] = count
|
||||
zhihu_download("https://www.zhihu.com/question/{}/answer/{}".format(question_id, data["id"]),
|
||||
output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
count += 1
|
||||
if questions["paging"]["is_end"]:
|
||||
return
|
||||
questions = json.loads(get_content(questions["paging"]["next"], headers=fake_headers))
|
||||
|
||||
|
||||
site_info = "zhihu.com"
|
||||
download = zhihu_download
|
||||
download_playlist = zhihu_download_playlist
|
@ -11,8 +11,25 @@ def output(video_extractor, pretty_print=True):
|
||||
out['title'] = ve.title
|
||||
out['site'] = ve.name
|
||||
out['streams'] = ve.streams
|
||||
try:
|
||||
if ve.dash_streams:
|
||||
out['streams'].update(ve.dash_streams)
|
||||
except AttributeError:
|
||||
pass
|
||||
try:
|
||||
if ve.audiolang:
|
||||
out['audiolang'] = ve.audiolang
|
||||
except AttributeError:
|
||||
pass
|
||||
extra = {}
|
||||
if getattr(ve, 'referer', None) is not None:
|
||||
extra["referer"] = ve.referer
|
||||
if getattr(ve, 'ua', None) is not None:
|
||||
extra["ua"] = ve.ua
|
||||
if extra:
|
||||
out["extra"] = extra
|
||||
if pretty_print:
|
||||
print(json.dumps(out, indent=4, sort_keys=True, ensure_ascii=False))
|
||||
print(json.dumps(out, indent=4, ensure_ascii=False))
|
||||
else:
|
||||
print(json.dumps(out))
|
||||
|
||||
@ -31,6 +48,11 @@ def print_info(site_info=None, title=None, type=None, size=None):
|
||||
|
||||
def download_urls(urls=None, title=None, ext=None, total_size=None, refer=None):
|
||||
ve = last_info
|
||||
if not ve:
|
||||
ve = VideoExtractor()
|
||||
ve.name = ''
|
||||
ve.url = urls
|
||||
ve.title=title
|
||||
# save download info in streams
|
||||
stream = {}
|
||||
stream['container'] = ext
|
||||
@ -42,4 +64,3 @@ def download_urls(urls=None, title=None, ext=None, total_size=None, refer=None):
|
||||
ve.streams = {}
|
||||
ve.streams['__default__'] = stream
|
||||
output(ve)
|
||||
|
||||
|
220
src/you_get/processor/ffmpeg.py
Normal file → Executable file
220
src/you_get/processor/ffmpeg.py
Normal file → Executable file
@ -1,69 +1,102 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import os.path
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from ..util.strings import parameterize
|
||||
from ..common import print_more_compatible as print
|
||||
|
||||
try:
|
||||
from subprocess import DEVNULL
|
||||
except ImportError:
|
||||
# Python 3.2 or below
|
||||
import os
|
||||
import atexit
|
||||
DEVNULL = os.open(os.devnull, os.O_RDWR)
|
||||
atexit.register(lambda fd: os.close(fd), DEVNULL)
|
||||
|
||||
def get_usable_ffmpeg(cmd):
|
||||
try:
|
||||
p = subprocess.Popen([cmd, '-version'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
p = subprocess.Popen([cmd, '-version'], stdin=DEVNULL, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
out, err = p.communicate()
|
||||
vers = str(out, 'utf-8').split('\n')[0].split()
|
||||
assert (vers[0] == 'ffmpeg' and vers[2][0] > '0') or (vers[0] == 'avconv')
|
||||
#if the version is strange like 'N-1234-gd1111', set version to 2.0
|
||||
try:
|
||||
version = [int(i) for i in vers[2].split('.')]
|
||||
v = vers[2][1:] if vers[2][0] == 'n' else vers[2]
|
||||
version = [int(i) for i in v.split('.')]
|
||||
except:
|
||||
version = [1, 0]
|
||||
return cmd, version
|
||||
return cmd, 'ffprobe', version
|
||||
except:
|
||||
return None
|
||||
|
||||
FFMPEG, FFMPEG_VERSION = get_usable_ffmpeg('ffmpeg') or get_usable_ffmpeg('avconv') or (None, None)
|
||||
LOGLEVEL = ['-loglevel', 'quiet']
|
||||
FFMPEG, FFPROBE, FFMPEG_VERSION = get_usable_ffmpeg('ffmpeg') or get_usable_ffmpeg('avconv') or (None, None, None)
|
||||
if logging.getLogger().isEnabledFor(logging.DEBUG):
|
||||
LOGLEVEL = ['-loglevel', 'info']
|
||||
STDIN = None
|
||||
else:
|
||||
LOGLEVEL = ['-loglevel', 'quiet']
|
||||
STDIN = DEVNULL
|
||||
|
||||
def has_ffmpeg_installed():
|
||||
return FFMPEG is not None
|
||||
|
||||
# Given a list of segments and the output path, generates the concat
|
||||
# list and returns the path to the concat list.
|
||||
def generate_concat_list(files, output):
|
||||
concat_list_path = output + '.txt'
|
||||
concat_list_dir = os.path.dirname(concat_list_path)
|
||||
with open(concat_list_path, 'w', encoding='utf-8') as concat_list:
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
relpath = os.path.relpath(file, start=concat_list_dir)
|
||||
concat_list.write('file %s\n' % parameterize(relpath))
|
||||
return concat_list_path
|
||||
|
||||
def ffmpeg_concat_av(files, output, ext):
|
||||
print('Merging video parts... ', end="", flush=True)
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
for file in files:
|
||||
if os.path.isfile(file): params.extend(['-i', file])
|
||||
params.extend(['-c', 'copy'])
|
||||
params.extend(['--', output])
|
||||
if subprocess.call(params, stdin=STDIN):
|
||||
print('Merging without re-encode failed.\nTry again re-encoding audio... ', end="", flush=True)
|
||||
try: os.remove(output)
|
||||
except FileNotFoundError: pass
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
for file in files:
|
||||
if os.path.isfile(file): params.extend(['-i', file])
|
||||
params.extend(['-c:v', 'copy'])
|
||||
if ext == 'mp4':
|
||||
params.extend(['-c:a', 'aac'])
|
||||
elif ext == 'webm':
|
||||
params.extend(['-c:a', 'vorbis'])
|
||||
params.extend(['-strict', 'experimental'])
|
||||
params.append(output)
|
||||
return subprocess.call(params)
|
||||
elif ext == 'webm':
|
||||
params.extend(['-c:a', 'opus'])
|
||||
params.extend(['--', output])
|
||||
return subprocess.call(params, stdin=STDIN)
|
||||
else:
|
||||
return 0
|
||||
|
||||
def ffmpeg_convert_ts_to_mkv(files, output='output.mkv'):
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
params.extend(['-y', '-i', file, output])
|
||||
subprocess.call(params)
|
||||
params.extend(['-y', '-i', file])
|
||||
params.extend(['--', output])
|
||||
subprocess.call(params, stdin=STDIN)
|
||||
|
||||
return
|
||||
|
||||
def ffmpeg_concat_mp4_to_mpg(files, output='output.mpg'):
|
||||
# Use concat demuxer on FFmpeg >= 1.1
|
||||
if FFMPEG == 'ffmpeg' and (FFMPEG_VERSION[0] >= 2 or (FFMPEG_VERSION[0] == 1 and FFMPEG_VERSION[1] >= 1)):
|
||||
concat_list = open(output + '.txt', 'w', encoding="utf-8")
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
concat_list.write("file %s\n" % parameterize(file))
|
||||
concat_list.close()
|
||||
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
params.extend(['-f', 'concat', '-safe', '-1', '-y', '-i'])
|
||||
params.append(output + '.txt')
|
||||
params += ['-c', 'copy', output]
|
||||
|
||||
if subprocess.call(params) == 0:
|
||||
concat_list = generate_concat_list(files, output)
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y', '-f', 'concat', '-safe', '-1',
|
||||
'-i', concat_list, '-c', 'copy']
|
||||
params.extend(['--', output])
|
||||
if subprocess.call(params, stdin=STDIN) == 0:
|
||||
os.remove(output + '.txt')
|
||||
return True
|
||||
else:
|
||||
@ -73,7 +106,7 @@ def ffmpeg_concat_mp4_to_mpg(files, output='output.mpg'):
|
||||
if os.path.isfile(file):
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
|
||||
params.extend([file, file + '.mpg'])
|
||||
subprocess.call(params)
|
||||
subprocess.call(params, stdin=STDIN)
|
||||
|
||||
inputs = [open(file + '.mpg', 'rb') for file in files]
|
||||
with open(output + '.mpg', 'wb') as o:
|
||||
@ -83,10 +116,9 @@ def ffmpeg_concat_mp4_to_mpg(files, output='output.mpg'):
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
|
||||
params.append(output + '.mpg')
|
||||
params += ['-vcodec', 'copy', '-acodec', 'copy']
|
||||
params.append(output)
|
||||
subprocess.call(params)
|
||||
params.extend(['--', output])
|
||||
|
||||
if subprocess.call(params) == 0:
|
||||
if subprocess.call(params, stdin=STDIN) == 0:
|
||||
for file in files:
|
||||
os.remove(file + '.mpg')
|
||||
os.remove(output + '.mpg')
|
||||
@ -101,10 +133,11 @@ def ffmpeg_concat_ts_to_mkv(files, output='output.mkv'):
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
params[-1] += file + '|'
|
||||
params += ['-f', 'matroska', '-c', 'copy', output]
|
||||
params += ['-f', 'matroska', '-c', 'copy']
|
||||
params.extend(['--', output])
|
||||
|
||||
try:
|
||||
if subprocess.call(params) == 0:
|
||||
if subprocess.call(params, stdin=STDIN) == 0:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
@ -115,19 +148,12 @@ def ffmpeg_concat_flv_to_mp4(files, output='output.mp4'):
|
||||
print('Merging video parts... ', end="", flush=True)
|
||||
# Use concat demuxer on FFmpeg >= 1.1
|
||||
if FFMPEG == 'ffmpeg' and (FFMPEG_VERSION[0] >= 2 or (FFMPEG_VERSION[0] == 1 and FFMPEG_VERSION[1] >= 1)):
|
||||
concat_list = open(output + '.txt', 'w', encoding="utf-8")
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
# for escaping rules, see:
|
||||
# https://www.ffmpeg.org/ffmpeg-utils.html#Quoting-and-escaping
|
||||
concat_list.write("file %s\n" % parameterize(file))
|
||||
concat_list.close()
|
||||
|
||||
params = [FFMPEG] + LOGLEVEL + ['-f', 'concat', '-safe', '-1', '-y', '-i']
|
||||
params.append(output + '.txt')
|
||||
params += ['-c', 'copy', output]
|
||||
|
||||
subprocess.check_call(params)
|
||||
concat_list = generate_concat_list(files, output)
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y', '-f', 'concat', '-safe', '-1',
|
||||
'-i', concat_list, '-c', 'copy',
|
||||
'-bsf:a', 'aac_adtstoasc']
|
||||
params.extend(['--', output])
|
||||
subprocess.check_call(params, stdin=STDIN)
|
||||
os.remove(output + '.txt')
|
||||
return True
|
||||
|
||||
@ -138,7 +164,7 @@ def ffmpeg_concat_flv_to_mp4(files, output='output.mp4'):
|
||||
params += ['-map', '0', '-c', 'copy', '-f', 'mpegts', '-bsf:v', 'h264_mp4toannexb']
|
||||
params.append(file + '.ts')
|
||||
|
||||
subprocess.call(params)
|
||||
subprocess.call(params, stdin=STDIN)
|
||||
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
|
||||
params.append('concat:')
|
||||
@ -147,32 +173,41 @@ def ffmpeg_concat_flv_to_mp4(files, output='output.mp4'):
|
||||
if os.path.isfile(f):
|
||||
params[-1] += f + '|'
|
||||
if FFMPEG == 'avconv':
|
||||
params += ['-c', 'copy', output]
|
||||
params += ['-c', 'copy']
|
||||
else:
|
||||
params += ['-c', 'copy', '-absf', 'aac_adtstoasc', output]
|
||||
params += ['-c', 'copy', '-absf', 'aac_adtstoasc']
|
||||
params.extend(['--', output])
|
||||
|
||||
if subprocess.call(params) == 0:
|
||||
if subprocess.call(params, stdin=STDIN) == 0:
|
||||
for file in files:
|
||||
os.remove(file + '.ts')
|
||||
return True
|
||||
else:
|
||||
raise
|
||||
|
||||
def ffmpeg_concat_mp3_to_mp3(files, output='output.mp3'):
|
||||
print('Merging video parts... ', end="", flush=True)
|
||||
|
||||
files = 'concat:' + '|'.join(files)
|
||||
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y']
|
||||
params += ['-i', files, '-acodec', 'copy']
|
||||
params.extend(['--', output])
|
||||
|
||||
subprocess.call(params)
|
||||
|
||||
return True
|
||||
|
||||
def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
|
||||
print('Merging video parts... ', end="", flush=True)
|
||||
# Use concat demuxer on FFmpeg >= 1.1
|
||||
if FFMPEG == 'ffmpeg' and (FFMPEG_VERSION[0] >= 2 or (FFMPEG_VERSION[0] == 1 and FFMPEG_VERSION[1] >= 1)):
|
||||
concat_list = open(output + '.txt', 'w', encoding="utf-8")
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
concat_list.write("file %s\n" % parameterize(file))
|
||||
concat_list.close()
|
||||
|
||||
params = [FFMPEG] + LOGLEVEL + ['-f', 'concat', '-safe', '-1', '-y', '-i']
|
||||
params.append(output + '.txt')
|
||||
params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc', output]
|
||||
|
||||
subprocess.check_call(params)
|
||||
concat_list = generate_concat_list(files, output)
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y', '-f', 'concat', '-safe', '-1',
|
||||
'-i', concat_list, '-c', 'copy',
|
||||
'-bsf:a', 'aac_adtstoasc']
|
||||
params.extend(['--', output])
|
||||
subprocess.check_call(params, stdin=STDIN)
|
||||
os.remove(output + '.txt')
|
||||
return True
|
||||
|
||||
@ -183,7 +218,7 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
|
||||
params += ['-c', 'copy', '-f', 'mpegts', '-bsf:v', 'h264_mp4toannexb']
|
||||
params.append(file + '.ts')
|
||||
|
||||
subprocess.call(params)
|
||||
subprocess.call(params, stdin=STDIN)
|
||||
|
||||
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
|
||||
params.append('concat:')
|
||||
@ -192,19 +227,20 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
|
||||
if os.path.isfile(f):
|
||||
params[-1] += f + '|'
|
||||
if FFMPEG == 'avconv':
|
||||
params += ['-c', 'copy', output]
|
||||
params += ['-c', 'copy']
|
||||
else:
|
||||
params += ['-c', 'copy', '-absf', 'aac_adtstoasc', output]
|
||||
params += ['-c', 'copy', '-absf', 'aac_adtstoasc']
|
||||
params.extend(['--', output])
|
||||
|
||||
subprocess.check_call(params)
|
||||
subprocess.check_call(params, stdin=STDIN)
|
||||
for file in files:
|
||||
os.remove(file + '.ts')
|
||||
return True
|
||||
|
||||
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
|
||||
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.', stream=True):
|
||||
"""str, str->True
|
||||
WARNING: NOT THE SAME PARMS AS OTHER FUNCTIONS!!!!!!
|
||||
You can basicly download anything with this function
|
||||
You can basically download anything with this function
|
||||
but better leave it alone with
|
||||
"""
|
||||
output = title + '.' + ext
|
||||
@ -212,25 +248,25 @@ def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
|
||||
if not (output_dir == '.'):
|
||||
output = output_dir + '/' + output
|
||||
|
||||
ffmpeg_params = []
|
||||
#should these exist...
|
||||
print('Downloading streaming content with FFmpeg, press q to stop recording...')
|
||||
if stream:
|
||||
ffmpeg_params = [FFMPEG] + ['-y', '-re', '-i']
|
||||
else:
|
||||
ffmpeg_params = [FFMPEG] + ['-y', '-i']
|
||||
ffmpeg_params.append(files) #not the same here!!!!
|
||||
|
||||
if FFMPEG == 'avconv': #who cares?
|
||||
ffmpeg_params += ['-c', 'copy']
|
||||
else:
|
||||
ffmpeg_params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc']
|
||||
|
||||
if params is not None:
|
||||
if len(params) > 0:
|
||||
for k, v in params:
|
||||
ffmpeg_params.append(k)
|
||||
ffmpeg_params.append(v)
|
||||
|
||||
|
||||
print('Downloading streaming content with FFmpeg, press q to stop recording...')
|
||||
ffmpeg_params = [FFMPEG] + ['-y', '-re', '-i']
|
||||
ffmpeg_params.append(files) #not the same here!!!!
|
||||
|
||||
if FFMPEG == 'avconv': #who cares?
|
||||
ffmpeg_params += ['-c', 'copy', output]
|
||||
else:
|
||||
ffmpeg_params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc']
|
||||
|
||||
ffmpeg_params.append(output)
|
||||
ffmpeg_params.extend(['--', output])
|
||||
|
||||
print(' '.join(ffmpeg_params))
|
||||
|
||||
@ -244,3 +280,31 @@ def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
|
||||
pass
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def ffmpeg_concat_audio_and_video(files, output, ext):
|
||||
print('Merging video and audio parts... ', end="", flush=True)
|
||||
if has_ffmpeg_installed:
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
params.extend(['-f', 'concat'])
|
||||
params.extend(['-safe', '0']) # https://stackoverflow.com/questions/38996925/ffmpeg-concat-unsafe-file-name
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
params.extend(['-i', file])
|
||||
params.extend(['-c:v', 'copy'])
|
||||
params.extend(['-c:a', 'aac'])
|
||||
params.extend(['-strict', 'experimental'])
|
||||
params.extend(['--', output + "." + ext])
|
||||
return subprocess.call(params, stdin=STDIN)
|
||||
else:
|
||||
raise EnvironmentError('No ffmpeg found')
|
||||
|
||||
|
||||
def ffprobe_get_media_duration(file):
|
||||
print('Getting {} duration'.format(file))
|
||||
params = [FFPROBE]
|
||||
params.extend(['-i', file])
|
||||
params.extend(['-show_entries', 'format=duration'])
|
||||
params.extend(['-v', 'quiet'])
|
||||
params.extend(['-of', 'csv=p=0'])
|
||||
return subprocess.check_output(params, stdin=STDIN, stderr=subprocess.STDOUT).decode().strip()
|
||||
|
@ -1,8 +1,8 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import platform
|
||||
from .os import detect_os
|
||||
|
||||
def legitimize(text, os=platform.system()):
|
||||
def legitimize(text, os=detect_os()):
|
||||
"""Converts a string to a valid filename.
|
||||
"""
|
||||
|
||||
@ -13,7 +13,8 @@ def legitimize(text, os=platform.system()):
|
||||
ord('|'): '-',
|
||||
})
|
||||
|
||||
if os == 'Windows':
|
||||
# FIXME: do some filesystem detection
|
||||
if os == 'windows' or os == 'cygwin' or os == 'wsl':
|
||||
# Windows (non-POSIX namespace)
|
||||
text = text.translate({
|
||||
# Reserved in Windows VFAT and NTFS
|
||||
@ -28,10 +29,11 @@ def legitimize(text, os=platform.system()):
|
||||
ord('>'): '-',
|
||||
ord('['): '(',
|
||||
ord(']'): ')',
|
||||
ord('\t'): ' ',
|
||||
})
|
||||
else:
|
||||
# *nix
|
||||
if os == 'Darwin':
|
||||
if os == 'mac':
|
||||
# Mac OS HFS+
|
||||
text = text.translate({
|
||||
ord(':'): '-',
|
||||
@ -41,5 +43,5 @@ def legitimize(text, os=platform.system()):
|
||||
if text.startswith("."):
|
||||
text = text[1:]
|
||||
|
||||
text = text[:82] # Trim to 82 Unicode characters long
|
||||
text = text[:80] # Trim to 82 Unicode characters long
|
||||
return text
|
||||
|
@ -5,13 +5,13 @@ from ..version import script_name
|
||||
|
||||
import os, sys
|
||||
|
||||
IS_ANSI_TERMINAL = os.getenv('TERM') in (
|
||||
TERM = os.getenv('TERM', '')
|
||||
IS_ANSI_TERMINAL = TERM in (
|
||||
'eterm-color',
|
||||
'linux',
|
||||
'screen',
|
||||
'vt100',
|
||||
'xterm',
|
||||
)
|
||||
) or TERM.startswith('xterm')
|
||||
|
||||
# ANSI escape code
|
||||
# See <http://en.wikipedia.org/wiki/ANSI_escape_code>
|
||||
@ -89,10 +89,14 @@ def e(message, exit_code=None):
|
||||
"""Print an error log message."""
|
||||
print_log(message, YELLOW, BOLD)
|
||||
if exit_code is not None:
|
||||
exit(exit_code)
|
||||
sys.exit(exit_code)
|
||||
|
||||
def wtf(message, exit_code=1):
|
||||
"""What a Terrible Failure!"""
|
||||
print_log(message, RED, BOLD)
|
||||
if exit_code is not None:
|
||||
exit(exit_code)
|
||||
sys.exit(exit_code)
|
||||
|
||||
def yes_or_no(message):
|
||||
ans = str(input('%s (y/N) ' % message)).lower().strip()
|
||||
return ans == 'y'
|
||||
|
32
src/you_get/util/os.py
Normal file
32
src/you_get/util/os.py
Normal file
@ -0,0 +1,32 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
from platform import system
|
||||
|
||||
def detect_os():
|
||||
"""Detect operating system.
|
||||
"""
|
||||
|
||||
# Inspired by:
|
||||
# https://github.com/scivision/pybashutils/blob/78b7f2b339cb03b1c37df94015098bbe462f8526/pybashutils/windows_linux_detect.py
|
||||
|
||||
syst = system().lower()
|
||||
os = 'unknown'
|
||||
|
||||
if 'cygwin' in syst:
|
||||
os = 'cygwin'
|
||||
elif 'darwin' in syst:
|
||||
os = 'mac'
|
||||
elif 'linux' in syst:
|
||||
os = 'linux'
|
||||
# detect WSL https://github.com/Microsoft/BashOnWindows/issues/423
|
||||
try:
|
||||
with open('/proc/version', 'r') as f:
|
||||
if 'microsoft' in f.read().lower():
|
||||
os = 'wsl'
|
||||
except: pass
|
||||
elif 'windows' in syst:
|
||||
os = 'windows'
|
||||
elif 'bsd' in syst:
|
||||
os = 'bsd'
|
||||
|
||||
return os
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user